README - metacpan.org

NAME
    Lingua::StanfordCoreNLP - A Perl interface to Stanford's CoreNLP tool
    set.

SYNOPSIS
     # Note that Lingua::StanfordCoreNLP can't be instantiated.
     use Lingua::StanfordCoreNLP;

     # Create a new NLP pipeline (don't silence messages, do make corefs bidirectional)
     my $pipeline = new Lingua::StanfordCoreNLP::Pipeline(0, 1);

     # Process text
     # (Will output lots of debug info from the Java classes to STDERR.)
     my $result = $pipeline->process(
        'Jane looked at the IBM computer. She turned it off.'
     );

     my @seen_corefs;

     # Print results
     for my $sentence (@{$result->toArray}) {
        print "\n[Sentence ID: ", $sentence->getIDString, "]:\n";
        print "Original sentence:\n\t", $sentence->getSentence, "\n";

        print "Tagged text:\n";
        for my $token (@{$sentence->getTokens->toArray}) {
           printf "\t%s/%s/%s [%s]\n",
                  $token->getWord,
                  $token->getPOSTag,
                  $token->getNERTag,
                  $token->getLemma;
        }

        print "Dependencies:\n";
        for my $dep (@{$sentence->getDependencies->toArray}) {
           printf "\t%s(%s-%d, %s-%d) [%s]\n",
                  $dep->getRelation,
                  $dep->getGovernor->getWord,
                  $dep->getGovernorIndex,
                  $dep->getDependent->getWord,
                  $dep->getDependentIndex,
                  $dep->getLongRelation;
        }

        print "Coreferences:\n";
        for my $coref (@{$sentence->getCoreferences->toArray}) {
           printf "\t%s [%d, %d] <=> %s [%d, %d]\n",
                  $coref->getSourceToken->getWord,
                  $coref->getSourceSentence,
                  $coref->getSourceHead,
                  $coref->getTargetToken->getWord,
                  $coref->getTargetSentence,
                  $coref->getTargetHead;

           print "\t\t(Duplicate)\n"
              if(grep { $_->equals($coref) } @seen_corefs);

           push @seen_corefs, $coref;
        }
     }

DESCRIPTION
    This module implements a "StanfordCoreNLP" pipeline for annotating text
    with part-of-speech tags, dependencies, lemmas, named-entity tags, and
    coreferences.

    (Note that the archive contains the CoreNLP annotation models, which is
    why it's so darn big.)

INSTALLATION
    The following should do the job:

     $ perl Build.PL
     $ ./Build test
     $ sudo ./Build install

PREREQUISITES
    Lingua::StanfordCoreNLP consists mainly of Java code, and thus needs
    Inline::Java installed to function.

EXPORTED CLASSES
    Lingua::StanfordCoreNLP exports the following Java-classes via
    Inline::Java:

  Lingua::StanfordCoreNLP::Pipeline
    The main interface to "StanfordCoreNLP". This class is the only one you
    should need to instantiate yourself.

    new
    new($silent)
    new($silent, $bidirectionalCorefs)
        Creates a new "Lingua::StanfordCoreNLP::Pipeline" object. The
        optional boolean parameter $silent silences the output from
        annotators if true, while the optional parameter
        $bidirectionalCorefs makes coreferences bidirectional; that is to
        say, the coreference is added to both the source and the target
        sentence of all coreferences (if the source and target sentence are
        different). $silent and $bidirectionalCorefs default to false.

    getAnnotatorLog
        If the pipeline was created to be $silent, return logged messages as
        a string. Otherwise, or if no output has been logged, returns an
        empty string.

    getPipeline
        Returns a reference to the "StanfordCoreNLP" pipeline used for
        annotation. You probably won't want to touch this.

    process($str)
        Process a string. Returns a
        "Lingua::StanfordCoreNLP::PipelineSentenceList".

  Lingua::StanfordCoreNLP::PipelineItem
    Abstract superclass of
    "Pipeline{Coreference,Dependency,Sentence,Token}". Contains ID and
    methods for getting and comparing it.

    getID
        Returns a "java.util.UUID" object which represents the item's ID.

    getIDString
        Returns the ID as a string.

    identicalTo($b)
        Returns true if $b has an identical ID to this item.

  Lingua::StanfordCoreNLP::PipelineCoreference
    An object representing a coreference between head-word W1 in sentence S1
    and head-word W2 in sentence S2. Note that both sentences and words are
    zero-indexed, unlike the default outputs of Stanford's tools.

    getSourceSentence
        Index of sentence S1.

    getTargetSentence
        Index of sentence S2.

    getSourceHead
        Index of word W1 (in S1).

    getTargetHead
        Index of word W2 (in S2).

    getSourceToken
        The "Lingua::StanfordCoreNLP::PipelineToken" representing W1.

    getTargetToken
        The "Lingua::StanfordCoreNLP::PipelineToken" representing W2.

    equals($b)
        Returns true if this "PipelineCoreference" matches $b --- if their
        "getSourceToken" and "getTargetToken" have the same ID. Note that it
        returns true even if the orders of the coreferences are reversed (if
        "$a->getSourceToken->getID == $b->getTargetToken->getID" and
        "$a->getTargetToken->getID == $b->getSourceToken->getID").

    toCompactString
        A compact String representation of the coreference ---
        "Word/Sentence:Head <=> Word/Sentence:Head".

    toString
        A String representation of the coreference --- "Word/POS-tag
        [sentence, head] <=> Word/POS-tag [sentence, head]".

  Lingua::StanfordCoreNLP::PipelineDependency
    Represents a dependency in the Stanford Typed Dependency format. For
    example, in the fragment "Walk hard", "Walk" is the governor and "hard"
    is the dependent in the relationship "advmod" ("hard" is an adverbial
    modifier of "Walk").

    getGovernor
        The governor in the relation as a
        "Lingua::StanfordCoreNLP::PipelineToken".

    getGovernorIndex
        The index of the governor within the sentence.

    getDependent
        The dependent in the relation as a
        "Lingua::StanfordCoreNLP::PipelineToken".

    getDependentIndex
        The index of the dependent within the sentence.

    getRelation
        Short name of the relation.

    getLongRelation
        Long description of the relation.

    toCompactString
    toCompactString($includeIndices)
    toString
    toString($includeIndices)
        Returns a String representation of the dependency ---
        "relation(governor-N, dependent-N) [description]". "toCompactString"
        does not include description. The optional parameter $includeIndices
        controls whether governor and dependent indices are included, and
        defaults to true. (Note that unlike those of, e.g., the Stanford
        Parser, these indices start at zero, not one.)

  Lingua::StanfordCoreNLP::PipelineSentence
    An annotated sentence, containing the sentence itself, its dependencies,
    pos- and ner-tagged tokens, and coreferences.

    getSentence
        Returns a string containing the original sentence

    getTokens
        A "Lingua::StanfordCoreNLP::PipelineTokenList" containing the POS-
        and NER-tagged and lemmaized tokens of the sentence.

    getDependencies
        A "Lingua::StanfordCoreNLP::PipelineDependencyList" containing the
        dependencies found in the sentence.

    getCoreferences
        A "Lingua::StanfordCoreNLP::PipelineCoreferenceList" of the
        coreferences between this and other sentences.

    toCompactString
    toString
        A String representation of the sentence, its coreferences,
        dependencies, and tokens. "toCompactString" separates fields by
        "\n", whereas "toString" separates them by "\n\n".

  Lingua::StanfordCoreNLP::PipelineToken
    A token, with POS- and NER-tag and lemma.

    getWord
        The textual representation of the token (i.e. the word).

    getPOSTag
        The token's Part-of-Speech tag.

    getNERTag
        The token's Named-Entity tag.

    getLemma
        The lemma of the the token.

    toCompactString
    toCompactString($lemmaize)
        A compact String representation of the token --- "word/POS-tag". If
        the optional argument $lemmaize is true, returns "lemma/POS-tag".

    toString
        A String representation of the token --- "word/POS-tag/NER-tag
        [lemma]".

  Lingua::StanfordCoreNLP::PipelineList
  Lingua::StanfordCoreNLP::PipelineCoreferenceList
  Lingua::StanfordCoreNLP::PipelineDependencyList
  Lingua::StanfordCoreNLP::PipelineSentenceList
  Lingua::StanfordCoreNLP::PipelineTokenList
    "Lingua::StanfordCoreNLP::PipelineList" is a generic list class which
    extends "java.Util.ArrayList". It is in turn extended by
    "Pipeline{Coreference,Dependency,Sentence,Token}List" (which are the
    list-types that "Pipeline" returns). Note that all lists are
    zero-indexed.

    joinList($sep)
    joinListCompact($sep)
        Returns a string containing the output of either the "toString" or
        "toCompactString" methods of the elements in "PipelineList",
        separated by $sep.

    toArray
        Return the elements of the list as an array-reference.

    toHashMap
        Return the list as a "java.util.HashMap<String,PipelineItem>", with
        items' stringified ID:s as keys.

    toCompactString
    toString
        Returns the elements of the "PipelineList" as a string containing
        the output of either their "toCompactString" or "toString" methods,
        separated by the default separator (which is "\n" for all lists
        except "PipelineTokenList" which uses " ").

TODO
    *   Custom annotator-combinations, so you won't have to load up six
        different annotator models just to POSTag som text.

REQUESTS & BUGS
    Mail any bug-reports or feature-requests to
    <StanfordCoreNLP@fivebyfive.be>.

AUTHORS
    Kalle Räisänen <kal@cpan.org>.

COPYRIGHT
  Lingua::StanfordCoreNLP (Perl bindings)
    Copyright © 2011 Kalle Räisänen.

    This program is free software: you can redistribute it and/or modify it
    under the terms of the GNU Affero General Public License as published by
    the Free Software Foundation, either version 3 of the License, or (at
    your option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero
    General Public License for more details.

    You should have received a copy of the GNU Affero General Public License
    along with this program. If not, see <http://www.gnu.org/licenses/>.

  Stanford CoreNLP tool set
    Copyright © 2010-2011 The Board of Trustees of The Leland Stanford
    Junior University.

    This program is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
    Free Software Foundation; either version 2 of the License, or (at your
    option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
    Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, see <http://www.gnu.org/licenses/>.

SEE ALSO
    <http://nlp.stanford.edu/software/corenlp.shtml>,
    Text::NLP::Stanford::EntityExtract, NLP::StanfordParser, Inline::Java.
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)