The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::YaTeA::WordFromCorpus - Perl extension for managing word of the corpus and related information

SYNOPSIS

  use Lingua::YaTeA::WordFromCorpus;
  Lingua::YaTeA::WordFromCorpus->new($form,$lexicon,$sentences);

DESCRIPTION

The module manages the word occurrence $form of the corpus ($form is the inflected form of the word). It associates an identifier (field ID), the word entry of the lexicon $lexicon (field LEX_ITEM), the sentence (from the sentence set $sentences) where the word occurrs (field SENTENCE) and the offset of the word in the sentence (START_CHAR).

METHODS

new()

    new($form,$lexicon,$sentences);    

The method creates the objet correspoding to the word $form. $lexicon and $sentences are used to set the fields LEX_ITEM and SENTENCE respectively.

setLexItem()

    setLexItem($form, $lexicon);

The method sets the field LEX_ITEM of the word $form with the corresponding item in the lexicon $lexicon.

getID()

    getID();

The method returns the identifier of the current word.

getSentence()

    getSentence();

The method return the sentence where occurs the current word.

getDocument()

    getDocument();

The method return the document where occurs the current word.

getSentenceID()

    getSentenceID();

The method return the identifier of the sentence where occurs the current word.

getDocumentID()

    getDocumentID();

The method return the identifier of the document where occurs the current word.

getStartChar()

    getStartChar();

The method returns the offset (field START_CHAR) of the word in the sentence.

getLexItem()

    getLexItem();

The method returns the lexicon item (field LEX_ITEM) correspondig to the current word.

isSentenceBoundary()

    isSentenceBoundary($sentence_boundary);

The methods indicates if the word is a sentence boundary (sentence boundary is a string).

isDocumentBoundary()

    isDocumentBoundary($sentence_boundary);

The methods indicates if the word is a document boundary (sentence boundary is a string).

updateSentence()

    updateSentence($sentences);

The method updates the field SENTENCE regarding to the sentence set (sentences).

updateStartChar()

    updateSentence();

The method updates the field START_CHAR regarding to the value of the current offset in the sentence.

isChunkingFrontier()

    isChunkingFrontier($chunking_data);

The method indicates if the current word is a chunking frontier according to the defined chunking data ($chunking_data).

isChunkingException()

    isChunkingException($chunking_data);

The method indicates if the current word is a chunking exception according to the defined chunking data ($chunking_data).

isCleaningFrontier()

    isCleaningFrontier($chunking_data);

The method indicates if the current word is a cleaning frontier according to the defined chunking data ($chunking_data).

isCleaningException()

    isCleaningException($chunking_data);

The method indicates if the current word is a cleaning exception according to the defined chunking data ($chunking_data).

isCompulsory()

    izCompulsory($compulsory);

The method indicates if the Part-Of-Speech (POS) tag of the current word is one of the required POS tag that must appear in a term.

getPOS()

    getPOS();

The methods returns the Part-Of-Speech tag of the current word.

isEndTrigger()

    isEndTrigger($end_trigger_set);

the method indicates if the word is at the end of a trigger (see Lingua::YaTeA::TriggerSet and Lingua::YaTeA::Trigger).

isStartTrigger()

    isStartTrigger($start_trigger_set);

the method indicates if the word is at the start of a trigger (see Lingua::YaTeA::TriggerSet and Lingua::YaTeA::Trigger).

getIF()

    getIF();

The methods returns the inflected form of the current word.

getLF()

    getLF();

The methods returns the lemmatised form of the current word.

SEE ALSO

Sophie Aubin and Thierry Hamon. Improving Term Extraction with Terminological Resources. In Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006). pages 380-387. Tapio Salakoski, Filip Ginter, Sampo Pyysalo, Tapio Pahikkala (Eds). August 2006. LNAI 4139.

AUTHOR

Thierry Hamon <thierry.hamon@univ-paris13.fr> and Sophie Aubin <sophie.aubin@lipn.univ-paris13.fr>

COPYRIGHT AND LICENSE

Copyright (C) 2005 by Thierry Hamon and Sophie Aubin

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.