Lingua::YaTeA::TestifiedTerm - Perl extension for Testified Term
use Lingua::YaTeA::TestifiedTerm; Lingua::YaTeA::TestifiedTerm->new(num_content_words,$words_a,$tag_set,$source,$match_type);
The module implements a representation of the testified terms, i.e. terms from a terminological resource. Those testified terms are used to find corresponding terms in the corpus. Each testified term is described by its identifier (ID), its inflected form IF, its list of part-of-speech tags POS, its lemma LF, the terminological source SOURCE, the list of word components WORDS, the regular expression used to identify it in the corpus (REG_EXP), the indication whether the testified term is found or not (FOUND), its list of occurrences OCCURRENCES and the list of the word index entries (INDEX_SET).
ID
IF
POS
LF
SOURCE
WORDS
REG_EXP
FOUND
OCCURRENCES
INDEX_SET
The three information IF, POS and LF are computed from the information issued from their word components.
new($num_content_words,$words_a,$tag_set,$source,$match_type);
This method creates a new object representing a testified term. It sets the fields IF, POS, LF, REG_EXP, INDEX_SET and SOURCE. $words_a and $tag_set are used to initialise the lignuistic information (IF, POS, LF). $source initialises the SOUCE field. $mach_type defines the type of matching for finding the terms in the corpus.
$words_a
$tag_set
$source
SOUCE
$mach_type
isInLexicon($filtering_lexicon_h, $match_type);
This method checks if all the words of a testified term appear in the lexicon of the text ($filtering_lexicon_h) according to the matching type $match_type: loose (each word matches either a inflected form or a lemmatised form) strict (each word matches a inflected form with the correct Part-of-Speech tag) default (each word mathces a inflected form). The method returns 1 if all the words of the testified term are found in the lexicon, otherwise it returns 0.
$filtering_lexicon_h
$match_type
loose
strict
default
$filtering_lexicon_h is a hash table containing the inflected forms, the lemmatised form and the concatenation of the inflected form and the Partof-speech tag (separated by a ~ character) of each word in the text.
~
buildLinguisticInfos($words, $tagset);
The method returns the inflected form, the postag list and the lemma of the term candidate as an array (each informationn is the concatenation of the word information found in the array $words and the Part-of-Speech tags $tagset).
$words
$tagset
getWords();
The mathod returns the list of the words that are components of the term candidate.
setIF();
The method sets the inflected form of the term candidate.
setPOS();
The method sets the list of the part-of-speech tags of the term candidate.
setLF();
The method sets the canonical form (lemma) of the term candidate.
getIF();
The method returns the inflected form of the term candidate.
getPOS();
The method returns the list of the part-of-speech tags of the term candidate.
getLF();
The method returns the canonical form (lemma) of the term candidate.
getID();
This method returns the identifier of the term candidate.
buoldKey();
This method builds the key of the testified term, i.e. the concatenation of the inflected form, the postag list and the lemma (separated by the character '~').
getSource(),
The method returns the terminological resource where the testified term is issued.
buildRegularExpression($match_type);
The method computes the regular expression corresponding to the term according to the type of matching defined by $mach_type. This regular expression will be used to find the term in the corpus.
getReqExp();
The method returns the regular expression corresponding to the testified term (field REG_EXP).
getWord($index);
The method returns the word at the position index in the list of the components of the term candidate.
index
addOccurrence($phrase_occurrence,$phrase,$key,$fh);
This method looks for the current testified term with the occurrence hrase_occurrence of the phrase $phrase (according to the key $key). And then the occurrence is recorded in the list of occurrences OCCURRENCES. $fh is the file hanlder of a debugging file.
hrase_occurrence
$phrase
$key
$fh
getPositionInPhrase($phrase,$index_a,$fh);
The method returns the position (start and end offsets) of the phrase $phrase according to the index array index_a. $fh is the file hanlder of a debugging file.
index_a
setIndexSet($size);
This method initialises the index set with the number betwwen 0 and $size (usually the number of words).
$size
getIndexSet();
This method returns the index set (field INDEX_SET) of the word components.
getOccurrences();
This method returns the list of the occurrences of the term candidate, as an array reference.
Sophie Aubin and Thierry Hamon. Improving Term Extraction with Terminological Resources. In Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006). pages 380-387. Tapio Salakoski, Filip Ginter, Sampo Pyysalo, Tapio Pahikkala (Eds). August 2006. LNAI 4139.
Thierry Hamon <thierry.hamon@univ-paris13.fr> and Sophie Aubin <sophie.aubin@lipn.univ-paris13.fr>
Copyright (C) 2005 by Thierry Hamon and Sophie Aubin
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.
To install Lingua::YaTeA, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::YaTeA
CPAN shell
perl -MCPAN -e shell install Lingua::YaTeA
For more information on module installation, please visit the detailed CPAN module installation guide.