Lingua::PT::PLN - Perl extension for NLP of the Portuguese Language
use Lingua::PT::PLN; # occurrence counter %o = oco("file"); oco({num=>1,output=>"outfile"},"file"); $p = accent($phrase); ## mark word accent of all words $w = syllable($word); # get syllables $w = wordaccent($word); # get word accent with : after vowel (default) $w = wordaccent($word, 1); # get word accent with " before syllable (standard) initPhon; # initializes Phonetic dictionary (adds manual corrections) $w = toPhon($word); # get phonetic transcription
This is a module for Natural Language Processing of the Portuguese.
Because you are processing Portuguese, you must use a correct locale.
oco
Counts word occurrence from a string or a set of files. Returns an hash with the information or creates a sorted file with the results.
This function takes optionally as first argument an hash of options where you can specify:
means the output should be sorted by ocurrence number;
mean the output should be sorted lexicographically
means the output will be written to the file "f";
means that next argument (after the option hash) is a string which should be used as input for the function.
means that remaining arguments to the function are filenames which should be used as input for the function. This is the default option.
To force UTF8 encoding (default latin1)
XML tags are striped.
All words are lower-cased.
to obtain logaritmic output. Output values are between 0..log(1000000) or (0..13.85).
log => 20 -- to obtain values between 0 and 20
Examples:
oco({num=>1,output=>"f"}, "f1","f2") # sort by occurrence # store output on file "f" # process files "f1" and "f2" oco({alpha=>1,output=>"f"}, "f1","f2") # sort lexicographically # store output on file "f" # process files "f1" and "f2" %oc = oco("f1","f2") # return a hash with the occurrences # use "f1" and "f2" as input files %oc = oco( {from=>"string"},"text in a string") # use a string as input # return a hash with the occurrences
syllable
my $sylls = syllable( $word )
Returns the word with the syllables separated by "|"
my $accent = accent( $phrase )
Returns the phrase with the syllables separated by "|" and accents marked with the charater ":".
Retuns the word splited into syllables and with the accent character marked.
Projecto Natura (http://natura.di.uminho.pt)
Alberto Simoes (albie@alfarrabio.di.uminho.pt)
José João Almeida (jj@di.uminho.pt)
Paulo Rocha (paulo.rocha@di.uminho.pt)
Lingua::PT::PLNbase(3pm), perl(1), cqp(1),
To install Lingua::PT::PLN, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::PT::PLN
CPAN shell
perl -MCPAN -e shell install Lingua::PT::PLN
For more information on module installation, please visit the detailed CPAN module installation guide.