The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::PT::PLN - Perl extension for NLP of the Portuguese Language

SYNOPSIS

  use Lingua::PT::PLN;

  # occurrence counter
  %o = oco("file");
  oco({num=>1,output=>"outfile"},"file");

  $p = accent($phrase);        ## mark word accent of all words

  $w = syllable($word);         # get syllables
  $w = wordaccent($word);       # get word accent with : after vowel (default)
  $w = wordaccent($word, 1);    # get word accent with " before syllable (standard)

  initPhon;                     # initializes Phonetic dictionary (adds manual corrections)
  $w = toPhon($word);           # get phonetic transcription

DESCRIPTION

This is a module for Natural Language Processing of the Portuguese.

Because you are processing Portuguese, you must use a correct locale.

Occurrence counting: oco

Counts word occurrence from a string or a set of files. Returns an hash with the information or creates a sorted file with the results.

This function takes optionally as first argument an hash of options where you can specify:

num => 1

means the output should be sorted by ocurrence number;

alpha => 1

mean the output should be sorted lexicographically

output => "f"

means the output will be written to the file "f";

from => "string"

means that next argument (after the option hash) is a string which should be used as input for the function.

from => "file"

means that remaining arguments to the function are filenames which should be used as input for the function. This is the default option.

encoding => "utf8"

To force UTF8 encoding (default latin1)

ignorexml => 1

XML tags are striped.

ignorecase => 1

All words are lower-cased.

log => 1

to obtain logaritmic output. Output values are between 0..log(1000000) or (0..13.85).

  log => 20    -- to obtain values between 0 and 20

Examples:

  oco({num=>1,output=>"f"}, "f1","f2")
  # sort by occurrence
  # store output on file "f"
  # process files "f1" and "f2"

  oco({alpha=>1,output=>"f"}, "f1","f2")
  # sort lexicographically
  # store output on file "f"
  # process files "f1" and "f2"

  %oc = oco("f1","f2")
  # return a hash with the occurrences
  # use "f1" and "f2" as input files

  %oc = oco( {from=>"string"},"text in a string")
  # use a string as input
  # return a hash with the occurrences

syllable

  my $sylls = syllable( $word )

Returns the word with the syllables separated by "|"

accent

  my $accent = accent( $phrase )

Returns the phrase with the syllables separated by "|" and accents marked with the charater ":".

wordaccent

Retuns the word splited into syllables and with the accent character marked.

compacta

compara

initPhon

toPhon

carregaDicionario

chargeNoAccented

gfdict

toPhon2

AUTHOR

Projecto Natura (http://natura.di.uminho.pt)

Alberto Simoes (albie@alfarrabio.di.uminho.pt)

José João Almeida (jj@di.uminho.pt)

Paulo Rocha (paulo.rocha@di.uminho.pt)

SEE ALSO

Lingua::PT::PLNbase(3pm), perl(1), cqp(1),