Alberto Manuel Brandão Simões > Lingua-NATools-v0.7.5 > Lingua::NATools::Dict

Download:
Lingua/Lingua-NATools-v0.7.5.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.03   Source  

NAME ^

Lingua::NATools::Dict - Perl extension to encapsulate Dict interface

SYNOPSIS ^

  use Lingua::NATools::Dict;

  $dic = Lingua::NATools::Dict::open("file.bin");

  $dic->save($filename);
  $dic->close;

  $dic->add($dic2);

  $dic->size();

  $dic->exists($id);
  $dic->occ($id);
  $dic->vals($id);

  $dic->for_each( sub{ ... } );

DESCRIPTION ^

The Dict files (with extension .bin) created by NATools, are mapping from identifiers of words on one corpus, to identifiers of words on another corpus. Thus, all operations performed by this module uses identifiers instead of words.

You can open the dictionary using

  $dic = Lingua::NATools::Dict::open("dic.bin");

Then, all operations are available by methods, in a OO fashion. After using the dictionary, do not forget to close it using

  $dic->close().

The add method receives a dictionary object and adds it with the current contents. Notice that both dictionaries need to be congruent relatively to word identifiers. After adding, do not forget to save, if you with, with

   $dic->save("new.dic.bin");

The size method returns the total number of words on the corpus (the sum of all word occurrences). To get the number of occurrences for a specific word, use the occ method, passing as parameter the word identifier.

To check if an identifier exists in the dictionary, you can use the exists method which returns a boolean value.

The vals method returns an hash of probable translations for the identifier supplied AS A ARRAY REFERENCE. The hash contains as keys the identifiers of the possible translations, and as values their probability of being a translation.

Finally, the for_each method makes you able to cycle through all word on the dictionary. It receives a funcion reference as argument.

  $dic->for_each( sub{ ... } );

Each time the function is called, the following is passed as @_:

  word => $id , occ => $occ , vals => $vals

where $id is the word identifier, $occ the result of calling occ with that word, and $vals is the result of calling vals with that word.

SEE ALSO ^

See perl(1) and NATools documentation.

AUTHOR ^

Alberto Manuel Brandao Simoes, <albie@alfarrabio.di.uminho.pt>

COPYRIGHT AND LICENSE ^

Copyright 2002-2012 by NATURA Project http://natura.di.uminho.pt

This library is free software; you can redistribute it and/or modify it under the GNU General Public License 2, which you should find on parent directory. Distribution of this module should be done including all NATools package, with respective copyright notice.

syntax highlighting: