The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::NATools::Client - Simple API to query NAT Objects

SYNOPSIS

  use Lingua::NATools::Client;

  $client = Lingua::NATools::Client->new();

DESCRIPTION

Lingua::NATools::Client is a simple query API to talk with NAT copora Objects. It can use a client-server approach (See nat-server) or directly with local access to the filesystem.

Methods

This module includes functions to query NATools Objects. To query you must first create a client object with the new method.

new

The new object receives an hash with configuration parameters, and creates a client object. For instance,

  $client = Lingua::NATools::Client->new( Local => "/opt/corpora/foo" );

Known options are:

PeerAddr

The IP address where the server is running on. Defaults to 127.0.0.1.

PeerPort

The port to be used in the connection. Defaults to 4000.

Local

A local directory with a NATools object. Note than not all methods support local corpora.

LocalDumper

A local Data::Dumper object with a NATools PTD. Note than not all methods support local NATools PTDs.

If the LocalDumper value is a reference to an array it is supposed to contain two positions, with both dictionary filenames. If its value is a string, it is supposed to be the filename with BOTH dictionaries included.

iterate

This method is used to iterate through a probabilistic translation dictionary. Pass a function reference to handle each dictionary entry. This function will be called with a flattened hash with keywords word, trans and count.

Use as first argument an hash reference to configure the method behaviour. For instance:

  $client -> iterate( {Language => 'source'},
                      sub {
                        my %param = @_;
                        print "$param{word}\n";
                      });

meta_information

list

This method is only available on server mode. Returns an hash table where keys are corpora names (identifiers). Values are hash tables with keys "id", """source" and "target". Values are the corpus identifier and the language names.

  $corpora = $client->list;

  # $corpora={ Crp1=> { id=> 1, source=> 'PT', target=> 'EN' } }

set_corpus

This method is also used only on server mode. It selects a corpus that will be used by all subsequent queries.

  $client->set_corpus(3);

ptd

This method is used to query Probabilistic Translation Dictionaries. As first argument you might pass a hash reference with configuration options. The only mandatory one is the word being searched.

Known options are:

crp

A corpus identifier to use. If not set, will use the first one or the one selected previously with set_corpus

direction

This option chooses the direction on the query. By default, a query on the source language is used. If direction is <~ the target language is used.

On local corpus mode, and server mode, you can query by identifier instead of word. For that use as direction ~#> or <#~.

Returns an array reference. First element if the occurrence count of the word, second is an hash with the translation probabilities, and the third one is the word searched.

attribute

To query meta-information use this method. At the moment it just works for server corpora. Pass it a reference to a configuration hash if you need to choose the corpus (see the ptd documentation, for instance). Mandatory parameter is the name of the attribute being queried. Returns the value if found, undef otherwise.

conc

This method is used to query for concordancies on the corpus. This method is not available with LocalDumper.

Mandatory arguments are one or two strings to search. First argument might be an hash reference with configuratoin details:

crp

The corpus identifier to be queried. Just used on server mode. If not used, the identifier 1 is used, or the one selected before with the set_corpus method.

direction

The direction on which the query will be done. At the moment, it defaults to query on the source side (thus, ignoring the second argument). You might use <- to query the target language (also ignores the second argument) or to use <-> to query both languages.

If you want to do pattern matching, use one of =>, <= or <=>.

TODO: make this interface cleaner.

count

Number of results to be presented. Defaults to 20. This value is always limited by the server.

ngrams

This method is used to query the ngram databases. Not all corpus have the ngram indexes, thus, some answers might be just a reference to an empty list.

At the moment use the same parameters for configuration as other methods (diretion and crp), and a string with the query. For instance:

  foo *        --> all bigram with "foo" as first word

  foo * bar    --> all trigrams with foo as first word
                   and bar as the last word

  foo bar      --> the bigram "foo bar"

It returns a list of ngrams. Each ngram is a list the the words, and as the last element the occurrence count.

SEE ALSO

See perl(1) and NATools documentation.

AUTHOR

Alberto Manuel Brandao Simoes, <albie@alfarrabio.di.uminho.pt>

COPYRIGHT AND LICENSE

Copyright 2002-2012 by Natura Project http://natura.di.uminho.pt

This library is free software; you can redistribute it and/or modify it under the GNU General Public License 2, which you should find on parent directory. Distribution of this module should be done including all NATools package, with respective copyright notice.