Alberto Manuel Brandão Simões > Lingua-NATools-v0.7.8 > nat-create

Download:
Lingua/Lingua-NATools-v0.7.8.tar.gz

Annotate this POD

View/Report Bugs
Source  

NAME ^

nat-create - Command line tool to create NATools Corpora Objects

SYNOPSIS ^

   nat-create <file1.nat> <file2.nat>

   nat-create -tmx <file.tmx>

DESCRIPTION ^

This is the basic command used to create a NATools Corpora Object from the command line.

A NATools Corpora Object is a ditectory with:

Known Switches

tokenize

The -tokenize flag can be used to force NATools to tokenize the texts. Note that at the moment a Portuguese tokenizer is used for all languages. This might change in the future.

id

The -id=name flag can be used to force NATools Corpora name. By default the name is read interactively.

q

The -q flag can be used to force quiet mode. In thic case, the name is extracted from the file-names.

lang

The -lang=PT..EN flag can be used to force languages.

ngrams

The -ngrams flag can be set to force NATools to create ngrams indexes.

noEM

The -noEM flag is used to bypass the EM-Algorithm (useful for debug purposes, mainly).

ipfp

The -ipfp flag is mutually exclusive with -noEM, -samplea and -sampleb. It defines that the EM-Algorithm to be used is the IPFP one. Optional numeric argument is the number of iterations. Defaults to 5.

samplea

The -samplea flag is mutually exclusive with -noEM, -ipfp and -sampleb. It defines that the EM-Algorithm to be used is the Sample A one. Optional numeric argument is the number of iterations. Defaults to 10.

sampleb

The -sampleb flag is mutually exclusive with -noEM, -ipfp and -samplea. It defines that the EM-Algorithm to be used is the Sample B one. Optional numeric argument is the number of iterations. Defaults to 10.

SEE ALSO ^

NATools documentation, perl(1)

AUTHOR ^

Alberto Manuel Brandão Simões, <ambs@cpan.org>

COPYRIGHT AND LICENSE ^

Copyright (C) 2006-2011 by Alberto Manuel Brandão Simões

syntax highlighting: