Alberto Manuel Brandão Simões > Lingua-PT-PLNbase-0.26 > sentences

Download:
Lingua/Lingua-PT-PLNbase-0.26.tar.gz

Annotate this POD

View/Report Bugs
Source  

NAME ^

sentences - Command line tool for text segmentation, tokenization and annotation

SYNOPSIS ^

   sentences [-tokenize[=cqp]] [-nat] [-o=output] <file>

DESCRIPTION ^

sentences is a command line tool for text segmentation and annotation. It uses the fsentences function from Lingua::PT::PLNbase. Its main behaviour is the detection of sentences and paragraphs, and their annotation with XML-like tags: <s> for sentences, <p> for paragraphs, and <text> for different files.

If the flag -tokenize is used, then words are detected and separated from each other by a space. If -tokenize=cqp is used, then each token is placed in a line by itself.

The -nat flag can be used to force a non-XML output, used for NATools alignment tools.

It is also possible to use the -o flag to send the output to a specific file.

SEE ALSO ^

Lingua::PT::PLNbase (3)

AUTHOR ^

Alberto Manuel Brandão Simões, <ambs@cpan.org> José João Almeida, <jj@di.uminho.pt>

COPYRIGHT AND LICENSE ^

Copyright (C) 2007-2008 by Projecto Natura

syntax highlighting: