sentences - Command line tool for text segmentation, tokenization and annotation
sentences [-tokenize[=cqp]] [-nat] [-o=output] <file>
sentences is a command line tool for text segmentation and annotation. It uses the
fsentences function from
Lingua::PT::PLNbase. Its main behaviour is the detection of sentences and paragraphs, and their annotation with XML-like tags: <s> for sentences, <p> for paragraphs, and <text> for different files.
If the flag
-tokenize is used, then words are detected and separated from each other by a space. If
-tokenize=cqp is used, then each token is placed in a line by itself.
-nat flag can be used to force a non-XML output, used for NATools alignment tools.
It is also possible to use the
-o flag to send the output to a specific file.
Alberto Manuel Brandão Simões, <email@example.com> José João Almeida, <firstname.lastname@example.org>
Copyright (C) 2007-2008 by Projecto Natura