Jeremy Kahn > Lingua-Treebank-0.16 > get_words

Download:
Lingua-Treebank-0.16.tar.gz

Annotate this POD

CPAN RT

New  3
Open  0
View/Report Bugs
Source  

NAME ^

get_words - given collapsed treebank, print words only

SYNOPSIS ^

get_words [options] [file[s] or STDIN]

 Options:
   -help    brief help message
   -man     full documentation

   -sgml    put <s> and </s> tokens around words
   -nosgml

   -parens  put ( and ) tokens around words
   -noparens

OPTIONS ^

-help

Print a brief help message and exits.

-man

Prints the manual page and exits.

-sgml
-nosgml

Writes <s> at the beginning of each line and </s> at the end of each line, or (in the case of -nosgml) don't.

Default is -sgml.

-parens
-noparens

Writes ( at the beginning of each line and ) at the end of each line, or (in the case of -noparens) don't.

Default is -noparens.

DESCRIPTION ^

Reads input files (or STDIN) for Penn-style trees, one per line, and prints out only the words, one tree per line.

Providing the -sgml tag makes the output pseudo-SGML by including angle-bracketed <s> and </s> tokens at the beginning and end of each line.

syntax highlighting: