NAME

vocabulary -- extract vocabularies from Penn treebank files

SYNOPSIS

vocabulary [-NT ntfile] [-POS posfile] [-word wordfile] [-count] [-binarized] [-verbose] file1 [file2...]

File1, file2 etc. are the names of Penn treebank files. If none are specified, STDIN is used.

OPTIONS

NT: Write the non-terminal node vocabulary to ntfile.
POS: Write the part of speech vocabulary to posfile
word: Write the word vocabulary to wordfile.
count: Print the frequency counts for each of the categories.
binarized: The file is in binarized format.
verbose: Print filenames as they are processed.

DESCRIPTION

Given a list of Penn treebank files, this script extracts the words, parts of speech, and non-terminal node names and emits each in a separate file in order of frequency.

Note that giving a "-" argument for any of ntfile, posfile, or wordfile causes the results to be written to STDOUT.

AUTHOR

W.P. McNeill <billmcn@ssli.ee.washington.edu>

To install Lingua::Treebank, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::Treebank

CPAN shell

perl -MCPAN -e shell
install Lingua::Treebank

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

OPTIONS

DESCRIPTION

AUTHOR

Module Install Instructions