Siddharth Patwardhan > WordNet-Similarity-1.04 > rawtextFreq.pl

Download:
WordNet-Similarity-1.04.tar.gz

Annotate this POD

CPAN RT

New  6
Open  2
View/Report Bugs
Source   Latest Release: WordNet-Similarity-2.05

NAME ^

rawtextFreq.pl - Perl program for finding the frequencies of words in raw text files

SYNOPSIS ^

rawtextFreq.pl --compfile COMPFILE --outfile OUTFILE [--stopfile=STOPFILE] {--stdin | --infile FILE [--infile FILE ...]} [--wnpath WNPATH] [--resnik] [--smooth=SCHEME] | --help | --version

OPTIONS ^

--compfile=filename

    The name of a file containing the compound words (collocations) in
    WordNet

--outfile=filename

    The name of a file to which output should be written

--stopfile=filename

    A file containing a list of stop listed words that will not be
    considered in the frequency counts.  A sample file can be down-
    loaded from
    http://www.d.umn.edu/~tpederse/Group01/WordNet/words.txt

--wnpath=path

    Location of the WordNet data files (e.g.,
    /usr/local/WordNet-2.1/dict)

--resnik

    Use Resnik (1995) frequency counting

--smooth=SCHEME

    Smoothing should used on the probabilities computed.  SCHEME can
    only be ADD1 at this time

--help

    Show a help message

--version

    Display version information

--stdin

    Read from the standard input the text that is to be used for
    counting the frequency of words.

--infile=PATTERN

    The name of a raw text file to be used to count word frequencies.
    This can actually be a filename, a directory name, or a pattern (as
    understood by Perl's glob() function).  If the value is a directory
    name, then all the files in that directory and its subdirectories will
    be used.

    If you are looking for some interesting files to use, check out
    Project Gutenberg: <http://www.gutenberg.org>.

    This option may be given more than once (if more than one file
    should be used).

AUTHORS ^

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

 Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
 banerjee+ at cs.cmu.edu

 Siddharth Patwardhan, University of Utah, Salt Lake City
 sidd at cs.utah.edu

 Jason Michelizzi, University of Minnesota, Duluth
 mich0212 at d.umn.edu

BUGS ^

None.

COPYRIGHT AND LICENSE ^

Copyright (c) 2005, Ted Pedersen, Satanjeev Banerjee, Siddharth Patwardhan and Jason Michelizzi

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 Free Software Foundation, Inc.
 59 Temple Place - Suite 330
 Boston, MA  02111-1307, USA
syntax highlighting: