The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 NAME 

README.Toolkit - SenseClusters Toolkit directory structure with links to 
all program documentation  

=head1 DIRECTORY STRUCTURE

This briefly describes the structure of the Toolkit 
directory, and gives a brief idea of what each program 
does. Directories are indicated with a / at the end of their name 
(preprocess/) while programs end with the .pl suffix. All of this is
contained in the Toolkits/ directory.  Note that these are organized 
roughly in the order in which they will be used by SenseClusters.

Please review the flowcharts found in doc/Flowcharts for additional 
information. 

=head2 preprocess/ (text preprocessing programs)

=over 

=item * plain/ (processes input in plain text format) 

=over

=item * L<text2sval.pl> - Convert simple plain text into Senseval2 format

=back

=item * sval2/ (processes input in Senseval-2 format)

=over 

=item * L<balance.pl> - Balances sense distribution in a Senseval-2 
input file by removing some instances

=item * L<filter.pl> - Removes instances associated with low frequency 
sense tags from Senseval-2 input 

=item * L<frequency.pl> - Displays frequency distribution of senses 

=item * L<keyconvert.pl> - Convert KEY file from Senseval-2 format to 
SenseCluster's format

=item * L<maketarget.pl> - Create a Perl regex for the target word by 
spotting all <head> tags in the given file

=item * L<prepare_sval2.pl> - Prepare Senseval-2 data for experiments 

=item * L<preprocess.pl> - Tokenize and optionally split Senseval-2 
input into training and test portions

=item * L<sval2plain.pl> - Convert a Senseval-2 input file to plain text 
format 

=item * L<windower.pl> - Cut a window of context W words big around a 
target word in a given Senseval-2 input file

=back

=back

=head2 count/ (Modify count.pl output from Text-NSP)

=over

=item * L<reduce-count.pl> - Reduce the size of the Text-NSP output 
created with huge training data

=back

=head2 matrix/ - (Similarity matrix constructors)

=over 4 

=item * L<bitsimat.pl> - Create a similarity matrix for given bit 
vectors

=item * L<simat.pl> - Create a similarity matrix for given non-binary 
(integer or real) vectors

=back

=head2 vector/ (Represent contexts as vectors to be clustered)

=over

=item * L<nsp2regex.pl> - Creates regular expressions from Text-NSP 
output to represent features

=item * L<order1vec.pl> - Creates first order context vectors 

=item * L<order2vec.pl> - Creates second order context vectors

=item * L<wordvec.pl> - Creates word vectors from Text-NSP output 

=back

=head2 svd/ (SVDPACKC interface)

=over

=item * L<mat2harbo.pl> - Convert matrices from SenseClusters format to 
Harwell-Boeing format

=item * L<svdpackout.pl> - Reconstruct a matrix from its singular 
vectors as found by by SVDPACKC

=back

=head2 clusterstopping/ (Cluster Stopping program)

=over

=item * L<clusterstopping.pl> - Predicts the number of clusters that a 
given data should be divided into. Provides three such cluster stopping 
measures.

=back


=head2 evaluate/ (Evaluate the results of SenseClusters by comparing to gold standard data)

=over

=item * L<cluto2label.pl> - Convert clustering output of Cluto to a 
cluster
by sense confusion matrix for evaluation

=item * L<format_clusters.pl> - Display contexts that were clustered 
with 
assigned sense id, or display senseval-2 format with assigned sense id 

=item * L<label.pl> - Assign sense tags to the discovered clusters for 
evaluation 

=item * L<report.pl> - Report performance in terms of the precision, 
recall, and F-Measure, and show a confusion matrix

=back

=head2 clusterlabel/ (Cluster Labeling programs)

=over

=item * L<clusterlabeling.pl> - Selects significant word-pairs from the 
contents/instances of the clusters and assigns them as the labels to 
the clusters. Also creates separate file for each cluster.

=back

=head1 AUTHOR

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

=head1 COPYRIGHT

Copyright (c) 2003-2008, Ted Pedersen

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 
or any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on 
the web at L<http://www.gnu.org/copyleft/fdl.html> and is included in 
this distribution as FDL.txt.

=cut