# run perldoc on this file to get nicely formatted documentation

=head1 NAME

CONFIG - [documentation] Description of all configuration options for measures

=head1 DESCRIPTION

The following is a list of options supported by the measures of
semantic relatedness.  This is intended to serve as a "master
list" of options so that descriptions can be copied from here
and pasted into the documentation for specific modules.

=over

=item trace

This option is supported by all measures.

The value of this parameter specifies the level of tracing that should
be employed for generating the traces. This value
is an integer equal to 0, 1, or 2. If the value is omitted, then the 
default value, 0, is used. A value of 0 switches tracing off. A value
of 1 or 2 switches tracing on.  The difference between a value of 1 or 2
depends upon the measure being used.

For vector_pairs and lesk, a value of 1 displays as
traces only the gloss overlaps found. A value of 2 displays as traces all
the text being compared.

For the res, lin, jcn, wup, lch, path, and hso
measures, a trace of level 1 means the synsets are represented as
word#pos#sense strings, while for level 2, the synsets are represented as
word#pos#offset strings.

=item cache

This option is supported by all measures.

The value of this parameter specifies whether or not caching of the
relatedness values should be performed.  This value is an
integer equal to  0 or 1.  If the value is omitted, then the default
value, 1, is used. A value of 0 switches caching 'off', and
a value of 1 switches caching 'on'.

=item maxCacheSize

This option is supported by all measures.

The value of this parameter indicates the size of the cache, used for
storing the computed relatedness value. The specified value must be
a non-negative integer.  If the value is omitted, then the default
value, 5,000, is used. Setting maxCacheSize to zero has
the same effect as setting cache to zero, but setting cache to zero is
likely to be more efficient.  Caching and tracing at the same time can result
in excessive memory usage because the trace strings are also cached.  If
you intend to perform a large number of relatedness queries, then you
might want to turn tracing off.

=item rootNode

This option is supported by the res, lin, jcn, wup, path, and lch measures.

The value of this parameter indicates whether or not a unique root node
should be used. In WordNet, there is no unique root node for the noun and
verb taxonomies. If this parameter is set to 1 (or if the value is omitted),
then certain measures (wup, path, lch, res, lin, and jcn) will "fake" a
unique root node. If the value is set to 0, then no unique root node will
be used.  If the value is omitted, then the default value, 1, is used.

=item infocontent

This option is supported by the res, lin, and jcn measures.

The value for this parameter should be a string that specifies the path of
an information content file containing the frequency of occurrence of every
WordNet concept in a large corpus. A number of utility programs are
included in this distribution that can be used to generate an infocontent
file (see utils.pod).  If no path is specified, then the default infocontent
file is used, which was generated from SemCor using the sense-tags.

=item taxonomyDepthsFile

This option is supported only by the lch measure.

The value for this parameter should be a string that specifies the location
of a taxonomy depths file (as generated by wnDepths.pl). If no path is
specified, then the default file is used, which was generated when the
Similarity package was installed.

=item synsetDepthsFile

This option is supported only by the wup measure.

The value for this parameter should be a string that specifies the location
of a synset depths file (as generated by wnDepths.pl.  If no path is
specified, then the default file is used, which was generated when the
Similarity package was installed.

=item relation

This option is supported only by the lesk and vector_pairs measures.

The value of this parameter is the path to a file that contains a list of
WordNet relations.  The path may be either an absolute path or a relative
path.

The vector_pairs module combines the glosses of synsets related to the target
synsets by these relations and forms the gloss-vector from this combined
gloss.

The lesk module combines glosses of synsets related to the target
synsets by these relations and then searches for overlaps in these
"super-glosses."

WARNING: the format of the relation file is different for the vector_pairs
and lesk measures.  The documentation for lesk and vector_pairs describe
the respective formats for the relation files.
See I<WordNet::Similarity::vector_pairs>(3pm) and
I<WordNet::Similarity::lesk>(3pm).

=item stop

This option is supported only by the lesk and vector_pairs measures.

The value of this parameter the path of a file containing a list of stop
words that should be ignored in the glosses.  The path may be either an
absolute path or a relative path.

=item stem

This option is supported only by the lesk and vector_pairs measures.

The value of this parameter indicates whether or not stemming should be
performed.  The value must be an integer equal to 0 or 1.  If the
value is omitted, then the default value, 0, is used.
A value of 1 switches 'on' stemming, and a value of 0 switches stemming
'off'. When stemming is enabled, all the words of the
glosses are stemmed before their vectors are created for the vector
measure or their overlaps are compared for the lesk measure.

=item normalize

This option is supported only by the lesk measure.

The value of this parameter indicates whether or not normalization of
scores is performed.  The value must be an integer equal to 0 or 1.  If
the value is omitted, then the default value, 0, is assumed. A value of
1 switches 'on' normalizing of the score, and a value of 0 switches
normalizing 'off'. When normalizing is enabled, the score obtained by
counting the gloss overlaps is normalized by the size of the glosses.
The details are described in Banerjee and Pedersen (2002).

=item vectordb

This option is supported only by the vector_pairs measure.

The value of this parameter is the path to a Vectors file 
containing word vectors, i.e. co-occurrence vectors for all the words
in the WordNet glosses.  The value of this parameter may not be omitted,
and the vector_pairs measure will not run without a DB file being specified
in a configuration file.

=item maxrand

This option is supported only by the random measure.

The value of this option is the maximum random number that will be generated.
The value of this option must be a positive floating-point number.  The
default value is 1.0.  All random numbers generated will be in the range
[0, maxrand).

=back

=head1 SEE ALSO

L<intro.pod>

Mailing list:
 L<http://groups.yahoo.com/group/wn-similarity>

Project Home page:
 L<http://wn-similarity.sourceforge.net>

=head1 AUTHORS

 Ted Pedersen, University of Minnesota Duluth
 tpederse at d.umn.edu

 Siddharth Patwardhan, University of Utah, Salt Lake City
 sidd at cs.utah.edu

 Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
 banerjee+ at cs.cmu.edu

 Jason Michelizzi

=head1 COPYRIGHT 

Copyright (c) 2005-2008, Ted Pedersen, Siddharth Patwardhan, Satanjeev 
Banerjee, and Jason Michelizzi 

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts.

Note: a copy of the GNU Free Documentation License is available on
the web at L<http://www.gnu.org/copyleft/fdl.html> and is included in
this distribution as FDL.txt.