# run perldoc on this file to get nicely formatted documentation
=head1 NAME
CONFIG - [documentation] Description of all configuration options for measures
=head1 DESCRIPTION
The following is a list of options supported by the measures of
semantic relatedness. This is intended to serve as a "master
list" of options so that descriptions can be copied from here
and pasted into the documentation for specific modules.
=over
=item trace
This option is supported by all measures.
The value of this parameter specifies the level of tracing that should
be employed for generating the traces. This value
is an integer equal to 0, 1, or 2. If the value is omitted, then the
default value, 0, is used. A value of 0 switches tracing off. A value
of 1 or 2 switches tracing on. The difference between a value of 1 or 2
depends upon the measure being used.
For vector_pairs and lesk, a value of 1 displays as
traces only the gloss overlaps found. A value of 2 displays as traces all
the text being compared.
For the res, lin, jcn, wup, lch, path, and hso
measures, a trace of level 1 means the synsets are represented as
word#pos#sense strings, while for level 2, the synsets are represented as
word#pos#offset strings.
=item cache
This option is supported by all measures.
The value of this parameter specifies whether or not caching of the
relatedness values should be performed. This value is an
integer equal to 0 or 1. If the value is omitted, then the default
value, 1, is used. A value of 0 switches caching 'off', and
a value of 1 switches caching 'on'.
=item maxCacheSize
This option is supported by all measures.
The value of this parameter indicates the size of the cache, used for
storing the computed relatedness value. The specified value must be
a non-negative integer. If the value is omitted, then the default
value, 5,000, is used. Setting maxCacheSize to zero has
the same effect as setting cache to zero, but setting cache to zero is
likely to be more efficient. Caching and tracing at the same time can result
in excessive memory usage because the trace strings are also cached. If
you intend to perform a large number of relatedness queries, then you
might want to turn tracing off.
=item rootNode
This option is supported by the res, lin, jcn, wup, path, and lch measures.
The value of this parameter indicates whether or not a unique root node
should be used. In WordNet, there is no unique root node for the noun and
verb taxonomies. If this parameter is set to 1 (or if the value is omitted),
then certain measures (wup, path, lch, res, lin, and jcn) will "fake" a
unique root node. If the value is set to 0, then no unique root node will
be used. If the value is omitted, then the default value, 1, is used.
=item infocontent
This option is supported by the res, lin, and jcn measures.
The value for this parameter should be a string that specifies the path of
an information content file containing the frequency of occurrence of every
WordNet concept in a large corpus. A number of utility programs are
included in this distribution that can be used to generate an infocontent
file (see utils.pod). If no path is specified, then the default infocontent
file is used, which was generated from SemCor using the sense-tags.
=item taxonomyDepthsFile
This option is supported only by the lch measure.
The value for this parameter should be a string that specifies the location
of a taxonomy depths file (as generated by wnDepths.pl). If no path is
specified, then the default file is used, which was generated when the
Similarity package was installed.
=item synsetDepthsFile
This option is supported only by the wup measure.
The value for this parameter should be a string that specifies the location
of a synset depths file (as generated by wnDepths.pl. If no path is
specified, then the default file is used, which was generated when the
Similarity package was installed.
=item relation
This option is supported only by the lesk and vector_pairs measures.
The value of this parameter is the path to a file that contains a list of
WordNet relations. The path may be either an absolute path or a relative
path.
The vector_pairs module combines the glosses of synsets related to the target
synsets by these relations and forms the gloss-vector from this combined
gloss.
The lesk module combines glosses of synsets related to the target
synsets by these relations and then searches for overlaps in these
"super-glosses."
WARNING: the format of the relation file is different for the vector_pairs
and lesk measures. The documentation for lesk and vector_pairs describe
the respective formats for the relation files.
See I<WordNet::Similarity::vector_pairs>(3pm) and
I<WordNet::Similarity::lesk>(3pm).
=item stop
This option is supported only by the lesk and vector_pairs measures.
The value of this parameter the path of a file containing a list of stop
words that should be ignored in the glosses. The path may be either an
absolute path or a relative path.
=item stem
This option is supported only by the lesk and vector_pairs measures.
The value of this parameter indicates whether or not stemming should be
performed. The value must be an integer equal to 0 or 1. If the
value is omitted, then the default value, 0, is used.
A value of 1 switches 'on' stemming, and a value of 0 switches stemming
'off'. When stemming is enabled, all the words of the
glosses are stemmed before their vectors are created for the vector
measure or their overlaps are compared for the lesk measure.
=item normalize
This option is supported only by the lesk measure.
The value of this parameter indicates whether or not normalization of
scores is performed. The value must be an integer equal to 0 or 1. If
the value is omitted, then the default value, 0, is assumed. A value of
1 switches 'on' normalizing of the score, and a value of 0 switches
normalizing 'off'. When normalizing is enabled, the score obtained by
counting the gloss overlaps is normalized by the size of the glosses.
The details are described in Banerjee and Pedersen (2002).
=item vectordb
This option is supported only by the vector_pairs measure.
The value of this parameter is the path to a Vectors file
containing word vectors, i.e. co-occurrence vectors for all the words
in the WordNet glosses. The value of this parameter may not be omitted,
and the vector_pairs measure will not run without a DB file being specified
in a configuration file.
=item maxrand
This option is supported only by the random measure.
The value of this option is the maximum random number that will be generated.
The value of this option must be a positive floating-point number. The
default value is 1.0. All random numbers generated will be in the range
[0, maxrand).
=back
=head1 SEE ALSO
L<intro.pod>
Mailing list:
L<http://groups.yahoo.com/group/wn-similarity>
Project Home page:
L<http://wn-similarity.sourceforge.net>
=head1 AUTHORS
Ted Pedersen, University of Minnesota Duluth
tpederse at d.umn.edu
Siddharth Patwardhan, University of Utah, Salt Lake City
sidd at cs.utah.edu
Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
banerjee+ at cs.cmu.edu
Jason Michelizzi
=head1 COPYRIGHT
Copyright (c) 2005-2008, Ted Pedersen, Siddharth Patwardhan, Satanjeev
Banerjee, and Jason Michelizzi
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts.
Note: a copy of the GNU Free Documentation License is available on
the web at L<http://www.gnu.org/copyleft/fdl.html> and is included in
this distribution as FDL.txt.