This is a master list of comments as used in the example configuration
files found in this directory. This is not intended to be used as a
configuration file but rather as a plain text summary of possible options
and their values. In fact, the measures will not accept this as a
configuration file.
All of these options have default values that are described below. The
only exception to this is vectordb, which has no default. If an option is
listed without a value (as in trace:: or cache:: ), then the default value
is used.
Note that in the configuration files anything following a # is treated as
a comment, so the following text can be used directly in a configuration
file. You will want to make sure to change the value of an option as fits
your needs however!
# ----------------------------------------------------------------------
# The following options are supported for all measures
trace::0 # Turns off (0) tracing. Turn on tracing by setting
# to 1 or 2. The effect of these different levels will
# depend on the measure being used. The default value
# is off (0). If the value is omitted, then the default
# is used. 0, 1, and 2 are the only valid settings.
cache::1 # Turns on (1) caching. Turn off caching by setting
# to 0. The default is on (1). If the value is omitted,
# then the default is used. 0 and 1 are the only valid
# settings.
maxCacheSize::1000
# Limit the cache size to 1000 pairs of query words.
# The default is 5000. If the value is omitted, then
# the default is used. The value of this option
# must be a non-negative integer or "unlimited" (without
# the quotes).
# ----------------------------------------------------------------------
# The following option is supported by :
# path, lch, wup, res, lin, jcn
rootNode::1 # Turns on (1) a (hypothetical) top-level root node for
# the nouns, and another for the verbs. Turn off the
# root nodes by setting to 0. The default is to use (1)
# a unique top-level root node. If the value is omitted,
# then the default is used. 0 and 1 are the only valid
# settings.
# ----------------------------------------------------------------------
# The following option is supported by :
# res, lin, jcn
infocontent::lib/WordNet/infocontent.dat
# Specifies an information content file. The value of
# this option must be the name of a file, or a relative
# or absolute path name. The default value of this option
# is $INSTALLDIR/WordNet/ic-semcor.dat, where $INSTALLDIR
# is the directory in which the WordNet::Similarity modules
# are installed.
# ----------------------------------------------------------------------
# The following options are supported by vector and lesk
stem::1 # Turns on (1) stemming. Turn off stemming by setting
# this value to 0. The default value is on (1). If the
# value is omitted, then the default is used. When
# stemming is on (1), all the words in a gloss are stemmed
# by the WordNet stemmer before overlaps are identified.
stop::samples/stoplist.txt
# Specifies the name of a stop list, which consists of
# words that are to be ignored in a gloss overlap. The
# value of this must be a file name, or an absolute or
# relative path name. The default is to not use a stop
# list. If the value is omitted, then the default is used.
# ----------------------------------------------------------------------
# The following options are supported by the lesk measure
relation::samples/lesk-relation.dat
# Specifies a lesk relation file. This value can be a file
# name, or an absolute or relative path name. The default
# is to use the file $INSTALLDIR/WordNet/lesk-relation.dat,
# where $INSTALLDIR is the directory in which the
# WordNet::Similarity modules are installed. If the value is
# ommited, then the default is used. Please note that the
# format of the lesk relation file is not the same as
# that of the vector relation file. The lesk relation file
# consists of relation pairs that specify glosses that
# are to be compared for overlaps.
normalize::1 # Turns on (1) normalization of lesk scoring. Turn off
# by setting this value to 0. The default value is off
# (0). If the value is omitted, then the default is used.
# When normalization is enabled, the gloss overlap score
# is normalized by the size of the glosses. The details
# are described in Banerjee and Pedersen (2002).
# ----------------------------------------------------------------------
# The following options are supported by the vector measure
vectordb::lib/WordNet/wordvectors.dat
# Specifies a database file containing word vectors.
# The value of this option must be a file name, or an
# absolute or relative path name. utils/wordVectors.pl
# must be used to generate this file. This option is
# required, and there is no default value. If the
# option is not specified, or if the option is specified
# without a value, the vector measure will fail.
relation::samples/vector-relation.dat
# Specifies a vector relation file. This value can be a file
# name, or an absolute or relative path name. The default
# is to use the glos-example relation. If the value is
# ommited, then the default is used. Please note that the
# format of the vector relation file is not the same as
# that of the lesk relation file. The vector relation file
# consists of single relations that specify which glossess
# of a word will be used in constructing the gloss vector.
compounds::samples/wn30compounds.txt
# Specifies a file of WordNet compounds. The value of
# this option must be a file name, or an absolute or
# relative path. The program utils /compounds.pl can
# be used to generate this file. When compounds are
# specified, compound words that occur in glosses are
# identified prior to creating word vectors. The default
# is to ignore compound words. If the value of this
# option is omitted, then the default is used.
# ----------------------------------------------------------------------
# The following option is supported by the random measure
maxrand::1 # The random measure will generate measures between 0
# and this value. The value of this option may be an
# integer or a real number. The default value is 1.
# If the value of this option is omitted, then the
# default is used.
# ----------------------------------------------------------------------