The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
=head1 NAME

CHANGES - Revision history for WordNet::Similarity

=head1 DESCRIPTION

=head2 Version 2.07 (Released 10/05/2015)

=over

=item (1) 

Fix make test error in lesktrace.t due to overlap results returning in 
unpredictable orders - problem is documented here :
L<https://rt.cpan.org/Ticket/Display.html?id=86437> and fix is provided 
by Phil Goetz, philgoetz@gmail.com and involves sorting overlaps in
lesk.pm to guarantee order in testing. Note that keys had to be 
regenerated after this fix installed using perl t/trace.t --key (TDP)

=item (2) 

Install patch to fix WordNet version detection issues in Windows. Problem
description and patch provided here : L<https://rt.cpan.org/Ticket/Display.html?id=79065>

=item (3) 

add doc/update-pod.sh in order to create plain text documentation (TDP)

=item (4) 

fix WordNet download location in install.pod (TDP)

=item (5) 

update prereqs in Makefile.PL (TDP)

=back

=head2 Version 2.05 (Released 06/16/2008)

=over

=item (1)

Created new module WordNet::Similarity::FrequencyCounter containing common
support code for information content programs. (Sid)

=item (2)

Updated all the frequency counting programs in /utils (*Freq.pl) to use
the common code in WordNet::Similarity::FrequencyCounter. (Sid)

=item (3)

Changed the default path to Perl from /usr/local/bin to /usr/bin in all
scripts and tests in the package. (Sid)

=item (4)

Fixed incorrect handling of BNC header information. (Sid)

=item (5)

Modified the compoundify() method in WordNet::Tools to include compounds
containing special characters (period, hyphen, forward-slash,
single-quote). (Sid)

=item (6)

Updated compoundify() to handle larger compounds. (Sid)

=back

=over

=item *

04/23/08

=over

=item (1)

Fixed the "excessive ROOTs" bug in *Freq.pl. (Sid)

=item (2)

Fixed the extra verb concept counts in *Freq.pl. (Sid)

=back

=back

=head2 Version 2.04 (Released 04/19/2008)

=over

=item *

04/17/08

=over

=item (1)

Reorganized similarity_server initialization. (Sid)

=item (2)

The similarity server now prints more intuitive messages. (Sid)

=item (3)

Attached timestamps to log messages. (Sid)

=item (4)

Added additional checks to input strings from clients. (Sid)

=back

=item *

04/12/08

=over

=item (1)

Added more detailed description of information content to
rawtextFreq.pl, and made minor copy editing and formatting changes to
other /utils files (TDP)

=item (2)

Made minor copy editing and formatting changes to files in /doc (TDP)

=back

=item *

04/10/08

=over

=item (1)

Moved get_wn_info, stem and vectorFile modules under WordNet, i.e., they
are now WordNet::get_wn_info, WordNet::stem and WordNet::vectorFile. (Sid)

=item (2)

Updated all the modules and programs using the above modules. (Sid)

=item (3)

Added copyright notices in all module and program headers. (Sid)

=item (4)

Added method getCompoundsList() to WordNet::Tools. (Sid)

=item (5)

Made a more distrtibutable version of simialrity_server. The
similarity_server is now "daemonized", and is installed in /usr/bin
along with the other utils. (Sid)

=back

=item *

03/23/08

=over

=item (1)

Added SIGNATURE to distrribution to enable package verification. (Sid)

=item (2)

Updated MANIFEST to reflect new SIGNATURE. (Sid)

=item (3)

Set the LICENSE to gpl in META.yml and Makefile.PL. (Sid)

=back

=item *

03/17/08

=over

=item (1)

Added NO_META option to Makefile.PL to prevent automatic generation of
META.yml during 'make dist'. (Sid)

=item (2)

Removed unused variable "loaded" from Makefile.PL. (Sid)

=back

=back

=head2 Version 2.03 (Released 03/11/2008)

=over

=item *

03/07/08

=over

=item (1)

Removed all references to WordNet::QueryData from Makefile.PL. This is
based on the following advice present in the ExtUtils::MakeMaker
documentation: "Module installation tools have ways of resolving unmet
dependencies but to do that they need a Makefile". By checking for
the presence of WordNet::QueryData during 'perl Makefile.PL', we are
preventing any opportunity for automated dependency resolution. (Sid)

=item (2)

The WordNet path (if specified by the WNHOME option during 'perl
Makefile.PL') is not checked for validity beforehand, and is now
directly provided as-is to build/Infocontent.PL and build/Depthfiles.PL.
In case of a WNHOME error, now 'make' should fail instead of 'perl
Makefile.PL' (which is more appropriate). (Sid)

=item (3)

Corrected a typo in DepthFinder.pm synopsis that refered to
getTaxonomyRoot rather than getTaxonomies. Removed some cut and
paste documentation from the templated used for GlossFinder.pm and
PathFinder.pm (Ted)

=item (4)

Made synopsis examples WordNet version independent by not hard coding
offsets, etc. Did this in Depthfinder.pm, PathFinder.pm, ICFinder, and
GlossFinder.pm (Ted)

=item (5)

Made minor changes in path names and file names in the /samples
directory and the /config-files subdirectory. (Ted)

=back

=back

=head2 Version 2.02 (Released 03/04/2008)

=over

=item *

03/04/08

=over

=item (1)

Applied patch from Ben Haskell to fix a bug report (submitted by Quang Do
Xuan) about failing self-similarity of tilde#n#1 using wup and lch
measures. (Sid)

=item (2)

Added tests for above bug to t/wup.t and t/lch.t. (Sid)

=item (3)

Added WordNet::Similarity package version info to similarity.pl --version.
(Sid)

=back

=item *

01/31/08

=over

=item (1)

Changed some default options in the similarity_server.conf configuration.
(Sid)

=item (2)

Reformatted some of the similarity_server code. (Sid)

=back

=item *

01/10/08

=over

=item (1)

Reduced version requirements of some of the PREREQ_PM modules. (Sid)

=item (2)

Changed WordNet::QueryData requirements to v1.40 in the documentation.
(Sid)

=back

=back

=head2 Version 2.01 (Released 10/14/2007)

=over

=item *

10/13/07

=over

=item (1)

Fixed error in loading WordNet::Tools for similarity_server.pl. (Sid)

=item (2)

Removed the use of default (hardcoded) stoplist and word-vectors file for
similarity_server.pl. (Sid)

=item (3)

Print WordNet hash-code instead of WordNet version, for similarity.cgi
WordNet version information. (Sid)

=back

=item *

10/09/07

=over

=item (1)

Updated the Pathfinder code to handle loops in the WordNet is-a hierarchy
(like the one in WN3.0). (Sid)

=item (2)

Updated MANIFEST, changelog and documentation to reflect the new changes.
(Sid)

=back

=item *

10/08/07

=over

=item (1)

The modules now are not dependent on the version() method of
WordNet::QueryData (which is no longer reliable). Instead they now use a
'hash-code' representing a specific WordNet distribution. (Sid)

=item (2)

Added module WordNet::Tools which provides the hashCode and compoundify
methods used by most of the other modules and utilities. (Sid)

=item (3)

Completely modified the build procedure to generate data files during the
'make' step instead of the 'perl Makefile.PL' step. (Sid)

=item (4)

Removed the WordNet version numbers appended to synsetdepths.dat and
treedepths.dat. (Sid)

=item (5)

Added two "build" utilities -- build/Infocontent.PL and
build/Depthfiles.PL -- which are run during the 'make' step to generate
data files. (Sid)

=item (6)

The default WordNet version is now v3.0. Changed all documentation, code
and examples to reflect this. (Sid)

=item (7)

The package now requires WordNet::QueryData version 1.46 or above. (Sid)

=item (8)

Revised all tests and test-keys for the new code and new version of
WordNet and QueryData. (Sid)

=item (9)

Removed the multiple pieces of code implementing "compoundify" and moved
it all into a single method in WordNet::Tools. (Sid)

=back

=item *

10/04/07

=over

=item (1)

Included a default word vectors file in the distribution and eliminated
the creation of a default word vectors file at install time. (Sid)

=back

=item *

02/25/07

=over

=item (1)

Fixed documentation where module WordNet::Similarity::path was referred
to as WordNet::Similarity::edge (old name). (Sid)

=back

=item *

01/30/07

=over

=item (1)

Fixed wnDepths.pl man-page to display the wnpath option consistently in
the usage and the description. (Sid)

=item (2)

Fixed the "deep recursion" error (only with WN3.0) in the findWPSDepths()
subroutine in the wnDepths.pl script. (Sid)

=back

=back

=head2 Version 1.04 (Released 12/13/2006)

=over

=item *

12/13/06

=over

=item (1)

Fixed major bug reported in vector_pairs, where every alternate function
is skipped because of a loop variable being incremented twice. (Sid)

=back

=item *

04/21/06

=over

=item (1)

The web-interface was still not working for the vector measure, because
only one side of the client-server interface had been updated. Updated
the similarity server with code to support both, vector and vector_pairs
measures. (Sid)

=item (2)

Updated the description of the Gloss Vector measure in measures.html (web
interface). (Sid)

=back

=back

=head2 Version 1.03 (Released 04/14/2006)

=over

=item *

04/14/06

=over

=item (1)

Applied Ben Haskell's patch to ICFinder.pm (to make the behaviour of the
probability() and IC() functions consistent with their comments).

=back

=item *

04/05/06

=over

=item (1)

Updated the names for the Extended Gloss Overlaps measure and the Gloss
Vector measure in the documentation. (Sid)

=back

=item *

02/19/06

=over

=item (1)

Updated PODs for all modules. (Sid)

=item (2)

Added tests for POD errors and for POD coverage. (Sid)

=back

=item *

03/31/06

=over

=item (1)

Changed "hash-style" constants (Perl v5.8) to single line constants (Perl
v5.6) for compatibility with Perl v5.6.0. (Sid)

=back

=back

=head2 Version 1.02 (Released 02/07/2006)

=over

=item *

02/06/06

=over

=item (1)

Added utility rankFormat.pl for ranking the output of similarity.pl and
making the output suitable for input to rank.pl (to compute Spearman's
correlation coefficient) of the Text::NSP package. (Sid)

=back

=item *

01/15/06

=over

=item (1)

Fixed issue in lesk.pm where undefined values for $wc1 and $wc2 caused
errors with the normalize option. (Sid)

=item (2)

Fixed minor UI issues in wnDepths.pl. (Sid)

=back

=back

=head2 Version 1.01 (Released 12/21/2005)

=over

=item *

12/09/05

=over

=item (1)

Modified get_wn_info.pm with Wybo Wiersma's changes. (Sid)

=item (2)

Modified lesk.pm, vector.pm and vector_pairs.pm to be compatible with
above changes. (Sid)

=back

=item *

12/07/05

=over

=item (1)

Updated all utilities to use WordNet 2.1 (WordNet::QueryData 1.39 or
above). (Sid)

=item (2)

Updated all modules and test cases for WordNet 2.1. (Sid)

=back

=item *

12/05/05

=over

=item (1)

Changed order of authors in package documentation. (Sid)

=back

=back

=head2 Version 0.16 (Released 12/12/2005)

=over

=item *

12/01/05

=over

=item (1)

Added Wybo Wiersma's super-gloss caching code to GlossFinder.pm. (Sid)

=item (2)

Updated documentation to reflect above changes. (Sid)

=back

=back

=head2 Version 0.15 (Re-released 12/11/2005)

=over

=item *

12/11/05

=over

=item (1)

tar file unpacked as WordNet-Similarity for June 12, v 0.15, now unpacks
as WordNet-Similarity-0.15, which is consistent with all previous
versions. (Ted)

=item (2)

Similarity.pm version was shown as 0.14, is now 0.15. Our general
convention for modules is that their version number only change when the
module itself changes, so the module version number can tell you when was
the last time a module changed. However, for Similarity.pm this is
needlessly confusing, so it will always carry the same version number as
the release. (Ted)

=back

=back

=head2 Version 0.15 (Released 6/12/2005)

=over

=item *

06/10/05

=over

=item (1)

Fixed a minor bug in MANIFEST. (Sid)

=item (2)

Updated modules.pod and developers.pod to reflect new software
architecture. (Jason)

=back

=back

=head2 Version 0.14 (Released 6/9/2005)

=over

=item *

06/08/05

=over

=item (1)

Re-introduced the previous (non-pairwise-comparison) vector. (Sid)

=item (2)

Updated documentation and test cases to support the new vector measure.
(Sid)

=item (3)

Added default relation file for new vector measure. (Sid)

=item (4)

Expunged erroneous references to LCSFinder, esp. in test scripts. (JM)

=back

=back

=head2 Version 0.13 (Released 5/9/2005)

=over

=item *

04/21/05

=over

=item (1)

removed LCSFinder module; moved LCS methods to DepthFinder, ICFinder, and
PathFinder (JM)

=item (2)

renamed vector measure vector_pairs (JM)

=back

=item *

03/24/05

=over

=item (1)

Modified the documentation to reflect the relation file format for vector
and for lesk. (Sid)

=back

=item *

03/02/05

=over

=item (1)

Set up selective test cases for "make test", depending upon the default
data files installed by user. (Sid)

=back

=item *

02/24/05

=over

=item (1)

Reinstated default relation files for vector and lesk. In case
the default relation files (vector-relation.dat and lesk-relation.dat) are
missing, both modules would default to the glosexample-glosexample
relation. (Sid)

=item (2)

Modified Makefile.PL to query the user before installing default
data files. (Sid)

=item (3)

Removed infocontent file generation code from Makefile.PL. Now
Makefile.PL simply calls utilities from the /utils directory (wnDepths.pl,
semCorFreq.pl and wordVectors.pl) to generate the all default data
files. (Sid)

=item (4)

Installation process now generates a default word vectors
file. The vectordb configuration variable for vector is now optional.
(Sid)

=item (5)

Earlier, the WNHOME option was given to Makefile.PL as --WNHOME
<path>, whereas the PREFIX option was written as PREFIX=<path>. This
inconsistent (and potentially confusing) notation has now been fixed. Now,
the WNHOME option is provided to Makefile.PL as WNHOME=<path>. (Sid)

=item (6)

Added some basic tests for vector in t/vector.t.

=back

=item *

12/11/04

=over

=item (1)

Created WordNet::Similarity::GlossFinder.pm, a super-class of
WordNet::Similarity::vector and WordNet::Similarity::lesk. (Sid)

=item (2)

Removed default relation file for lesk. Vector and lesk both
default to glosexample-glosexample. (Sid)

=back

=back

=head2 Version 0.12 (Released 10/29/04)

=over

=item *

10/29/04

=over

=item (1)

Added vector to the CGI interface. (JM)

=item (2)

Incorporated a configuration file into similarity_server.pl. (JM)

=back

=item *

10/28/04

=over

=item (1)

Removed readDB.pl. (JM)

=back

=item *

10/27/04

=over

=item (1)

Modified string overlap finding in lesk to use the Text::OverlapFinder
module. Removed string_compare.pm. This fixed an old bug where the
relatedness of word1 and word2 wasn't always equal to the relatedness of
word2 and word1. (JM)

=item (2)

Updated Makefile.PL, INSTALL, and doc/install.pod to reflect new
dependency on Text::OverlapFinder. (JM)

=item (3)

Removed lib/dbInterface.pm and lib/string_compare.pm from MANIFEST. (JM)

=back

=item *

10/19/04

=over

=item (1)

Word vectors no longer stored in a BerkeleyDB database, a plain text file
is now used. Modified wordVectors.pl, WordNet::Similarity::vector to use
the plain text word vectors file. New module vectorFile.pm now used to
access this plain text database. Module dbInterace.pm is obsolete. (Sid)

=item (2)

Modified Makefile.PL to no longer check for BerkeleyDB dependency. All
modules are installed. (Sid)

=back

=back

=head2 Version 0.11 (Released 09/23/04)

=over

=item *

09/23/04

=over

=item (1)

Fixed bug in wup that allowed some relatedness scores to be greater
than 1.  This bug is discussed in the archives of the mailing list. (JM)

=back

=back

=head2 Version 0.10 (Released 09/03/04)

=over

=item *

09/01/04

=over

=item (1)

Modified vector to look like the other measures. It now is derived
from WordNet::Similarity.pm. (Sid)

=item (2)

Updated the MANIFEST. (Sid)

=item (3)

Fixed some minor typos in Makefile.PL. (Sid)

=item (4)

Added single test case (for vector) to t/access.t. (Sid)

=item (5)

Fixed config option name conflict in WordNet::Similarity.pm. (JM)

=item (6)

Fixed WNHOME and WNSEARCHDIR related bugs. (JM)

=item (7)

Updated documentation for the web interface. (JM)

=back

=back

=head2 Version 0.09 (Released 05/19/04)

=over

=item *

05/19/04

=over

=item (1)

Fixed over-counting problem in *Freq.pl programs.  Under certain
conditions, word senses would sometimes get counted twice. (JM)

=item (2)

Updated *Freq.pl programs to use WordNet 2.0. (JM)

=item (3)

Input files to rawtextFreq.pl are now specified with the --infile option.
(JM)

=item (4)

Improved speed of compound identification in rawtextFreq.pl by adding
',', ';', and ':' to the list of characters that we consider to be the
end of a sentence (compound identification time is proportional to the
square of the length of the sentence). (JM)

=back

=back

=head2 Version 0.08 (Released 04/28/04)

=over

=item *

04/28/2004

=over

=item (1)

Created a CGI-based web interface for the relatedness modules. (JM)

=back

=item *

04/19/2004

=over

=item (1)

Fixed problem with path to Perl interpreter in Makefile.PL.  This was
causing problems during installation if there was no /usr/local/bin/perl.
(JM)

=item (2)

wnDepths.pl had forgotten that on Windows some filenames are different;
for example, data.noun is noun.dat. (JM)

=back

=back

=head2 Version 0.07 (Released 03/24/04)

=over

=item *

03/23/2004

=over

=item (1)

In /t, save diff files between 0.06 and 0.07. Make sure to run diff
tests for path/0.07 and edge/0.06.

=back

=item *

03/16/2004

=over

=item (1)

make sure that every .pm and .pl file has the same GNU copyleft language.
Use PathFinder.pm as a template.

=item (2)

make sure that documentation is clear that vector and lesk require
different format relation files (ie they are not interchangeable).

=item (3)

convert README into a series of pod documents in doc directory. In
the intro.pod, provide a table of contents like structure (much like
perldoc perl does).

Make sure that each pod documents follows the cpan style (name, synopsis,
etc.) This should be true of any pod documentation in the package.

=item (4)

Modify INSTALL to describe local install correctly. In particular,
the description of how to do a 'use lib' or -I may need adjustment.

=back

=item *

03/12/2004

=over

=item (1)

Make developers.pod into a self contained document that provides a step by
step tutorial on how to write a measure of relatedness. The file
NewStats.txt in NSP provides an example of the style of presentation that
is expected.

=item (2)

developers.pod should be a tutorial that explains how to create a new
measure. It should take the reader through a complete example, such as
creating a measure that returns the sum of the information content of the
concpets found in the shortest path between two  concepts. This should
include an example of how to use all of the available configuration
options, and also adding a new one.

=back

=item *

03/11/2004

=over

=item (1)

document measure modules (lch.pm, wup.pm, etc.) with information about
effect of hypo root node. (Take discussion from email explaining why it
has an effect, and why it doesn't have an effect) and make it a part of
the .pm perldoc. This will eventually be used in thesis writing, so it
should be complete and  detailed. Of particular important is the behavior
of lch.pm, but all of the modules should have their expected behaviour
with and without the hypo  root node clearly documented. Also, you should
note what the behavior was in 0.06 for both nouns and verbs, and if this
has changed.

=back

=item *

03/09/2004

=over

=item (1)

lch.pm does not yet support not having a hypo root. Remember that
the lack of hypo root will change (potentially) the max path length found
for each taxonomy.

=back

=item *

03/08/2004

=over

=item (1)

depth finding code should be contained with DepthFinder.pm. We should
not do any depth finding on the fly, rather that should all be precomputed
(like we do info content). That includes the depth of individual concepts,
and the max depths of taxonomies.

=item (2)

When wup.pm encounters two or more paths to the root, the trace output
"condenses" those paths into a single path. It would be better to show
all paths in the trace (as res does, for example). Also, make sure that
the depth reported in such cases is always the minimum (shortest path
to root).

=back

=item *

03/05/2004

=over

=item (1)

Modify wnDepths such that it shows both the depths of individual concepts,
as well as the max distance from a root node. In the case of multiple
inheritance, wndepths should show the depth of the concept in each case,
and also the relevant root node. wnDepths should sort these depths
from shortest to longest. The output of wndepths should be formatted like
infocontent.dat, anticipating an eventual merger.

=back

=item *

03/02/2004

=over

=item (1)

in docs, update/replace current discussion of modules. Include example
usage as well. Make sure that path length is clearly defined for
lch, edge, and wup.

=back

=item *

02/25/2004

=over

=item (1)

In PathFinder.pm, Infocontent.pm, Similarity.pm, and LCSFinder.pm
each function should be documented in perldoc form such that their input,
output and basic functionality is described. This should then appear in
the DESCRIPTION portion of the perldoc. The SYNOPSIS should contain
examples or templates of each function being used.

=back

=item *

02/23/2004

=over

=item (1)

redo random pairs testing such that we have 60 noun-noun pairs, 25
verb-verb pairs, and 15 mixed pairs.

=back

=item *

02/20/2004

=over

=item (1)

Revisit the distance versus similarity issue in jcn.pm. It maybe be that
simply inverting the distance is too extreme a solution. One possibility
is to make it a linear transformation via maxdist - dist instead. (JM -
we'll stick with inverting the distance, but added a discussion of this
issue to the documentation)

=back

=item *

02/18/2004

=over

=item (1)

document all multiple inheritance issues that are being handled for
measures.

=back

=item *

02/16/2004

=over

=item (1)

validateSynset should check wps format fairly closely, and issue
descriptive errors if the wps is ill formed. Words can apparently be
about anything (except #) but pos should be lower case nvra, and senses
should be digits. Error messages should point out which field is the
problem, or if there are too few or too many fields.

=item (2)

place all hypo root handling node code in PathFinder.pm. The measures
should not have any hypo root handling code in them.

=item (3)

PathFinder.pm should include a function getAllPaths.pm that returns
all paths between two concepts, their length, and their "tops" (the
candidate LCSs). This should be used as the main source of input
for the getLCS* functions, and for getShortestPath.

=item (4)

remove all "input verifcation" code from the measures. That should
be inherited from Similarity.pm.

=item (5)

There is replicated code in the measure modules that checks validity of
input. This should be removed to a common module that can be called by
all of the measures. Any other replicated code should be removed as well.
The goal of 0.07 is to largely eliminate replicated code via the use of
inheritance, and to make the writing of new measures simpler.

=back

=item *

02/13/2004

=over

=item (1)

add pod/perldoc to lib/ICFinder.pm. Should also be done for all other
files as they are modified for other reasons. In particular, introductory
material that appears in source code comments, author information, GPL,
etc. should be moved into pod and removed from source code comments. See
similarity.pl for an example.

=item (2)

path should use getShortestPath from PathFinder.pm.

=back

=item *

02/09/2004

=over

=item (1)

getLCSDepth, getLCSInfo, getLCSPath should appear in LCSFinder.pm, which
should inherit from both ICFinder and Pathfiner.

=item (2)

The measures (lch, path, jcn, lin, res, wup) should  default to having
the hypo root node turned on (for both nouns and verbs). This will
eventually be true of hso, but is not currently. hypo root nodes could
also be used for lesk and vector, although they are not currently.

=back

=item *

02/04/2004

=over

=item (1)

Wps and offsets will be supported internally. The user can request
either mode via an option to getRelatedness. offset is our default.
profiling has shown wps to be somewhat faster, in that it makes fewer
calls to getSense, although it does make some. For input, we only
support wps. For trace output we support wps and offset. For output
we support wps and offset.

=back

=item *

01/29/2004

=over

=item (1)

modify option in config files such that an option without a value reverts
to the default in all cases (except vectordb).

=back

=item *

01/24/2004

=over

=item (1)

Provide support for undefined values in the path finding and info content
measures (path, wup, lch, res, lin, jcn). If two concepts are not in the
same taxonomy then an error should be issued and a large negative
integer should be returned. This can occur in two cases, between the same
part of speech (noun-noun, verb-verb), or between nouns and verbs. Distinct
error messsages should be indicated in both cases.

=back

=item *

01/20/2004

=over

=item (1)

Clean up configuration file examples (in samples). Make them consistent
by having a master list (all-options.conf) that is what we make changes
to. Then specific example files can be created via copy and paste. Make
sure all possible options for a measure are included, and that the
explanations describe all possible values as well as default handling.
(TDP updated all-options.conf on 12/10/03, use this as source of cut and
paste).

=back

=item *

01/19/2004

=over

=item (1)

Create test scripts that can be run to verify the correctness of output
- they should include "correct" answers that can be compared to
(automatically) and rerun as the system changes. We should use
the CPAN module Test::More, and create .t files in a /t directory that
test specific situations/problems, etc. The .t files themselves should be
documented with an explanation of what is being tested. We should have
lots of smaller, specific .t tests (rather than a few big test files).
Whenever a bug is found and fixed, a .t file should be created that
tests the fix, and this should be mentioned in the source code comments
where the fix is made (this fix is tested by t/xyz.t).

Make sure that the testing system can be easily extended/modified, and
that it can support the use of multiple input files and configuration
files. We should have multiple *.t files to run our tests, and each module
and utility should have at least its own *.t file (maybe more than one
in some cases). We should also have *.t files that are dedicated to
particular situations that affect a number of measures (like what happens
when info content is zero for one concept, what happens if one of
the concepts being compared is the lcs of the other, what if the two
concepts are the same (self similarity), and so forth.

=item (2)

Test cases for configuration file handling should include:

repeated options in configuration file, as in

    trace::0
    trace::1

bad values in configuration file, as in

    trace::nothankyou

bad options in configuration file, as in

    tracer::0

=item (3)

Test cases for similarity.pl should include:

ill formed file input for similarity.pl, as in

    cat#dog#1 cat#n#2
    cat#n#n cat#n#2
    cat

=item (4)

Test cases for measures should include:

show that wps and offset methods of path finding are equivalent

check trace output for each of the measures. use wps format, as that
is subject to fewer changes than offsets.

a "big" file of word pairs (maybe 100 pairs) that run all the measures
and  compare values to what is obtained in 0.6. If there are differences,
let's see what they are.

=item (5)

Test cases for information content programs should include:

an information content file based on one of our resident text files
that is large enough to be interesting (readme, gpl, etc.) as computed in
0.6/0.7 (should be the same). This can be used as a reference point when
we make changes in future.

Information content computed with a very small number of concepts, to
expose the counting problem that ted mentions below.

=item (6)

Test cases for wnDepth...

Generate output for 0.07 to use as a point of reference. A few
specific manual checks would be good too (leather_carp, entity, etc.)

=item (7)

run tests to determine where the system now provides different results
from version 0.06 - make sure to document these cases (that are
different).

=back

=item *

01/12/2004

=over

=item (1)

document configuration options extensively in a separate pod called
doc/config.pod. Organize such that you have options that are used
with all measures, and then those that are used with certain classes
of measures. Then, use this as a master copy to update .pm files
with.

=back

=item *

01/09/2004

=over

=item (1)

modify option handling such that multiple occurrences of an option in
a config file cause an error. For example

  trace::
  trace::1

should cause an error.

=back

=item *

12/17/2003

=over

=item (1)

SemCor1.7Freq.pl and SemTagFreq.pl need to be renamed. They are now
called semCorRawFreq.pl and SemCorFreq.pl. semCorRawFreq.pl counts without
sense tags and  SemCorFreq.pl counts the  sense tags. (TDP)

=back

=item *

12/09/2003

=over

=item (1)

In similarity.pl cache error strings that indicate that two input synsets
are from different parts of speech so that we only print out a warning
once for each unique word1#pos1 word2#pos2 combination (JM)

=item (2)

=over

=item (a)

Enhance similarity.pl file handling (for input files). Comments should
be allowed - this will help in creation of test data (we can explain in
the comment what "case" is being tested by a particular set of pairs.
Use standard perl commenting style line starting with a # is a comment.
Note that I don't think we can use the convention of # anywhere in a line
as being the start of a comment (due to w#p#s) but I think any line
that starts with a # can be safely treated as a comment. (JM -- we are
using // to indicated the start of a comment)

=item (b)

Enhance similarity.pl file handling (for input files). At present if
a single word (not a pair) appears on a line, no error is issued. It
silently ignores this case. This should result in an error to the effect
that the input format is invalid, only one word. Also, I'm not sure
what happens if you have more than two words on a line. An error of
some sort would also be necessary in that case. Also, I am not sure if
similarity.pl checks to see that the words pairs are "well formed", that
is to say do they adhere to the word, word#pos, or word#pos#number
format. It would be good to have a simple check that verifies we have
alphanumeric words, pos of n, v, a, or r, and numeric numbers.  (JM)

=back

=back

=item *

12/08/2003

=over

=item (1)

Clean up configuration file examples (in samples). Make them consistent
by having a master list (all-options.conf) that is what we make changes
to. Then specific example files can be created via copy and paste. Make
sure all possible options for a measure are included, and that the
explanations describe all possible values as well as default handling.
(JM)(TDP updated all-options.conf on 12/10/03, use this as source of cut
and paste).

=item (2)

Determine if it is feasible (not too difficult or time consuming) to
modify  --version option so it can display both the version of
similarity.pl and the version of the module used when --type is specified.
(JM -- version will show module version as well if a module is specified)

=back

=item *

12/05/2003

=over

=item (1)

all configuration options are now printed to traceString after module
initialization.  (JM)

=item (2)

explain the distinction between compounds and collocations raised in
sample README. (Drop the distinction, and clarify what we mean by
Wordnet compounds. TDP Dec 3). (JM)

=back

=item *

12/04/2003

=over

=item (1)

document caching for random (normally random uses an unlimited cache size)
(JM -- random now uses the same default as all other measures)

=item (2)

determine a reasonable default cache size. Should not be unlimited.
Current default is 1000, maybe it can be increased to 5000 or 10000.
Let lesk with trace be the standard as to what is reasonable. (JM --
default is now 5,000).

=item (3)

Improve error handling when processing config files.  Make sure the values
specified are valid and that filenames refer to extant files.  All options
should allow the value to be omitted, in which case the default is used.
(JM)

=back

=item *

12/01/2003

=over

=item (1)

Adjust Makefile.PL to account for new contents of samples directory. Added
entries to MANIFEST as well. (JM)

=item (2)

update samples/sample.pl to run with the new files (and organization)
provided in the samples directory. This was also a problem in 0.06,
where it did not run for hso properly due to a mismatch in the name
specified in sample.pl and the configuration file.

=item (3)

Rename infocontent.dat in Makefile.PL to use our standard name for
semcor information content files. Name should reflect options
used in computing information content values (if any). JM

=item (4)

relation.dat is in lib/WordNet. Should be referred to as
lesk-relation.dat. Should also have vector-relation.dat I would
think. (if not, what does vector do?).  JM (vector doesn't try
finding a default relation file--it fails silently).

=item (5)

/sample/vector-relation.dat is wrong. Calls itself LeskRelationFile. JM

=item (6)

In intro.pod, provide instruction on how to convert to html or whatever
if  user wishes (just point them to documentation that describes this
elsewhere even).  JM

=back

=item *

11/28/2003

=over

=item (1)

remove wordnet 1.7.1 compounds from samples directory. (TDP)

=item (2)

change comment in Similarity.pm to explain the pluses and minuses of
using/not using a unique root node.  (JM)

=back

=item *

11/26/2003

=over

=item (1)

added info content files in samples/Infocontent

=item (2)

changed version numbers to 0.07 in all modules and utils

=item (3)

fixed bug in wup: if user supplies car#n#1 and auto#n#1, the LCS found by
wup is motor_vehicle#n#1, not car#n#1

=item (4)

added POD to all programs in /samples

=back

=item *

11/24/2003

=over

=item (1)

added documentation (in the form of POD) to /doc

=back

=item *

11/21/2003

=over

=item (1)

added /doc directory to contain documentation

=back

=item *

11/18/2003

=over

=item (1)

ensured that each measure initializes a part-of-speech list in _initialize

=item (2)

all measures (except vector) now use fetchFromCache and storeToCache

=item (3)

updated README:

=over

=item (a)

Replaces most references to WordNet 1.7.1 with 2.0

=item (b)

Add some documentation on how to write a new measure

=back

=item (4)

added an INSTALL file

=item (5)

cleaned up /samples.  relation.dat is now named lesk-relation.dat and
added vector-relation.dat.  A sample config file is also provided for
each measure (in /samples/config-files)

=back

=item *

11/15/2003

=over

=item (1)

updated jcn, hso, random, and lesk to use the funcitions that have
been moved to Similarity.pm (such as the cache management functions).

=item (2)

cleaned up the /samples directory.  Removed outdated files.  Put
sample config files in samples/config-files.  Added README in /samples.

=back

=item *

11/12/2003

=over

=item (1)

Added fetchFromCache() and storeToCache() to Similarity.pm to make
caching easier and cleaner.

=item (2)

Updated wup, edge, lch, res, and lin to use fetchFromCache() and
storeToCache().

=back

=item *

10/25/2003

=over

=item (1)

Reduced the amount of duplication code in the measure modules by
moving some common code to WordNet::Similarity.  WordNet::Similarity is
now a base class for all the measures.  Also added a module called
infocontent.pm from which all information content measures are descended
(i.e., res, lin, jcn).

=item (2)

Removed @ symbol from all email addresses in all files (I think).
This might help keep spammers from harvesting our email addresses.

=back

=back

=head2 Version 0.06

=over

=item *

10/18/2003

=over

=item (1)

Removed dependence of the vector measure on PDL. Implemented
"in-house" sparse vector manipulation functions.

=item (2)

Modified the README with updated documentation of similarity.pl
(--interact option) and wordVectors.pl.

=back

=item *

10/15/2003

=over

=item (1)

Changed Makefile.PL so that it checks for version 1.30 of QueryData

=back

=item *

10/13/2003

=over

=item (1)

Added "maxCacheSize" option to all measures.

=item (2)

Added "maxCacheSize" option info to the man/pod documentation.

=item (3)

Used the new dataPath() method of QueryData 1.31 in all the
utilities to obtain the path of the WordNet data files.

=item (4)

Modified Makefile.PL to check for PDL and BerkeleyDB dependency
during installation. vector.pm is not installed on failed
dependencies.

=back

=item *

10/11/2003

=over

=item (1)

Replaced instances of deprecated WordNet::QueryData::query with
WordNet::QueryData::queryWord in hso.pm

=item (2)

made hso.pm check QueryData version.  queryWord was broken in
QueryData 1.29 and earlier

=item (3)

added support for new relations in WordNet 2.0 to get_wn_info.pm

=item (4)

updated test scripts to work with WN 2.0 (and WN 1.7.1)

=back

=item *

10/06/2003

=over

=item (1)

Added rootNode option to wup.pm

=back

=item *

09/27/2003

=over

=item (1)

Fixed syntax error in wordVectors.pl.

=item (2)

Added readDB.pl to utils.

=item (3)

Changed contact information in docs.

=item (4)

Re-organized the samples subdirectory.

=item (5)

Fixed typo in random.pm.

=item (6)

Updated the MANIFEST.

=back

=item *

09/21/2003

=over

=item (1)

Updated POD for WordNet::Similarity::wup

=item (2)

Added option to wup to specify a cache size in a configuration file.

=item (3)

similarity.pl now 'use's QueryData 1.30 or later. Previous  versions
of QueryData will not work.  t/access.t also 'use's  QueryData 1.30.
get_wn_info.pm and lesk.pm both check for QueryData 1.30 and will die if
it not found.

=item (4)

Reorganized the bibliography in README and slightly re-worded part
of the introduction.

=back

=item *

09/18/2003

=over

=item (1)

Added new Wu Palmer measure of similarity
(lib/WordNet/Similarity/wup.pm)

=item (2)

Updated README to mention wup

=item (3)

Added t/wup.t

=item (4)

Updated POD for WordNet::Similarity to mention wup

=item (5)

Updated the help message of similarity.pl to mention wup

=item (6)

Added t/wup.t and lib/WordNet/Similarity/wup.pm to MANIFEST

=back

=item *

09/05/2003

=over

=item (1)

Added '--interact' option to similarity.pl.

=item (2)

Changed the structure of the Vector Relation File.

=item (3)

Fixed a minor bug in similarity.pl. (s///g)

=item (4)

Updated the perldocs for the measures.

=item (5)

Incorporated some new features into the 'wordVectors.pl' utility.
These features were used for thesis experiments.

=item (6)

Added documentation about the Lesk and Vector relation files (they
have different formats now).

=back

=back

=head2 Version 0.05

=over 4

=item *

06/03/2003

=over

=item (1)

Added new measure of semantic relatedness, based on co-occurrence
vectors of WordNet glosses.

=item (2)

Set up the package so that similarity.pl and the other perl
utilities get installed in "/usr/local/bin".

=item (3)

Complete rewrite of similarity.pl with cleaner code and added
functionality:

=over

=item (a)

Multiple parts of speech can be specified as car#nv (noun and verb
forms of car) or cool#nar (noun, adjective and adverb forms of
cool).

=item (b)

Word senses can now be specified as car#n#2, jump#v#2, etc.

=item (c)

Added functionality to similarity.pl to use a local install of
WordNet::Similarity modules (in non-standard directories).

=item (d)

Output of similarity.pl now specifies the senses that represent the
relatedness of two words.

=back

=item (4)

Enforced limit on the cache size of modules.

=item (5)

Updated README to reflect the changes and to specify options for
local installs of similarity.pl and the other utilities.

=item (6)

Fixed the perl docs (remove leading spaces).

=item (7)

Added mailing list address to documentation --
(http://groups.yahoo.com/group/wn-similarity).

=item (8)

Improved jcn and lin tracing ("bird-crane" problem obvious now).

=item (9)

Added new utility wordVectors.pl required for
WordNet::Similarity::vector module.

=back

=back

=head2 Version 0.04

=over 4

=item *

05/02/2003

=over

=item (1)

*Fixed* newline in traces.

=item (2)

*Fixed* blank line bug in brownFreq.pl.

=item (3)

*Fixed* "--offset" option bug in similarity.pl.

=item (4)

*Fixed* lin measure non-normalized scores... added zero infocontent
handling in jcn and lin.

=item (5)

New utility rawtextFreq.pl, to generate information content files
from plain text.

=item (6)

similarity.pl supports option to specify part-of-speech of input
words while measuring relatedness.

=item (7)

Added option to specify (conifuration / information content) file in
similarity.pl.

=item (8)

Added Resnik counting option to the information content generation
utilities.

=item (9)

More documentation on information content utilities.

=item (10)

Added Add-1 smoothing option to the information content generation
utilities.

=back

=back

=head2 Version 0.03

=over

=item *

03/10/2003

=over

=item (1)

Removed trace bug in hso.pm.

=item (2)

Added test cases for all modules.

=back

=back

=head2 Version 0.01

=over

=item *

02/10/2003

=over

=item (1)

Created CPAN modules from distance ver 0.11.

=item (2)

Modules are completely object oriented.

=item (3)

Added Adapted Lesk semantic relatedness measure -- lesk.pm.

=item (4)

Added simple edge counting semantic relatedness measure -- edge.pm.

=item (5)

Added a random relatedness measure -- random.pm.

=item (6)

jcn, res and lin measures now support verb hierarchies.

=item (7)

Information content files can now be specified as parameters to the
modules.

=item (8)

Tools provided to build information content files from various
publicly available corpora.

=item (9)

Various parameters now control the behavior of the modules. These
parameters are passed to the modules through 'configuration files'.

=back

=back

=head1 AUTHORS

  Ted Pedersen, University of Minnesota, Duluth
  tpederse at d.umn.edu

  Siddharth Patwardhan, University of Utah, Salt Lake City
  sidd at cs.utah.edu

  Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
  banerjee+ at cs.cmu.edu

  Jason Michelizzi

=head1 SEE ALSO

L<todo.pod>

=head1 COPYRIGHT

Copyright (c) 2005, Ted Pedersen, Siddharth Patwardhan, Satanjeev Banerjee
and Jason Michelizzi

Permission is granted to copy, distribute and/or modify this  document
under the terms of the GNU Free Documentation License, Version 1.2 or  any
later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web
at L<http://www.gnu.org/copyleft/fdl.html> and is included in this
distribution as FDL.txt.

=cut