The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
WordNet::Similarity Web Interface
=================================

The web interface distribution contains three Perl scripts, which are in the
cgi-bin directory:

cgi-bin/similarity.cgi
cgi-bin/similarity_server.pl
cgi-bin/wps.cgi

The interface employs a client-server model.  The two CGI scripts,
similarity.cgi and wps.cgi, are the clients that request information
from the server, similarity_server.pl. It is similarity_server.pl that
actually interacts with WordNet and computes relatedness scores.

The similarity_server.pl can be run on the same machine on which your
webserver is running or it can be on a different machine.  Running
similarity_server.pl on a different machine than the webserver can be
useful if you have only limited control over the webserver machine (for
example, you might not be able to install WordNet on the webserver machine).

Quick Installation Instructions
-------------------------------

This guide assumes that you are using the Apache webserver.  If you are
using a different server, then the setup process will probably vary
a little.

Step 1
------
Put the similarity.cgi and wps.cgi scripts whereever CGI scripts go on your
webserver (e.g., /usr/local/apache2/cgi-bin). In order to keep your system 
somewhat organized, you may want to put them in a subdirectory in your cgi 
directory (e.g., /usr/local/apache2/cgi-bin/similarity). These  
instructions generally assume the latter.  

The similarity_server.pl script can also go in the same directory, or you  
can put it elsewhere. You should keep similarity_server.conf with   
similarity_server.pl, or specify the location of similarity_server.conf  
in similarity_server.pl if for some reason it resides elsewhere. As noted  
above, similarity_server.pl can even run on a different machine.

There are various html, style sheets, and images in the 'doc' directory  
of the distribution. These should be put wherever html documents go on  
your webserver (e.g., /usr/local/apache2/htdocs). In order to keep your  
system somewhat organized, you may want to put them in a subdirectory in  
your html directory (e.g., /usr/local/apache2/htdocs/similarity). These 
instructions generally assume the latter. 

Step 2
------

The following three files may need to be edited:

similarity.cgi
similarity_server.pl
wps.cgi

- similarity.cgi
   * change $remote_host to be the hostname or IP addr of the machine
     on which similarty_server.pl is located.  If similarity_server.pl is
     running on the same machine as your web server, then 'localhost' or
     '127.0.0.1' will work.  Note: the setting for $remote_host and
     $remote_port in similarity.cgi are not related to Apache's LISTEN
     setting.  In fact, $remote_port needs to be different than the
     port on which Apache is listening.

   * set $doc_base to be the relative path to the HTML files that are
     in the 'doc' directory in the distribution. For example, if you
     have the .cgi files in /usr/local/apache2/cgi-bin/similarity and 
     the HTML files in /usr/local/apache2/htdocs/similarity, then you  
     would set $doc_base to '../../htdocs/similarity'.  Note: this  
     variable is not (closely) related to Apache's DOCUMENT_ROOT setting.

- similarity_server.pl
   * $BASEDIR should be changed to be a file writable by the UID under
     which the script will be running.  Usually $BASEDIR can be the
     absolute path of the directory in which this script resides.  E.g.,
     if the script is in /home/jsmith, then this could be your $BASEDIR.
     Note: the setting for this variable is not related to Apache's
     DOCUMENT_ROOT setting.

   * $wnlocation should be the location of the WordNet dictionary files.
     If you are using WordNet-2.1 on a Unix box, then
     /usr/local/WordNet-2.1/dict is probably what you want.

- wps.cgi
   * $remote_host should be the same as $remote_host for similarity.cgi. 
     $doc_base is the location of a style sheet (sim-style.css), which
     is often the same as $doc_base in similarity.cgi. 

Step 3
------
Edit the similarity_server.conf file.  This should be in the same 
directory as similarity_server.pl, or the .pl should be edited to 
provide the location of the .conf file. 

There are five options that need values. The default settings look like  
this: 

    lock_file::/var/lock/similarity.lock
    error_log::/var/log/similarity.log
    vectordb::wordvectors.dat
    stop::stoplist.txt
    compounds::wn21compounds.txt

You can usually use the default values for lock_file and error_log.  The
value for the vectordb option is a file generated by the wordVectors.pl
program (one of the utility programs distributed with WordNet-Similarity).
The value of stop is a file containing a list of stopwords, and the value
of compounds is a file containing a list of compoud words.  A sample
stop list and compounds file is distributed in the 'samples' directory
of WordNet-Similarity. 

Step 4
------
Start similarity_server.pl running:

$ similarity_server.pl &

COPYRIGHT
---------

Copyright (c) 2005, Ted Pedersen and Jason Michelizzi

This distribution is free software; you may redistribute and/or modify it
under the terms of the GNU General Public License, version 2 or, at your
option, any later version.

SEE ALSO
--------
http://wn-similarity.sourceforge.net
http://groups.yahoo.com/group/wn-similarity/