The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

   		          /samples README

This directory contains a number sample files that demonstrate various
aspects of the WordNet::Similarity package and related utilities.

We recommend that you save a copy of the files in this directory for 
future use.  

Information content
===================

The directory /Infocontent is intended to contain a large number of  
precomputed information content files that can be downloaded from :

http://www.d.umn.edu/~tpederse/similarity.html

These files are the output of the word counting programs brownFreq.pl,
BNCFreq.pl, treebankFreq.pl, etc. and are used by the information content
measures: res, lin, and jcn.  You will find these files very useful if
you plan on using the res, lin, or jcn measures. 

The file Infocontent/README contains a complete description of the files  
that are available, and how they were created. Make sure you download the
version of the information content files that is for your version of
WordNet (for e.g WordNet v2.1).

WordNet Compounds
=================

The file wn21compounds.txt contains a list of "WordNet compounds" as 
found in version 2.1 of WordNet. A WordNet compound is any multi word  
expression that appears in WordNet that includes a _. These are mainly  
nouns, and include proper nouns (winston_churchill), foreign words  
(ipso_facto), expressions (face_to_face), etc. 

This file of compounds are required if you are running one of the 
information content programs (BNCFreq.pl, treebankFreq.pl, etc.). It is a 
useful option for the lesk and vector measures as well. 

stoplist
========

The file stoplist.txt contains a list of stop words.  Stop words are words
excluded from some natural language processing task because the words are
non-informative or misleading.  For example the word "a" has several 
senses in WordNet: the blood type "A", an abbreviation for angstrom, 
adenine, etc., but most often the word "a" is used as an indefinite article.

Configuration files
===================

the /config-files sub-directory contains a sample configuration file for
each measure module (WordNet::Similarity::res, WordNet::Similairyty::lesk,
WordNet::Similarity::wup, etc.).

Relation files
==============

The files vector-relation.dat and lesk-relation.dat are used by the 
vector_pairs and lesk measures respectively. Run 
'perldoc WordNet::Similarity::lesk' and 
'perldoc WordNet::Similarity::vector_pairs' for more information.

Word files
==========

The file millercharles.txt contains the 30 word pairs used in :

Miller and Charles, 1991, Contextual Correlates of Semantic Similarity.  
Language and Cognitive Processes, 6(1):1-28. 

The file resnikdiab.txt contains the 27 verb pairs used in :

Resnik and Diab, 2000, Measuring Verb Similarity, Appears in the  
Proceedings of the Twenty Second  Annual Meeting of the Cognitive Science  
Society (COGSCI2000), Philadelphia, August.