View on
MetaCPAN is shutting down
For details read Perl NOC. After June 25th this page will redirect to
Siddharth Patwardhan > WordNet-Similarity-1.04 > WordNet::Similarity::GlossFinder



Annotate this POD


Open  6
View/Report Bugs
Module Version: 1.01   Source   Latest Release: WordNet-Similarity-2.07


WordNet::Similarity::GlossFinder - module to implement gloss finding methods for WordNet::Similarity measures of semantic relatedness (specifically, lesk and vector)


  use WordNet::QueryData;

  my $wn = WordNet::QueryData->new;

  use WordNet::Similarity::GlossFinder;

  my $obj = WordNet::Similarity::GlossFinder->new ($wn);



This class is derived from (i.e., is a sub-class of) WordNet::Similarity. Two of the measures of similarity, provided in this package, viz. WordNet::Similarity::lesk and WordNet::Similarity::vector deal with WordNet glosses. This module provides methods for easy access to the required glosses.


This module inherits all the methods of WordNet::Similarity. Additionally, the following methods are also defined.

Public methods


Specifies the parts of speech that measures derived from this module support (namely, nouns, verbs, adjectives and adverbs).

parameters: none

returns: true


Overrides method of same name in WordNet::Similarity. Prints module-specific configuration options to the trace string (if tracing is on). GlossFinder supports module specific options: relation, stop, stem and compounds.

Parameters: none

Returns: nothing


Overrides the configure method in WordNet::Similarity. This method loads various data files, such as the stop words, compounds and relations.

Parameters: $file -- path of the configuration file.

Returns: nothing

$self->getSuperGlosses($wps1, $wps2)

This method returns a list of large blocks of concatenated glosses (super-gloss) for each specified synset. A super-gloss is the block of text formed by concatenating the glosses of a synset with glosses of synsets related to it in WordNet. "Related" synsets are identified by specific relations specified in the "relations" file. If no relations file was specified in the configuration, only the gloss of that synset is returned.

Parameters: wps1 and wps2 -- two synsets.

Returns: List of superglosses for both synsets (2-D array).


This method identifies all compounds in a given block of text. It uses the list of compounds present in WordNet. Any such compound found in text is connected with underscores.

Parameters: block -- block of text.

Returns: Compounded block of text.

Private Methods


This method loads relations from a relation file.

Parameters: none

Returns: nothing


The semantic relatedness modules in this distribution are built as classes. The classes define four methods that are useful in finding relatedness values for pairs of synsets.


Typical Usage Examples

To create an object of the Resnik measure, we would have the following lines of code in the Perl program.

   use WordNet::Similarity::path;
   $object = WordNet::Similarity::path->new($wn, '~/path.conf');

The reference of the initialized object is stored in the scalar variable '$object'. '$wn' contains a WordNet::QueryData object that should have been created earlier in the program. The second parameter to the 'new' method is the path of the configuration file for the path measure. If the 'new' method is unable to create the object, '$object' would be undefined. This, as well as any other error/warning may be tested.

   die "Unable to create path object.\n" unless defined $object;
   ($err, $errString) = $object->getError();
   die $errString."\n" if($err);

To create a Leacock-Chodorow measure object, using default values, i.e. no configuration file, we would have the following:

   use WordNet::Similarity::lch;
   $measure = WordNet::Similarity::lch->new($wn);

To find the semantic relatedness of the first sense of the noun 'car' and the second sense of the noun 'bus' using the path measure, we would write the following piece of code:

   $relatedness = $object->getRelatedness('car#n#1', 'bus#n#2');

To get traces for the above computation:

   print $object->getTraceString();

However, traces must be enabled using configuration files. By default traces are turned off.


Many of the methods in this module can work with either offsets or wps strings internally. There are several interesting consequences of each mode.

  1. An offset is not a unique identifier for a synset, but neither is a wps string. An offset only indicates a byte offset in one of the WordNet data files (data.noun, data.verb, etc. on Unix-like systems). An offset along with a part of speech, however, does uniquely identify a synset.

    A word#pos#sense string, on the other hand, is the opposite extreme. A word#pos#sense string is an identifier for a unique word sense. A synset can have several word senses in it (i.e., a synset is a set of word senses that are synonymous). The synset {beer_mug#n#1, stein#n#1} has two word senses. The wps strings 'beer_mug#n#1' and 'stein#n#1' can both be used to refer to the synset. For simplicity, we usually just use the first wps string when referring to the synset. N.B., the wps representation was developed by WordNet::QueryData.

  2. Early versions of WordNet::Similarity::* used offsets internally for finding paths, hypernym trees, subsumers, etc. The module WordNet::QueryData that is used by Similarity, however, accepts only wps strings as input to its querySense method, which is used to find hypernyms. We have found that it is more efficient (faster) to use wps strings internally.


 Ted Pedersen, University of Minnesota Duluth
 tpederse at

 Siddharth Patwardhan, University of Utah, Salt Lake City
 sidd at




WordNet::Similarity(3) WordNet::Similarity::vector(3) WordNet::Similarity::lesk(3)


Copyright (c) 2005, Ted Pedersen and Siddharth Patwardhan

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

    The Free Software Foundation, Inc.,
    59 Temple Place - Suite 330,
    Boston, MA  02111-1307, USA.

Note: a copy of the GNU General Public License is available on the web at and is included in this distribution as GPL.txt.

syntax highlighting: