Bridget McInnes > UMLS-Similarity-0.01 > Docs/README.pod

Download:
UMLS-Similarity-0.01.tar.gz

Annotate this POD

CPAN RT

Open  0
Report a bug
Source   Latest Release: UMLS-Similarity-0.03

^

UMLS::Similarity

SYNOPSIS

This package consists of Perl modules along with supporting Perl programs that implement the semantic relatedness measures described by Leacock & Chodorow (1998) and a simple path based measure. In the near future, we are planning to add Jiang & Conrath (1997), Resnik (1995) and Lin (1998).

This package is essentially a copy of Semantic::Similarity which is a re-implementation of the WordNet::Similarity suite of modules. WordNet::Similarity is tied to the WordNet lexical database. But, suppose we wish to use these techniques in the domain of medical informatics, for instance. This Semantic::Similarity allows one to replace WordNet with another domain-specific taxonomy, and use this to find semantic relatedness of concepts in that domain.

Semantic::Similarity is not tied to a specific database but requires an Interface module (such as SnoMed::Interface) communicate between it and the database. Currently, we created UMLS::Interface to connect with the UMLS to be used with this module. In the future, we plan that UMLS-Interface will work seemlessly with all of the Semantic::Similarity functionality not just what is available in UMLS::Similarity.

The Perl modules are designed as objects with methods that take as input two word senses. The semantic relatedness of these word senses is returned by these methods. A quantitative measure of the degree to which two word senses are related has wide ranging applications in numerous areas, such as word sense disambiguation, information retrieval, etc. For example, in order to determine which sense of a given word is being used in a particular context, the sense having the highest relatedness with its context word senses is most likely to be the sense being used. Similarly, in information retrieval, retrieving documents containing highly related concepts are more likely to have higher precision and recall values.

The following sections describe the organization of this software package and how to use it. A few typical examples are given to help clearly understand the usage of the modules and the supporting utilities.

SEMANTIC RELATEDNESS

    We observe that humans find it extremely easy to say if two words are
    related and if one word is more related to a given word than another.
    For example, if we come across two words -- 'car' and 'bicycle', we know
    they are related as both are means of transport. Also, we easily observe
    that 'bicycle' is more related to 'car' than 'fork' is. But is there
    some way to assign a quantitative value to this relatedness? Some ideas
    have been put forth by researchers to quantify the concept of
    relatedness of words, with encouraging results.

    A number of different measures of relatedness have been implemented in
    this software package. These include a simple edge counting
    approach. The measures require a backend taxonomy that defines concepts
    in a domain (or in general), and some basic relationships between these
    concepts.

CONTENTS

    All the modules that will be installed in the Perl system directory are
    present in the '/lib' directory tree of the package. These include the
    semantic relatedness modules -- 

      Semantic/Similarity/jcn.pm
      Semantic/Similarity/path.pm

    -- present in the lib/ subdirectory. All these modules, once installed
    in the Perl system directory, can be directly used by Perl programs.

    The package contains a utils/ directory that contain Perl utility 
    programs. These utilities use the modules or provide some supporting
    functionality.

      queryUMLS.pl -- returns the semantic similarity of two 
                      terms or UMLS CUIs given a specified 
                      measure

INSTALL

    To install these modules run the following magic commands:

      perl Makefile.PL
      make
      make test
      make install

    This will install the modules in the standard locations. You will, most
    probably, require root privileges to install in standard system
    directories. To install in a non-standard directory, specify a prefix
    during the 'perl Makefile.PL' stage as:

      perl Makefile.PL PREFIX=/home/sid

    It is possible to modify other parameters during installation. The
    details of these can be found in the ExtUtils::MakeMaker
    documentation. However, it is highly recommended not messing around
    with other parameters, unless you know what you're doing.

TAXONOMY INTERFACE

    The modules implemented in this package require a backend taxonomy for
    computing semantic relatedness. A taxonomy is provided to these modules
    as an interface object. An interface object (for example
    Snomed::Interface v0.01) is a Perl module that provides certain methods
    that can be used by the Semantic::Similarity modules to access the
    taxonomy. The following methods are expected in the interface object:

      $version = $interface->version();
      $depth   = $interface->depth();
      $bool    = $interface->exists($concept);
      @tList   = $interface->getTermList($concept);
      @cList   = $interface->getConceptList($term);
      @path    = $interface->findShortestPath($concept1, $concept2);
    
    The 'version' method returns the version of the UMLS that is 
    being used. The 'depth' method returns the max depth of the 
    view of the UMLS that is being used. The 'exists' method checks 
    if a concept exists in the view of the UMLS being used. The 
    'getTermsList' method lists all terms corresponding to a 
    concept in the given UMLS viewand the 'getConceptList' method
    retrieves the list of CUIs corresponding to a given term. The 
    'findShortestPath' method returns the shortest path between 
    two CUIs given the view of the UMLS being used.

    Right now we know that this package works with UMLS::Interface.

SOFTWARE COPYRIGHT AND LICENSE

    Copyright (C) 2004-2009 Bridget T McInnes,  Siddharth Patwardhan, 
    Serguei Pakhomov and Ted Pedersen

    This suite of programs is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License as published
    by the Free Software Foundation; either version 2 of the License, or (at
    your option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    General Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

    Note: The text of the GNU General Public License is provided in the file
    'GPL.txt' that you should have received with this distribution.

ACKNOWLEDGMENTS

    We would like to thank the following for their support and contribution
    towards the development of this package. We thank Jason Rennie for his
    QueryData package, the WordNet guys at Princeton for WordNet, Resnik,
    Hirst, St-Onge, Jiang, Conrath, Lin, Wu, Palmer, Leacock, and Chodorow
    for their algorithms and work on the relatedness measures. We also thank
    Bano (Satanjeev Banerjee) for his work on the adapted gloss overlap
    module.

REFERENCES

    1   Wu Z. and Palmer M. 1994. Verb Semantics and Lexical Selection. In
        Proceedings of the 32nd Annual Meeting of the Association for
        Computational Linguistics.  Las Cruces, New Mexico.

    2   Resnik P. 1995. Using information content to evaluate semantic
        similarity. In Proceedings of the 14th International Joint
        Conference on Artificial Intelligence, pages 448-453, Montreal.

    3   Jiang J. and Conrath D. 1997. Semantic similarity based on corpus
        statistics and lexical taxonomy. In Proceedings of International
        Conference on Research in Computational Linguistics, Taiwan.

    4   Fellbaum C., editor. WordNet: An electronic lexical database. MIT
        Press, 1998.

    5   Leacock C. and Chodorow M. 1998. Combining local context and WordNet
        similarity for word sense identification. In Fellbaum 1998, pp.
        265-283.

    6   Lin D. 1998. An information-theoretic definition of similarity. In
        Proceedings of the 15th International Conference on Machine
        Learning, Madison, WI.

    7   Hirst G. and St-Onge D. 1998. Lexical Chains as representations of
        context for the detection and correction of malapropisms. In
        Fellbaum 1998, pp. 305-332.

    8   Schütze H. 1998. Automatic Word Sense Discrimination. Computational
        Linguistics, 24(1):97-123.

    9   Resnik P. 1999. Semantic Similarity in a Taxonomy: An Information-
        Based Measure and its Applications to Problems of Ambiguity in
        Natural Language. Journal of Artificial Intelligence Research, 11,
        95-130.

    10  Budanitsky A. and Hirst G. 2001. Semantic distance in WordNet: An
        experimental, application-oriented evaluation of five measures. In
        Workshop on WordNet and Other Lexical Resources, Second meeting of
        the North American Chapter of the Association for Computational
        Linguistics. Pittsburgh, PA.

    11  Banerjee S. and Pedersen T. 2002. An Adapted Lesk Algorithm for Word
        Sense Disambiguation Using WordNet. In Proceeding of the Fourth
        International Conference on Computational Linguistics and
        Intelligent Text Processing (CICLING-02). Mexico City.

    12  Patwardhan S., Banerjee S. and Pedersen T. 2002. Using Semantic
        Relatedness for Word Sense Disambiguation. In Proceedings of the
        Fourth International Conference on Intelligent Text Processing and
        Computational Linguistics, Mexico City.

    13  Banerjee S. Adapting the Lesk algorithm for word sense
        disambiguation to WordNet. Master Thesis, University of Minnesota,
        Duluth, 2002.

    14  Patwardhan S. Incorporating dictionary and corpus information into a
        vector measure of semantic relatedness. Master Thesis, University of
        Minnesota, Duluth, 2003.

SEE ALSO

    <http://groups.yahoo.com/group/wn-similarity>,
    <http://search.cpan.org/dist/WordNet-Similarity>,
    <http://wn-similarity.sourceforge.net>

AUTHORS

     Bridget T McInnes, University of Minnesota Twin Cities
     bthomson at cs.umn.edu

     Siddharth Patwardhan, University of Utah
     sidd at cs.utah.edu

     Serguei Pakhomov, University of Minnesota Twin Cities
     pakh002 at umn.edu

     Ted Pedersen, University of Minnesota Duluth
     tpederse at d.umn.edu

DOCUMENTATION COPYRIGHT AND LICENSE

    Copyright (C) 2003-2009 Bridget T. McInnes, Siddharth Patwardhan, 
    Serguei Pakhomov and Ted Pedersen.

    Permission is granted to copy, distribute and/or modify this document
    under the terms of the GNU Free Documentation License, Version 1.2 or
    any later version published by the Free Software Foundation; with no
    Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

    Note: a copy of the GNU Free Documentation License is available on the
    web at <http://www.gnu.org/copyleft/fdl.html> and is included in this
    distribution as FDL.txt.