The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
UMLS::SenseRelate
  SYNOPSIS
    This package consists of a set of Perl modules along with supporting
    Perl programs that perform the task of Word Sense Disambiguation. The
    program(s) attempt to disambiguate the sense of a single target word in
    a given context using measures of similarity and relatedness provided by
    the UMLS::Similarity CPAN module.

  INSTALL
        To install these modules run:

          perl Makefile.PL
          make
          make test
          make install

        This will install the modules in the standard locations. You will, 
        most probably, require root privileges to install in standard system
        directories. To install in a non-standard directory, specify a prefix
        during the 'perl Makefile.PL' stage as:

          perl Makefile.PL PREFIX=/home

  CONTENTS
           lib/     

             Location of the UMLS::SenseRelate::TargetWord perl module. 
             Location of the UMLS::SenseRelate::AllWords perl module. 

           t/       

             Location of the test programs

           samples/

              A directory of scripts that demonstrate SenseClusters' usage and
              functionality.

           External/

              Contains a modified version of SemEval 2010 All Words Disambigation
              scorering program , and a script that can be run to automatically 
              install it. This is used by the umls-senserelate-evaluation.pl 
              program

           utils/
       
              Directory containing Perl utility programs. These utilities use 
              the UMLS::SenseRelate modules and provide some supporting
              functionality.

  MEASURES
    UMLS-SenseRelate disambiguates terms in running text using the measures
    implemented in the UMLS-Similarity packages. Currently the following
    measures are available:

        lch    - Leacock & Chodorow (1998)
        wup    - Wu & Palmer (1994)
        nam    - Nguyen and Al-Mubaid  (2006) 
        cdist  - Rada, et. al. 1989
        jcn    - Jiang & Conrath (1997) 
        res    - Resnik (1995)
        lin    - Lin (1998) 
        lesk   - Banerjee and Pedersen(2002)
        vector - Patwardhan and Pedersen (2006)
        path   - a simple path  based measure.

  CONFIGURATION FILE
    UMLS-SenseRelate passes the UMLS-Similarity package a configuration file
    which allows a specified set of sources and relations to be used when
    calculating the similarity score between two CUIs.

    There are six configuration options: SAB, REL, RELA, SABDEF, RELDEF, and
    RELADEF.

    The SAB and REL options are used to determine which sources and
    relations the path information is to be obtained from. The RELA option
    narrows down the relation even further. The RELA will only be applied to
    the PAR/CHD and RB/RN relations.

    The SABDEF and RELDEF options are used to determine which sources and
    relations to use when creating the Extended Definition. The RELA option
    narrows down the relation even further. The RELADEF will only be applied
    to the PAR/CHD and RB/RN relations.

    The path, wup, lch, lin, jcn and res measures require the SAB and REL
    options to be set. There is also an optional RELA option.

    The vector and lesk measures require the SABDEF and RELDEF options to be
    set with an optional RELADEF.

    You can specify a single source, multiple sources or the entire UMLS
    (using the UMLS_ALL option). Keep in mind that the greater the number of
    sources the larger the search space so if you obtaining path information
    about two concepts this will take longer. The names of the sources in
    the configuration file are expected to be in the SAB (source
    abbreviation) form. A listing of the sources and their SABs can be
    found:

    <http://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/re
    lease/source_vocabularies.html>

    You can specify any relations that exist in the specified set of sources
    that you defined. The directional (hierarchical) relations though are
    PAR/CHD and RB/RN. The other relations (such as RO and SIB) are not
    directional which means when obtaining path information when using these
    relations may take much longer than obtaining path information using the
    directional relations. A listing of the different relations can be found
    here (scroll down to the REL table):

    <http://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/re
    lease/abbreviations.html>

    If you do plan on using a multiple sources or the entire UMLS, we would
    advise you to use the --realtime option which is explained below, in the
    Interface.pm documentation and the path programs in the utils/
    directory. We also have a am UMLS_ALL option for this so you do not have
    to specify each and every source and relation.

    The format of the configuration file is as follows:

    SAB :: <include|exclude> <source1, source2, ... sourceN>

    REL :: <include|exclude> <relation1, relation2, ... relationN>

    RELA :: <include|exclude> <rela1, rela2, ... relaN>

    For example, if we wanted to use the MSH vocabulary with only the RB/RN
    relations, the configuration file would be:

     SAB :: include MSH
     REL :: include RB, RN

    or

     SAB :: include MSH
     REL :: exclude PAR, CHD

    If we wanted to use the SNOMEDCT vocabulary with only the PAR/CHD
    relations that are is-a relations, the configuration file would be:

     SAB :: include SNOMEDCT
     REL :: include PAR, CHD 
     RELA :: include isa, inverse_isa

    The format for SABDEF and RELDEF is similar.

    The SABDEF and RELDEF options are used to determine the sources and
    relations the extended definition is to be obtained from.

    The format of the configuration file is as follows:

    SABDEF :: <include|exclude> <source1, source2, ... sourceN>

    RELDEF :: <include|exclude> <relation1, relation2, ... relationN>

    RELADEF :: <include|exclude> <rela1, rela2, ... relaN>

    Note: RELDEF takes any of MRREL relations and two special 'relations':

          1. CUI which refers to the CUIs definition

          2. TERM which refers to the terms associated with the CUI

    For example, if we wanted to use the definitions from MSH vocabulary and
    we only wanted the definition of the CUI and the definitions of the CUIs
    SIB relation, the configuration file would be:

     SABDEF :: include MSH
     RELDEF :: include CUI, SIB

    If you wanted only the PAR/CHD definitions which are is-a relations.

     SABDEF :: include MSH
     RELDEF :: include PAR, CHD
     RELADEF :: include isa, inverse_isa

    For all of these options, there is an UMLS_ALL tag. If used with SAB or
    SABDEF, it would include all of the UMLS sources. If used with the REL
    or RELDEF, it would include all of the possible relations (as well as
    CUI and TERM for RELDEF). If used with the RELA or RELADEF, it would
    include all of the RELA relations including those with no RELA relation.
    Note that this is also the default for this option which is why it is
    optional. An example of using the UMLS_ALL option is as follows:

     SAB :: include UMLS_ALL
     REL :: include UMLS_ALL

    and another is:

     SABDEF :: include UMLS_ALL
     RELDEF :: include UMLS_ALL

    If you go to the configuration file directory, there will be example
    configuration files for the different runs that you have performed.

    For more information about the configuration options please see the
    README.

  SOFTWARE COPYRIGHT AND LICENSE
    Copyright (C) 2010-2011 Bridget T McInnes, Ying Liu, Serguei Pakhomov
    and Ted Pedersen

    This suite of programs is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License as published
    by the Free Software Foundation; either version 2 of the License, or (at
    your option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
    Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

    Note: The text of the GNU General Public License is provided in the file
    GPL.txt' that you should have received with this distribution.

  CONTACT US
    If you have any trouble installing and using UMLS-SenseRelate, please
    contact us via the users mailing list :

    umls-similarity@yahoogroups.com

    You can join this group by going to:

    <http://tech.groups.yahoo.com/group/umls-similarity/>

    You may also contact us directly if you prefer :

      Bridget T. McInnes: bthomson at umn.edu
      Ted Pedersen      : tpederse at d.umn.edu

  AUTHORS
     Bridget T McInnes, University of Minnesota Twin Cities
     bthomson at umn.edu

     Ted Pedersen, University of Minnesota Duluth
     tpederse at d.umn.edu

     Ying Liu, University of Minnesota
     liux0395 at umn.edu

     Serguei Pakhomov, University of Minnesota Twin Cities
     pakh002 at umn.edu

  DOCUMENTATION COPYRIGHT AND LICENSE
    Copyright (C) 2010-2011 Bridget T. McInnes, Ying Liu, Serguei Pakhomov
    and Ted Pedersen.

    Permission is granted to copy, distribute and/or modify this document
    under the terms of the GNU Free Documentation License, Version 1.2 or
    any later version published by the Free Software Foundation; with no
    Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

    Note: a copy of the GNU Free Documentation License is available on the
    web at:

    <http://www.gnu.org/copyleft/fdl.html>

    and is included in this distribution as FDL.txt.