The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    Todo List for UMLS-Association

SYNOPSIS
    Plans for future versions of UMLS-Association

TO DO LIST
    1. add additional test cases for util/ programs

    2. develop web interface

    3. resolve windowing ambiguity issue (see example case below)

    4. Problem with --noorder for overlapping sets. It is not well defined
    and can throw errors. This is a known issue. see functions.t and the
    last commented out test. That WILL fail and we need a theoretical base
    for how to deal with this kind of situation

  Windowing Ambiguity Issue Example
    CUICollector was changed in order to implement windowing. To do
    windowing all CUIs in each phrase were placed into a tree/hash structure
    where the sentence position (0,1,2,etc) is used as the hash key, and the
    value is a list where each CUI can only be represented once. This
    eliminated duplicate bigrams counts and allowed for easy windowing.
    However, this method introduces some ambiguity that was not present in
    the original algorithm.

    The example comes from the PMID 10257468 that is part of the PMIDs from
    1983-1985 MetaMap collection. The MetaMap entry is pasted below followed
    by the computed tree structure from CUICollector.

    10257468 ti::1:: args('MetaMap14.FML.BINARY.Linux --lexicon db -Z 2014AB
    -Z 2014AB -qE -E /tmp/text_000N_41948
    /tmp/text_000N.out_41948',[lexicon-db,mm_data_year-'2014AB',machine_outp
    ut-[],indicate_citation_end-[],infile-'/tmp/text_000N_41948',outfile-'/t
    mp/text_000N.out_41948']). aas([]). neg_list([]).
    utterance('10257468.ti.1',"Tug of war between work and leisure for
    health care managers.",154/61,[]). phrase('Tug of war between
    work',[head([lexmatch(['tug of
    war']),inputmatch(['Tug',of,war]),tag(noun),tokens([tug,of,war])]),prep(
    [lexmatch([between]),inputmatch([between]),tag(prep),tokens([between])])
    ,mod([lexmatch([work]),inputmatch([work]),tag(noun),tokens([work])])],15
    4/23,[]). candidates(5,0,0,5,[]).
    mappings([map(-687,[ev(-760,'C1319201','TUG','Timed up and go mobility
    test',[tug],[diap],[[[1,1],[1,1],0]],yes,no,['MTH','SNOMEDCT_US'],[154/3
    ],0,0),ev(-760,'C0043027','War','War',[war],[acty],[[[3,3],[1,1],0]],yes
    ,no,['AOD','CHV','CSP','LCH','MSH','MTH'],[161/3],0,0),ev(-593,'C0043227
    ','Work','Work',[work],[ocac],[[[5,5],[1,1],0]],no,no,['CHV','CSP','LCH'
    ,'LCH_NW','LNC','MSH','MTH','NCI','SNOMEDCT_US'],[173/4],0,0)]),map(-687
    ,[ev(-760,'C1422219','TUG','ASPSCR1
    gene',[tug],[gngm],[[[1,1],[1,1],0]],yes,no,['HGNC','MTH','NCI','OMIM'],
    [154/3],0,0),ev(-760,'C0043027','War','War',[war],[acty],[[[3,3],[1,1],0
    ]],yes,no,['AOD','CHV','CSP','LCH','MSH','MTH'],[161/3],0,0),ev(-593,'C0
    043227','Work','Work',[work],[ocac],[[[5,5],[1,1],0]],no,no,['CHV','CSP'
    ,'LCH','LCH_NW','LNC','MSH','MTH','NCI','SNOMEDCT_US'],[173/4],0,0)]),ma
    p(-687,[ev(-760,'C2346564','TUG','ASPSCR1 wt
    Allele',[tug],[gngm],[[[1,1],[1,1],0]],yes,no,['MTH','NCI'],[154/3],0,0)
    ,ev(-760,'C0043027','War','War',[war],[acty],[[[3,3],[1,1],0]],yes,no,['
    AOD','CHV','CSP','LCH','MSH','MTH'],[161/3],0,0),ev(-593,'C0043227','Wor
    k','Work',[work],[ocac],[[[5,5],[1,1],0]],no,no,['CHV','CSP','LCH','LCH_
    NW','LNC','MSH','MTH','NCI','SNOMEDCT_US'],[173/4],0,0)])]).
    phrase(and,[conj([lexmatch([and]),inputmatch([and]),tag(conj),tokens([an
    d])])],178/3,[]). candidates(0,0,0,0,[]). mappings([]). phrase('leisure
    for health care
    managers.',[head([lexmatch([leisure]),inputmatch([leisure]),tag(noun),to
    kens([leisure])]),prep([lexmatch([for]),inputmatch([for]),tag(prep),toke
    ns([for])]),mod([lexmatch(['health care
    managers']),inputmatch([health,care,managers]),tag(noun),tokens([health,
    care,managers])]),punc([inputmatch(['.']),tokens([])])],182/33,[]).
    candidates(8,2,0,6,[]).
    mappings([map(-761,[ev(-760,'C0086542','Leisure','Leisure',[leisure],[id
    cn],[[[1,1],[1,1],0]],yes,no,['AOD','CHV','CSP','LCH_NW','LNC','MSH','NC
    I'],[182/7],0,0),ev(-640,'C0086388','Health Care','Health
    Care',[health,care],[hlca],[[[3,4],[1,2],0]],no,no,['AOD','CHV','CSP','M
    SH','NCI','NCI_NICHD','NLMSubSyn','SNMI'],[194/11],0,0),ev(-593,'C033514
    1','MANAGERS',manager,[managers],[prog],[[[5,5],[1,1],0]],no,no,['AOD','
    CHV','LNC','MTH','NCI','SNM','SNMI','SNOMEDCT_US'],[206/8],0,0)]),map(-7
    61,[ev(-760,'C0086542','Leisure','Leisure',[leisure],[idcn],[[[1,1],[1,1
    ],0]],yes,no,['AOD','CHV','CSP','LCH_NW','LNC','MSH','NCI'],[182/7],0,0)
    ,ev(-593,'C0018684','Health','Health',[health],[idcn],[[[3,3],[1,1],0]],
    no,no,['AOD','CHV','CSP','LCH','LCH_NW','LNC','MSH','MTH','NCI','NCI_NIC
    HD','SNOMEDCT_US'],[194/6],0,0),ev(-640,'C0687694','care managers','Case
    Manager',[care,managers],[prog],[[[4,4],[1,1],0],[[5,5],[2,2],0]],no,no,
    ['CHV','LNC','MTH','NCI'],[201/13],0,0)])]). @@@@@@@

    CUI tree/hash structure:

    Sentence Position (hash key): 0 1 2 | 3 4 5

                            CUI1: C1319201  C0043027  C0043227 | C0086542  C0086388  C0335141

                            CUI2: C1422219                     |           C0018684  C0687694

                            CUI3: C2346564                     |

    Sentence positions 0-2 were part of one phrase, and positions 3-5 were
    part of the second phrase in MetaMap. The ambiguity comes from sentence
    positions 4 and 5. Originally, CUICollector would have returned the
    following CUI Bigrams for sentence positions 4-5: C0086388-C0335141,
    C0018684-C0687694

    This is because in the original CUICollector the individual phrases
    don't get mixed up and recombined in an all-by-all fashion. The CUI's in
    questions are for the terms "health care managers". The different CUI
    Bigrams reflect the difference in meaning of "health care" "managers",
    or "health" "care managers". Thus each of these concepts has different
    CUIs. In the windowing version of the code the CUIs are placed into the
    shown structure and all-by-all bigrams are generated for each sentence
    position, rather than for each phrase like before. Now we get the
    ADDITIONAL Bigrams of: C0086388-C0687694 and C0018684-C0335141 This is
    equivalent to "health" "managers" and "health care" "care"managers".

    This behavior is now present in the Perl and Hadoop versions of
    CUICollector. It is being left alone for now, but some thought needs to
    be done to figure out if this is ok or if this ambiguity needs to be
    checked for, which will probably add more complexity to CUICollector.

AUTHORS
    Sam Henry <henryst at vcu.edu> Bridget McInnes <btmcinnes at vcu.edu>

COPYRIGHT
    Copyright (C) 2015 Bridget T. McInnes

    Permission is granted to copy, distribute and/or modify this document
    under the terms of the GNU Free Documentation License, Version 1.2 or
    any later version published by the Free Software Foundation; with no
    Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

    Note: a copy of the GNU Free Documentation License is available on the
    web at <http://www.gnu.org/copyleft/fdl.html> and is included in this
    distribution as FDL.txt.