Ted Pedersen > Text-Similarity > Text::OverlapFinder

Download:
Text-Similarity-0.10.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  1
View/Report Bugs
Module Version: 0.03   Source  

NAME ^

Text::OverlapFinder - Find Overlapping Words in Strings

SYNOPSIS ^

    # this will list out the overlaps found in two strings
    # note that the overlaps are found among space separated
    # tokens, there are no partial word matches
    # ('cat' will not match 'at' or 'cats', for example)

    use Text::OverlapFinder;
    my $finder = Text::OverlapFinder->new;
    defined $finder or die "Construction of Text::OverlapFinder failed";

    my $string1 = 'aaa bbb ccc ddd eee';
    my $string2 = 'aa bbb ccc dd ee aaa';

    # overlaps is a hash of references to the overlaps found
    # len1 and len2 are the lengths of the strings in terms of words

    my ($overlaps, $len1, $len2) = $finder->getOverlaps ($string1, $string2); 
    foreach my $overlap (keys %$overlaps) {
        print "$overlap occurred $overlaps->{$overlap} times.\n";
    }
    print "length of string 1 = $len1 length of string 2 = $len2\n";

DESCRIPTION ^

This module finds word overlaps in strings. It finds the longest possible overlap, and keeps track of how many time each overlap occurs.

There is a mechanism available for a user to provide a stemming module, but no stemmer is provided by this package as yet.

AUTHORS ^

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

 Siddharth Patwardhan, University of Utah
 sidd at cs.utah.edu

 Satanjeev Banerjee, Carnegie-Mellon University
 banerjee at cs.cmu.edu

 Jason Michelizzi 

 Ying Liu, University of Minnesota, Twin Cities
 liux0395 at umn.edu

Last modified by: $Id: OverlapFinder.pm,v 1.14 2010/06/09 21:12:49 liux0395 Exp $

COPYRIGHT AND LICENSE ^

Copyright (C) 2004-2010 by Jason Michelizzi, Ted Pedersen, Siddharth Patwardhan, Satanjeev Banerjee and Ying Liu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

syntax highlighting: