Ulrich Pfeifer > perlindex-1.302 > Text::English

Download:
perlindex-1.302.tar.gz

Dependencies

Annotate this POD

Related Modules

File::Basename
Getopt::Long
File::Find
Tk::Pod
Text::Soundex
Text::Metaphone
more...
By perlmonks.org

CPAN RT

New  8
Open  2
View/Report Bugs
Module Version: 0.01   Source   Latest Release: perlindex-1.606

NAME ^

Text::English - Porter's stemming algorithm

SYNOPSIS ^

    use Text::English;
    @stems = Text::English::stem( @words );

DESCRIPTION ^

This routine applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains these notes:

   Purpose:    Implementation of the Porter stemming algorithm documented 
               in: Porter, M.F., "An Algorithm For Suffix Stripping," 
               Program 14 (3), July 1980, pp. 130-137.
   Provenance: Written by B. Frakes and C. Cox, 1986.

I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may misbehave on short words starting with "y", but I can't think of any examples.

The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix.

NOTES ^

This is version 0.1. I would welcome feedback, especially improvements to the punctuation-stripping step.

AUTHOR ^

Ian Phillipps <ian@unipalm.pipex.com>

COPYRIGHT ^

Copyright Public IP Exchange Ltd (PIPEX). Available for use under the same terms as perl.

syntax highlighting: