NAME

Text::TermExtract - Extract terms from text

SYNOPSIS

    use Text::TermExtract;

    my $text = { Hey, hey, how's it going? Wanna go to Wendy's 
                 tonight? Wendy's has great sandwiches." };

    my $ext = Text::TermExtract->new();

    for my $word ( $ext->terms_extract( $text, { max => 3 }) ) {
        print "$word\n";
    }

    # "sandwiches"
    # "tonight"
    # "wendy"

DESCRIPTION

Text::TermExtract takes a simple approach at extracting the most interesting terms from documents of arbitrary length.

There's more scientific methods to term extraction, like Yahoo's online term extraction API (but you can't have it locally) and the Lingua::YaTeA module on CPAN (which is so poorly documented that I couldn't figure out how to use it).

So I wrote Text::TermExtract, which first tries to guess the language a text is written in, kicks out the language- specific stopwords, weighs the rest with a hand-crafted formula and returns a list of (hopefully) interesting words.

This is a very crude approach to term extraction, if you have a better method and want to include it in Text::TermExtract, drop me an email, I'm interested.

METHODS

new()

Constructor.

terms_extract( $text, $opts )

Goes through the text stringin $text, extracts the keywords and returns them as a list.

To limit the number of words returned, use the max option:

    $extr->terms_extract( $text, { max => 10 } );

exclude( $array_ref )

Add a list of words to exclude. The words listed in the array passed in as a reference will never be used as keywords.

    $extr->exclude( ['moe', 'joe'] );

LEGALESE

AUTHOR

2008, Mike Schilli <cpan@perlmeister.com>

To install Text::TermExtract, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::TermExtract

CPAN shell

perl -MCPAN -e shell
install Text::TermExtract

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)