Marvin Humphrey > KinoSearch1 > KinoSearch1::Analysis::Stopalizer

Download:
KinoSearch1-1.01.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
View/Report Bugs
Source  

NAME ^

KinoSearch1::Analysis::Stopalizer - suppress a "stoplist" of common words

SYNOPSIS ^

    my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
        language => 'fr',
    );
    my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new(
        analyzers => [ $lc_normalizer, $tokenizer, $stopalizer, $stemmer ],
    );

DESCRIPTION ^

A "stoplist" is collection of "stopwords": words which are common enough to be of little value when determining search results. For example, so many documents in English contain "the", "if", and "maybe" that it may improve both performance and relevance to block them.

    # before
    @token_texts = ('i', 'am', 'the', 'walrus');
    
    # after
    @token_texts = ('',  '',   '',    'walrus');

CONSTRUCTOR ^

new

    my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
        language => 'de',
    );
    
    # or...
    my $stopalizer = KinoSearch1::Analysis::Stopalizer->new(
        stoplist => \%stoplist,
    );

new() takes two possible parameters, language and stoplist. If stoplist is supplied, it will be used, overriding the behavior indicated by the value of language.

SEE ALSO ^

Lingua::StopWords

COPYRIGHT ^

Copyright 2005-2010 Marvin Humphrey

LICENSE, DISCLAIMER, BUGS, etc. ^

See KinoSearch1 version 1.01.

syntax highlighting: