Marvin Humphrey > Lucy-0.3.3 > Lucy::Analysis::Normalizer

Download:
Lucy-0.3.3.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  1
View/Report Bugs
Module Version: 0.003003   Source   Latest Release: Lucy-0.4.1

NAME ^

Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent stripping

Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms.

SYNOPSIS ^

    my $normalizer = Lucy::Analysis::Normalizer->new;
    
    my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
        analyzers => [ $normalizer, $tokenizer, $stemmer ],
    );

DESCRIPTION ^

Optionally, it performs Unicode case folding and converts accented characters to their base character.

If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.

CONSTRUCTORS ^

new( [labeled params] )

    my $normalizer = Lucy::Analysis::Normalizer->new(
        normalization_form => 'NFKC',
        case_fold          => 1,
        strip_accents      => 0,
    );

INHERITANCE ^

Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Lucy::Object::Obj.

syntax highlighting: