Marvin Humphrey > Lucy > Lucy::Analysis::Normalizer

Download:
Lucy-0.4.1.tar.gz

Dependencies

Annotate this POD

Website

View/Report Bugs
Module Version: 0.004001   Source  

NAME ^

Lucy::Analysis::Normalizer - Unicode normalization, case folding and accent stripping.

SYNOPSIS ^

    my $normalizer = Lucy::Analysis::Normalizer->new;
    
    my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
        analyzers => [ $tokenizer, $normalizer, $stemmer ],
    );

DESCRIPTION ^

Normalizer is an Analyzer which normalizes tokens to one of the Unicode normalization forms. Optionally, it performs Unicode case folding and converts accented characters to their base character.

If you use highlighting, Normalizer should be run after tokenization because it might add or remove characters.

CONSTRUCTORS ^

new( [labeled params] )

    my $normalizer = Lucy::Analysis::Normalizer->new(
        normalization_form => 'NFKC',
        case_fold          => 1,
        strip_accents      => 0,
    );

INHERITANCE ^

Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa Clownfish::Obj.

syntax highlighting: