Nick Patch > Lingua-Stem-UniNE-0.07 > Lingua::Stem::UniNE::DE

Download:
Lingua-Stem-UniNE-0.07.tar.gz

Dependencies

Annotate this POD

Website

View/Report Bugs
Module Version: 0.07   Source  

NAME ^

Lingua::Stem::UniNE::DE - German stemmer

VERSION ^

This document describes Lingua::Stem::UniNE::DE v0.07.

SYNOPSIS ^

    use Lingua::Stem::UniNE::DE qw( stem_de );

    my $stem = stem_de($word);

    # alternate syntax
    $stem = Lingua::Stem::UniNE::DE::stem($word);

DESCRIPTION ^

Light and aggressive stemmers for the German language. The light stemmer removes plural endings and umlauts. The aggressive stemmer also removes inflectional suffixes and additional diacritics.

This module provides the stem and stem_de functions for the light stemmer, which are synonymous and can optionally be exported, plus stem_aggressive and stem_de_aggressive functions for the light stemmer. They accept a single word and return a single stem.

NOTES ^

“In proposing stemmers for other languages than English, we think that a ‘light’ stemmer (removing inflections only for noun and adjectives) presents some advantages. […] In German, a few rules may be applied to obtain the plural form of words (e.g., ‘Frau’ into ‘Frauen’ (woman), ‘Bild’ into ‘Bilder’ (picture), ‘Sohn’ into ‘Söhne’ (son), ‘Apfel’ into ‘Äpfel’ (apple)), but the suggested algorithms do not account for person and tense variations, or for the morphological variations used by verbs (we think that indexing verbs for Italian, French or German is not of primary importance compared to nouns and adjectives).” —Jacques Savoy, IR Multilingual Resources at UniNE

“For the German corpus, Porter’s stemmer provided better retrieval performance than did the UniNE scheme (average difference of 3.7% over nine IR models). The difference between these two stemming schemes however was never statistically significant.” —Jacques Savoy, Light Stemming Approaches for the French, Portuguese, German and Hungarian Languages

SEE ALSO ^

Lingua::Stem::UniNE provides a stemming object with access to all of the implemented University of Neuchâtel stemmers including this one. It has additional features like stemming lists of words.

Lingua::Stem::Any provides a unified interface to any stemmer on CPAN, including this one, as well as additional features like normalization, casefolding, and in-place stemming.

This modules is based on stemming algorithms by Jacques Savoy of the University of Neuchâtel and implemented in C (light, aggressive).

AUTHOR ^

Nick Patch <patch@cpan.org>

This module is brought to you by Shutterstock. Additional open source projects from Shutterstock can be found at code.shutterstock.com.

COPYRIGHT AND LICENSE ^

© 2014 Shutterstock, Inc.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: