The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::YALI::Identifier - Module for language identification with custom models.

VERSION

version 0.012

SYNOPSIS

This modul identify languages with moduls provided by the user. If you want to use pretrained models use Lingua::YALI::LanguageIdentifier.

Models trained on texts from specific domain outperforms the general ones.

    use Lingua::YALI::Builder;
    use Lingua::YALI::Identifier;

    # create models
    my $builder_a = Lingua::YALI::Builder->new(ngrams=>[2]);
    $builder_a->train_string("aaaaa aaaa aaa aaa aaa aaaaa aa");
    $builder_a->store("model_a.2_all.gz", 2);

    my $builder_b = Lingua::YALI::Builder->new(ngrams=>[2]);
    $builder_b->train_string("bbbbbb bbbb bbbb bbb bbbb bbbb bbb");
    $builder_b->store("model_b.2_all.gz", 2);

    # create identifier and load models
    my $identifier = Lingua::YALI::Identifier->new();
    $identifier->add_class("a", "model_a.2_all.gz");
    $identifier->add_class("b", "model_b.2_all.gz");

    # identify strings
    my $result1 = $identifier->identify_string("aaaaaaaaaaaaaaaaaaa");
    print $result1->[0]->[0] . "\t" . $result1->[0]->[1];
    # prints out a 1

    my $result2 = $identifier->identify_string("bbbbbbbbbbbbbbbbbbb");
    print $result2->[0]->[0] . "\t" . $result2->[0]->[1];
    # prints out b 1

More examples is presented in Lingua::YALI::Examples.

METHODS

BUILD

Initializes internal variables.

    # create identifier
    my $identifier = Lingua::YALI::Identifier->new();

add_class

    $added = $identifier->add_class($class, $model)

Adds model stored in file $model with class $class and returns whether it was added or not.

    print $identifier->add_class("a", "model.a1.gz") . "\n";
    # prints out 1
    print $identifier->add_class("a", "model.a2.gz") . "\n";
    # prints out 0 - class a was already added

remove_class

     my $removed = $identifier->remove_class($class);

Removes model for class $class.

    $identifier->add_class("a", "model.a1.gz");
    print $identifier->remove_class("a") . "\n";
    # prints out 1
    print $identifier->remove_class("a") . "\n";
    # prints out 0 - class a was already removed

get_classes

    my \@classes = $identifier->get_classes();

Returns all registered classes.

identify_file

    my $result = $identifier->identify_file($file)

Identifies class for file $file.

  • It returns undef if $file is undef.

  • It croaks if the file $file does not exist or is not readable.

  • Otherwise look for more details at method "identify_handle".

identify_string

    my $result = $identifier->identify_string($string)

Identifies class for string $string.

  • It returns undef if $string is undef.

  • Otherwise look for more details at method "identify_handle".

identify_handle

    my $result = $identifier->identify_handle($fh)

Identifies class for file handle $fh and returns:

  • It returns undef if $fh is undef.

  • It croaks if the $fh is not file handle.

  • It returns array reference in format [ ['class1', score1], ['class2', score2], ...] sorted according to score descendently, so the most probable class is the first.

SEE ALSO

AUTHOR

Martin Majlis <martin@majlis.cz>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2012 by Martin Majlis.

This is free software, licensed under:

  The (three-clause) BSD License