Lingua::YALI::Identifier - Module for language identification with custom models.
version 0.012
This modul identify languages with moduls provided by the user. If you want to use pretrained models use Lingua::YALI::LanguageIdentifier.
Models trained on texts from specific domain outperforms the general ones.
use Lingua::YALI::Builder; use Lingua::YALI::Identifier; # create models my $builder_a = Lingua::YALI::Builder->new(ngrams=>[2]); $builder_a->train_string("aaaaa aaaa aaa aaa aaa aaaaa aa"); $builder_a->store("model_a.2_all.gz", 2); my $builder_b = Lingua::YALI::Builder->new(ngrams=>[2]); $builder_b->train_string("bbbbbb bbbb bbbb bbb bbbb bbbb bbb"); $builder_b->store("model_b.2_all.gz", 2); # create identifier and load models my $identifier = Lingua::YALI::Identifier->new(); $identifier->add_class("a", "model_a.2_all.gz"); $identifier->add_class("b", "model_b.2_all.gz"); # identify strings my $result1 = $identifier->identify_string("aaaaaaaaaaaaaaaaaaa"); print $result1->[0]->[0] . "\t" . $result1->[0]->[1]; # prints out a 1 my $result2 = $identifier->identify_string("bbbbbbbbbbbbbbbbbbb"); print $result2->[0]->[0] . "\t" . $result2->[0]->[1]; # prints out b 1
More examples is presented in Lingua::YALI::Examples.
Initializes internal variables.
# create identifier my $identifier = Lingua::YALI::Identifier->new();
$added = $identifier->add_class($class, $model)
Adds model stored in file $model with class $class and returns whether it was added or not.
$model
$class
print $identifier->add_class("a", "model.a1.gz") . "\n"; # prints out 1 print $identifier->add_class("a", "model.a2.gz") . "\n"; # prints out 0 - class a was already added
my $removed = $identifier->remove_class($class);
Removes model for class $class.
$identifier->add_class("a", "model.a1.gz"); print $identifier->remove_class("a") . "\n"; # prints out 1 print $identifier->remove_class("a") . "\n"; # prints out 0 - class a was already removed
my \@classes = $identifier->get_classes();
Returns all registered classes.
my $result = $identifier->identify_file($file)
Identifies class for file $file.
$file
It returns undef if $file is undef.
It croaks if the file $file does not exist or is not readable.
Otherwise look for more details at method "identify_handle".
my $result = $identifier->identify_string($string)
Identifies class for string $string.
$string
It returns undef if $string is undef.
my $result = $identifier->identify_handle($fh)
Identifies class for file handle $fh and returns:
$fh
It returns undef if $fh is undef.
It croaks if the $fh is not file handle.
It returns array reference in format [ ['class1', score1], ['class2', score2], ...] sorted according to score descendently, so the most probable class is the first.
Identifier with pretrained models for language identification is Lingua::YALI::LanguageIdentifier.
Builder for these models is Lingua::YALI::Builder.
There is also command line tool yali-identifier with similar functionality.
Source codes are available at https://github.com/martin-majlis/YALI.
Martin Majlis <martin@majlis.cz>
This software is Copyright (c) 2012 by Martin Majlis.
This is free software, licensed under:
The (three-clause) BSD License
To install Lingua::YALI, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::YALI
CPAN shell
perl -MCPAN -e shell install Lingua::YALI
For more information on module installation, please visit the detailed CPAN module installation guide.