The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::YALI::LanguageIdentifier - Module for language identification.

VERSION

version 0.009_01

SYNOPSIS

This modul is for language identification and can identify 122 languages.

    use Lingua::YALI::LanguageIdentifier;
    
    // create identifier and register languages
    my $identifier = Lingua::YALI::LanguageIdentifier->new();
    $identifier->add_language("ces", "eng")
    
    // identify string
    my $result = $identifier->identify_string("CPAN, the Comprehensive Perl Archive Network, is an archive of modules written in Perl.");
    print "The most probable language is " . $result->[0]->[0] . ".\n";
    // prints out The most probable language is eng.    

More examples is presented in Lingua::YALI::Examples.

METHODS

add_language

    my $added_languages = $identifier->add_languages(@languages)

Registres new languages @languages for identification and returns the amount of newly added languages. Languages are identified by their ISO 639-3 code.

It croaks when unsupported language is used.

    print $identifier->add_languages("ces", "deu", "eng") . "\n";
    // prints out 3
    print $identifier->add_languages("ces", "slk") . "\n";
    // prints out 1

remove_language

    my $removed_languages = $identifier->remove_languages(@languages)

Remove languages @languages for identification and returns the amount of removed languages.

It croaks when unsupported language is used.

    $identifier->add_languages("ces", "deu", "eng")
    print $identifier->remove_languages("ces", "slk") . "\n";
    // prints out 1
    print $identifier->remove_languages("ces", "slk") . "\n";
    // prints out 0

get_languages

    my \@languages = $identifier->get_languages();

Returns all registered languages.

get_available_languages

    my \@languages = $identifier->get_available_languages();

Returns all available languages. Currently there is 122 languages ("LANGUAGES").

identify_file

    my $result = $identifier->identify_file($file)

Identifies language for file $file.

For more details look at method "identify_file" in Lingua::YALI::Identifier.

identify_string

    my $result = $identifier->identify_string($string)

Identifies language for string $string.

For more details look at method "identify_string" in Lingua::YALI::Identifier.

identify_handle

    my $result = $identifier->identify_handle($fh)

Identifies language for handle $fh.

For more details look at method "identify_handle" in Lingua::YALI::Identifier.

LANGUAGES

More details about supported languages may be found at http://ufal.mff.cuni.cz/~majlis/w2c/download.html.

  • afr - Afrikaans

  • als - Tosk Albanian

  • amh - Amharic

  • ara - Arabic

  • arg - Aragonese

  • arz - Egyptian Arabic

  • ast - Asturian

  • aze - Azerbaijani

  • bcl - Central Bicolano

  • bel - Belarusian

  • ben - Bengali

  • bos - Bosnian

  • bpy - Bishnupriya

  • bre - Breton

  • bug - Buginese

  • bul - Bulgarian

  • cat - Catalan

  • ceb - Cebuano

  • ces - Czech

  • chv - Chuvash

  • cos - Corsican

  • cym - Welsh

  • dan - Danish

  • deu - German

  • diq - Dimli (individual language)

  • ell - Modern Greek (1453-)

  • eng - English

  • epo - Esperanto

  • est - Estonian

  • eus - Basque

  • fao - Faroese

  • fas - Persian

  • fin - Finnish

  • fra - French

  • fry - Western Frisian

  • gan - Gan Chinese

  • gla - Scottish Gaelic

  • gle - Irish

  • glg - Galician

  • glk - Gilaki

  • guj - Gujarati

  • hat - Haitian

  • hbs - Serbo-Croatian

  • heb - Hebrew

  • hif - Fiji Hindi

  • hin - Hindi

  • hrv - Croatian

  • hsb - Upper Sorbian

  • hun - Hungarian

  • hye - Armenian

  • ido - Ido

  • ina - Interlingua (International Auxiliary Language Association)

  • ind - Indonesian

  • isl - Icelandic

  • ita - Italian

  • jav - Javanese

  • jpn - Japanese

  • kan - Kannada

  • kat - Georgian

  • kaz - Kazakh

  • kor - Korean

  • kur - Kurdish

  • lat - Latin

  • lav - Latvian

  • lim - Limburgan

  • lit - Lithuanian

  • lmo - Lombard

  • ltz - Luxembourgish

  • mal - Malayalam

  • mar - Marathi

  • mkd - Macedonian

  • mlg - Malagasy

  • mon - Mongolian

  • mri - Maori

  • msa - Malay (macrolanguage)

  • mya - Burmese

  • nap - Neapolitan

  • nds - Low German

  • nep - Nepali

  • new - Newari

  • nld - Dutch

  • nno - Norwegian Nynorsk

  • nor - Norwegian

  • oci - Occitan (post 1500)

  • oss - Ossetian

  • pam - Pampanga

  • pms - Piemontese

  • pnb - Western Panjabi

  • pol - Polish

  • por - Portuguese

  • que - Quechua

  • ron - Romanian

  • rus - Russian

  • sah - Yakut

  • scn - Sicilian

  • sco - Scots

  • slk - Slovak

  • slv - Slovenian

  • spa - Spanish

  • sqi - Albanian

  • srp - Serbian

  • sun - Sundanese

  • swa - Swahili (macrolanguage)

  • swe - Swedish

  • tam - Tamil

  • tat - Tatar

  • tel - Telugu

  • tgk - Tajik

  • tgl - Tagalog

  • tha - Thai

  • tur - Turkish

  • ukr - Ukrainian

  • urd - Urdu

  • uzb - Uzbek

  • vec - Venetian

  • vie - Vietnamese

  • vol - Volapük

  • war - Waray (Philippines)

  • wln - Walloon

  • yid - Yiddish

  • yor - Yoruba

  • zho - Chinese

SEE ALSO

AUTHOR

Martin Majlis <martin@majlis.cz>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2012 by Martin Majlis.

This is free software, licensed under:

  The (three-clause) BSD License