The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::Translit - transliterates text between writing systems

SYNOPSIS

  use Lingua::Translit;
 
  my $tr = new Lingua::Translit("ISO 843");
 
  my $text_tr = $tr->translit("character oriented string");
 
  if ($tr->can_reverse()) {
    $text_tr = $tr->translit_reverse("character oriented string");
  }

DESCRIPTION

Lingua::Translit can be used to convert text from one writing system to another, based on national or international transliteration tables. Where possible a reverse transliteration is supported.

The term transliteration describes the conversion of text from one writing system or alphabet to another one. The conversion is ideally unique, mapping one character to exactly one character, so the original spelling can be reconstructed. Practically this is not always the case and one single letter of the original alpabet can be transcribed as two, three or even more letters.

Furthermore there is more than one transliteration scheme for one writing system. Therefore it is an important and necessary information, which scheme will be or has been used to transliterate a text, to work integrative and be able to reconstruct the original data.

Reconstruction is a problem though for non-unique transliterations, if no language specific knowledge is available as the resulting clusters of letters may be ambigous. For example, the Greek character "PSI" maps to "ps", but "ps" could also result from the sequence "PI", "SIGMA" since "PI" maps to "p" and "SIGMA" maps to s. If a transliteration table leads to ambigous conversions, the provided table cannot be used reverse.

Otherwise the table can be used in both directions, if appreciated. So if ISO 9 is originally created to convert Cyrillic letters to the Latin alphabet, the reverse transliteration will transform Latin letters to Cyrillic.

METHODS

new("name of table")

Initializes an object with the specific transliteration table, e.g. "ISO 9".

translit("character oriented string")

Transliterates the given text according to the object's transliteration table. Returns the transliterated text.

translit_reverse("character oriented string")

Transliterates the given text according to the object's transliteration table, but uses it the other way round. For example table ISO 9 is a transliteration scheme for the converion of Cyrillic letters to the Latin alphabet. So if used reverse, Latin letters will be mapped to Cyrillic ones.

Returns the transliterated text.

can_reverse()

Returns true (1), iff reverse transliteration is possible. False (0) otherwise.

name()

Returns the name of the chosen transliteration table, e.g. "ISO 9".

desc()

Returns a description for the transliteration, e.g. "ISO 9:1995, Cyrillic to Latin".

SUPPORTED TRANSLITERATIONS

Cyrillic

ISO 9, reversible, ISO 9:1995, Cyrillic to Latin

DIN 1460 RUS, reversible, DIN 1460:1982, Cyrillic to Latin, Russian

DIN 1460 UKR, reversible, DIN 1460:1982, Cyrillic to Latin, Ukrainian

DIN 1460 BUL, reversible, DIN 1460:1982, Cyrillic to Latin, Bulgarian

Streamlined System BUL, not reversible, The Streamlined System: 2006, Cyrillic to Latin, Bulgarian

GOST 7.79 RUS, reversible, GOST 7.79:2000 (table B), Cyrillic to Latin, Russian

GOST 7.79 RUS OLD, not reversible, GOST 7.79:2000 (table B), Cyrillic to Latin with support for Old Russian (pre 1918), Russian

GOST 7.79 UKR, reversible, GOST 7.79:2000 (table B), Cyrillic to Latin, Ukrainian

Greek

ISO 843, not reversible, ISO 843:1997, Greek to Latin

DIN 31634, not reversible, DIN 31634:1982, Greek to Latin

Greeklish, not reversible, Greeklish (Phonetic), Greek to Latin

Latin

Common CES, not reversible, Czech without diacritics

Common DEU, not reversible, German without umlauts

Common POL, not reversible, Unaccented Polish

Common RON, not reversible, Romanian without diacritics as commonly used

Common SLK, not reversible, Slovak without diacritics

Common SLV, not reversible, Slovenian without diacritics

Mongolian

Common Classical MON, reversible, Classical Mongolian to Latin

ADDING NEW TRANSLITERATIONS

In case you want to add your own transliteration tables to Lingua::Translit, have a look at the developer manual included in the distribution. An online version is available at http://www.lingua-systems.com/downloads/Lingua-Translit/.

A template of a transliteration table is provided as well (xml/template.xml) so you can easily start developing.

RESTRICTIONS

Lingua::Translit is suited to handle Unicode and utilizes comparisons and regular expressions that rely on code points. Therefore, any input is supposed to be character oriented (use utf8;, ...) instead of byte oriented.

However, if your data is byte oriented, be sure to pass it UTF-8 encoded to translit() and/or translit_reverse() - it will be converted internally.

BUGS

None known.

Please report bugs to perl@lingua-systems.com.

SEE ALSO

Lingua::Translit::Tables, Encode, perlunicode

translit(1)

http://www.lingua-systems.com/transliteration/Lingua-Translit-Perl-module/

http://www.lingua-systems.com/transliteration/Lingua-Translit-Perl-module/online-transliteration.html provides an online frontend for Lingua::Translit.

CREDITS

Thanks to Dr. Daniel Eiwen, Romanisches Seminar, Universitaet Koeln for his help on Romanian transliteration.

Thanks to Bayanzul Lodoysamba <baynaa@users.sourceforge.net> for contributing the "Common Classical Mongolian" transliteration table.

AUTHORS

Alex Linke <alinke@lingua-systems.com>

Rona Linke <rlinke@lingua-systems.com>

LICENSE AND COPYRIGHT

Copyright (C) 2007-2008 Alex Linke and Rona Linke

Copyright (C) 2009-2010 Lingua-Systems Software GmbH

This module is free software. It may be used, redistributed and/or modified under the terms of either the GPL v2 or the Artistic license.