PerlIO::via::Unidecode - a perlio layer for Unidecode
% cat utf8translit #!/usr/bin/perl use strict; use PerlIO::via::Unidecode; foreach my $f (@ARGV) { open IN, '<:encoding(utf8):via(Unidecode)', $f or die "$f -> $!\n"; print while <IN>; close(IN); } __END__ % od -x home_city.txt 000000: E5 8C 97 E4 BA B0 0D 0A
that's the the Chinese characters for Beijing, in UTF8
% utf8translit home_city.txt Bei Jing
PerlIO::via::Unidecode implements a PerlIO::via layer that applies Unidecode (Text::Unidecode) to data passed through it.
You can use PerlIO::via::Unidecode on already-Unicode data, as in the example in the SYNOPSIS; or you can combine it with other layers, as in this little program that converts KOI8R text into Unicode and then feeds it to Unidecode, which then outputs an ASCII transliteration:
% cat transkoi8r #!/usr/bin/perl use strict; use PerlIO::via::Unidecode; foreach my $f (@ARGV) { open IN, '<:encoding(koi8-r):via(Unidecode)', $f or die $!; print while <IN>; close(IN); } __END__ % cat fet_koi8r.txt ëÏÇÄÁ ÞÉÔÁÌÁ ÔÙ ÍÕÞÉÔÅÌØÎÙÅ ÓÔÒÏËÉ, çÄÅ ÓÅÒÄÃÁ Ú×ÕÞÎÙÊ ÐÙÌ ÓÉÑÎØÅ ÌØÅÔ ËÒÕÇÏÍ é ÓÔÒÁÓÔÉ ÒÏËÏ×ÏÊ ×ÚÄÙÍÁÀÔÓÑ ÐÏÔÏËÉ,- îÅ ×ÓÐÏÍÎÉÌÁ ÌØ Ï ÞÅÍ? % transkoi8r fet_koi8r.txt Koghda chitala ty muchitiel'nyie stroki, Gdie sierdtsa zvuchnyi pyl siian'ie l'iet krughom I strasti rokovoi vzdymaiutsia potoki,- Nie vspomnila l' o chiem?
Of course, you could do this all by manually calling Text::Unidecode's unidecode(...) function on every line you fetch, but that's just what :via(...) layers do automatically do for you.
unidecode(...)
:via(...)
Note that you can also use :via(Unidecode) as an output layer too. In that case, add a dummy ":utf8" after it, as below, just to silence some "wide character in print" warnings that you might otherwise see.
:via(Unidecode)
% cat writebei.pl use PerlIO::via::Unidecode; open OUT, ">:via(Unidecode):utf8", "rombei.txt" or die $!; print OUT "\x{5317}\x{4EB0}\n"; # those are the Chinese characters for Beijing close(OUT); % perl writebei.pl % cat rombei.txt Bei Jing
This module provides no public functions or methods -- everything is done thru the via interface. If you want a function, see Text::Unidecode.
via
Don't forget the "use PerlIO::via::Unidecode;" line, and be sure to get the case right.
Don't type "Unicode" when you mean "Unidecode", nor vice versa.
Handy modes to remember:
<:encoding(utf8):via(Unidecode) <:encoding(some-other-encoding):via(Unidecode) >:via(Unidecode):utf8
Text::Unidecode
PerlIO::via
Encode and Encode::Supported (even though the modes they implement are called as ":encoding(...)").
:encoding(...)
PerlIO::via::PinyinConvert
Thanks for Jarkko Hietaniemi for help with this module and many other things besides.
Copyright 2003, Sean M. Burke sburke@cpan.org, all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The programs and documentation in this dist are distributed in the hope that they will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
Sean M. Burke sburke@cpan.org
1 POD Error
The following errors were encountered while parsing the POD:
Non-ASCII character seen before =encoding in 'ëÏÇÄÁ'. Assuming CP1252
To install PerlIO::via::Unidecode, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PerlIO::via::Unidecode
CPAN shell
perl -MCPAN -e shell install PerlIO::via::Unidecode
For more information on module installation, please visit the detailed CPAN module installation guide.