
Lingua::JA::MacJapanese - transcoding between Mac OS Japanese encoding and Unicode

(1) using function names exported by default:
use Lingua::JA::MacJapanese;
$wchar = decodeMacJapanese($octet);
$octet = encodeMacJapanese($wchar);
(2) using function names exported on request:
use Lingua::JA::MacJapanese qw(decode encode);
$wchar = decode($octet);
$octet = encode($wchar);
(3) using function names fully qualified:
use Lingua::JA::MacJapanese ();
$wchar = Lingua::JA::MacJapanese::decode($octet);
$octet = Lingua::JA::MacJapanese::encode($wchar);
# $wchar : a string in Perl's Unicode format
# $octet : a string in Mac OS Japanese encoding

This module provides decoding from/encoding to Mac OS Japanese encoding (denoted MacJapanese hereafter).
In order to ensure roundtrip mapping, MacJapanese encoding has some characters with mapping from a single MacJapanese character to a sequence of Unicode characters and vice versa. Such characters include 0x85AB (MacJapanese) from/to 0xF862+0x0058+0x0049+0x0049+0x0049 (Unicode) for "Roman numeral thirteen".
This module provides functions to transcode between MacJapanese and Unicode, without information loss for every MacJapanese character.
Shift-JIS has 2444 User Defined Characters (a.k.a. Gaiji) [0xF040 to 0xFCFC (rows 95 to 120)], which are mapped to Unicode's PUA [0xE000 to 0xE98B].
$wchar = decode($octet)
$wchar = decode($handler, $octet)
$wchar = decodeMacJapanese($octet)
$wchar = decodeMacJapanese($handler, $octet)Converts MacJapanese to Unicode.
decodeMacJapanese() is an alias for decode() exported by default.
If the $handler is not specified, any MacJapanese character that is not mapped to Unicode is deleted; if the $handler is a code reference, a string returned from that coderef is inserted there. if the $handler is a scalar reference, a string (a PV) in that reference (the referent) is inserted there.
The 1st argument for the $handler coderef is a string of the unmapped MacJapanese character (e.g. "\xEF\xFC").
$octet = encode($wchar)
$octet = encode($handler, $wchar)
$octet = encodeMacJapanese($wchar)
$octet = encodeMacJapanese($handler, $wchar)Converts Unicode to MacJapanese.
encodeMacJapanese() is an alias for encode() exported by default.
If the $handler is not specified, any Unicode character that is not mapped to MacJapanese is deleted; if the $handler is a code reference, a string returned from that coderef is inserted there. if the $handler is a scalar reference, a string (a PV) in that reference (the referent) is inserted there.
The 1st argument for the $handler coderef is the Unicode code point (unsigned integer) of the unmapped character.
E.g.
sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR
sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR
print encodeMacJapanese("ABC\x{100}\x{10000}");
# "ABC"
print encodeMacJapanese(\"", "ABC\x{100}\x{10000}");
# "ABC"
print encodeMacJapanese(\"?", "ABC\x{100}\x{10000}");
# "ABC??"
print encodeMacJapanese(\&hexNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"
print encodeMacJapanese(\&decNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"

Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong.

SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C) 2003-2007, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CORPCHAR.TXT