
Lingua::HE::MacHebrew - transcoding between Mac OS Hebrew encoding and Unicode

(1) using function names exported by default:
use Lingua::HE::MacHebrew;
$wchar = decodeMacHebrew($octet);
$octet = encodeMacHebrew($wchar);
(2) using function names exported on request:
use Lingua::HE::MacHebrew qw(decode encode);
$wchar = decode($octet);
$octet = encode($wchar);
(3) using function names fully qualified:
use Lingua::HE::MacHebrew ();
$wchar = Lingua::HE::MacHebrew::decode($octet);
$octet = Lingua::HE::MacHebrew::encode($wchar);
# $wchar : a string in Perl's Unicode format
# $octet : a string in Mac OS Hebrew encoding

This module provides decoding from/encoding to Mac OS Hebrew encoding (denoted MacHebrew hereafter).
Functions provided here should cope with Unicode accompanied with some directional formatting codes: i.e. PDF (or U+202C), LRO (or U+202D), and RLO (or U+202E).
e.g. decode("\xC0") returns "\x{F86A}\x{05DC}\x{05B9}" and encode("\x{F86A}\x{05DC}\x{05B9}") returns "\xC0".
$wchar = decode($octet)
$wchar = decodeMacHebrew($octet)Converts MacHebrew to Unicode.
decodeMacHebrew() is an alias for decode() exported by default.
$octet = encode($wchar)
$octet = encode($handler, $wchar)
$octet = encodeMacHebrew($wchar)
$octet = encodeMacHebrew($handler, $wchar)Converts Unicode to MacHebrew.
encodeMacHebrew() is an alias for encode() exported by default.
If the $handler is not specified, any character that is not mapped to MacHebrew is deleted; if the $handler is a code reference, a string returned from that coderef is inserted there. if the $handler is a scalar reference, a string (a PV) in that reference (the referent) is inserted there.
The 1st argument for the $handler coderef is the Unicode code point (integer) of the unmapped character.
E.g.
sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR
sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR
print encodeMacHebrew("ABC\x{100}\x{10000}");
# "ABC"
print encodeMacHebrew(\"", "ABC\x{100}\x{10000}");
# "ABC"
print encodeMacHebrew(\"?", "ABC\x{100}\x{10000}");
# "ABC??"
print encodeMacHebrew(\&hexNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"
print encodeMacHebrew(\&decNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"

Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong.
Maybe bug?: The (default) paragraph direction is not resolved. Does Mac always surround by LRO..PDF or RLO..PDF the characters with bidirectional type to be overridden?

SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C) 2003-2007, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/HEBREW.TXT
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CORPCHAR.TXT