View on
MetaCPAN is shutting down
For details read Perl NOC. After June 25th this page will redirect to
SADAHIRO Tomoyuki > Lingua-AR-MacArabic-0.20 > Lingua::AR::MacArabic



Annotate this POD

View/Report Bugs
Module Version: 0.20   Source  


Lingua::AR::MacArabic - transcoding between Mac OS Arabic encoding and Unicode


(1) using function names exported by default:

    use Lingua::AR::MacArabic;
    $wchar = decodeMacArabic($octet);
    $octet = encodeMacArabic($wchar);

(2) using function names exported on request:

    use Lingua::AR::MacArabic qw(decode encode);
    $wchar = decode($octet);
    $octet = encode($wchar);

(3) using function names fully qualified:

    use Lingua::AR::MacArabic ();
    $wchar = Lingua::AR::MacArabic::decode($octet);
    $octet = Lingua::AR::MacArabic::encode($wchar);

   # $wchar : a string in Perl's Unicode format
   # $octet : a string in Mac OS Arabic encoding


This module provides decoding from/encoding to Mac OS Arabic encoding (denoted MacArabic hereafter).


bidi support

Functions provided here should cope with Unicode accompanied with some directional formatting codes: i.e. PDF (or U+202C), LRO (or U+202D), and RLO (or U+202E).

additional mapping

Arabic-Indic Digits and some related characters in Unicode are encoded in MacArabic as if normal digits (U+0030..U+0039) when they appear in the left-to-right direction.


$wchar = decode($octet)
$wchar = decodeMacArabic($octet)

Converts MacArabic to Unicode.

decodeMacArabic() is an alias for decode() exported by default.

$octet = encode($wchar)
$octet = encode($handler, $wchar)
$octet = encodeMacArabic($wchar)
$octet = encodeMacArabic($handler, $wchar)

Converts Unicode to MacArabic.

encodeMacArabic() is an alias for encode() exported by default.

If the $handler is not specified, any character that is not mapped to MacArabic is deleted; if the $handler is a code reference, a string returned from that coderef is inserted there. if the $handler is a scalar reference, a string (a PV) in that reference (the referent) is inserted there.

The 1st argument for the $handler coderef is the Unicode code point (integer) of the unmapped character.


   sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR
   sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR

   print encodeMacArabic("ABC\x{100}\x{10000}");
   # "ABC"

   print encodeMacArabic(\"", "ABC\x{100}\x{10000}");
   # "ABC"

   print encodeMacArabic(\"?", "ABC\x{100}\x{10000}");
   # "ABC??"

   print encodeMacArabic(\&hexNCR, "ABC\x{100}\x{10000}");
   # "ABCĀ𐀀"

   print encodeMacArabic(\&decNCR, "ABC\x{100}\x{10000}");
   # "ABCĀ𐀀"


Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong.

Maybe bug?: The (default) paragraph direction is not resolved. Does Mac always surround by LRO..PDF or RLO..PDF the characters with bidirectional type to be overridden?


SADAHIRO Tomoyuki <>

Copyright(C) 2003-2011, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Map (external version) from Mac OS Arabic character set to Unicode 2.1 and later (version: c02 2005-Apr-04)

Registry (external version) of Apple use of Unicode corporate-zone characters (version: c03 2005-Apr-04)

The Bidirectional Algorithm

syntax highlighting: