SADAHIRO Tomoyuki > ShiftJIS-CP932-Correct-0.06 > ShiftJIS::CP932::Correct

Download:
ShiftJIS-CP932-Correct-0.06.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.06   Source  

NAME ^

ShiftJIS::CP932::Correct - corrects a string in Windows CP-932 (a variant of Shift_JIS)

SYNOPSIS ^

    use ShiftJIS::CP932::Correct;

    $corrected_cp932 = correct_cp932($cp932_string);

DESCRIPTION ^

The Microsoft Code Page 932 (CP-932) table comprises 7915 characters:

    JIS X 0201-1997 single-byte characters (159 characters),
    JIS X 0211-1994 single-byte characters (32 characters),
    JIS X 0208-1997 double-byte characters (6879 characters),
    NEC special characters (83 characters from SJIS row 13),
    NEC-selected IBM extended characters (374 characters from SJIS row 89 to 92),
    and IBM extended characters (388 characters from SJIS row 115 to 119).

It contains duplicates that do not round trip map. These duplicates are due to the characters defined by vendors, NEC and IBM.

For example, there are two characters mapped to U+2252, namely, 0x81e0 (a JIS X 0208 character) and 0x8790 (an NEC special character).

So some programs converting Unicode to CP-932 may carelessly convert U+2252 to 0x8790, but not to 0x81e0.

Such a behavior is disagreeable since NEC special characters (or other vendor-defined characters) are less compatible.

This module corrects (or normalizes) such a (certainly legal but) 'wrong' CP-932 string.

This modules uses a map provided in Microsoft PRB: Conversion Problem Between Shift-JIS and Unicode (Article ID: Q170559).

correct_cp932(STRING)

Corrects a CP-932 string. namely, converts less preferred code points of duplicates (doubly-defined characters) to those preferred.

Does not affect characters that can be round trip mapped to Unicode. Any undefined characters are deleted.

For example, converts \x87\x90 to \x81\xe0.

is_corrected_cp932(STRING)

Returns boolean whether the string is a corrected CP-932 string.

is_cp932(STRING)

Returns boolean whether the string is a CP-932 string.

EXPORT

  correct_cp932 and is_corrected_cp932 by default.
  is_cp932 on request.

CAVEAT ^

A corrected CP-932 string may still contain a vendor-defined character.

IT SHOULD BE NOTED THAT CP-932 IS DIFFERENT FROM SHIFT_JIS !!

AUTHOR ^

SADAHIRO Tomoyuki <SADAHIRO@cpan.org>

Copyright(C) 2001-2002, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO ^

  1. Microsoft PRB: Conversion Problem Between Shift-JIS and Unicode (Article ID: Q170559)
  2. ShiftJIS::CP932::MapUTF
syntax highlighting: