SADAHIRO Tomoyuki > Lingua-KO-Hangul-Util-0.27 > Lingua::KO::Hangul::Util

Download:
Lingua-KO-Hangul-Util-0.27.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.27   Source  

NAME ^

Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode

SYNOPSIS ^

  use Lingua::KO::Hangul::Util qw(:all);

  decomposeSyllable("\x{AC00}");          # "\x{1100}\x{1161}"
  composeSyllable("\x{1100}\x{1161}");    # "\x{AC00}"
  decomposeJamo("\x{1101}");              # "\x{1100}\x{1100}"
  composeJamo("\x{1100}\x{1100}");        # "\x{1101}"

  getHangulName(0xAC00);                  # "HANGUL SYLLABLE GA"
  parseHangulName("HANGUL SYLLABLE GA");  # 0xAC00

DESCRIPTION ^

A Hangul syllable consists of Hangul jamo (Hangul letters).

Hangul letters are classified into three classes:

  CHOSEONG  (the initial sound) as a leading consonant (L),
  JUNGSEONG (the medial sound)  as a vowel (V),
  JONGSEONG (the final sound)   as a trailing consonant (T).

Any Hangul syllable is a composition of (i) L + V, or (ii) L + V + T.

Composition and Decomposition

$resultant_string = decomposeSyllable($string)

It decomposes a precomposed syllable (LV or LVT) to a sequence of conjoining jamo (L + V or L + V + T) and returns the result as a string.

Any characters other than Hangul syllables are not affected.

$resultant_string = composeSyllable($string)

It composes a sequence of conjoining jamo (L + V or L + V + T) to a precomposed syllable (LV or LVT) if possible, and returns the result as a string. A syllable LV and final jamo T are also composed.

Any characters other than Hangul jamo and syllables are not affected.

$resultant_string = decomposeJamo($string)

It decomposes a complex jamo to a sequence of simple jamo if possible, and returns the result as a string. Any characters other than complex jamo are not affected.

  e.g.
      CHOSEONG SIOS-PIEUP to CHOSEONG SIOS + PIEUP
      JUNGSEONG AE        to JUNGSEONG A + I
      JUNGSEONG WE        to JUNGSEONG U + EO + I
      JONGSEONG SSANGSIOS to JONGSEONG SIOS + SIOS
$resultant_string = composeJamo($string)

It composes a sequence of simple jamo (L1 + L2, V1 + V2 + V3, etc.) to a complex jamo if possible, and returns the result as a string. Any characters other than simple jamo are not affected.

  e.g.
      CHOSEONG SIOS + PIEUP to CHOSEONG SIOS-PIEUP
      JUNGSEONG A + I       to JUNGSEONG AE
      JUNGSEONG U + EO + I  to JUNGSEONG WE
      JONGSEONG SIOS + SIOS to JONGSEONG SSANGSIOS
$resultant_string = decomposeFull($string)

It decomposes a syllable/complex jamo to a sequence of simple jamo. Equivalent to decomposeJamo(decomposeSyllable($string)).

Composition and Decomposition (Old-interface, deprecated!)

$string_decomposed = decomposeHangul($code_point)
@codepoints = decomposeHangul($code_point)

If the specified code point is of a Hangul syllable, it returns a list of code points (in a list context) or a string (in a scalar context) of its decomposition.

   decomposeHangul(0xAC00) # U+AC00 is HANGUL SYLLABLE GA.
      returns "\x{1100}\x{1161}" or (0x1100, 0x1161);

   decomposeHangul(0xAE00) # U+AE00 is HANGUL SYLLABLE GEUL.
      returns "\x{1100}\x{1173}\x{11AF}" or (0x1100, 0x1173, 0x11AF);

Otherwise, returns false (empty string or empty list).

   decomposeHangul(0x0041) # outside Hangul syllables
      returns empty string or empty list.
$string_composed = composeHangul($src_string)
@code_points_composed = composeHangul($src_string)

Any sequence of an initial jamo L and a medial jamo V is composed to a syllable LV; then any sequence of a syllable LV and a final jamo T is composed to a syllable LVT.

Any characters other than Hangul jamo and syllables are not affected.

   composeHangul("\x{1100}\x{1173}\x{11AF}.")
   # returns "\x{AE00}." or (0xAE00,0x2E);
$code_point_composite = getHangulComposite($code_point_here, $code_point_next)

It returns the codepoint of the composite if both two code points, $code_point_here and $code_point_next, are in Hangul, and composable.

Otherwise, returns undef.

Hangul Syllable Name

The following functions handle only a precomposed Hangul syllable (from U+AC00 to U+D7A3), but not a Hangul jamo or other Hangul-related character.

Names of Hangul syllables have a format of "HANGUL SYLLABLE %s".

$name = getHangulName($code_point)

If the specified code point is of a Hangul syllable, it returns its name; otherwise it returns undef.

   getHangulName(0xAC00) returns "HANGUL SYLLABLE GA";
   getHangulName(0x0041) returns undef.
$codepoint = parseHangulName($name)

If the specified name is of a Hangul syllable, it returns its code point; otherwise it returns undef.

   parseHangulName("HANGUL SYLLABLE GEUL") returns 0xAE00;

   parseHangulName("LATIN SMALL LETTER A") returns undef;

   parseHangulName("HANGUL SYLLABLE PERL") returns undef;
    # Regrettably, HANGUL SYLLABLE PERL does not exist :-)

Standard Korean Syllable Block

Standard Korean syllable block consists of L+ V+ T* (a sequence of one or more L, one or more V, and zero or more T) according to conjoining jamo behabior revised in Unicode 3.2 (cf. UAX #28). A sequence of L followed by T is not a syllable block without V, but consists of two nonstandard syllable blocks: one without V, and another without L and V.

$bool = isStandardForm($string)

It returns boolean whether the string is encoded in the standard form without a nonstandard sequence. It returns true only if the string contains no nonstandard sequence.

$resultant_string = insertFiller($string)

It transforms the string into standard form by inserting fillers into each syllables and returns the result as a string. Choseong filler (Lf, U+115F) is inserted into a syllable block without L. Jungseong filler (Vf, U+1160) is inserted into a syllable block without V.

$type = getSyllableType($code_point)

It returns the Hangul syllable type (cf. HangulSyllableType.txt) for the specified code point as a string: "L" for leading jamo, "V" for vowel jamo, "T" for trailing jamo, "LV" for LV syllables, "LVT" for LVT syllables, and "NA" for other code points (as Not Applicable).

EXPORT ^

By default:

    decomposeHangul
    composeHangul
    getHangulName
    parseHangulName
    getHangulComposite

On request:

    decomposeSyllable
    composeSyllable
    decomposeJamo
    composeJamo
    decomposeFull
    isStandardForm
    insertFiller
    getSyllableType

CAVEAT ^

This module does not support Hangul jamo assigned in Unicode 5.2.0 (2009).

A list of Hangul charcters this module supports:

    1100..1159 ; 1.1 # [90] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEORINHIEUH
    115F..11A2 ; 1.1 # [68] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSANGARAEA
    11A8..11F9 ; 1.1 # [82] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH
    AC00..D7A3 ; 2.0 # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH

AUTHOR ^

SADAHIRO Tomoyuki <SADAHIRO@cpan.org>

Copyright(C) 2001, 2003, 2005, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO ^

Unicode Normalization Forms (UAX #15)

http://www.unicode.org/reports/tr15/

Conjoining Jamo Behavior (revision) in UAX #28

http://www.unicode.org/reports/tr28/#3_11_conjoining_jamo_behavior

Hangul Syllable Type

http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt

Jamo Decomposition in Old Unicode

http://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt

ISO/IEC JTC1/SC22/WG20 N954

Paper by K. KIM: New canonical decomposition and composition processes for Hangeul

http://std.dkuug.dk/JTC1/SC22/WG20/docs/N954.PDF

(summary: http://std.dkuug.dk/JTC1/SC22/WG20/docs/N953.PDF) (cf. http://std.dkuug.dk/JTC1/SC22/WG20/docs/documents.html)

syntax highlighting: