The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Unicode::Unihan - The Unihan Data Base 3.2.0

SYNOPSIS

  use Unicode::Unihan;
  my $db = new Unicode::Unihan;
  print join("," => $db->Mandarin("\x{5c0f}\x{98fc}\x{5f3e}"), "\n";

ABSTRACT

This module provides a user-friendly interface to the Unicode Unihan Database 3.2. With this module, the Unihan database is as easy as shown in the SYNOPSIS above.

DESCRIPTION

The first thing you do is make the database available. Just say

  use Unicode::Unihan;
  my $db = new Unicode::Unihan;

That's all you have to say. After that, you can access the database via $db->tag($string) where tag is the tag in the Unihan Database, without 'k' prefix.

$data = $db->tag($string) =item @data = $db->tag($string)

The first form (scalar context) returns the Unihan Database entry of the first character in $string. The second form (array context) checks the entry for each character in $string.

  @data = $db->Mandarin("\x{5c0f}\x{98fc}\x{5f3e}");
  # @data is now ('SHAO4 XIAO3','SI4','DAN4')

  @data = $db->JapaneseKun("\x{5c0f}\x{98fc}\x{5f3e}");
  # @data is now ('CHIISAI KO O','KAU YASHINAU','TAMA HAZUMU HIKU')

SEE ALSO

perlunintro
perlunicode
The Unihand Database, in Text

http://www.unicode.org/Public/3.2-Update/Unihan-3.2.0.txt.gz

AUTHOR

For the Module:

Dan Kogai <dankogai@home.dan.intra>

For the Source Data:

Unicode, Inc.

COPYRIGHT AND LICENSE

For the Module Copyright 2002 by Dan Kogai, All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

For the Source Data;

Copyright (c) 1996-2002 Unicode, Inc. All Rights reserved.

 Name: Unihan database
 Unicode version: 3.2.0
 Table version: 1.1
 Date: 15 March 2002

TAGS

The following is auto-generated out of Unihan-3.2.0.txt. The 'k' prefix in the original source is omitted.

AccountingNumeric

The value of the character when used in the writing of accounting numerals.

AlternateKangXi

An alternate possible position for the character in the KangXi dictionary

AlternateMorohashi

An alternate possible position for the character in the Morohashi dictionary

BigFive

The Big Five mapping for this character in hex; note that this does *not* cover any of the Big Five extensions in common use, including the ETEN extensions.

CCCII

The CCCII mapping for this character in hex

CNS1986

The CNS 11643-1986 mapping for this character in hex

CNS1992

The CNS 11643-1992 mapping for this character in hex

Cangjie*

The cangjie input code for the character. This incorporates data from the file cangjie-table.b5 by Christian Wittern

Cantonese

The Cantonese pronunciation(s) for this character

The romanization used is a modified version of the Yale romanization, modified as follows:

(1) No effort is made to distinguish between Yale's "high level" and "high falling" tones, which are not universally reflected in all Cantonese romanizations and which appear to be no longer distinctive in Hong Kong Cantonese. As a general rule, syllables which end with a stop (p, t, or k) have the "high level" tone; but there are numerous exceptions.

(2) Digits 1-6 are used to indicate the tones --

  1 == High level/high falling
  2 == High rising
  3 == Middle level
  4 == Low falling
  5 == Low rising
  6 == Low level

(3) Accordingly, the letter "H" is *not* used as a tone indicator

Cantonese pronunciations are sorted alphabetically, not in order of frequency

CihaiT*

The position of this character in the Cihai (\x{8fad}\x{6d77}) dictionary, single volume edition, published in Hong Kong by the Zhonghua Bookstore, 1983 (reprint of the 1947 edition), ISBN 962-231-005-2.

The position is indicated by a decimal number. The digits to the left of the decimal are the page number. The first digit after the decimal is the row on the page, and the remaining two digits after the decimal are the position on the row.

CompatibilityVariant*

The compatibility decomposition for this ideograph, derived from the UnicodeData.txt file.

Cowles*

The index of this character in Roy T. Cowles, _A Pocket Dictionary of Cantonese_, Hong Kong: University Press, 1999.

DaeJaweon

The position of this character in the Dae Jaweon (Korean) dictionary used in the four-dictionary sorting algorithm. The position is in the form "page.position" with the final digit in the position being "0" for characters actually in the dictionary and "1" for characters not found in the dictionary and assigned a "virtual" position in the dictionary.

Thus, "1187.060" indicates the sixth character on page 1187. A character not in this dictionary but assigned a position between the 6th and 7th characters on page 1187 for sorting purposes would have the code "1187.061"

The edition used is the first edition, published in Seoul by Samseong Publishing Co., Ltd., 1988.

Definition

An English definition for this character

EACC

The EACC mapping for this character in hex

Fenn*

Data on the character from _Fenn's Chinese-English Pocket Dictionary_ by Courtenay H. Fenn, Cambridge, Mass.: Harvard University Press, 1942. The data here consists of a decimal number followed by a letter A through K. The decimal number gives the Soothill number for the character's phonetic, and the letter is a rough frequency indication, with A indicating the 500 most common ideographs, B the next five hundred, and so on.

Frequency

A rough fequency measurement for the character based on analysis of Chinese USENET postings

GB0

The GB 2312-80 mapping for this character in ku/ten form

GB1

The GB 12345-90 mapping for this character in ku/ten form

GB3

The GB 7589-87 mapping for this character in ku/ten form

GB5

The GB 7590-87 mapping for this character in ku/ten form

GB7

The "General Use Characters for Modern Chinese" mapping for this character

GB8

The GB 8565-89 mapping for this character in ku/ten form

GradeLevel*

The grade in the Hong Kong school system by which a student is expected to know the character.

HanYu

The position of this character in the Hanyu Da Zidian (HDZ) Chinese character dictionary (bibliographic information below).

The character references are given in the form "ABCDE.XYZ", in which: "A" is the volume number [1..8]; "BCDE" is the zero-padded page number [0001..4809]; "XY" is the zero-padded number of the character on the page [01..32]; "Z" is "0" for a character actually in the dictionary, and greater than 0 for a character assigned a "virtual" position in the dictionary. For example, 53044.060 indicates an actual HDZ character, the 6th character on Page 3,044 of Volume 5 (i.e. [U+269a4]). Note that the Volume 8 "BCDE" references are in the range [0008..0044] inclusive, referring to the pagination of the "Appendix of Addendum" at the end of that volume (beginning after p. 5746).

Release information:

This data set contains a total of 56097 records, 54728 of which are actual HDZ character references (positions are given for all HDZ head entries, including source-internal unifications), and 1369 of which are virtual character positions (see note below). All HDZ references in this data set are unique. Because of IRG source-internal unifications, a given UCS-4 Scalar Value (USV) may have more than one HDZ reference. Source-internal unifications are of two types: (1) unifications of graphical variants; (2) unifications of duplicate head entries.

The proofing of all references was done primarily on the basis of cross-checks of three versions of the reference data: (1) the original print source; (2) the "kIRGHanyuDaZidian" field of Unihan.txt (release 3.1.1d1); (3) "HDZ.txt", originally produced and proofed for Academia Sinica's Institute of Information Technology (Document Processing Laboratory). In addition, the data was checked against the "kHanYu" and "kAlternateHanYu" fields of Unihan.txt (release 3.1.1d1), which the present data set supersedes.

String value, string length, compound key, field count, and page total validations were all performed. Altogether, 578 omissions/ errors in source (2) were identified/corrected. Any remaining errors will likely relate to virtual positions, or to the ordering of actual characters within a given page. It is unlikely that errors across page breaks remain. Possible future deunifications of source-internal unifications will necessitate update of USV for some references. Under no circumstances should the source-internal unification (duplicate USV) mappings be removed from this data set.

Note: Source (3) contributed only actual HDZ character references to the proofing process, while source (2) contributed all virtual positions. It seems that the compilers of source (2) usually assigned virtual positions based on stroke count, though occasionally the virtual position brings the virtual character together with the actual HDZ character of which it is a variant, without regard to actual stroke count.

Bibliographic information for the print source:

<<Hanyu Da Zidian>> ['Great Chinese Character Dictionary' (in 8 Volumes)]. XU Zhongshu (Editor in Chief). Wuhan, Hubei Province (PRC): Hubei and Sichuan Dictionary Publishing Collectives, 1986 -1990. ISBN: 7-5403-0030-2/H.16. \x{300a}\x{6f22}\x{8a9e}\x{5927}\x{5b57}\x{5178}\x{300b}\x{3002} \x{8a31}\x{529b}\x{4ee5}\x{4e3b}\x{4efb}\x{ff0c}\x{5f90}\x{4e2d} \x{8212}\x{4e3b}\x{7de8}\x{ff0c} \x{ff08}\x{6f22}\x{8a9e}\x{5927}\x{5b57}\x{5178}\x{5de5}\x{4f5c} \x{59d4}\x{54e1}\x{6703}\x{ff09}\x{3002}\x{6b66}\x{6f22}\x{ff1a} \x{56db}\x{5ddd}\x{8fad}\x{66f8} \x{51fa}\x{7248}\x{793e}\x{ff0c}\x{6e56}\x{5317}\x{8fad}\x{66f8} \x{51fa}\x{7248}\x{793e},1986-1990. ISBN: 7-5403-0030 2/H.16.

HKGlyph*

The index of the character in \x{5e38}\x{7528}\x{5b57}\x{5b57}\x{5f62}\x{8868} (\x{4e8c}\x{96f6}\x{96f6}\x{96f6}\x{5e74}\x{4fee}\x{8a02}\x{672c}), \x{9999}\x{6e2f}: \x{9999}\x{6e2f}\x{6559}\x{80b2}\x{5b78}\x{9662}, 2000, ISBN 962-949-040-4.

This publication gives the "proper" shapes for characters as used in the Hong Kong school system.

HKSCS

Mappings to the Big Five extended code points used for the Hong Kong Supplementary Character Set

IBMJapan

The IBM Japanese mapping for this character in hex

IRG_GSource

The IRG "G" source mapping for this character in hex. The IRG "G" source consists of data from the following national standards, publications, and lists from the People's Republic of China and Singapore. The versions of the standards used are those provided by the PRC to the IRG and may not always reflect published versions of the standards generally available.

  4K    Siku Quanshu
  BK    Chinese Encyclopedia
  CH    The Ci Hai (PRC edition)
  CY    The Ci Yuan
  FZ and FZ_BK  Founder Press System
  G0    GB2312-80
  G1    GB12345-90 with 58 Hong Kong and 92 Korean "Idu" characters
  G3    GB7589-87 unsimplified forms
  G5    GB7590-87 unsimplified forms
  G7    General Purpose Hanzi List for Modern Chinese Language, and
        General List of Simplified Hanzi
  GS    Singapore characters
  G8    GB8685-88
  GE    GB16500-95
  HC    The Hanyu Da Cidian
  HZ    The Hanyu Da Zidian
  KX    The KangXi dictionary
IRG_HSource

The IRG "H" source mapping for this character in hex. The IRG "H" source consists of data from the Hong Kong Supplementary Characer Set.

IRG_JSource

The IRG "J" source mapping for this character in hex. The IRG "J" source consists of data from the following national standards and lists from Japan.

  J0    JIS X 0208-1990
  J1    JIS X 0212-1990
  J3    JIS X 0213-2000
  J4    JIS X 0213-2000
  JA    Unified Japanese IT Vendors Contemporary Ideographs, 1993
IRG_KSource

The IRG "K" source mapping for this character in hex. The IRG "K" source consists of data from the following national standards and lists from the Republic of Korea (South Korea).

  K0    KS C 5601-1987
  K1    KS C 5657-1991
  K2    PKS C 5700-1 1994
  K3    PKS C 5700-2 1994
  K4    PKS 5700-3:1998
IRG_KPSource

The IRG "KP" source mapping for this character in hex. The IRG "KP" source consists of data from the following national standards and lists from the Democratic People's Republic of Korea (North Korea).

  KP0   KPS 9566-97
  KP1   KPS 10721-2000
IRG_TSource

The IRG "T" source mapping for this character in hex. The IRG "T" source consists of data from the following national standards and lists from the Republic of China (Taiwan).

  T1    CNS 11643-1992, plane 1
  T2    CNS 11643-1992, plane 2
  T3    CNS 11643-1992, plane 3 (with some additional characters)
  T4    CNS 11643-1992, plane 4
  T5    CNS 11643-1992, plane 5
  T6    CNS 11643-1992, plane 6
  T7    CNS 11643-1992, plane 7
  TF    CNS 11643-1992, plane 15
IRG_VSource

The IRG "V" source mapping for this character in hex. The IRG "V" source consists of data from the following national standards and lists from Vietnam.

  V0    TCVN 5773:1993
  V1    VHN 01:1998
  V2    VHN 02:1998
  V3    TCVN 6056:1995
IRGDaeJaweon

The position of this character in the Dae Jaweon (Korean) dictionary used in the four-dictionary sorting algorithm. The position is in the form "page.position" with the final digit in the position being "0" for characters actually in the dictionary and "1" for characters not found in the dictionary and assigned a "virtual" position in the dictionary.

Thus, "1187.060" indicates the sixth character on page 1187. A character not in this dictionary but assigned a position between the 6th and 7th characters on page 1187 for sorting purposes would have the code "1187.061"

This field represents the official position of the character within the Dae Jaweon dictionary as used by the IRG in the four-dictionary sorting algorithm.

The edition used is the first edition, published in Seoul by Samseong Publishing Co., Ltd., 1988.

IRGDaiKanwaZiten

The index of this character in the Dae Kanwa Ziten, aka Morohashi dictionary (Japanese) used in the four-dictionary sorting algorithm.

This field represents the official position of the character within the DaiKanwa dictionary as used by the IRG in the four-dictionary sorting algorithm.

The edition used is the revised edition, published in Tokyo by Taishuukan Shoten, 1986.

IRGHanyuDaZidian

The position of this character in the Hanyu Da Zidian (PRC) dictionary used in the four-dictionary sorting algorithm. The position is in the form "volume page.position" with the final digit in the position being "0" for characters actually in the dictionary and "1" for characters not found in the dictionary and assigned a "virtual" position in the dictionary.

Thus, "32264.080" indicates the eighth character on page 2264 in volume 3. A character not in this dictionary but assigned a position between the 8th and 9th characters on this page for sorting purposes would have the code "32264.081"

This field represents the official position of the character within the Hanyu Da Zidian dictionary as used by the IRG in the four-dictionary sorting algorithm.

The edition of the Hanyu Da Zidian used is the first edition, published in Chengdu by Sichuan Cishu Publishing, 1986.

IRGKangXi

The position of this character in the KangXi dictionary used in the four-dictionary sorting algorithm. The position is in the form "page.position" with the final digit in the position being "0" for characters actually in the dictionary and "1" for characters not found in the dictionary and assigned a "virtual" position in the dictionary.

Thus, "1187.060" indicates the sixth character on page 1187. A character not in this dictionary but assigned a position between the 6th and 7th characters on page 1187 for sorting purposes would have the code "1187.061"

This field represents the official position of the character within the KangXi dictionary as used by the IRG in the four-dictionary sorting algorithm.

The edition of the KangXi dictionary used is the 7th edition published by Zhonghua Bookstore in Beijing, 1989.

JapaneseKun

The Japanese pronunciation(s) of this character

JapaneseOn

The Sino-Japanese pronunciation(s) of this character

JIS0213

The JIS X 0213-2000 mapping for this character in min,ku,ten form

Jis0

The JIS X 0208-1990 mapping for this character in ku/ten form

Jis1

The JIS X 0212-1990 mapping for this character in ku/ten form

KPS0

The KP 9566-97 mapping for this character in hexadecimal form.

KPS1

The KPS 10721-2000 mapping for this character in hexadecimal form.

KSC0

The KS X 1001:1992 (KS C 5601-1989) mapping for this character in ku/ten form

KSC1

The KS X 1002:1991 (KS C 5657-1991) mapping for this character in ku/ten form

KangXi

The position of this character in the KangXi dictionary used in the four-dictionary sorting algorithm. The position is in the form "page.position" with the final digit in the position being "0" for characters actually in the dictionary and "1" for characters not found in the dictionary and assigned a "virtual" position in the dictionary.

Thus, "1187.060" indicates the sixth character on page 1187. A character not in this dictionary but assigned a position between the 6th and 7th characters on page 1187 for sorting purposes would have the code "1187.061"

The edition of the KangXi dictionary used is the 7th edition published by Zhonghua Bookstore in Beijing, 1989.

Karlgren*

The index of this character in _Analytic Dictionary of Chinese and Sino-Japanese_ by Bernhard Karlgren, New York: Dover Publications, Inc., 1974.

If the index is followed by an asterisk (*), then the index is an interpolated one, indicating where the character would be found if it were to have been included in the dictionary.

Korean

The Korean pronunciation(s) of this character

Lau*

The index of this character in _A Practical Cantonese-English Dictionary_ by Sidney Lau, Hong Kong: The Government Printer, 1977.

MainlandTelegraph

The PRC telegraph code for this character, derived from "Kanzi denpou koudo henkan-hyou" ("Chinese character telegraph code conversion table"), Lin Jinyi, KDD Engineering and Consulting, Tokyo, 1984

Mandarin

The Mandarin pronunciation(s) for this character in pinyin; Mandarin pronunciations are sorted alphabetically, not in order of frequency

Matthews

The index of this character in _Mathews' Chinese-English Dictionary_ by Robert H. Mathews, Cambrige: Harvard University Press, 1975. Note that the field name is kMatthews instead of kMathews to maintain compatibility with earlier versions of this file, where it was inadvertently misspelled.

MeyerWempe*

The index of this character in the Student's Cantonese-English Dictionary by Bernard F. Meyer and Theodore F. Wempe (3rd edition, 1947)

Morohashi

The index of this character in the Dae Kanwa Ziten, aka Morohashi dictionary (Japanese) used in the four-dictionary sorting algorithm.

The edition used is the revised edition, published in Tokyo by Taishuukan Shoten, 1986.

Nelson

The index of this character in _The Modern Reader's Japanese-English Character Dictionary_ by Andrew Nathaniel Nelson, Rutland, Vermont: Charles E. Tuttle Company, 1974.

OtherNumeric

The numeric value for the character in certain unusual, specialized contexts.

Phonetic*

The phonetic index for the character from _Ten Thousand Characters: An Analytic Dictionary_ by G. Hugh Casey, S.J. Hong Kong: Kelley and Walsh, 1980.

PrimaryNumeric

The value of the character when used in the writing of numbers in the standard fashion.

PseudoGB1

A "GB 12345-90" code point assigned this character for the purposes of including it within Unihan. Pseudo-GB1 codes were used to provide official code points for characters not already in national standards, such as characters used to write Cantonese, and so on.

RSJapanese

A Japanese radical/stroke count for this character in the form "radical.additional strokes". A ' after the radical indicates the simplified version of the given radical

RSKanWa

A Morohashi radical/stroke count for this character in the form "radical.additional strokes". A ' after the radical indicates the simplified version of the given radical

RSKangXi

A KangXi radical/stroke count for this character in the form "radical.additional strokes". A ' after the radical indicates the simplified version of the given radical

RSKorean

A Korean radical/stroke count for this character in the form "radical.additional strokes". A ' after the radical indicates the simplified version of the given radical

RSUnicode

A standard radical/stroke count for this character in the form "radical.additional strokes". A ' after the radical indicates the simplified version of the given radical

SemanticVariant

The Unicode value for a semantic variant for this character. A semantic variant is an x- or y-variant with similar or identical meaning which can generally be used in place of the indicated character.

SBGY

The position of this character in the Song Ben Guang Yun (SBGY) Medieval Chinese character dictionary (bibliographic and general information below).

The 25330 character references are given in the form "ABC.XY", in which: "ABC" is the zero-padded page number [004..546]; "XY" is the zero-padded number of the character on the page [01..73]. For example, 364.38 indicates the 38th character on Page 364 (i.e. \x{6f8d}). Where a given Unicode Scalar Value (USV) has more than one reference, these are space-delimited.

Release information (20020310):

This data set contains a total of 25330 references, for 19511 different hanzi. The original data was input under the direction of Prof. LUO Fengzhu at Taiwan Taoyuanxian Yuan Zhi University (see below) using an early version of the Big5-based CDP encoding scheme developed at Academia Sinica. During 2000-2002 this raw data was processed and revised by Richard Cook as follows: the data was converted to Unicode encoding using his revised kHanYu mapping tables (first provided to the Unicode Consortium for the Unihan.txt release 3.1.1d1) and also using several other mapping tables developed specifically for this project; the kSBGY indices were generated based on hand-counts of all page totals; numerous indexing errors were corrected; and the data underwent final proofing.

About the print sources: The SBGY text, which dates to the beginning of the Song Dynasty (c. 1008, edited by \x{9673}\x{5f6d}\x{5e74} CHEN Pengnian et al.) is an enlargement of an earlier text known as Qie Yun (dated to c. 601, edited by \x{9678}\x{6cd5}\x{8a00} LU Fayan). With 25,330 head entries, this large early lexicon is important in part for the information which it provides for historical Chinese phonology. The GY dictionary employs a Chinese transcription method (known as \x{53cd}\x{5207}) to give pronunciations for each of its head entries. In addition, each syllable is also given a brief gloss. It must be emphasized that the mapping of a particular SBGY glyph to a single USV may in some cases be merely an approximation or may have required the choice of a "best possible glyph" (out of those available in the Unicode repertoire). This indexing data in conjunction with the print sources will be useful for evaluating the degree of distinctive variation in the character forms appearing in this text, and future proofing of this data may reveal additional Chinese glyphs for IRG encoding.

Bibliographic information on the print sources: \x{300a}\x{5b8b}\x{672c}\x{5ee3}\x{97fb}\x{300b} <<Song Ben Guang Yun>> ['Song Dynasty edition of the Guang Yun Rhyming Dictionary'], edited by \x{9673}\x{5f6d}\x{5e74} CHEN Pengnian et al. (c. 1008).

Two modern editions of this work were consulted in building the kSBGY indices:

\x{300a}\x{65b0}\x{6821}\x{6b63}\x{5207}\x{5b8b}\x{672c}\x{5ee3} \x{97fb}\x{300b}\x{3002}\x{53f0}\x{7063}\x{9ece}\x{660e}\x{6587} \x{5316}\x{4e8b}\x{696d}\x{516c}\x{53f8}\x{51fa}\x{7248}\x{ff0c} \x{6797}\x{5c39}\x{6821}\x{8a02}1976

\x{5e74}\x{51fa}\x{7248}\x{3002}[This was the edition used in by Prof. LUO \x{53f0}\x{7063}\x{6843}\x{5712}\x{7e23}\x{5143}\x{667a}\x{5927} \x{5b78}\x{4e2d}\x{8a9e}\x{7cfb}\x{7f85}\x{9cf3}\x{73e0}, and in the subsequent revision, conversion, indexing and proofing.]

\x{300a}\x{65b0}\x{6821}\x{4e92}\x{8a3b}\x{2027}\x{5b8b}\x{672c} \x{5ee3}\x{97fb}\x{300b}\x{3002}\x{9999}\x{6e2f}\x{4e2d}\x{6587} \x{5927}\x{5b78},\x{4f59}\x{8ffa}\x{6c38}1993,2000\x{5e74}\x{51fa} \x{7248}\x{3002} ISBN: 962-201-413-5; 7-5326-0685-6. [Textual problems were resolved on the basis of this extensively annotated modern edition of the text.]

Further Information: For further information on this index data and the databases from which it is excerpted, or to report errata, please contact Richard S. Cook <rscook@socrates.berkeley.edu>.

SimplifiedVariant

The Unicode value for the simplified Chinese variant for this character (if any).

Note that a character can be *both* a traditional Chinese character in its own right *and* the simplified variant for other characters (e.g., U+53F0).

In such case, the character is listed as its own simplified variant and one of its own traditional variants. This distinguishes this from the case where the character is not the simplified form for any character (e.g., U+4E95).

Much of the of the data on simplified and traditional variants was supplied by Wenlin <http://www.wenlin.com>

SpecializedSemanticVariant

The Unicode value for a specialized semantic variant for this character.

A specialized semantic variant is an x- or y-variant with similar or identical meaning only in certain contexts (such as accountants' numerals).

TaiwanTelegraph

The Taiwanese telegraph code for this character, derived from "Kanzi denpou koudo henkan-hyou" ("Chinese character telegraph code conversion table"), Lin Jinyi, KDD Engineering and Consulting, Tokyo, 1984

Tang*

The Tang dynasty pronunciation(s) of this character, derived from _T'ang Poetic Vocabulary_ by Hugh M. Stimson, Far Eastern Publications, Yale Univ. 1976. Stimson's romanization has been modified as follows:

The tones are indicated using numerals 1 through 4. Stimson leaves the level (tone 1) and entering (tone 4) tones unmarked (the latter being found in syllables ending in a stop, -p, -t, or -k), uses a hacek accent for the rising tone (tone 2), and a grave accent the departing tone (tone 3)

Stimson's script a (\x{0251}, U+0251) is replaced with a-umlaut (\x{00e4}, U+00E4)

Stimson's open e (\x{025b}, U+025B) is replaced with e-umlaut (\x{00eb}, U+00EB)

Stimson's schwa (\x{0259}, U+0259) is replaced with e-circumflex (\x{00ea}, U+00EA)

TotalStrokes

The total number of strokes in the character (including the radical)

TraditionalVariant

The Unicode value(s) for the traditional Chinese variant(s) for this character.

Note that a character can be *both* a traditional Chinese character in its own right *and* the simplified variant for other characters (e.g., U+53F0).

In such case, the character is listed as its own simplified variant and one of its own traditional variants. This distinguishes this from the case where the character is not the simplified form for any character (e.g., U+4E95).

Much of the of the data on simplified and traditional variants was supplied by Wenlin Institute, Inc. <http://www.wenlin.com>

Vietnamese

The character's pronunciation(s) in Qu\x{1ed1}c ng\x{1eef}

Xerox

The Xerox code for this character

ZVariant

The Unicode value(s) for known z-variants of this character

ACCURACY OF THE DATA:

Not all of these fields have been checked and proofed as carefully as some others have been. Please report errata, corrections, and additions at <http://www.unicode.org/unicode/reporting.html>.

The following fields may be taken as completely accurate and their values are *normative* parts of Unicode and ISO/IEC 10646-1 and -2:

kIRG_GSource, kIRG_TSource, kIRG_JSource, kIRG_KSource, kIRG_KPSource, kIRG_VSource

The IRG dictionary fields have also been extensively proofed by IRG experts and may be taken as accurate.

The following fields have been extensively proofed by experts world-wide and may be taken as accurate:

kBigFive, kCNS1986, kGB0, kGB1, kGB3, kGB5, kGB7, kGB8, kJis0, kJis1, kJIS0213, kKSC0, kKSC1, kPseudoGB1, kCCCII, kCNS1992, kDaeJaweon, kHanYu, kIBMJapan, kKangXi, kMatthews, kMorohashi, kNelson, kXerox

The remaining fields have not been as extensively proofed and their values should be taken as provisional. Some of these fields are still in the process of being populated; more data will be available in future releases of this file. Such fields are marked in this header with an asterisk (*).

KNOWN ERRORS:

U+6B06 should map to the kIRG_KSource 2-3D7B, not 7-3D7B. This error is in a normative part of the standard; the relevant standards bodies are aware of it, but we cannot fix it in this file until the fix is officially adopted

U+2F958 should map to the kIRG_TSource 6-4267, not 6-4627. This error is in a normative part of the standard; the relevant standards bodies are aware of it, but we cannot fix it in this file until the fix is officially adopted

The Japanese and Korean readings need to be normalized. The Mandarin vowel \x{00dc} is not consistently represented as pinyin requires