Lingua::Han::Utils - The utility tools of Chinese character(HanZi)
use Lingua::Han::Utils qw/Unihan_value csplit cdecode csubstr clength/; # cdecode # the same as decode('cp936', $word) in ASCII editing mode # and decode('utf8', $word) in Unicode editing mode my $word = cdecode($word); # Unihan_value # return the first field of Unihan.txt on unicode.org my $word = "我"; my $unihan = Unihan_value($word); # return '6211' my $words = "爱你"; my @unihan = Unihan_value($word); # return (7231, 4F60) my $unihan = Unihan_value($word); # return 72314F60 # csplit # split the Chinese characters into an array my $words = "我爱你"; my @words = csplit($words); # return ("我", "爱", "你") # csubstr # treat the Chinese characters as one # so it's the same as splice(csplit($words), $offset, $length) my $words = "我爱你啊"; my @words = csubstr($words, 1, 2); # return ("爱", "你") my @words = csubstr($words, 1); # return ("爱", "你", "啊") my $words = csubstr($words, 1, 2); # 爱你 # clength # treat the Chinese character as one my $words = "我爱你"; print clength($words); # 3
Nothing is exported by default.
use Encode::Guess to decode the character. It behavers like: decode('cp936', $word) under ASCII editing mode and decode('utf8', $word) under Unicode editing mode.
the first field of Unihan.txt is the Unicode scalar value as U+[x]xxxx, we return the [x]xxxx.
split the Chinese characters into an array, English words can be mixed in.
treat the Chinese character as one word, substr it.
(BE CAFEFUL! it's NOT lvalue, we cann't use csubstr($word, 2, 3) = $REPLACEMENT)
if no LENGTH is specified, substr form OFFSET to END.
treat the Chinese character as one word(length 1).
a Chinese version of document can be found @ http://www.fayland.org/journal/Lingua-Han-Utils.html
Fayland Lam, <fayland at gmail.com>
<fayland at gmail.com>
Please report any bugs or feature requests to bug-lingua-han-utils at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-Han-Utils. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-lingua-han-utils at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Lingua::Han::Utils
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Lingua-Han-Utils
CPAN Ratings
http://cpanratings.perl.org/d/Lingua-Han-Utils
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Lingua-Han-Utils
Search CPAN
http://search.cpan.org/dist/Lingua-Han-Utils
the wonderful Encode::Guess
Copyright 2005 Fayland Lam, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Lingua::Han::Utils, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::Han::Utils
CPAN shell
perl -MCPAN -e shell install Lingua::Han::Utils
For more information on module installation, please visit the detailed CPAN module installation guide.