Unicode::Util - Unicode-aware versions of built-in Perl functions
This document describes Unicode::Util version 0.06.
use Unicode::Util qw( graph_length code_length byte_length ); # grapheme cluster ю́: Cyrillic small letter yu + combining acute accent my $grapheme = "\x{44E}\x{301}"; say graph_length($grapheme); # 1 say code_length($grapheme); # 2 say byte_length($grapheme, 'UTF-8'); # 4
This module provides Unicode-aware versions of Perl’s built-in string functions, tailored to work on grapheme clusters as opposed to code points or bytes.
Functions may each be exported explicitly, or by using the :all tag for everything or the :length tag for the length functions.
:all
:length
Returns the length of the given string in grapheme clusters. This is the closest to the number of “characters” that many people would count on a printed string.
Returns the length of the given string in code points. This is likely the number of “characters” that many programmers and programming languages would count in a string. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form.
Valid normalization forms are C or NFC, D or NFD, KC or NFKC, and KD or NFKD.
C
NFC
D
NFD
KC
NFKC
KD
NFKD
Returns the length of the given string in bytes, as if it were encoded using the specified encoding or UTF-8 if no encoding is supplied. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form.
Returns the given string with the last grapheme cluster chopped off. Does not modify the original value, unlike the built-in chop.
chop
Returns the given string value with all grapheme clusters in the opposite order.
graph_substr, graph_index, graph_rindex
graph_substr
graph_index
graph_rindex
Unicode::GCString, String::Multibyte, Perl6::Str, http://perlcabal.org/syn/S32/Str.html
Nick Patch <patch@cpan.org>
© 2011–2012 Nick Patch
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Unicode::Util, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Unicode::Util
CPAN shell
perl -MCPAN -e shell install Unicode::Util
For more information on module installation, please visit the detailed CPAN module installation guide.