Unicode::Semantics - Work around *the* Perl 5 Unicode bug
$foo; # could be anything us($foo): # force Unicode semantics
or:
us($foo) =~ s/\W/_/g; # Upgrade and use immediately
Perl uses Unicode semantics when the internal encoding for a string is UTF-8, but it uses ASCII semantics when the internal encoding is ISO-8859-1. This means that the non-ASCII part of the character set is ignored when for the following operations:
* uc, lc, ucfirst, lcfirst, \U, \L, \u, \l * \d, \s, \w, \D, \S, \W * /.../i, (?i:...) * /[[:posix:]]/
Because you shouldn't (and often don't) know what the internal encoding will be, it's hard to predict whether these operations will actually do what you want. Unicode::Semantics::us() gives you predictable results.
This module exports us that upgrades your string to UTF-8 internally and returns the string.
us
You can also use utf8::upgrade, which does exactly the same thing, except that it does not return the string. This module was released because it's less typing in a large program :)
utf8::upgrade
Obviously, these broken text operations are no problem when you're dealing with bytes instead of characters. Don't upgrade your binary strings!
Juerd Waalboer <#####@juerd.nl>
perlunitut, perlunifaq
To install Unicode::Semantics, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Unicode::Semantics
CPAN shell
perl -MCPAN -e shell install Unicode::Semantics
For more information on module installation, please visit the detailed CPAN module installation guide.