Unicode::Properties - find out what properties a character has
use utf8; use Unicode::Properties 'uniprops'; my @prop_list = uniprops ('☺'); # Unicode smiley face print "@prop_list\n";
produces output
Any Assigned Common InMiscellaneousSymbols
(This example is included as synopsis.pl in the distribution.)
You can then use, for example, \p{InMiscellaneousSymbols} to match this character in a regular expression.
\p{InMiscellaneousSymbols}
This documents Unicode::Properties version 0.07 corresponding to git commit c1071fba77751ea891c887d26d8e3ea9ce91f631 released on Mon Jan 30 09:30:10 2017 +0900.
Unicode::Properties provides a way to go from a character to its list of properties.
my @prop_list = uniprops ('☺'); # Unicode smiley face
Given a character, returns a list of properties which the character has. This works by testing its argument against \p{} regular expressions for every possible category the module knows about, so it is not an efficient method.
\p{}
use Unicode::Properties 'uniprops'; print join (',',uniprops('2')), "\n";
ASCII,Any,Assigned,Common,IDContinue,InBasicLatin
(This example is included as univer.pl in the distribution.)
my @matching = matchchars ($property);
This returns a list of all the characters which match a particular property. If $property is not found in the list of possible Unicode properties, it treats it as a regular expression.
$property
It can also return an array reference:
use utf8; use FindBin '$Bin'; use Unicode::Properties ':all'; my $type = 'InCJKUnifiedIdeographs'; my $matching = matchchars ($type); printf "There are %d characters of type %s.\n", scalar (@$matching), $type;
There are 20992 characters of type InCJKUnifiedIdeographs.
(This example is included as matchchars.pl in the distribution.)
$unicode_version is the version of Unicode supplied with your version of Perl, taken from "Unicode::UCD". To override the Unicode version and get properties for a different version of Unicode, set this to a desired value.
$unicode_version
"uniprops" and "matchchars" are exported on demand. A tag :all exports all the functions of the module.
:all
Unicode::UCD (Unicode Character Database) is used to find the version of Unicode which your Perl supplies.
This module uses a list taken from the "perlunicode" documentation. It should use Perl's internals or the Unicode files to get the list.
As of version 0.07, the Unicode data dates from an older version of Perl.
Depending on your Perl and Unicode version, you'll get different results. For example "Balinese" was added in Unicode version 5.0.0, so if you are using Perl 5.8.8 unpatched, your Unicode version is 4.1.0 so you won't get "Balinese" in the results list.
Also, I don't know the behaviour of Unicode versions other than 4.1.0 and 5.0.0, so this module only covers those two. I couldn't get Perl 5.8.5 to install on my computer, so I've set the minimum version to 5.8.8 for this module.
This script was written because the author (Tom Christiansen, <TCHRIST>) was dissatisfied with Unicode::Properties. Unfortunately, it uses the same method as this module, of parsing the Perl documentation to get the information. The last time I tested it, it only worked for Perl versions 5.12 or 5.14, but that was about three years ago.
See perlunicode for Unicode documentation, and perluniprops for details of all the different properties. There is also a tutorial in perlunitut, and some more advice in perlunifaq.
Tutorial on Perl and Unicode is a tutorial for people new to Unicode and Perl.
Get the Unicode value of a character in Perl explains how to get the Unicode value of a single character.
What characters match a regular expression? is a Perl script which shows what single characters match a particular regular expression, like \s or \p{InCJKUnifiedIdeographs}.
\s
\p{InCJKUnifiedIdeographs}
Ben Bullock, <bkb@cpan.org>
This package and associated files are copyright (C) 2011-2017 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.
To install Unicode::Properties, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Unicode::Properties
CPAN shell
perl -MCPAN -e shell install Unicode::Properties
For more information on module installation, please visit the detailed CPAN module installation guide.