The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Unicode::Properties - find out what properties a character has

SYNOPSIS

    use utf8;
    use Unicode::Properties 'uniprops';
    my @prop_list = uniprops ('☺'); # Unicode smiley face
    print "@prop_list\n";
    

produces output

    Any Assigned Common InMiscellaneousSymbols

(This example is included as synopsis.pl in the distribution.)

You can then use, for example, \p{InMiscellaneousSymbols} to match this character in a regular expression.

VERSION

This documents Unicode::Properties version 0.07 corresponding to git commit c1071fba77751ea891c887d26d8e3ea9ce91f631 released on Mon Jan 30 09:30:10 2017 +0900.

DESCRIPTION

Unicode::Properties provides a way to go from a character to its list of properties.

FUNCTIONS

uniprops

    my @prop_list = uniprops ('☺'); # Unicode smiley face

Given a character, returns a list of properties which the character has. This works by testing its argument against \p{} regular expressions for every possible category the module knows about, so it is not an efficient method.

    use Unicode::Properties 'uniprops';
    print join (',',uniprops('2')), "\n";
    
    
    

produces output

    ASCII,Any,Assigned,Common,IDContinue,InBasicLatin

(This example is included as univer.pl in the distribution.)

matchchars

   my @matching = matchchars ($property);

This returns a list of all the characters which match a particular property. If $property is not found in the list of possible Unicode properties, it treats it as a regular expression.

It can also return an array reference:

    use utf8;
    use FindBin '$Bin';
    use Unicode::Properties ':all';
    my $type = 'InCJKUnifiedIdeographs';
    my $matching = matchchars ($type);
    printf "There are %d characters of type %s.\n", scalar (@$matching), $type;
    

produces output

    There are 20992 characters of type InCJKUnifiedIdeographs.

(This example is included as matchchars.pl in the distribution.)

VARIABLES

$unicode_version

$unicode_version is the version of Unicode supplied with your version of Perl, taken from "Unicode::UCD". To override the Unicode version and get properties for a different version of Unicode, set this to a desired value.

EXPORTS

"uniprops" and "matchchars" are exported on demand. A tag :all exports all the functions of the module.

DEPENDENCIES

Unicode::UCD

Unicode::UCD (Unicode Character Database) is used to find the version of Unicode which your Perl supplies.

BUGS

Data source

This module uses a list taken from the "perlunicode" documentation. It should use Perl's internals or the Unicode files to get the list.

Outdated data

As of version 0.07, the Unicode data dates from an older version of Perl.

Perl & Unicode version

Depending on your Perl and Unicode version, you'll get different results. For example "Balinese" was added in Unicode version 5.0.0, so if you are using Perl 5.8.8 unpatched, your Unicode version is 4.1.0 so you won't get "Balinese" in the results list.

Also, I don't know the behaviour of Unicode versions other than 4.1.0 and 5.0.0, so this module only covers those two. I couldn't get Perl 5.8.5 to install on my computer, so I've set the minimum version to 5.8.8 for this module.

SEE ALSO

Other CPAN modules

"uniprops" in Unicode::Tussle

This script was written because the author (Tom Christiansen, <TCHRIST>) was dissatisfied with Unicode::Properties. Unfortunately, it uses the same method as this module, of parsing the Perl documentation to get the information. The last time I tested it, it only worked for Perl versions 5.12 or 5.14, but that was about three years ago.

Information about Perl and Unicode

Perl Unicode documentation

See perlunicode for Unicode documentation, and perluniprops for details of all the different properties. There is also a tutorial in perlunitut, and some more advice in perlunifaq.

Other Unicode and Perl information

Tutorial on Perl and Unicode is a tutorial for people new to Unicode and Perl.

Get the Unicode value of a character in Perl explains how to get the Unicode value of a single character.

What characters match a regular expression? is a Perl script which shows what single characters match a particular regular expression, like \s or \p{InCJKUnifiedIdeographs}.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2011-2017 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.