Abigail > Regexp-CharClasses-2009102801 > Regexp::CharClasses

Download:
Regexp-CharClasses-2009102801.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 2009102801   Source  

NAME ^

Regexp::CharClasses - Provide character classes

SYNOPSIS ^

 use Regexp::CharClasses;               # Import all.

 "..." =~ /\p{IsDigit0}/;
 "..." =~ /\P{IsThaiDigit}/;

 use Regexp::CharClasses qw [IsDigit2]; # Import a property.

 use Regexp::CharClasses ':perl';       # Properties tagged ':perl'

DESCRIPTION ^

Using the module Regexp::CharClasses in your package allows you to use several "Unicode Property" character classes in addition to the standard ones. Such character classes are all of the form \p{IsProperty} (which matches a character adhering to the property) and \P{IsProperty} (which matches a character not adhering to the property). For details, see "Unicode Properties" in perlrecharclass.

By default, all the properties listed below will be imported in your namespace. But you can specify which properties you want to import by giving them as arguments to the use line. Alternatively, you can import one or more tags. The properties listed below will specify to which tags they belong.

Properties

The following properties are exported from Regexp::CharClasses:

\p{IsDigit0}
\p{IsDigit1}
\p{IsDigit2}
\p{IsDigit3}
\p{IsDigit4}
\p{IsDigit5}
\p{IsDigit6}
\p{IsDigit7}
\p{IsDigit8}
\p{IsDigit9}

Matches any digit 0 for \p{IsDigit0}, any digit 1 for \p{IsDigit1} etc, in one of the following languages or scripts: Latin, Arabic-Indic, Extended Arabic-Indic, Nko, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Tibetan, Myanmar, Khmer, Mongolian, Limbu, New Tai Lue, Balinese, Osmanya, Fullwidth, Mathematical Bold, Mathematical Double-Struck, Mathematical Sans-Serif, Mathematical Sans-Serif Bold, Mathematical Monospace. The code points of the characters matched can be found by adding the digit being matched to the following list (in hex):

  0030  0660  06F0  07C0  0966  09E6  0A66  0AE6  0B66  0BE6  0C66 
  0CE6  0D66  0E50  0ED0  0F20  1040  17E0  1810  1946  19D0  1B50
  FF10 104A0 1D7CE 1D7D8 1D7E2 1D7EC 1D7F6

The properties are imported when asking for the tag :digits.

\p{IsLatinDigit}
\p{IsArabicIndicDigit}
\p{IsExtendedArabicIndicDigit}
\p{IsNkoDigit}
\p{IsDevanagariDigit}
\p{IsBengaliDigit}
\p{IsGurmukhiDigit}
\p{IsGujaratiDigit}
\p{IsOriyaDigit}
\p{IsTamilDigit}
\p{IsTeluguDigit}
\p{IsKannadaDigit}
\p{IsMalayalamDigit}
\p{IsThaiDigit}
\p{IsLaoDigit}
\p{IsTibetanDigit}
\p{IsMyanmarDigit}
\p{IsKhmerDigit}
\p{IsMongolianDigit}
\p{IsLimbuDigit}
\p{IsNewTaiLueDigit}
\p{IsBalineseDigit}
\p{IsOsmanyaDigit}
\p{IsFullwidthDigit}
\p{IsMathematicalBoldDigit}
\p{IsMathematicalSansSerifDigit}
\p{IsMathematicalSansSerifBoldDigit}
\p{IsMathematicalMonospaceDigit}

These properties match the characters representing the digits 0 .. 9 in the given script.

The properties are imported when asking for the tag :digits.

\p{IsPerlSigil}

This property matches all the characters that are sigils in Perl. It's equivalent with [\$\@%&*]. This property is imported when asking for the tag :perl.

\p{IsLeftParen}
\p{IsRightParen}
\p{IsParen}

These properties match left (opening) parenthesis, right (closing) parenthesis, and just any parenthesis respectively. The classes are equivalent to [(<[{], [)>\]}] and [()<>[\]{}]. These properties are imported when asking for the tag :perl

\p{IsLcVowel}
\p{IsUcVowel}
\p{IsVowel}

These properties match vowels in the English language. \p{IsLcVowel} matches lowercase vowels, \p{IsUcVowel} matches uppercase vowels, while \p{IsVowel} matches vowels in any case. The properties are equivalent to the character classes [aeiou], [AEIOU] and [aeiouAEIOU].

The properties are imported when asking for the tag :english.

\p{IsLcConsonant}
\p{IsUcConsonant}
\p{IsConsonant}

These properties match consonants in the English language. \p{IsLcConsonant} matches lowercase consonants, \p{IsUcConsonant} matches uppercase consonants, while \p{IsConsonant} matches consonants in any case. The properties are equivalent to the character classes [bcdfghjklmnpqrstvwxyz], [BCDFGHJKLMNPQRSTVWXYZ] and [bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ].

The properties are imported when asking for the tag :english.

\p{IsUuencode}

This property matches the characters used in uudecoding strings. Uudecode uses 64 characters to encode, with space and accent grave interchangeable, so this property matches 65 different characters; it's equivalent with the character class [ !"#\$%&'()*+,\-./0-9:;<=>?\@A-Z[\\\]^_`].

\p{IsUuencode} is imported when asking for the tag :encode.

\p{IsBase64}
\p{IsBase64url}
\p{IsBase32}
\p{IsBase32hex}
\p{IsBase16}

These properties match the characters used in various encodings as described in RFC 4648, of which base64 is the probably the best known. \p{IsBase64} and \p{IsBase64url} use 64 characters to encode, \p{IsBase32} and \p{IsBase32hex} use 32 characters, and \p{IsBase16} use 16. All but \p{IsBase16} use the = character for padding. The properties are equivalent with the following character classes: [0-9A-Za-z+/=] (\p{IsBase64}), [0-9A-Za-z\-_=] (\p{IsBase64url}), [2-7A-Z=] (\p{IsBase32}), [0-9A-V=] (\p{IsBase32hex}), and [0-9A-F] (\p{IsBase16}).

The properties are imported when asking for the tag :encode.

\p{IsBinHex}

This property matches the characters used when encoding a string using the binhex method. The encoding uses 64 characters; it's equivalent with the character class [!"#\$%&'()*+,\-0-9\@A-NP-VX-Z[`a-fh-mp-r].

\p{IsBinHex} is imported when asking for the tag :encode.

EXAMPLES ^

 use Regexp::CharClasses;

 "[" =~ /\p{IsRightParen}/;    # Match
 "[" =~ /\p{IsLeftParen}/;     # No match
 "[" =~ /\p{IsParen}/;         # Match

 use charnames ":full";
 $thai5 = "\N{THAI DIGIT FIVE}";
 $thai5 =~ /\p{IsDigit5}/;     # Match
 $thai5 =~ /\P{IsDigit6}/;     # Match
 $thai5 =~ /\p{IsThaiDigit}/;  # Match

INSTALLATION ^

To install this module type the following:

   perl Makefile.PL
   make
   make test
   make install

BUGS ^

Sometimes y and w are used as consonants in English.

DEVELOPMENT ^

The current sources of this module are found on github, git://github.com/Abigail/regexp--charclasses.git.

AUTHOR ^

Abigail mailto:regexp-charclasses@abigail.be.

COPYRIGHT and LICENSE ^

This program is copyright 2008 - 2009 by Abigail.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

syntax highlighting: