brian d foy > Unicode-Tussle-1.05 > uninames

Download:
Unicode-Tussle-1.05.tar.gz

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Source  

NAME ^

uninames - show selected Unicode character descriptions

SYNOPSIS ^

uninames [options] criteria

Options must use double-dash form only:

  --version   print version information
  --help      this message
  --man       full manpage

  --word      patterns wrapped with \b ... \b
  --bmp       restrict matches to Basic Multilingual Plane
  --astral    restrict matches to above the Basic Multilingual Plane

  --debug     print debugging and exit

Args are otherwise patterns, all of which much be matched. Each match is case-insensitive if it contains any lower-case letters. A single leading minus is a negated match.

  eg: uninames greek alpha
  eg: uninames GREEK ALPHA tonos
  eg: uninames LATIN LETTER -greek -WITH

DESCRIPTION ^

The uninames program searches the Unicode NamesList.txt file for character descriptions, showing all entries matching the selection criteria that were given as program arguments. Without any arguments, the entire file is displayed.

A typical entry looks like this:

    007C  VERTICAL LINE
        = vertical bar
        * used in pairs to indicate absolute value
        x (latin letter dental click - 01C0)
        x (hebrew punctuation paseq - 05C0)
        x (divides - 2223)
        x (light vertical bar - 2758)

So its official name is in all caps, but later parts of the description are in mixed case--usually lowercase. You can use this property to restrict what part of the entry you do or not match. Note also that code points are given in hex if you care to match them.

Although typical arguments are words, each argument is a regular expression. Word boundaries enclose each argument (only) if the --words option is given. If any lowercase letters occur in a given argument (except for regex escapes), that argument will be matched case insensitively.

Each pattern is compiled with /x, /m, and /s. Each entry in the names file is examined in succession, and if all criteria match, that entry printed out prefixed with its literal character and the code point's decimal value right before its hex value.

Output is piped through the user's pager, or more if none is set.

EXAMPLES ^

Find entries matching both "greek" and "alpha", case insensitively:

  $ uninames greek alpha

Find entries matching both "GREEK" and "ALPHA" case sensitively, and "tonos" case insensitively:

  $ uninames GREEK ALPHA tonos

Find entries matching both "LATIN" and "LETTER case sensitively, but not matching "greek" case insensitively nor "WITH" case sensitively.

  $ uninames LATIN LETTER -greek -WITH

Find entries whose official name ends with "ETH" and are from the Basic Multilingual Plane:

  $ uninames --bmp "ETH$"

Find entries containing "latin" case insensitively anywhere in the description at word boundaries, and which are not from the Basic Multilingual Plane:

  $ uninames --word --astral latin

Find entries with aliased names, except for those named "<control>":

  $ uninames '^ \s+ = \s+' -'<control>'

Find entries marked as used in French:

  $ uninames '\* .* French'

Find entries marked as used in either Spanish or Portuguese:

  $ uninames '^ \s+ \* .* (Spanish|Portuguese)'

FILES ^

$privlib/unicore/NamesList.txt

PROGRAMS ^

less(1)

BUGS ^

It's hard to remember to type a double-dash for options.

If your system's idea of valid Unicode lags behind your font's, you may have to call less yourself, passing it -r so it displays the real characters instead of "<U+XXXX>".

May be subclever in inferring case sensitivity.

SEE ALSO ^

unichars, uniprops, perlunicode

Tim Bray's article discussing the astral planes http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF

AUTHOR ^

Tom Christiansen <tchrist@perl.com>

HISTORY ^

3.0

Fri Aug 8 20:44:08 MDT 2008

"Conway made me do it"

4.0

Sun Oct 24 14:05:56 MDT 2010

Restructured code.

syntax highlighting: