NAME

Text::Transliterator::Unaccent - Compile a transliterator from Unicode tables, to remove accents from text

SYNOPSIS

  my $unaccenter = Text::Transliterator::Unaccent->new(script => 'Latin',
                                                       wide   => 0,
                                                       upper  => 0);
  $unaccenter->($string);

  my $map   = Text::Transliterator::Unaccent->char_map(script => 'Latin');

  my $descr = Text::Transliterator::Unaccent->char_map_descr();

DESCRIPTION

This package compiles a transliteration function that will replace accented characters by unaccented characters. That function is fast, because it uses the builtin tr/.../.../ Perl operator; it is compact, because it only treats the Unicode subset that you need for your language; and it is complete, because it relies on the builtin Unicode character tables shipped with your Perl installation.

The algorithm for detecting accented characters is derived from the notion of compositions in Unicode; that notion is explained in perluniintro. Characters considered "accented" are the precomposed characters for which the Unicode canonical decomposition contains more than one codepoint; for such decompositions, the first codepoint is the unaccented character that will be mapped to the accented one. This definition seems to work well for the Latin script; I presume that it also makes sense for other scripts as well, but I'm not able to test.

METHODS

new

  my $unaccenter = Text::Transliterator::Unaccent->new(@range_description);
  # or
  my $unaccenter = Text::Transliterator::Unaccent->new(); # script => 'Latin'

Compiles a new 'unaccenter' function. The @range_description argument specifies which ranges of characters will be handled, and is comprised of pairs of shape :

script => $unicode_script

$unicode_script is the name of a Unicode script, such as 'Latin', 'Greek' or 'Cyrillic'. For a complete list of unicode scripts, see

  perl -MUnicode::UCD=charscripts -e "print join ', ', keys %{charscripts()}"

block => $unicode_block

$unicode_block is the name of a Unicode block. For a complete list of Unicode blocks, see

  perl -MUnicode::UCD=charblocks -e "print join ', ', keys %{charblocks()}"

range => \@codepoint_ranges

@codepoint_ranges is a list of arrayrefs that contain start-of-range, end-of-range code point pairs.

wide => $bool

Decides if wide characters (i.e. characters with code points above 255) are kept or not within the map. The default is true.

upper => $bool

Decides if uppercase characters are kept or not within the map. The default is true.

lower => $bool

Decides if lowercase characters are kept or not within the map. The default is true.

The @range_description may contain a list of several scripts, blocks and/or ranges; all will get concatenated into a single correspondance map. If the list is empty, the default range is script => 'Latin'.

The return value from that new method is actually a reference to a function, not an object. That function is called as

  $unaccenter->(@strings);

and modifies every member of @strings in place, like the tr/.../.../ operator. The return value is the number of transliterated characters in the last member of @strings.

char_map

  my $map = Text::Transliterator::Unaccent->char_map(@range_description);

Utility class method that returns a hashref of the accented characters in @range_description, mapped to their unaccented corresponding characters, according to the algorithm described in the introduction. The @range_description format is exactly like for the new() method.

char_map_descr

  my $descr = Text::Transliterator::Unaccent->char_map_descr(@range_descr);

Utility class method that returns a textual description of the map generated by @range_descr.

AUTHOR

Laurent Dami, <dami@cpan.org>

BUGS

Please report any bugs or feature requests to bug-text-transliterator at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Transliterator. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Text::Transliterator::Unaccent

You can also look for information at:

RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Text-Transliterator
AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Text-Transliterator
CPAN Ratings

http://cpanratings.perl.org/d/Text-Transliterator
Search CPAN

http://search.cpan.org/dist/Text-Transliterator/

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

To install Text::Transliterator, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::Transliterator

CPAN shell

perl -MCPAN -e shell
install Text::Transliterator

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)