The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Encode::Escape::Unicode - Perl extension for Encoding of Unicode Escape Sequnces

SYNOPSIS

  use Encode::Escape::Unicode;

  $escaped = "What is \\x{D384}? It's Perl!";
  $string = decode 'unicode-escape', $escaped;

  # Now, $string is equivalent "What is \x{D384}? It's Perl!"

  Encode::Escape::Unicode->demode('python');

  $python_unicode_escape = "And \\u041f\\u0435\\u0440\\u043b? It's Perl, too.";
  $string = decode 'unicode-escape', $python_unicode_escape;

  # Now, $string eq "And \x{041F}\x{0435}\x{0440}\x{043B}? It's Perl, too."

If you have a text data file 'unicode-escape.txt'. It contains a line:

  What is \x{D384}? It's Perl!\n
  And \x{041F}\x{0435}\x{0440}\x{043B}? It's Perl, too.\n

And you want to use it as if it were a normal double quote string in source code. Try this:

  use Encode::Escape::Unicode;

  open(FILE, 'unicode-escape.txt');

  while(<FILE>) {
    chomp;
    print encode 'utf8', decode 'unicode-escape', $_;
  }

DESCRIPTION

Encode::Escape::Unicode module implements encodings of escape sequences.

Simply saying, it converts (decodes) escape sequences into Perl internal string (\x{0000} -- \x{ffff}) and encodes Perl strings to escape sequences.

MODES AND SUPPORTED ESCAPE SEQUENCES

default or perl mode

 Escape Sequcnes      Description
 ---------------      --------------------------
 \a                   Alarm (beep)
 \b                   Backspace
 \e                   Escape
 \f                   Formfeed
 \n                   Newline
 \r                   Carriage return
 \t                   Tab
 \000     - \377      octal ASCII value. \0, \00, and \000 are equivalent.
 \x00     - \xff      hexadecimal ASCII value. \x0 and \x00 are equivalent.
 \x{0000} - \x{ffff}  hexadecimal ASCII value. \x{0}, \x{00}, x\{000}, \x{0000}


 \\                   Backslash
 \$                   Dollar Sign
 \@                   Ampersand
 \"                   Print double quotes
 \                    Escape next character if known otherwise print

This is the default mode. You don't need to invoke it since you haven't invoke other mode previously.

python or java mode

Python, Java, and C# languages use \uxxxx escape sequence for Unicode character.

 Escape Sequcnes      Description
 ---------------      --------------------------
 \a                   Alarm (beep)
 \b                   Backspace
 \e                   Escape
 \f                   Formfeed
 \n                   Newline
 \r                   Carriage return
 \t                   Tab
 \000   - \377        octal ASCII value. \0, \00, and \000 are equivalent.
 \x00   - \xff        hexadecimal ASCII value. \x0 and \x00 are equivalent.
 \u0000 - \uffff      hexadecimal ASCII value.

 \\                   Backslash
 \$                   Dollar Sign
 \@                   Ampersand
 \"                   Print double quotes
 \                    Escape next character if known otherwise print

If you have data which contains \uxxxx escape sequences, this will translate them to utf8-encoded characters:

 use Encode::Escape;

 Encode::Escape::demode 'unicode-escape', 'python';

 while(<>) {
        chomp;
        print encode 'utf8', decode 'unicode-escape', $_;
 }

And this will translate \uxxxx to \x{xxxx}.

 use Encode::Escape;

 Encode::Escape::enmode 'unicode-escape', 'perl';
 Encode::Escape::demode 'unicode-escape', 'python';

 while(<>) {
        chomp;
        print encode 'unicode-escape', decode 'unicode-escape', $_;
 }

SEEALSO

See Encode::Escape.

AUTHOR

you, <you at cpan dot org>

COPYRIGHT AND LICENSE

Copyright (C) 2007 by you

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.