Tatsuhiko Miyagawa > Encode-DoubleEncodedUTF8 > Encode::DoubleEncodedUTF8

Download:
Encode-DoubleEncodedUTF8-0.04.tar.gz

Dependencies

Annotate this POD

CPAN RT

Open  0
Report a bug
Module Version: 0.04   Source  

NAME ^

Encode::DoubleEncodedUTF8 - Fix double encoded UTF-8 bytes to the correct one

SYNOPSIS ^

  use Encode;
  use Encode::DoubleEncodedUTF8;

  my $dodgy_utf8 = "Some byte strings from the web/DB with double-encoded UTF-8 bytes";
  my $fixed = decode("utf-8-de", $dodgy_utf8); # Fix it

DESCRIPTION ^

Encode::DoubleEncodedUTF8 adds a new encoding utf-8-de and fixes double encoded utf-8 bytes found in the original bytes to the correct Unicode entity.

The double encoded utf-8 frequently happens when strings with UTF-8 flag and without are concatenated, for instance:

  my $string = "L\x{e9}on";   # latin-1
  utf8::upgrade($string);
  my $bytes  = "L\xc3\xa9on"; # utf-8

  my $dodgy_utf8 = encode_utf8($string . " " . $bytes); # $bytes is now double encoded

  my $fixed = decode("utf-8-de", $dodgy_utf8); # "L\x{e9}on L\x{e9}on";

See encoding::warnings for more details.

AUTHOR ^

Tatsuhiko Miyagawa <miyagawa@bulknews.net>

LICENSE ^

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO ^

encoding::warnings, Test::utf8