View on
MetaCPAN
Heiko Schlittermann > IsUTF8-0.2 > IsUTF8

Download:
IsUTF8-0.2.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.2   Source  

NAME ^

IsUTF8 - detects if UTF8 characters are present

SYNOPSIS ^

    use IsUTF8;
    $result = IsUTF8::isUTF8;
    $result = IsUTF8::isUTF8($line);

    use IsUTF8 qw(isUTF8);
    $result = isUTF8;
    $result = isUTF8($line);

    use IsUTF8 qw(isUTF8 debug);
    $result = isUTF8;
    $result = isUTF8($line);

    if (not defined $result) {
        print "Contains some characters with 8th bit set!";
    }
    if ($result == 0) {
        print "Plain ASCII (0..127)";
    }
    if ($result) {
        print "Contains UTF8";
    }

DESCRIPTION ^

This tests the given line and returns true if there is at least one UTF8 character sequence. (Actually the tests returns after the first sequence found.) undef is returned if there is some other character with the 8th bit set. 0 is returned if there are only characters from 0x00 to 0x7f.

BACKGROUND ^

UTF8-Encoding looks like this:

    1111.0x:   1111.0000-1111.0111 0xF0 - 0xF7, followed by 3 bytes
    1110.xx:   1110.0000-1110.1111 0xE0 - 0xEF, followed by 2 bytes
    110x.xx:   1100.0000-1101.1111 0xC0 - 0xDF, followed by 1 byte
    10xx.xx:   1000.0000-1011.1111 0x80 - 0xBF  (following byte as above)

SEE ALSO ^

Encode::Guess and Encode::Detect

BUGS ^

First release. Please do not rely on a stable API yet. If you're interested in stabilizing, please tell me.

Probably. Not tested a lot!

AUTHOR ^

Heiko Schlittermann <hs@schlittermann.de>

syntax highlighting: