钱宇/Qian Yu > CharsetDetector > CharsetDetector

Download:
CharsetDetector-2.0.2.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  2
Open  0
View/Report Bugs
Module Version: 2.0.2   Source  

NAME ^

CharsetDetector - A Charset Detector, optimized for EastAsia charset and website content

SYNOPSIS ^

        use CharsetDetector;
        use CharsetDetector qw(detect detect1);
        
        #simple use it
        $charset = CharsetDetector::detect($octets);
        
        #with length limit
        $charset = CharsetDetector::detect($octets,$max_len);
        
        #don't consider html head charset as a factor to detect charset
        $charset = CharsetDetector::detect1($octets);
        $charset = CharsetDetector::detect1($octets,$max_len);

Basic Function ^

detect - detect charset

        $charset = CharsetDetector::detect($octets);
        $charset = CharsetDetector::detect($octets,$max_len);

detect1 - detect only by binary

detect charset don't consider html head charset as a factor to detect charset by DEFAULT, detetor will consider html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, use detect1 instead of detect

        $charset = CharsetDetector::detect1($octets);
        $charset = CharsetDetector::detect1($octets,$max_len);

Return Value

if $octets is null return '' if $octets is '' return 'iso-8859-1' else return charset name

Supported Charset List ^

        return value: alias
        
        ascii       : ascii
        iso-8859-1  : iso-8859-1
        utf8        : utf8 utf-8-strict
        utf16       : utf16
        cp936       : euc-cn(gb2312) cp936(gbk) gb18030
        big5-eten   : big5-eten
        euc-jp      : euc-jp
        shiftjis    : shiftjis
        iso-2022-jp : iso-2022-jp
        euc-kr      : euc-kr
        iso-2022-kr : iso-2022-kr

COPYRIGHT ^

The CharsetDetector module is Copyright (c) 2003-2006 QIAN YU. All rights reserved.

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.

syntax highlighting: