CharsetDetector - A Charset Detector, optimized for EastAsia charset and website content
use CharsetDetector; use CharsetDetector qw(detect detect1); #simple use it $charset = CharsetDetector::detect($octets); #with length limit $charset = CharsetDetector::detect($octets,$max_len); #don't consider html head charset as a factor to detect charset $charset = CharsetDetector::detect1($octets); $charset = CharsetDetector::detect1($octets,$max_len);
$charset = CharsetDetector::detect($octets); $charset = CharsetDetector::detect($octets,$max_len);
detect charset don't consider html head charset as a factor to detect charset by DEFAULT, detetor will consider html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, use detect1 instead of detect
$charset = CharsetDetector::detect1($octets); $charset = CharsetDetector::detect1($octets,$max_len);
if $octets is null return '' if $octets is '' return 'iso-8859-1' else return charset name
return value: alias ascii : ascii iso-8859-1 : iso-8859-1 utf8 : utf8 utf-8-strict utf16 : utf16 cp936 : euc-cn(gb2312) cp936(gbk) gb18030 big5-eten : big5-eten euc-jp : euc-jp shiftjis : shiftjis iso-2022-jp : iso-2022-jp euc-kr : euc-kr iso-2022-kr : iso-2022-kr
The CharsetDetector module is Copyright (c) 2003-2006 QIAN YU. All rights reserved.
You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.
To install CharsetDetector, copy and paste the appropriate command in to your terminal.
cpanm
cpanm CharsetDetector
CPAN shell
perl -MCPAN -e shell install CharsetDetector
For more information on module installation, please visit the detailed CPAN module installation guide.