Encode::Detect::CJK

出自小鱼工作室

跳转到: 导航, 搜索

目录

OFFICIAL SITE

http://search.cpan.org/dist/Encode-Detect-CJK/

NAME

Encode::Detect::CJK - A Charset Detector, optimized for EastAsia charset and website content

SYNOPSIS

use Encode::Detect::CJK; #just use
  
use Encode::Detect::CJK qw(detect); #use and export function 

#simple use it
my $charset=CharsetDetector::detect($octets);

#use it with advanced option
my $charset = CharsetDetector::detect($octets,$max_len,$is_consider_html_head_charset);
#return the charset of binary string $octets
#$max_len if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for 
#       detect,null is for DEFAULT(unlimit max_len)
#$is_consider_html_header_charset, by DEFAULT, detetor will consider 
#       html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, 
#       if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to "" or 0

Basic Function

detect - detect the charset of string

$charset=CharsetDetector::detect($octets,$max_len,$is_consider_html_head_charset);
$charset=CharsetDetector::detect($octets,$max_len);#CharsetDetector::detect($octets,$max_len,1);
$charset=CharsetDetector::detect($octets);#same as CharsetDetector::detect($octets,undef);

Param $octets - input binary string

input binary string

Param $max_len - max length for charset detector

if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for detect,null is for DEFAULT(unlimit max_len) DEFAULT is unlimit

Param $is_consider_html_head_charset

by DEFAULT, detetor will consider html header (e.g. <meta http-equiv=``Content-Type content=``text/html; charset=utf-8 /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to `` or 0


Return Value $charset

if $octets is null return if $octets is return 'iso-8859-1'


Supported Charset List

return value: alias
       
ascii       : ascii
iso-8859-1  : iso-8859-1
utf8        : utf8 utf-8-strict
utf16       : utf16
cp936       : euc-cn(gb2312) cp936(gbk) gb18030
big5-eten   : big5-eten
euc-jp      : euc-jp
shiftjis    : shiftjis
iso-2022-jp : iso-2022-jp
euc-kr      : euc-kr
iso-2022-kr : iso-2022-kr

COPYRIGHT

The CharsetDetector module is Copyright (c) 2003-2008 QIAN YU. All rights reserved.

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.

小猪的知识库
小猪实验室(LABS)