INABA Hitoshi > jacode-2.13.4.10 > jacode.pl

Download:
jacode/jacode-2.13.4.10.tar.gz

Annotate this POD

View/Report Bugs
Source   Latest Release: jacode-2.13.4.11

NAME ^

jacode.pl - Perl library for Japanese character code conversion

SYNOPSIS ^

    require 'jacode.pl';

    # note: file name is 'jacode.pl', but package name is 'jcode'

    # Perl4 interface:

    &jcode'getcode(*line)
    &jcode'convert(*line, $ocode [, $icode [, $option]])
    &jcode'xxx2yyy(*line [, $option])
    &jcode'to($ocode, $line [, $icode [, $option]])
    &jcode'jis($line [, $icode [, $option]])
    &jcode'euc($line [, $icode [, $option]])
    &jcode'sjis($line [, $icode [, $option]])
    &jcode'utf8($line [, $icode [, $option]])
    &jcode'jis_inout($in, $out)
    &jcode'get_inout($string)
    &jcode'cache()
    &jcode'nocache()
    &jcode'flushcache()
    &jcode'flush()
    &jcode'h2z_xxx(*line)
    &jcode'z2h_xxx(*line)
    &jcode'tr(*line, $from, $to [, $option])
    &jcode'trans($line, $from, $to [, $option])
    &jcode'init()

    $jcode'convf{'xxx', 'yyy'}
    $jcode'z2hf{'xxx'}
    $jcode'h2zf{'xxx'}

    # Perl5 interface:

    jcode::getcode(\$line)
    jcode::convert(\$line, $ocode [, $icode [, $option]])
    jcode::xxx2yyy(\$line [, $option])
    jcode::to($ocode, $line [, $icode [, $option]])
    jcode::jis($line [, $icode [, $option]])
    jcode::euc($line [, $icode [, $option]])
    jcode::sjis($line [, $icode [, $option]])
    jcode::utf8($line [, $icode [, $option]])
    jcode::jis_inout($in, $out)
    jcode::get_inout($string)
    jcode::cache()
    jcode::nocache()
    jcode::flushcache()
    jcode::flush()
    jcode::h2z_xxx(\$line)
    jcode::z2h_xxx(\$line)
    jcode::tr(\$line, $from, $to [, $option])
    jcode::trans($line, $from, $to [, $option])
    jcode::init()

    &{$jcode::convf{'xxx', 'yyy'}}(\$line)
    &{$jcode::z2hf{'xxx'}}(\$line)
    &{$jcode::h2zf{'xxx'}}(\$line)

ABSTRACT ^

This software has upper compatibility to jcode.pl. 'Ja' is a meaning of 'Japanese' in ISO 639-1 code and is unrelated to 'JA Group Organization'.

The code conversion from 'sjis' to 'utf8' is done by using following table.

http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

From 'utf8' to 'sjis' is done by using the CP932.TXT and following table.

PRB: Conversion Problem Between Shift-JIS and Unicode

http://support.microsoft.com/kb/170559/en-us

What's this software good for ...

DEPENDENCIES ^

This software requires perl 4.036 or later.

PERL4 INTERFACE ^

&jcode'getcode(*line)
  Return 'jis', 'sjis', 'euc', 'utf8' or undef according
  to Japanese character code in $line.  Return 'binary' if
  the data has non-character code.
  
  When evaluated in array context, it returns a list
  contains two items.  First value is the number of
  characters which matched to the expected code, and
  second value is the code name.  It is useful if and
  only if the number is not 0 and the code is undef;
  that case means it couldn't tell 'euc' or 'sjis'
  because the evaluation score was exactly same.  This
  interface is too tricky, though.
  
  Code detection between euc and sjis is very difficult
  or sometimes impossible or even lead to wrong result
  when it includes JIS X0201 KANA characters.
&jcode'convert(*line, $ocode [, $icode [, $option]])
  Convert the contents of $line to the specified
  Japanese code given in the second argument $ocode.
  $ocode can be any of "jis", "sjis", "euc" or "utf8", or
  use "noconv" when you don't want the code conversion.
  Input code is recognized automatically from the line
  itself when $icode is not supplied. It is better to
  specify $icode, since &jcode'getcode's guess is not
  always right. xxx2yyy routine is more efficient when
  both codes are known.
  
  It returns the code of input string in scalar context,
  and a list of pointer of convert subroutine and the
  input code in array context.
  
  Japanese character code JIS X0201, X0208, X0212 and
  ASCII code are supported.  JIS X0212 characters can not
  be represented in sjis or utf8 and they will be replased
  by "geta" character when converted to sjis.
  JIS X0213 characters can not be represented in all.
  
  For perl is 5.8.1 or later, &jcode'convert acts as a wrapper
  to Encode::from_to. When $ocode or $icode is neither "jis",
  "sjis", "euc" nor "utf8", and Encode module can be used,
 
  Encode::from_to( $line, $icode, $ocode )
 
  is executed instead of
 
  &jcode'convert(*line, $ocode, $icode, $option).
 
  In this case, there is no effective return value of pointer
  of convert subroutine in array context.
 
  See next paragraph for $option parameter.
&jcode'xxx2yyy(*line [, $option])
  Convert the Japanese code from xxx to yyy.  String xxx
  and yyy are any convination from "jis", "euc", "sjis"
  or "utf8". They return *approximate* number of converted
  bytes.  So return value 0 means the line was not
  converted at all.
  
  Optional parameter $option is used to specify optional
  conversion method.  String "z" is for JIS X0201 KANA
  to JIS X0208 KANA, and "h" is for reverse.
$jcode'convf{'xxx', 'yyy'}
  The value of this associative array is pointer to the
  subroutine jcode'xxx2yyy().
&jcode'to($ocode, $line [, $icode [, $option]])
&jcode'jis($line [, $icode [, $option]])
&jcode'euc($line [, $icode [, $option]])
&jcode'sjis($line [, $icode [, $option]])
&jcode'utf8($line [, $icode [, $option]])
  These functions are prepared for easy use of
  call/return-by-value interface.  You can use these
  funcitons in s///e operation or any other place for
  convenience.
&jcode'jis_inout($in, $out)
  Set or inquire JIS start and end sequences.  Default
  is "ESC-$-B" and "ESC-(-B".  If you supplied only one
  character, "ESC-$" or "ESC-(" is prepended for each
  character respectively.  Acutually "ESC-(-B" is not a
  sequence to end JIS code but a sequence to start ASCII
  code set.  So `in' and `out' are somewhat misleading.
&jcode'get_inout($string)
  Get JIS start and end sequences from $string.
&jcode'cache()
&jcode'nocache()
&jcode'flushcache()
&jcode'flush()
  Usually, converted character is cached in memory to
  avoid same calculations have to be done many times.
  To disable this caching, call &jcode'nocache().  It
  can be revived by &jcode'cache() and cache is flushed
  by calling &jcode'flushcache().  &cache() and &nocache()
  functions return previous caching state.
  &jcode'flush() is an alias of &jcode'flushcache() to save
  an old document.
&jcode'h2z_xxx(*line)
  JIS X0201 KANA (so-called Hankaku-KANA) to JIS X0208 KANA
  (Zenkaku-KANA) code conversion routine.  String xxx is
  any of "jis", "sjis", "euc" and "utf8".  From the difficulty
  of recognizing code set from 1-byte KATAKANA string,
  automatic code recognition is not supported.
&jcode'z2h_xxx(*line)
  JIS X0208 to JIS X0201 KANA code conversion routine.
  String xxx is any of "jis", "sjis", "euc" and "utf8".
$jcode'z2hf{'xxx'}
$jcode'h2zf{'xxx'}
  These are pointer to the corresponding function just
  as $jcode'convf.
&jcode'tr(*line, $from, $to [, $option])
  &jcode'tr emulates tr operator for 2 byte code.  Only 'd'
  is interpreted as an option.

  Range operator like `A-Z' for 2 byte code is partially
  supported.  Code must be JIS or EUC, and first byte
  have to be same on first and last character.

  CAUTION: Handling range operator is a kind of trick
  and it is not perfect.  So if you need to transfer `-'
  character, please be sure to put it at the beginning
  or the end of $from and $to strings.
&jcode'trans($line, $from, $to [, $option])
  Same as &jcode'tr but accept string and return string
  after translation.
&jcode'init()
  Initialize the variables used in this package.  You
  don't have to call this when using jocde.pl by `do' or
  `require' interface.  Call it first if you embedded
  the jacode.pl at the end of your script.

PERL5 INTERFACE ^

Current jacode.pl is written in Perl 4 but it is possible to use from Perl 5 using `references'. Fully perl5 capable version is future issue.

Since lexical variable is not a subject of typeglob, *string style call doesn't work if the variable is declared as `my'. Same thing happens to special variable $_ if the perl is compiled to use thread capability. So using reference is generally recommented to avoid the mysterious error.

jcode::getcode(\$line)
jcode::convert(\$line, $ocode [, $icode [, $option]])
jcode::xxx2yyy(\$line [, $option])
&{$jcode::convf{'xxx', 'yyy'}}(\$line)
jcode::to($ocode, $line [, $icode [, $option]])
jcode::jis($line [, $icode [, $option]])
jcode::euc($line [, $icode [, $option]])
jcode::sjis($line [, $icode [, $option]])
jcode::utf8($line [, $icode [, $option]])
jcode::jis_inout($in, $out)
jcode::get_inout($string)
jcode::cache()
jcode::nocache()
jcode::flushcache()
jcode::flush()
jcode::h2z_xxx(\$line)
jcode::z2h_xxx(\$line)
&{$jcode::z2hf{'xxx'}}(\$line)
&{$jcode::h2zf{'xxx'}}(\$line)
jcode::tr(\$line, $from, $to [, $option])
jcode::trans($line, $from, $to [, $option])
jcode::init()

SAMPLES ^

Convert SJIS to JIS and print each line with code name.

  #require 'jcode.pl';
  require 'jacode.pl';
  while (defined($s = <>)) {
      $code = &jcode'convert(*s, 'jis', 'sjis');
      print $code, "\t", $s;
  }

Convert all lines to JIS according to the first recognized line.

  #require 'jcode.pl';
  require 'jacode.pl';
  while (defined($s = <>)) {
      print, next unless $s =~ /[\x1b\x80-\xff]/;
      (*f, $icode) = &jcode'convert(*s, 'jis');
      print;
      defined(&f) || next;
      while (<>) { &f(*s); print; }
      last;
  }

The safest way of JIS conversion.

  #require 'jcode.pl';
  require 'jacode.pl';
  while (defined($s = <>)) {
      ($matched, $icode) = &jcode'getcode(*s);
      if (@buf == 0 && $matched == 0) {
          print $s;
          next;
      }
      push(@buf, $s);
      next unless $icode;
      while (defined($s = shift(@buf))) {
          &jcode'convert(*s, 'jis', $icode);
          print $s;
      }
      while (defined($s = <>)) {
          &jcode'convert(*s, 'jis', $icode);
          print $s;
      }
      last;
  }
  print @buf if @buf;

Convert SJIS to UTF-8 and print each line by perl 4.036 or later.

  #retire 'jcode.pl';
  require 'jacode.pl';
  while (defined($s = <>)) {
      &jcode'convert(*s, 'utf8', 'sjis');
      print $s;
  }

Convert SJIS to UTF16-BE and print each line by perl 5.8.1 or later.

  require 'jacode.pl';
  use 5.8.1;
  while (defined($s = <>)) {
      jcode::convert(\$s, 'UTF16-BE', 'sjis');
      print $s;
  }

Convert SJIS to MIME-Header-ISO_2022_JP and print each line by perl 5.8.1 or later.

  require 'jacode.pl';
  use 5.8.1;
  while (defined($s = <>)) {
      jcode::convert(\$s, 'MIME-Header-ISO_2022_JP', 'sjis');
      print $s;
  }

BUGS AND LIMITATIONS ^

You must use -Llatin switch if you use on the JPerl.

Please patches and report problems to author are welcome.

AUTHOR ^

This project was originated by Kazumasa Utashiro <utashiro@iij.ad.jp>.

LICENSE AND COPYRIGHT ^

This software is free software;

Copyright (c) 2010, 2011 INABA Hitoshi <ina@cpan.org>>

The latest version is available here:

http://search.cpan.org/dist/jacode/

 *** CAUTION ***
 Redistributing this software by the name of jcode.pl infringes on
 the copyright of jcode.pl.

Original version `jcode.pl' is ...

Copyright (c) 1995-2000 Kazumasa Utashiro <utashiro@iij.ad.jp> Internet Initiative Japan Inc. 3-13 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101-0054, Japan

Copyright (c) 1992,1993,1994 Kazumasa Utashiro Software Research Associates, Inc.

Use and redistribution for ANY PURPOSE are granted as long as all copyright notices are retained. Redistribution with modification is allowed provided that you make your modified version obviously distinguishable from the original one. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Original version was developed under the name of srekcah@sra.co.jp February 1992 and it was called kconv.pl at the beginning. This address was a pen name for group of individuals and it is no longer valid.

The latest version is available here:

ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/

SEE ALSO ^

 Programming Perl, Second Edition
 By Larry Wall, Tom Christiansen, Randal L. Schwartz
 October 1996
 Pages: 670
 ISBN 10: 1-56592-149-6 | ISBN 13: 9781565921498
 http://shop.oreilly.com/product/9781565921498.do

 Programming Perl, Third Edition
 By Larry Wall, Tom Christiansen, Jon Orwant
 Third Edition  July 2000
 Pages: 1104
 ISBN 10: 0-596-00027-8 | ISBN 13: 9780596000271
 http://shop.oreilly.com/product/9780596000271.do

 Programming Perl, 4th Edition
 By: Tom Christiansen, brian d foy, Larry Wall, Jon Orwant
 Publisher: O'Reilly Media
 Formats: Print, Ebook, Safari Books Online
 Print: January 2012
 Ebook: December 2011
 Pages: 1054
 Print ISBN: 978-0-596-00492-7 | ISBN 10: 0-596-00492-3
 Ebook ISBN: 978-1-4493-9890-3 | ISBN 10: 1-4493-9890-1
 http://shop.oreilly.com/product/9780596004927.do

 Perl Cookbook, Second Edition
 By Tom Christiansen, Nathan Torkington
 Second Edition  August 2003
 Pages: 964
 ISBN 10: 0-596-00313-7 | ISBN 13: 9780596003135
 http://shop.oreilly.com/product/9780596003135.do

 Perl in a Nutshell, Second Edition
 By Stephen Spainhour, Ellen Siever, Nathan Patwardhan
 Second Edition  June 2002
 Pages: 760
 Series: In a Nutshell
 ISBN 10: 0-596-00241-6 | ISBN 13: 9780596002411
 http://shop.oreilly.com/product/9780596002411.do

 Learning Perl on Win32 Systems
 By Randal L. Schwartz, Erik Olson, Tom Christiansen
 August 1997
 Pages: 306
 ISBN 10: 1-56592-324-3 | ISBN 13: 9781565923249
 http://shop.oreilly.com/product/9781565923249.do

 Learning Perl, Fifth Edition
 By Randal L. Schwartz, Tom Phoenix, brian d foy
 June 2008
 Pages: 352
 Print ISBN:978-0-596-52010-6 | ISBN 10: 0-596-52010-7
 Ebook ISBN:978-0-596-10316-3 | ISBN 10: 0-596-10316-6
 http://shop.oreilly.com/product/9780596520113.do

 Perl RESOURCE KIT UNIX EDITION
 Futato, Irving, Jepson, Patwardhan, Siever
 ISBN 10: 1-56592-370-7
 http://shop.oreilly.com/product/9781565923706.do

 Understanding Japanese Information Processing
 By Ken Lunde
 January 1900
 Pages: 470
 ISBN 10: 1-56592-043-0 | ISBN 13: 9781565920439
 http://shop.oreilly.com/product/9781565920439.do

 CJKV Information Processing
 Chinese, Japanese, Korean & Vietnamese Computing
 By Ken Lunde
 First Edition  January 1999
 Pages: 1128
 ISBN 10: 1-56592-224-7 | ISBN 13: 9781565922242
 http://shop.oreilly.com/product/9781565922242.do

 Mastering Regular Expressions, Second Edition
 By Jeffrey E. F. Friedl
 Second Edition  July 2002
 Pages: 484
 ISBN 10: 0-596-00289-0 | ISBN 13: 9780596002893
 http://shop.oreilly.com/product/9780596002893.do

 Mastering Regular Expressions, Third Edition
 By Jeffrey E. F. Friedl
 Third Edition  August 2006
 Pages: 542
 ISBN 10: 0-596-52812-4 | ISBN 13:9780596528126
 http://shop.oreilly.com/product/9780596528126.do

 Regular Expressions Cookbook
 By Jan Goyvaerts, Steven Levithan
 May 2009
 Pages: 512
 ISBN 10:0-596-52068-9 | ISBN 13: 978-0-596-52068-7
 http://shop.oreilly.com/product/9780596520694.do

 PERL PUROGURAMINGU
 Larry Wall, Randal L.Schwartz, Yoshiyuki Kondo
 December 1997
 ISBN 4-89052-384-7
 http://www.context.co.jp/~cond/books/old-books.html

 JIS KANJI JITEN
 Kouji Shibano
 Pages: 1456
 ISBN 4-542-20129-5
 http://www.webstore.jsa.or.jp/lib/lib.asp?fn=/manual/mnl01_12.htm

 UNIX MAGAZINE
 1993 Aug
 Pages: 172
 T1008901080816 ZASSHI 08901-8
 http://ascii.asciimw.jp/books/books/detail/978-4-7561-5008-0.shtml

 MacPerl Power and Ease
 By Vicki Brown, Chris Nandor
 April 1998
 Pages: 350
 ISBN 10: 1881957322 | ISBN 13: 978-1881957324
 http://www.amazon.com/Macperl-Power-Ease-Vicki-Brown/dp/1881957322

 Other Tools
 http://search.cpan.org/dist/Char/
 http://search.cpan.org/dist/Char-Sjis/

 BackPAN
 http://backpan.perl.org/authors/id/I/IN/INA/

ACKNOWLEDGEMENTS ^

This software was made referring to software and the document that the following hackers or persons had made. I am thankful to all persons.

 Larry Wall, Perl
 http://www.perl.org/

 Kazumasa Utashiro, jcode.pl
 ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/
 http://mail.pm.org/pipermail/tokyo-pm/2002-March/001319.html

 mikeneko creator club, Private manual of jcode.pl
 http://mikeneko.creator.club.ne.jp/~lab/kcode/jcode.html

 gama, getcode.pl
 http://www2d.biglobe.ne.jp/~gama/cgi/jcode/jcode.htm

 Gappai, jcodeg.diff
 http://www.vector.co.jp/soft/win95/prog/se347514.html

 OHZAKI Hiroki, Perl memo
 http://www.din.or.jp/~ohzaki/perl.htm#JP_Code

 NAKATA Yoshinori, Ad hoc patch for reduce waring on h2z_euc
 http://white.niu.ne.jp/yapw/yapw.cgi/jcode.pl%A4%CE%A5%A8%A5%E9%A1%BC%CD%DE%C0%A9

 Dan Kogai, Jcode module and Encode module
 http://search.cpan.org/dist/Jcode/
 http://search.cpan.org/dist/Encode/
 http://blog.livedoor.jp/dankogai/archives/50116398.html
 http://blog.livedoor.jp/dankogai/archives/51004472.html

 Donzoko CGI+--, Jcode like Encode Wrapper
 http://www.donzoko.net/cgi/jencode/

 Yusuke Kawasaki, Encode561 module
 http://www.kawa.net/works/perl/i18n-emoji/i18n-emoji.html#Encode561

 Tokyo-pm archive
 http://mail.pm.org/pipermail/tokyo-pm/
syntax highlighting: