NAME
Text::KyTea - Perl wrapper for KyTea
SYNOPSIS
use Text::KyTea;
use utf8;
my $kytea = Text::KyTea->new(%config);
my $results = $kytea->parse($text);
for my $result (@{$results})
{
print $result->{surface};
for my $tags (@{$result->{tags}})
{
print "\t";
for my $tag (@{$tags})
{
print " ", $tag->{feature}, "/", $tag->{score};
}
}
print "\n";
}
DESCRIPTION
KyTea is a general toolkit developed for analyzing text, with a focus on
Japanese, Chinese and other languages requiring word or morpheme
segmentation.
This module works under KyTea Ver.0.3.2 and later. Under old versions of
KyTea, this might not work.
If you've changed default install directory of KyTea, please install
Text::KyTea with interactive mode (e.g., cpanm --interactive or cpanm
-v).
For more information about KyTea, please see the "SEE ALSO" section.
METHODS
new(%config)
Creates a new Text::KyTea instance.
my $kytea = Text::KyTea->new(
model => 'model.bin', # default is '/usr/local/share/kytea/model.bin'
h2z => 0, # default is 1 (enable)
notag => [1,2], # default is []
nounk => 0, # default is 0 (estimates the pronunciation of unkown words)
unkbeam => 50, # default is 50
tagmax => 3, # default is 3
deftag => 'UNK', # default is 'UNK'
unktag => '', # default is ''
);
new(h2z => 1)
Converts $text from hankaku to zenkaku before parsing $text. This
option improves the parsing accuracy in most of model files.
read_model($path)
Reads the given model file. The model file should be read by
new(model => $path) method.
Model files are available at
http://www.phontron.com/kytea/model.html
parse($text)
Parses the given text via KyTea, and returns results of analysis.
The results are returned as an array reference.
AUTHOR
pawa <pawapawa@cpan.org>
SEE ALSO
http://www.phontron.com/kytea/
LICENSE
Copyright (C) 2012 pawa All rights reserved.
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.