Lingua::JA::TFIDF - TF/IDF calculator based on MeCab.
use Lingua::JA::TFIDF; use Data::Dumper; my $calc = Lingua::JA::TFIDF->new(%config); # calculate TF/IDF and return a result object. my $result = $calc->tfidf($text); print Dumper $result->list; # dump the result object. print Dumper $result->dump # or calculate just TF print Dumper $calc->tf($text)->list;
* This software is still in alpha release *
Lingua::JA::TFIDF is TF/IDF calculator based on MeCab. It has DF(Document Frequency) data set that was fetched from Yahoo Search API, beforehand.
Instantiates a new Lingua::JA::TFIDF object. Takes the following parameters (optional).
my $calc = Lingua::JA::TFIDF->new( df_file => 'my_df_file', # default is undef ng_word => \@original_ngword, # default is undef fetch_df => 1, # default is undef fetch_df_save => 'my_df_file', # default is undef LWP_UserAgent => \%lwp_useragent_config, # default is undef XML_TreePP => \%xml_treepp_config, # default is undef yahoo_api_appid => $myid, # default is undef );
Calculates TF/IDF score. If the text includes unknown words, Document Frequency score of unknown words are replaced the average score of known words. If you set TRUE value to fetch_df parameter on constructor, the calculator fetches the unknown word from Yahoo Search API.
Calculates TF score.
Accessor method. You can replace NG word.
Inner accessor method.
Takeshi Miki <miki@cpan.org>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Lingua::JA::TFIDF, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::JA::TFIDF
CPAN shell
perl -MCPAN -e shell install Lingua::JA::TFIDF
For more information on module installation, please visit the detailed CPAN module installation guide.