Zbigniew Łukasiak > AI-Classifier-0.03 > AI::Classifier::Text::Analyzer

Download:
AI-Classifier-0.03.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.03   Source  

NAME ^

AI::Classifier::Text::Analyzer - computing feature vectors from documents

VERSION ^

version 0.03

SYNOPSIS ^

    use AI::Classifier::Text::Analyzer;

    my $analyzer = AI::Classifier::Text::Analyzer->new();
    
    my $features = $analyzer->analyze( 'aaaa http://www.example.com/bbb?xx=yy&bb=cc;dd=ff' );

DESCRIPTION ^

Computes feature vectors of text using some heuristics and adds words count (using Text::WordCounter by default).

The object is immutable - but some methods use a second parameter as an accumulator for the features found in given text.

It uses some specific values and methods that work for our case - but are not guaranteed to bring good results universally - see the source for details!

ATTRIBUTES ^

word_counter

Object with a word_count method that will calculate the frequency of words in a text document. By default Text::WordCounter.

global_feature_weight

The weight assigned for computed features of the text document. By default 2.

METHODS ^

new(word_counter => $foo, global_feature_weight => 3)

Creates a new AI::Classifier::Text::Analyzer object. Both arguments are optional.

analyze($document, $features)

Computes the feature vector of the given document and adds the initial vector of $features.

analyze_urls($document, $features)

Computes a vector special url related features of a given text - currently there are used NO_URLS, MANY_URLS and REPEATED_URLS features.

filter($document)

Removes html related parts from the text.

SEE ALSO ^

AI::NaiveBayes (3), AI::Classifier::Text(3)

AUTHOR ^

Zbigniew Lukasiak <zlukasiak@opera.com>, Tadeusz Sośnierz <tsosnierz@opera.com>

COPYRIGHT AND LICENSE ^

This software is copyright (c) 2012 by Opera Software ASA.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

syntax highlighting: