The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Classifier - A tool for classifying data with regular expressions

SYNOPSIS

    use strict;
    use warnings;
    
    use Data::Classifier;
    
    my $yaml = <<EOY;
    ---
    name: Root
    children:
        - name: BMW
          children:
              - name: Diesel
                match:
                      model: "d\$"
              - name: Sports
                match:
                      model: "i\$"
                      seats: 2
              - name: Really Expensive
                match:
                      model: "^M"
    EOY
    
    my $classifier = Data::Classifier->new(yaml => $yaml);
    my $attributes1 = { model => '325i', seats => 4 };
    my $class1 = $classifier->process($attributes1);
    my $attributes2 = { model => '535d', seats => 4 };
    my $class2 = $classifier->process($attributes2);
    my $attributes3 = { model => 'M3', seats => 2 };
    my $class3 = $classifier->process($attributes3);
    print "$attributes2->{model}: ", $class2->fqn, "\n";
    print "$attributes3->{model}: ", $class3->fqn, "\n";
    #no real sports car has 4 seats
    print "$attributes1->{model}: ", $class1->fqn, "\n";

OVERVIEW

This module provides tools to classify sets of data contained in hashes against a predefined class hierarchy. Testing against a class is performed using regular expressions stored in the class hierarchy. It is also possible to modify the behavior of the system by subclassing and overloading a few methods.

Note that this module may not be particularly usefull on its own. It is designed to be used as a base class for implementing other systems, such as Config::BuildHelper.

USAGE

Using this module involves creating an instance of the classifier object, passing the class hierarchy in via a YAML file, a YAML string, or prebuilt data structure, and any optional arguments:

    $classifier = Data::Classifier->new(file => 'classes.yaml', debug => 1);
    $classifier = Data::Classifier->new(yaml => $yaml_string);
    $classifier = Data::Classifier->new(tree => $hashref);

Class Definition File

The class definition file is a very specific tree format, normally stored in a YAML file. Each node of the tree is a map with the same set of keys, some of which are optional:

name

The textual name of the node being defined.

data (optional)

Extra data to be returned with classification results.

children (optional)

A sequence of nodes that exists under this node.

match (optional)

A map of keys to test against incomming data and regular expressions to apply to that data. For a match to be true, all items in the map must match the data.

Matching Semantics

By default, this class has very specific matching semantics. For a dataset to match a node, everything listed under the match definition must match the specified data. Additionally, a node which contains no match definition will have all of it's children searched but can never be a match itself.

Methods

$result = $classifier->process($attr)

Classify the data contained in the hash reference stored in $attr and return an instance of Data::Classifier::Result. See the documentation for that class for more information.

$classifier->dump

Return a textual representation of the class hierarchy stored in RAM.

More Information

The rest of this module is documented in Data::Classifier::Result, which you use to access the results of classification.

SUBCLASSING

This class can be subclassed to change its behavior. The following methods are available for overloading:

$classifier->return_result($result)

This method is invoked by $classifier->process() when it needs to return a new instance of a result class. Simply return an instance of your class here, such as:

    sub return_result {
            my ($self, $result) = @_;
            return Data::Classifier::Result->new($result);
    }
$classifier->check_match($matchlist, $attributes)

This method is invoked by $classifier->recursive_match() at each node of the tree that contains a match attribute. The entire contents of the match attribute will be passed in as $matchlist and the hashref given to $classifier->process() will be passed in via $attributes. Return true to indicate a match and false to indicate no match.

$classifier->recursive_search($attributes, $node)

This method is invoked by $classifier->process() to recursively search the entire tree. If you need to change the semantics of how the classifier treats matches against nodes with out a match attribute, you would do that here.

IMPROVEMENTS

Here are a few ideas for improvements to this class:

Data::Classifier::SQLTree

A class that stores it's tree in a SQL database, reconstructs it at startup, and passes it in using the tree argument to new.

AUTHORS

This module was created and documented by Tyler Riddle <triddle@gmail.com>.

BUGS

There are no known bugs at this time.

Please report any bugs or feature requests to bug-data-classifier@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data::Classifier. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.