Data::Classifier - A tool for classifying data with regular expressions
use strict; use warnings; use Data::Classifier; my $yaml = <<EOY; --- name: Root children: - name: BMW children: - name: Diesel match: model: "d\$" - name: Sports match: model: "i\$" seats: 2 - name: Really Expensive match: model: "^M" EOY my $classifier = Data::Classifier->new(yaml => $yaml); my $attributes1 = { model => '325i', seats => 4 }; my $class1 = $classifier->process($attributes1); my $attributes2 = { model => '535d', seats => 4 }; my $class2 = $classifier->process($attributes2); my $attributes3 = { model => 'M3', seats => 2 }; my $class3 = $classifier->process($attributes3); print "$attributes2->{model}: ", $class2->fqn, "\n"; print "$attributes3->{model}: ", $class3->fqn, "\n"; #no real sports car has 4 seats print "$attributes1->{model}: ", $class1->fqn, "\n";
This module provides tools to classify sets of data contained in hashes against a predefined class hierarchy. Testing against a class is performed using regular expressions stored in the class hierarchy. It is also possible to modify the behavior of the system by subclassing and overloading a few methods.
Note that this module may not be particularly usefull on its own. It is designed to be used as a base class for implementing other systems, such as Config::BuildHelper.
Using this module involves creating an instance of the classifier object, passing the class hierarchy in via a YAML file, a YAML string, or prebuilt data structure, and any optional arguments:
$classifier = Data::Classifier->new(file => 'classes.yaml', debug => 1); $classifier = Data::Classifier->new(yaml => $yaml_string); $classifier = Data::Classifier->new(tree => $hashref);
The class definition file is a very specific tree format, normally stored in a YAML file. Each node of the tree is a map with the same set of keys, some of which are optional:
The textual name of the node being defined.
Extra data to be returned with classification results.
A sequence of nodes that exists under this node.
A map of keys to test against incomming data and regular expressions to apply to that data. For a match to be true, all items in the map must match the data.
By default, this class has very specific matching semantics. For a dataset to match a node, everything listed under the match definition must match the specified data. Additionally, a node which contains no match definition will have all of it's children searched but can never be a match itself.
Classify the data contained in the hash reference stored in $attr and return an instance of Data::Classifier::Result. See the documentation for that class for more information.
Return a textual representation of the class hierarchy stored in RAM.
The rest of this module is documented in Data::Classifier::Result, which you use to access the results of classification.
This class can be subclassed to change its behavior. The following methods are available for overloading:
This method is invoked by $classifier->process() when it needs to return a new instance of a result class. Simply return an instance of your class here, such as:
sub return_result { my ($self, $result) = @_; return Data::Classifier::Result->new($result); }
This method is invoked by $classifier->recursive_match() at each node of the tree that contains a match attribute. The entire contents of the match attribute will be passed in as $matchlist and the hashref given to $classifier->process() will be passed in via $attributes. Return true to indicate a match and false to indicate no match.
This method is invoked by $classifier->process() to recursively search the entire tree. If you need to change the semantics of how the classifier treats matches against nodes with out a match attribute, you would do that here.
Here are a few ideas for improvements to this class:
A class that stores it's tree in a SQL database, reconstructs it at startup, and passes it in using the tree argument to new.
This module was created and documented by Tyler Riddle <triddle@gmail.com>.
There are no known bugs at this time.
Please report any bugs or feature requests to bug-data-classifier@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data::Classifier. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-data-classifier@rt.cpan.org
To install Data::Classifier, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Data::Classifier
CPAN shell
perl -MCPAN -e shell install Data::Classifier
For more information on module installation, please visit the detailed CPAN module installation guide.