NAME
Algorithm::AM - Classify data with Analogical Modeling
VERSION
version 3.05
SYNOPSIS
use Algorithm::AM;
my $dataset = dataset_from_file('finnverb');
my $am = Algorithm::AM->new(training_set => $dataset);
my $result = $am->classify($dataset->get_item(0));
print @{ $result->winners };
print ${ $result->statistical_summary };
DESCRIPTION
This module provides an object-oriented interface for classifying single
items using the analogical modeling algorithm. To work with sets of
items needing to be classified, see Algorithm::AM::Batch.
This module logs information using Log::Any, so if you want automatic
print-outs you need to set an adaptor. See the "classify" method for
more information on logged data.
BACKGROUND AND TERMINOLOGY
Analogical Modeling (or AM) was developed as an exemplar-based approach
to modeling language usage, and has also been found useful in modeling
other "sticky" phenomena. AM is especially suited to this because it
predicts probabilistic occurrences instead of assigning static labels to
instances.
The AM algorithm can be called a probabilistic
<http://en.wikipedia.org/wiki/Probabilistic_classification>,
instance-based <http://en.wikipedia.org/wiki/Instance-based_learning>
classifier. However, the probabilities given for each classification are
not degrees of certainty, but actual probabilities of occurring in real
usage. Thus in AM literature the classification is supposed to produce
dynamic "outcomes", not static "labels". In AM proper, the last step of
classification is to produce an outcome at random based on the
calculated probability distribution. AM therefore predicts that "sticky"
phenomena are "sticky" because they vary probabilistically, defying
absolute prediction.
In this software, an outcome can be chosen probabilistically using
"random_outcome" in Algorithm::AM::Result. However, in practice, usually
only the highest-probability prediction(s) are used for classification
tasks. These can be retrieved via "winners" in Algorithm::AM::Result, or
"result" in Algorithm::AM::Result if you're just interested in
classification accuracy on a test set. The entire outcome probability
distribution can also be retrieved via "scores_normalized" in
Algorithm::AM::Result. See Algorithm::AM::Result for other types of
information available after classification. See Algorithm::AM::algorithm
for details on the actual mechanism of classification.
Outside of the "random_outcome" method mentioned above, the rest of the
software uses more general machine learning terminology. What would
properly be called an "exemplar" is referred to simply as an "item",
and, as is customary, "training" and "test" sets are used, even though
AM never does any actual "training". Training items are assigned "class
labels" (not "outcomes"), and classification results in a set of scores
(or probabilities) for different "class labels", even though they would
properly be called "outcomes". Finally, items contain vectors of
"features", which were called "variables" in previous versions of this
software.
EXPORTS
When this module is imported, it also imports the following:
Algorithm::AM::Result
Algorithm::AM::DataSet
Also imports "dataset_from_file" in Algorithm::AM::DataSet.
Algorithm::AM::DataSet::Item
Also imports "new_item" in Algorithm::AM::DataSet::Item.
Algorithm::AM::BigInt
Also imports "bigcmp" in Algorithm::AM::BigInt.
METHODS
"new"
Creates a new instance of an analogical modeling classifier. This method
takes named parameters which set state described in the documentation
for the relevant methods. The only required parameter is "training_set",
which should be an instance of Algorithm::AM::DataSet, and which defines
the set of items used for training during classification. All of the
accepted parameters are listed below:
"training_set"
"exclude_nulls"
"exclude_given"
"linear"
"training_set"
Returns (but will not set) the dataset used for training. This is an
instance of Algorithm::AM::DataSet.
"exclude_nulls"
Get/set a boolean value indicating whether features with null values in
the test item should be ignored. If false, they will be treated as
having a specific value representing null. Defaults to true.
"exclude_given"
Get/set a boolean value indicating whether the test item should be
removed from the training set if it is found there during
classification. Defaults to true.
"linear"
Get/set a boolean value indicating whether the analogical set should be
computed using *occurrences* (linearly) or *pointers* (quadratically).
To understand what this means, you should read the algorithm page. A
false value indicates quadratic counting. Defaults to false.
"classify"
$am->classify(new_item(features => ['a','b','c']));
Using the analogical modeling algorithm, this method classifies the
input test item and returns a Result object.
Log::Any is used for logging. The full classification configuration is
logged at the info level. A notice is printed at the warning level if no
training items can be compared with the test item, preventing any
classification.
HISTORY
Initially, Analogical Modeling was implemented as a Pascal program.
Subsequently, it was ported to Perl, with substantial improvements made
in 2000. In 2001, the core of the algorithm was rewritten in C, while
the parsing, printing, and statistical routines remained in C; this was
accomplished by embedding a Perl interpreter into the C code.
In 2004, the algorithm was again rewritten, this time in order to handle
more features and large data sets. The algorithm breaks the
supracontextual lattice into the direct product of four smaller ones,
which the algorithm manipulates individually before recombining. These
lattices can be manipulated in parallel when using the right hardware,
and so the module was named "AM::Parallel". This implementation was
written with the core lattice-filling algorithm in XS, and hooks were
provided to help the user create custom reports and control
classification dynamically.
The present version has been renamed to "Algorithm::AM", which seemed a
better fit for CPAN. While the XS has largely remained intact, the Perl
code has been completely reorganized and updated to be both more
"modern" and modular. Most of the functionality of "AM::Parallel"
remains.
SEE ALSO
The <home page|http://humanities.byu.edu/am/> for Analogical Modeling
includes information about current research and publications, as well as
sample data sets.
The Wikipedia article <http://en.wikipedia.org/wiki/Analogical_modeling>
has details and even illustrations on analogical modeling.
SUPPORT
Bugs / Feature Requests
Please report any bugs or feature requests through the issue tracker at
<https://github.com/garfieldnate/Algorithm-AM/issues>. You will be
notified automatically of any progress on your issue.
Source Code
This is open source software. The code repository is available for
public review and contribution under the terms of the license.
<https://github.com/garfieldnate/Algorithm-AM>
git clone https://github.com/garfieldnate/Algorithm-AM.git
AUTHOR
Theron Stanford <shixilun@yahoo.com>, Nathan Glenn
<garfieldnate@gmail.com>
CONTRIBUTORS
* garfieldnate <garfieldnate@gmail.com>
* Nathan Glenn <garfieldnate@gmail.com>
* Nick <nlogan@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Royal Skousen.
This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.