The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package Algorithm::AM::algorithm;
use strict;
use warnings;
# ABSTRACT: How the Analogical Modeling algorithm works
# VERSION

1;

__END__

=pod

=encoding UTF-8

=head1 NAME

Algorithm::AM::algorithm - How the Analogical Modeling algorithm works

=head1 VERSION

version 3.10

=head1 DESCRIPTION

First, the user must create a set of data items, with their outcomes,
and some test items.  All of these items are represented by I<feature
vectors>.  These feature vectors are I<not> created by the AM
algorithm; they could be generated by hand or script, it matters not
to AM.  All the feature vectors must be of the same length, call it
I<n>.

=head2 The supracontextual lattice

AM requires the construction of a I<supracontextual lattice>.  It is
merely a complete distributive lattice of sets called
I<supracontexts>, each one labeled with an integer in the range 0
to S<2^I<n> - 1>.  If I<a> and I<b> are labels of two supracontexts,
then I<a> & I<b> = I<b> (that's bitwise AND) iff the supracontext
labeled by I<a> is a superset of the supracontext labeled by I<b>.

The supracontextual lattice starts out with every element being the
empty set.  The AM algorithm adds I<subcontexts> to them one at a
time.

=head2 Subcontexts

Subcontexts are also sets, and they are also labeled with an integer
in the range 0 to S<2^I<n> - 1>.  The elements of the subcontexts are
data items.

=over 4

=item Example

Suppose that the test item has feature vector

('S', 'O', '0', 'S', 'R', '0', 'T', 'A')

and a data item has the feature vector

('P', 'Y', 'V', 'S', 'R', '0', 'T', 'a').

Compare the corresponding features, using a 1 for different and 0 for
same.  Then the binary number

0b11100001

is the label of the subcontext to which the data item belongs.

=back

=head2 Filling the supracontextual lattice

The elements of the supracontexts are the subcontexts.  If a
subcontext has label I<a>, then it will be an element of any
supracontext whose label I<b> satisfies I<b> & I<a> = I<a>.

Thus, the subcontext labeled by 0b11100001 will belong to 16 different
supracontexts.  The labels of these supracontexts are found by
replacing the 0s by 1s in all possible ways; the original 1s are left
untouched.

=head2 Homogeneity and Heterogeneity

A supracontext is I<heterogeneous> if the following two conditions
hold:

=over 4

=item 1

The supracontext contains two or more subcontexts.

=item 2

The data items contained in these subcontexts do not share a common
outcome.

=back

Otherwise, the supracontext is I<homogeneous>.  Only homogeneous
supracontexts determine the possible outcomes.

=head2 The Analogical Set

The I<analogical set> is a list of all data items that appear in the
subcontexts in the homogeneous supracontexts.  With each data item is
associated a number which is one of the following:

=over 4

=item Number of I<occurrences>

This is merely the number of supracontexts containing the subcontext
containing the data item.

=item Number of I<pointers>

Assign to each supracontext a number representing the total number of
data items in the subcontexts it contains.  The number of pointers of
a particular data item is the sum of the numbers assigned to the
supracontexts containing the subcontext containing the data item.
(Number of occurrences can be thought of as assigning 1 to each
supracontext.)

=back

Using pointers gives rise to I<gang effects>: the ability of many data
items less similar to the test item but appearing in the same
subcontext to have more influence on the outcome than a few data items
more similar to the test item.

With the analogical set in hand, one can then describe the likelihood
of the various outcomes actually occurring by looking at the numbers
assigned to its data items.

=head1 OVERVIEW

I<Exemplar-based modeling> works as follows: there is a set of data
items, each assigned an outcome, and there is a test item.  The test
item is compared with the data items; the result of this comparison
tells what the possible outcomes of the test item are, along with
their likelihoods.

Exemplar-based modeling is often contrasted with I<rule-based
modeling>.  Note that in rule-based modeling, there can be only one
possible outcome, unless the model is fudged by introducing
probability.  (Some types of exemplar-based modeling also give only
one possible outcome.)

I<Analogical Modeling> (AM) is one way to do the comparison and
determine the outcome.  Some of its salient features are as follows:

=over 4

=item *

Exemplars that seem less similar to the test item than those that seem
more similar can still have a magnified effect if there are many of
them.  This is known as the I<gang effect>.

=item *

AM accounts for I<leakage>.

For instance, it is possible for someone to accidentally say "snew"
instead of "snowed", in analogy with "know/knew", "grow/grew",
"throw/threw", "blow/blew", etc.  (I've never done this myself, though
I know someone who has.)  In rule-based modeling, this could never
occur; in AM, this is predicted to occur, though with very low
frequency.

=back

=head1 AUTHOR

Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Royal Skousen.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

=cut