Algorithm::AM::algorithm - How the Analogical Modeling algorithm works
version 3.09
First, the user must create a set of data items, with their outcomes, and some test items. All of these items are represented by feature vectors. These feature vectors are not created by the AM algorithm; they could be generated by hand or script, it matters not to AM. All the feature vectors must be of the same length, call it n.
AM requires the construction of a supracontextual lattice. It is merely a complete distributive lattice of sets called supracontexts, each one labeled with an integer in the range 0 to 2^n - 1. If a and b are labels of two supracontexts, then a & b = b (that's bitwise AND) iff the supracontext labeled by a is a superset of the supracontext labeled by b.
The supracontextual lattice starts out with every element being the empty set. The AM algorithm adds subcontexts to them one at a time.
Subcontexts are also sets, and they are also labeled with an integer in the range 0 to 2^n - 1. The elements of the subcontexts are data items.
Suppose that the test item has feature vector
('S', 'O', '0', 'S', 'R', '0', 'T', 'A')
and a data item has the feature vector
('P', 'Y', 'V', 'S', 'R', '0', 'T', 'a').
Compare the corresponding features, using a 1 for different and 0 for same. Then the binary number
0b11100001
is the label of the subcontext to which the data item belongs.
The elements of the supracontexts are the subcontexts. If a subcontext has label a, then it will be an element of any supracontext whose label b satisfies b & a = a.
Thus, the subcontext labeled by 0b11100001 will belong to 16 different supracontexts. The labels of these supracontexts are found by replacing the 0s by 1s in all possible ways; the original 1s are left untouched.
A supracontext is heterogeneous if the following two conditions hold:
Otherwise, the supracontext is homogeneous. Only homogeneous supracontexts determine the possible outcomes.
The analogical set is a list of all data items that appear in the subcontexts in the homogeneous supracontexts. With each data item is associated a number which is one of the following:
This is merely the number of supracontexts containing the subcontext containing the data item.
Assign to each supracontext a number representing the total number of data items in the subcontexts it contains. The number of pointers of a particular data item is the sum of the numbers assigned to the supracontexts containing the subcontext containing the data item. (Number of occurrences can be thought of as assigning 1 to each supracontext.)
Using pointers gives rise to gang effects: the ability of many data items less similar to the test item but appearing in the same subcontext to have more influence on the outcome than a few data items more similar to the test item.
With the analogical set in hand, one can then describe the likelihood of the various outcomes actually occurring by looking at the numbers assigned to its data items.
Exemplar-based modeling works as follows: there is a set of data items, each assigned an outcome, and there is a test item. The test item is compared with the data items; the result of this comparison tells what the possible outcomes of the test item are, along with their likelihoods.
Exemplar-based modeling is often contrasted with rule-based modeling. Note that in rule-based modeling, there can be only one possible outcome, unless the model is fudged by introducing probability. (Some types of exemplar-based modeling also give only one possible outcome.)
Analogical Modeling (AM) is one way to do the comparison and determine the outcome. Some of its salient features are as follows:
For instance, it is possible for someone to accidentally say "snew" instead of "snowed", in analogy with "know/knew", "grow/grew", "throw/threw", "blow/blew", etc. (I've never done this myself, though I know someone who has.) In rule-based modeling, this could never occur; in AM, this is predicted to occur, though with very low frequency.
Theron Sanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>
This software is copyright (c) 2013 by Royal Skousen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.