The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package Algorithm::AM::lattice;
use strict;
use warnings;
# ABSTRACT: How to store and manipulate large lattices in C
# VERSION

1;

__END__

=pod

=head1 NAME

Algorithm::AM::lattice - How to store and manipulate large lattices in C

=head1 VERSION

version 2.45

=head1 DESCRIPTION

The Analogical Modeling (AM) algorithm requires constructing and
traversing large completely distributive lattices, also known as
Boolean algebras.  This document tells how we do it in F<Parallel.xs>.

A lot of what appears below could be used generally to store lattices;
that which applies specifically to C<AM::Parallel> is so marked.

=head1 BOOLEAN ALGEBRAS AND LATTICES

If I<n> is a positive integer, then the set of positive integers that
can be expressed in I<n> or fewer bits, along with the operations &
(bitwise AND) and | (bitwise OR), is an example of a I<Boolean
algebra>.  It is also an example of a I<lattice> which is I<completely
distributive>.  Although all the lattices used in AM are in fact
Boolean algebras, it is customary to refer to them merely as lattices.

Any lattice can have a I<partial order> imposed upon it; this is done
by defining I<a> E<lt>= I<b> whenever I<a> & I<b> = I<b> (or
equivalently, I<a> | I<b> = I<a>).  This partial order is symmetric,
transitive, and antireflexive (if I<a> E<lt>= I<b> and I<b> E<lt>=
I<a>, then I<a> = I<b>).  It's called a I<partial> order because it is
often the case that neither I<a> E<lt>= I<b> nor I<b> E<lt>= I<a>.

A common way to draw lattices on paper is by putting elements that are
greater than other elements higher up on the paper, using line
segments to indicate the partial order.  Here's an example:

                000
                /|\
               / | \
              /  |  \
             /   |   \
           001  010  100
           | \  / \  / |
           |  \/   \/  |
           |  /\   /\  |
           | /  \ /  \ |
           011  101  110
             \   |   /
              \  |  /
               \ | /
                \|/
                111

If you can get from one element to another by only going down along
the line segments, then the first element is greater than the second.

=head2 Notes for AM

=over 4

=item *

The elements of the lattice created by AM are sets.  The partial order
is defined as follows: I<A> E<lt>= I<B> if I<B> is a subset of I<A>.
If you draw the lattice, the smaller sets are at the top.  This
lattice is known as the I<supracontextual lattice>; its elements are
called I<supracontexts>.

=item *

The value of I<n>, and thus the size of the lattice, is determined by
the length of the I<feature vector> of the test item (see F<AM.pod>
for more explanation).  There is a set corresponding to each I<n>-bit
positive integer; furthermore, if set I<A> corresponds to integer I<a>
and set I<B> corresponds to integer I<b>, then I<A> is a superset of
(or "below") I<B> if I<a> & I<b> = I<b>.

=item *

Many of the elements of the supracontextual lattice are equal as sets;
i.e., they have precisely the same members.  Thus, for those of you
who know a lot of math, it is important not to confuse the
supracontextual lattice with the Boolean algebra generated by the
power set of a set.  The supracontextual lattice I<is> a Boolean
algebra of sets; where these sets come from is explained in F<AM.pod>.

To store the supracontextual lattice, it is enough to create an array
C<lattice[]> of length 2^I<n>, where C<lattice[>I<a>C<]> contains a
pointer to a structure containing information about the elements of
the set corresponding to I<a>.

Of course, the size of C<lattice[]> grows exponentially with I<n>; to
overcome that, see the section on L<lattices as products of smaller
lattices|"LATTICES AS PRODUCTS OF SMALLER LATTICES">.

=item *

The supracontextual lattice is built up by adding elements to these
sets one at a time.  When a new element is added to a set, it is a
simple thing to C<memcpy> (actually, we use Perl's safe equivalent,
C<Copy>) the original set to a new location, append the new element,
and change the pointer.  We only have to do this once; F<Parallel.xs>
keeps track of the creation of new sets, so sometimes all that is
necessary is the changing of a pointer.

=back

=head1 TRAVERSING A LATTICE

During the course of the AM algorithm, it is necessary to visit all
the supracontexts that lie "below" a given supracontext.  For
example, given that a supracontext is labeled

 1001011

AM requires an iterator that produces

 1001111
 1011011
 1011111
 1101011
 1101111
 1111011
 1111111

though the order in which these seven are produced is immaterial.

F<Parallel.xs> does this by using a I<Gray code>.  This is a method by
which only one bit flips (either from 0 to 1 or from 1 to 0) at each
step.  Deciding which bit to flip is done as follows:

=over 4

=item 1

List the "gaps"; for

 1001011

the gaps are

 0100000 = gaps[0]
 0010000 = gaps[1]
 0000100 = gaps[2]

Each gap has exactly one 1 bit which lines up with a 0 in the original
number.

=item 2

If there are I<g> gaps, list the I<g>-bit integers in reverse order:
in this case, 111, 110, ..., 001, 000.

=item 3

Take each of these numbers in succession.  Determine where the
rightmost 1 is; its position determines which bit to flip:

 1001011
 1101011 = 1001011 ^               0100000 (rightmost 1 of 111 is bit 0, use gap[0])
 1111011 = 1101011 ^        0010000        (rightmost 1 of 110 is bit 1, use gap[1])
 1011011 = 1111011 ^               0100000 (rightmost 1 of 101 is bit 0, use gap[0])
 1011111 = 1011011 ^ 0000100               (rightmost 1 of 100 is bit 2, use gap[2])
 1111111 = 1011111 ^               0100000 (rightmost 1 of 011 is bit 0, use gap[0])
 1101111 = 1111111 ^        0010000        (rightmost 1 of 010 is bit 1, use gap[1])
 1001111 = 1101111 ^               0100000 (rightmost 1 of 001 is bit 0, use gap[0])

=back

(As I write this, I see that finding the bit to flip is the same
problem of deciding which disk to move in the Towers of Hanoi
problem.)

=head1 LATTICES AS PRODUCTS OF SMALLER LATTICES

Consider the following lattice: the first number is the binary label,
and the other numbers represent the elements of the set with that
label:

 label  elements

  0000

  0001  3
  0010
  0100  6
  1000

  0011  3
  0101  3 6
  0110  6
  1001  1 3
  1010  4
  1100  5 6

  0111  2 3 6
  1011  1 3 4 7
  1101  1 3 5 6
  1110  4 5 6

  1111  1 2 3 4 5 6 7

(Verify that this is a lattice.)

This lattice can be stored as two smaller lattices:

 label  elements

    00  3
    01  2 3 6
    10  1 3 4 7
    11  1 2 3 4 5 6 7

 label  elements

    00  5 6
    01  1 3 5 6
    10  4 5 6
    11  1 2 3 4 5 6 7

The set labeled by 1001 in the large lattice is precisely the
intersection of the sets labeled by 10 and 01 respectively in the
smaller lattices: {1, 3} is the intersection of {1, 3, 4, 7} and {1,
3, 5, 6}.

C<AM::Parallel> breaks the supracontextual lattice up into 4 smaller
lattices, resulting in a great savings of memory at the expense of
finding a lot of intersections.  But since the elements of the sets
are listed as increasing sequences of positive integers, finding the
intersection is actually quite straightforward.

To initialize, set I<i> to point to the largest element of the first
set and I<j> to point to the largest element of the second set.

=over 4

=item 1

Move I<i> to the left as long as it points to an integer larger than
that pointed to by I<j>.

=item 2

If I<i> points to an integer less than the integer pointed to by I<j>,
swap I<i> and I<j> so they point into the opposite sets; go to step 1.

=item 3

If I<i> and I<j> point to equal values, put this equal value into the
intersection and move both I<i> and I<j> once to the left.

=item 4

If I<i> can't be moved, the algorithm ends.

=back

=head1 AUTHOR

Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>

=head1 COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Royal Skousen.

This is free software; you can redistribute it and/or modify it under
the same terms as the Perl 5 programming language system itself.

=cut