AI::NeuralNet::SOM - Perl extension for Kohonen Maps
use AI::NeuralNet::SOM::Rect; my $nn = new AI::NeuralNet::SOM::Rect (output_dim => "5x6", input_dim => 3); $nn->initialize; $nn->train (30, [ 3, 2, 4 ], [ -1, -1, -1 ], [ 0, 4, -3]); my @mes = $nn->train (30, ...); # learn about the smallest errors # during training print $nn->as_data; # dump the raw data print $nn->as_string; # prepare a somehow formatted string use AI::NeuralNet::SOM::Torus; # similar to above use AI::NeuralNet::SOM::Hexa; my $nn = new AI::NeuralNet::SOM::Hexa (output_dim => 6, input_dim => 4); $nn->initialize ( [ 0, 0, 0, 0 ] ); # all get this value $nn->value (3, 2, [ 1, 1, 1, 1 ]); # change value for a neuron print $nn->value (3, 2); $nn->label (3, 2, 'Danger'); # add a label to the neuron print $nn->label (3, 2);
This package is a stripped down implementation of the Kohonen Maps (self organizing maps). It is NOT meant as demonstration or for use together with some visualisation software. And while it is not (yet) optimized for speed, some consideration has been given that it is not overly slow.
Particular emphasis has been given that the package plays nicely with others. So no use of files, no arcane dependencies, etc.
The basic idea is that the neural network consists of a 2-dimensional array of N-dimensional vectors. When the training is started these vectors may be completely random, but over time the network learns from the sample data, which is a set of N-dimensional vectors.
Slowly, the vectors in the network will try to approximate the sample vectors fed in. If in the sample vectors there were clusters, then these clusters will be neighbourhoods within the rectangle (or whatever topology you are using).
Technically, you have reduced your dimension from N to 2.
The constructor takes arguments:
input_dim: (mandatory, no default)
A positive integer specifying the dimension of the sample vectors (and hence that of the vectors in the grid).
learning_rate: (optional, default
This is a magic number which controls how strongly the vectors in the grid can be influenced. Stronger movement can mean faster learning if the clusters are very pronounced. If not, then the movement is like noise and the convergence is not good. To mediate that effect, the learning rate is reduced over the iterations.
sigma0: (optional, defaults to radius)
A non-negative number representing the start value for the learning radius. Practically, the value should be chosen in such a way to cover a larger part of the map. During the learning process this value will be narrowed down, so that the learning radius impacts less and less neurons.
NOTE: Do not choose
1 as the
log function is used on this value.
Subclasses will (re)define some of these parameters and add others:
my $nn = new AI::NeuralNet::SOM::Rect (output_dim => "5x6", input_dim => 3);
You need to initialize all vectors in the map before training. There are several options how this is done:
If you provide a list of vectors, these will be used in turn to seed the neurons. If the list is shorter than the number of neurons, the list will be started over. That way it is trivial to zero everything:
$nn->initialize ( [ 0, 0, 0 ] );
Then all vectors will get randomized values (in the range [ -0.5 .. 0.5 ]).
$nn->train ( $epochs, @vectors )
@mes = $nn->train ( $epochs, @vectors )
The training uses the list of sample vectors to make the network learn. Each vector is simply a reference to an array of values.
epoch parameter controls how many vectors are processed. The vectors are NOT used in sequence, but picked randomly from the list. For this reason it is wise to run several epochs, not just one. But within one epoch all vectors are visited exactly once.
$nn->train (30, [ 3, 2, 4 ], [ -1, -1, -1 ], [ 0, 4, -3]);
($x, $y, $distance) = $nn->bmu ($vector)
This method finds the best matching unit, i.e. that neuron which is closest to the vector passed in. The method returns the coordinates and the actual distance.
$me = $nn->mean_error (@vectors)
This method takes a number of vectors and produces the mean distance, i.e. the average error which the SOM makes when finding the
bmus for the vectors. At least one vector must be passed in.
Obviously, the longer you let your SOM be trained, the smaller the error should become.
$ns = $nn->neighbors ($sigma, $x, $y)
Finds all neighbors of (X, Y) with a distance smaller than SIGMA. Returns a list reference of (X, Y, distance) triples.
$dim = $nn->output_dim
Returns the output dimensions of the map as passed in at constructor time.
$radius = $nn->radius
Returns the radius of the map. Different topologies interpret this differently.
$m = $nn->map
This method returns a reference to the map data. See the appropriate subclass of the data representation.
$val = $nn->value ($x, $y)
$nn->value ($x, $y, $val)
Set or get the current vector value for a particular neuron. The neuron is addressed via its coordinates.
$label = $nn->label ($x, $y)
$nn->label ($x, $y, $label)
Set or get the label for a particular neuron. The neuron is addressed via its coordinates. The label can be anything, it is just attached to the position.
This methods creates a pretty-print version of the current vectors.
This methods creates a string containing the raw vector data, row by row. This can be fed into gnuplot, for instance.
See the example script in the directory
examples provided in the distribution. It uses PDL (for speed and scalability, but the results are not as good as I had thought).
See the example script in the directory
examples. It uses
Storable to directly dump the data structure onto disk. Storage and retrieval is quite fast.
There is most likely something wrong with the
input_dim you specified and your vectors should be having.
Bugs should always be submitted via the CPAN bug tracker https://rt.cpan.org/Dist/Display.html?Status=Active&Queue=AI-NeuralNet-SOM
Explanation of the algorithm:
Old version of AI::NeuralNet::SOM from Alexander Voischev:
Robert Barta, <firstname.lastname@example.org>
Copyright (C) 200 by Robert Barta
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.