Algorithm::DistanceMatrix - Compute distance matrix for any distance metric
version 0.04
use Algorithm::DistanceMatrix; my $m = Algorithm::DistanceMatrix->new( metric=>\&mydistance,objects=\@myarray); my $distmatrix = $m->distancematrix; use Algorithm::Cluster qw/treecluster/; # method=> # s: single-linkage clustering # http://en.wikipedia.org/wiki/Single-linkage_clustering # m: maximum- (or complete-) linkage clustering # http://en.wikipedia.org/wiki/Complete_linkage_clustering # a: average-linkage clustering (UPGMA) # http://en.wikipedia.org/wiki/UPGMA my $tree = treecluster(data=>$distmat, method=>'a'); # Get your objects and the cluster IDs they belong to, assuming 5 clusters my $cluster_ids = $tree->cut(5); # Index corresponds to that of the original objects print $objects->[2], ' belongs to cluster ', $cluster_ids->[2], "\n";
This is a small helper package for Algorithm::Cluster. That module provides many facilities for clustering data. It also provides a distancematrix function, but assumes tabular data, which is the standard for gene expression data.
distancematrix
If your data is tabular, you should first have a look at distancematrix in Algorithm::Cluster
http://cpansearch.perl.org/src/MDEHOON/Algorithm-Cluster-1.48/doc/cluster.pdf
Otherwise, this package provides a simple distance matrix, given an arbitrary distance function. It does not assume anything about your data. You simply provide a callback function for measuring the distance between any two objects. It produces a lower diagonal (by default) distance matrix that is fit to be used by the clustering algorithms of Algorithm::Cluster.
One of qw/lower upper full/ for a lower diagonal, upper diagonal, or full distance matrix.
qw/lower upper full/
Callback for computing the distance, similarity, or whatever measure you like.
$matrix->metric(\@mydistance);
Where mydistance receives two objects as it's first two arguments.
mydistance
If you need to pass special parameters to your method:
$matrix->metric(sub{my($x,$y)=@_;mydistance(first=>$x,second=>$y,mode=>'fast')};
You may use any metric, and may return any number or object. Note that if you plan to use this with Algorithm::Cluster this needs to be a distance metric. So, if you're measure how similar two things are, on a scale of 1-10, then you should return 10-$similarity to get a distance.
10-$similarity
Default is the absolute values of the scalar difference (i.e. abs(X-Y))
abs(X-Y)
Array reference. Doesn't matter what kind of objects are in the array, as long as your metric can process them.
metric
2D array of distances (or similarities, or whatever) between your objects.
(An ArrayRef of ArrayRefs.)
Chad A. Davis <chad.a.davis@gmail.com>
This software is copyright (c) 2011 by Chad A. Davis.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Algorithm::DistanceMatrix, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Algorithm::DistanceMatrix
CPAN shell
perl -MCPAN -e shell install Algorithm::DistanceMatrix
For more information on module installation, please visit the detailed CPAN module installation guide.