
Statistics::Cluto - Perl binding for CLUTO

Download CLUTO from http://glaros.dtc.umn.edu/gkhome/views/cluto.
Find libcluto.a which matches your environment and place it under your library path (or specify its path with LIBS option as shown below).
Then do:
perl Makefile.PL [LIBS='-L/where/to/find/libcluto.a -lcluto'] make make test make install
Tested with cluto-2.1.2/Darwin-i386, cluto-2.1.2/Darwin-ppc and cluto-2.1.1/Linux-i686.

use Statistics::Cluto;
use Data::Dumper;
my $c = new Statistics::Cluto;
$c->set_dense_matrix(4, 5, [
[8, 8, 0, 3, 2],
[2, 9, 9, 1, 4],
[7, 6, 1, 2, 3],
[1, 7, 8, 2, 1]
]);
$c->set_options({
rowlabels => [ 'row0', 'row1', 'row2', 'row3' ],
collabels => [ 'col0', 'col1', 'col2', 'col3', 'col4' ],
nclusters => 2,
rowmodel => CLUTO_ROWMODEL_NONE,
colmodel => CLUTO_COLMODEL_NONE,
pretty_format => 1,
});
my $clusters = $c->VP_ClusterRB;
print Dumper $clusters;
my $cluster_features = $c->V_GetClusterFeatures;
print Dumper $cluster_features;

This is a perl binding for CLUTO. Please refer to the CLUTO's manual sections 5.6 - 5.8 for details of each function. Basically, Statistics::Cluto has all corresponding methods for functions described in the manual.
Initial matrix can be set either via set_dense_matrix or via set_sparse_matrix method.
# loading 4x5 dense matrix
#
# 1 1 0 1 1
# 1 0 0 1 0
# 0 1 1 0 0
# 0 0 1 0 0
my $c = new Statistics::Cluto;
my $nrows = 4;
my $ncols = 5;
my $rowval = [
[1, 1, 0, 0, 1],
[1, 1, 0, 1, 1],
[1, 0, 1, 1, 0],
[1, 0, 1, 0, 0]
];
$c->set_dense_matrix($nrows, $ncols, $rowval);
# loading 4x5 sparse matrix
#
# 1 1 0 1 1
# 1 0 0 1 0
# 0 1 1 0 0
# 0 0 1 0 0
my $c = new Statistics::Cluto;
my $nrows = 4;
my $ncols = 5;
my $rowval = [
[1, 1, 2, 1, 4, 1, 5, 1],
[1, 1, 4, 1],
[2, 1, 3, 1],
[3, 1]
];
$c->set_sparse_matrix($nrows, $ncols, $rowval)
Sparse matrix can also be set with set_raw_sparse_matrix, using the data format described in the manual section 3.3, Fig 16.
# loading sparse matrix via set_raw_sparse_matrix() # # 1 1 0 1 1 # 1 0 0 1 0 # 0 1 1 0 0 # 0 0 1 0 0 my $c = new Statistics::Cluto; my $nrows = 4; my $ncols = 5; my $rowptr = [0, 4, 6, 8, 9]; my $rowind = [0, 1, 3, 4, 0, 3, 1, 2, 2]; my $rowval = [1, 1, 1, 1, 1, 1, 1, 1, 1]; $c->set_raw_sparse_matrix($nrows, $ncols, $rowptr, $rowind, $rowval);
Input parameters nrows, ncols, rowptr, rowind, rowval are set automatically when initial matrix is loaded. All other input parameters should be set before calling clustering functions via set_options method. See sections 5.6 - 5.8 for necessary parameters.
$c->set_options({
rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'],
collabels => ['col0', 'col1', 'col2', 'col3', 'col4'],
nclusters => 2,
nfeatures => 2,
clfun => CLUTO_CLFUN_I2,
treetype => CLUTO_TREE_TOP,
});
CLUTO's api functions described in the manual sections from 5.6 to 5.8 can be called with methods of the same name, but without prefix "CLUTO_".
e.g. CLUTO_VP_ClusterDirect (in section 5.6.1) is named VP_ClusterDirect in this package.
Routines with a single output parameter will return a single value / arrayref. Routines with multiple output parameters will return an array, each member of the array being the output parameters appearing in the same order as the manual.
# suppose $c is initialized with 5x5 sparse matrix:
# col0 ... col4
# row0: 2 2 0 2 2
# row1: 2 1 0 1 4
# row2: 0 2 5 0 0
# row3: 0 1 6 0 0
# row4: 2 1 0 3 4
$c->set_options({
rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'],
collabels => ['col0', 'col1', 'col2', 'col3', 'col4'],
nclusters => 2,
nfeatures => 2,
});
my $part = $c->VP_ClusterDirect;
# $part = [
# '1',
# '1',
# '0',
# '0',
# '1'
# ];
my ($internalids, $internalwgts, $externalids, $externalwgts) = $c->V_GetClusterFeatures;
# $internalids =
# [
# '2',
# '0',
# '4',
# '0'
# ]
# $internalwgts =
# [
# '1',
# '0',
# '0.598181843757629',
# '0.209491595625877'
# ]
# $externalids =
# [
# '2',
# '4',
# '2',
# '4'
# ]
# $externalwgts =
# [
# '0.5',
# '0.299090921878815',
# '0.5',
# '0.299090921878815'
# ]
Please refer to the manual for the details of the returned data structure.
When pretty_format option is set to 1, results are returned in a single hashref, and in a (hopefully) little bit more comprehensible way. Meaning of the returned data should be pretty much self-explanatory.
# with the same matrix and options as above...
$c->set_options({ pretty_format => 1 });
my $result = $c->VP_ClusterDirect;
# $result =
# [
# [
# { 'row' => 2, 'rowlabel' => 'row2' },
# { 'row' => 3, 'rowlabel' => 'row3' }
# ],
# [
# { 'row' => 0, 'rowlabel' => 'row0' },
# { 'row' => 1, 'rowlabel' => 'row1' },
# { 'row' => 4, 'rowlabel' => 'row4' }
# ]
# ];
$result = $c->V_GetClusterFeatures;
# $result =
# [
# [
# {
# 'discriminating' => [
# {
# 'externalwgt' => '0.5',
# 'collabel' => 'col2',
# 'externalid' => 2
# },
# {
# 'externalwgt' => '0.299090921878815',
# 'collabel' => 'col4',
# 'externalid' => 4
# }
# ],
# 'descriptive' => [
# {
# 'internalid' => 2,
# 'internalwgt' => '1',
# 'collabel' => 'col2'
# },
# {
# 'internalid' => 0,
# 'internalwgt' => '0',
# 'collabel' => 'col0'
# }
# ]
# },
# {
# 'discriminating' => [
# {
# 'externalwgt' => '0.5',
# 'collabel' => 'col2',
# 'externalid' => 2
# },
# {
# 'externalwgt' => '0.299090921878815',
# 'collabel' => 'col4',
# 'externalid' => 4
# }
# ],
# 'descriptive' => [
# {
# 'internalid' => 4,
# 'internalwgt' => '0.598181843757629',
# 'collabel' => 'col4'
# },
# {
# 'internalid' => 0,
# 'internalwgt' => '0.209491595625877',
# 'collabel' => 'col0'
# }
# ]
# }
# ]
# ];

use Statistics::Cluto qw(:all)
will export all constants defined in cluto.h. (Auto generated by h2xs). See section 5 of CLUTO's manual, or cluto.h for details.
CLUTO_CLFUN_CLINK CLUTO_CLFUN_CLINK_W CLUTO_CLFUN_CUT CLUTO_CLFUN_E1 CLUTO_CLFUN_G1 CLUTO_CLFUN_G1P CLUTO_CLFUN_H1 CLUTO_CLFUN_H2 CLUTO_CLFUN_I1 CLUTO_CLFUN_I2 CLUTO_CLFUN_MMCUT CLUTO_CLFUN_NCUT CLUTO_CLFUN_RCUT CLUTO_CLFUN_SLINK CLUTO_CLFUN_SLINK_W CLUTO_CLFUN_UPGMA CLUTO_CLFUN_UPGMA_W CLUTO_COLMODEL_IDF CLUTO_COLMODEL_NONE CLUTO_CSTYPE_BESTFIRST CLUTO_CSTYPE_LARGEFIRST CLUTO_CSTYPE_LARGESUBSPACEFIRST CLUTO_DBG_APROGRESS CLUTO_DBG_CCMPSTAT CLUTO_DBG_CPROGRESS CLUTO_DBG_MPROGRESS CLUTO_DBG_PROGRESS CLUTO_DBG_RPROGRESS CLUTO_GRMODEL_ASYMETRIC_DIRECT CLUTO_GRMODEL_ASYMETRIC_LINKS CLUTO_GRMODEL_EXACT_ASYMETRIC_DIRECT CLUTO_GRMODEL_EXACT_ASYMETRIC_LINKS CLUTO_GRMODEL_EXACT_SYMETRIC_DIRECT CLUTO_GRMODEL_EXACT_SYMETRIC_LINKS CLUTO_GRMODEL_INEXACT_ASYMETRIC_DIRECT CLUTO_GRMODEL_INEXACT_ASYMETRIC_LINKS CLUTO_GRMODEL_INEXACT_SYMETRIC_DIRECT CLUTO_GRMODEL_INEXACT_SYMETRIC_LINKS CLUTO_GRMODEL_NONE CLUTO_GRMODEL_SYMETRIC_DIRECT CLUTO_GRMODEL_SYMETRIC_LINKS CLUTO_MEM_NOREUSE CLUTO_MEM_REUSE CLUTO_MTYPE_HEDGE CLUTO_MTYPE_HSTAR CLUTO_MTYPE_HSTAR2 CLUTO_OPTIMIZER_MULTILEVEL CLUTO_OPTIMIZER_SINGLELEVEL CLUTO_ROWMODEL_LOG CLUTO_ROWMODEL_MAXTF CLUTO_ROWMODEL_NONE CLUTO_ROWMODEL_SQRT CLUTO_SIM_CORRCOEF CLUTO_SIM_COSINE CLUTO_SIM_EDISTANCE CLUTO_SIM_EJACCARD CLUTO_SUMMTYPE_MAXCLIQUES CLUTO_SUMMTYPE_MAXITEMSETS CLUTO_TREE_FULL CLUTO_TREE_TOP CLUTO_VER_MAJOR CLUTO_VER_MINOR CLUTO_VER_SUBMINOR

http://glaros.dtc.umn.edu/gkhome/views/cluto

Ikuhiro IHARA <tsukue@gmail.com>

Copyright (C) 2007 by Ikuhiro IHARA
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.