Gene Boggs >
Statistics-RankCorrelation >
Statistics::RankCorrelation

Module Version: 0.1203
Statistics::RankCorrelation - Compute the rank correlation between two vectors

use Statistics::RankCorrelation; $x = [ 8, 7, 6, 5, 4, 3, 2, 1 ]; $y = [ 2, 1, 5, 3, 4, 7, 8, 6 ]; $c = Statistics::RankCorrelation->new( $x, $y, sorted => 1 ); $n = $c->spearman; $t = $c->kendall; $m = $c->csim; $s = $c->size; $xd = $c->x_data; $yd = $c->y_data; $xr = $c->x_rank; $yr = $c->y_rank; $xt = $c->x_ties; $yt = $c->y_ties;

This module computes rank correlation coefficient measures between two sample vectors.

Examples can be found in the distribution `eg/`

directory and methods test.

$c = Statistics::RankCorrelation->new; $c = Statistics::RankCorrelation->new( \@u, \@v ); $c = Statistics::RankCorrelation->new( \@u, \@v, sorted => 1 );

This method constructs a new `Statistics::RankCorrelation`

object.

If given two numeric vectors (as array references), the statistical ranks are computed. If the vectors are of different size, the shorter is padded with zeros.

If the `sorted`

flag is set, both are sorted by the first (**x**) vector.

$c->x_data( $y ); $x = $c->x_data;

Set or return the one dimensional array reference data. This is the "unit" array, used as a reference for size and iteration.

$c->y_data( $y ); $x = $c->y_data;

Set or return the one dimensional array reference data. This vector is dependent on the x vector.

$c->size( $s ); $s = $c->size;

Set or return the number of array elements.

$c->x_rank( $r ); $r = $c->x_rank;

Set or return the ranks as an array reference.

$c->y_rank( $y ); $y = $c->y_rank;

Set or return the ranks as an array reference.

$c->x_ties( $t ); $t = $c->x_ties;

Set or return the ties as a hash reference.

$c->y_ties( $t ); $t = $c->y_ties;

Set or return the ties as a hash reference.

$n = $c->spearman; 6 * sum( (xi - yi)^2 ) 1 - -------------------------- n^3 - n

Return Spearman's rho.

Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values and is a special case of the Pearson product-moment correlation.

Here `x`

and `y`

are the two rank vectors and `i`

is an index from one to **n** number of samples.

$t = $c->kendall; c - d t = ------------- n (n - 1) / 2

Return Kendall's tau.

Here, **c** and **d**, are the number of concordant and discordant pairs and **n** is the number of samples.

$n = $c->csim;

Return the contour similarity index measure. This is a single dimensional measure of the similarity between two vectors.

This returns a measure in the (inclusive) range `[-1..1]`

and is computed using matrices of binary data representing "higher or lower" values in the original vectors.

This measure has been studied in musical contour analysis.

$v = [qw(1 3.2 2.1 3.2 3.2 4.3)]; $ranks = rank($v); # [1, 4, 2, 4, 4, 6] my( $ranks, $ties ) = rank($v); # [1, 4, 2, 4, 4, 6], { 1=>[], 3.2=>[]}

Return an list of an array reference of the ordinal ranks and a hash reference of the tied data.

In the case of a tie in the data (identical values) the rank numbers are averaged. An example will elucidate:

sorted data: [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ] ranks: [ 1, 2, 3, 4, 5, 6 ] tied ranks: 3, 4, and 5 tied average: (3 + 4 + 5) / 3 == 4 averaged ranks: [ 1, 2, 4, 4, 4, 6 ]

( $u, $v ) = pad_vectors( [ 1, 2, 3, 4 ], [ 9, 8 ] ); # [1, 2, 3, 4], [9, 8, 0, 0]

Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.

( $u, $v ) = co_sort( $u, $v );

Sort the vectors as two dimensional data-point pairs with **u** values sorted first.

$matrix = correlation_matrix( $u );

Return the correlation matrix for a single vector.

This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.

Return 0, 1 or -1 given a number.

Handle any number of vectors instead of just two.

Implement other rank correlation measures that are out there...

For the `csim`

method:

http://personal.systemsbiology.net/ilya/Publications/JNMRcontour.pdf

For the `spearman`

and `kendall`

methods:

http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html

http://en.wikipedia.org/wiki/Kendall's_tau

For helping make this sturdier code:

Thomas Breslin <thomas@thep.lu.se>

Jerome <jerome.hert@free.fr>

Jon Schutz <Jon.Schutz@youramigo.com>

Andy Lee <yikes2000@yahoo.com>

Gene Boggs <gene@cpan.org>

Copyright 2010, Gene Boggs, All Rights Reserved.

This program is free software; you can redistribute or modify it under the same terms as Perl itself.

syntax highlighting: