Gene Boggs > Statistics-RankCorrelation-0.1204 > Statistics::RankCorrelation

Dependencies

Annotate this POD

# CPAN RT

 Open 0
View/Report Bugs
Module Version: 0.1204

# NAME

Statistics::RankCorrelation - Compute the rank correlation between two vectors

version 0.1204

# SYNOPSIS

```  use Statistics::RankCorrelation;

\$x = [ 8, 7, 6, 5, 4, 3, 2, 1 ];
\$y = [ 2, 1, 5, 3, 4, 7, 8, 6 ];

\$c = Statistics::RankCorrelation->new( \$x, \$y, sorted => 1 );

\$n = \$c->spearman;
\$t = \$c->kendall;
\$m = \$c->csim;

\$s = \$c->size;
\$xd = \$c->x_data;
\$yd = \$c->y_data;
\$xr = \$c->x_rank;
\$yr = \$c->y_rank;
\$xt = \$c->x_ties;
\$yt = \$c->y_ties;```

# DESCRIPTION

This module computes rank correlation coefficient measures between two sample vectors.

Examples can be found in the distribution `eg/` directory and methods test.

# METHODS

## new

```  \$c = Statistics::RankCorrelation->new;
\$c = Statistics::RankCorrelation->new( \@u, \@v );
\$c = Statistics::RankCorrelation->new( \@u, \@v, sorted => 1 );```

This method constructs a new `Statistics::RankCorrelation` object.

If given two numeric vectors (as array references), the statistical ranks are computed. If the vectors are of different size, the shorter is padded with zeros.

If the `sorted` flag is set, both are sorted by the first (x) vector.

## x_data

```  \$c->x_data( \$y );
\$x = \$c->x_data;```

Set or return the one dimensional array reference data. This is the "unit" array, used as a reference for size and iteration.

## y_data

```  \$c->y_data( \$y );
\$x = \$c->y_data;```

Set or return the one dimensional array reference data. This vector is dependent on the x vector.

## size

```  \$c->size( \$s );
\$s = \$c->size;```

Set or return the number of array elements.

## x_rank

```  \$c->x_rank( \$r );
\$r = \$c->x_rank;```

Set or return the ranks as an array reference.

## y_rank

```  \$c->y_rank( \$y );
\$y = \$c->y_rank;```

Set or return the ranks as an array reference.

## x_ties

```  \$c->x_ties( \$t );
\$t = \$c->x_ties;```

Set or return the ties as a hash reference.

## y_ties

```  \$c->y_ties( \$t );
\$t = \$c->y_ties;```

Set or return the ties as a hash reference.

## spearman

```  \$n = \$c->spearman;

6 * sum( (xi - yi)^2 )
1 - --------------------------
n^3 - n```

Return Spearman's rho.

Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values and is a special case of the Pearson product-moment correlation.

Here `x` and `y` are the two rank vectors and `i` is an index from one to n number of samples.

## kendall

```  \$t = \$c->kendall;

c - d
t = -------------
n (n - 1) / 2```

Return Kendall's tau.

Here, c and d, are the number of concordant and discordant pairs and n is the number of samples.

## csim

`  \$n = \$c->csim;`

Return the contour similarity index measure. This is a single dimensional measure of the similarity between two vectors.

This returns a measure in the (inclusive) range `[-1..1]` and is computed using matrices of binary data representing "higher or lower" values in the original vectors.

This measure has been studied in musical contour analysis.

# FUNCTIONS

## rank

```  \$v = [qw(1 3.2 2.1 3.2 3.2 4.3)];
\$ranks = rank(\$v);
# [1, 4, 2, 4, 4, 6]
my( \$ranks, \$ties ) = rank(\$v);
# [1, 4, 2, 4, 4, 6], { 1=>[], 3.2=>[]}```

Return an list of an array reference of the ordinal ranks and a hash reference of the tied data.

In the case of a tie in the data (identical values) the rank numbers are averaged. An example will elucidate:

```  sorted data:    [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ]
ranks:          [ 1,   2,   3,   4,   5,   6   ]
tied ranks:     3, 4, and 5
tied average:   (3 + 4 + 5) / 3 == 4
averaged ranks: [ 1,   2,   4,   4,   4,   6   ]```

```  ( \$u, \$v ) = pad_vectors( [ 1, 2, 3, 4 ], [ 9, 8 ] );
# [1, 2, 3, 4], [9, 8, 0, 0]```

Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.

## co_sort

`  ( \$u, \$v ) = co_sort( \$u, \$v );`

Sort the vectors as two dimensional data-point pairs with u values sorted first.

## correlation_matrix

`  \$matrix = correlation_matrix( \$u );`

Return the correlation matrix for a single vector.

This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.

## sign

Return 0, 1 or -1 given a number.

# TO DO

Handle any number of vectors instead of just two.

Implement other rank correlation measures that are out there...

For the `csim` method:

http://personal.systemsbiology.net/ilya/Publications/JNMRcontour.pdf

For the `spearman` and `kendall` methods:

http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html

http://en.wikipedia.org/wiki/Kendall's_tau

# THANK YOU

For helping make this sturdier code:

Thomas Breslin <thomas@thep.lu.se>

Jerome <jerome.hert@free.fr>

Jon Schutz <Jon.Schutz@youramigo.com>

Andy Lee <yikes2000@yahoo.com>

# AUTHOR

Gene Boggs <gene@cpan.org>

This software is copyright (c) 2015 by Gene Boggs.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

syntax highlighting: