The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::Sequences - Tests of sequences for runs, joins, bunches, turns, doublets, trinomes, potential energy, etc.

SYNOPSIS

  use Statistics::Sequences 0.11;
  $seq = Statistics::Sequences->new();
  
  my @data = (); # make it up:
  push @data, int(rand(2)) foreach 0 .. 300;

  $seq->load(\@data); # or @data or dataname => \@data
  print $seq->observed(stat => 'runs'); # expected, variance, z_value, p_value
  print $seq->observed(stat => 'pot', state => 1); # expected, variance, z_value, p_value
  print $seq->test(stat => 'vnomes', length => 2); # length of "v" (for mononomes/singlets, dinomes/doublets, etc.)
  $seq->dump(stat => 'runs', values => {observed => 1, z_value => 1, p_value => 1}, exact => 1, tails => 1);
  # see also Statistics::Data for inherited method for handling the loaded data

DESCRIPTION

Loading and preparing data for statistical tests of their sequential structure via Statistics::Sequences::Joins, Statistics::Sequences::Pot, Statistics::Sequences::Runs, Statistics::Sequences::Turns and Statistics::Sequences::Vnomes. Examples of the use of each test are given in these pages.

In general, to access the tests, you use this base module to directly create a Statistics::Sequences object with the new method. You then load data into it, and then access each test by calling the test method and specifying the stat attribute: either joins, pot, runs, turns or vnomes. This way, you can run several tests on the same data, as the data are immediately available to each test (of joins, pot, runs, turns or vnomes). See the SYNOPSIS for a simple example.

Otherwise, you can use each sub-module directly, and restrict your analyses to the sub-module's test. That is, if you only want to perform a test of one type (e.g., runs), you might simply use the relevant sub-package, create a class object specific to it, and load data specfically for its use; see the SYNOPSIS for the particular test, i.e., Joins, Pot, Runs, Turns or Vnomes. You won't be able to access other tests of the same data by this approach, unless you create another object for that test, and then specifically pass the data from the earlier object into the new one.

There are also methods to anonymously or nominally cache data, and that data might need to be reduced to a dichotomous format, before a valid test can be run. Several dichotomizing methods are provided, once data are loaded, and accessible via the generic or specific class objects, as above.

METHODS

The package provides an object-oriented interface for performing the tests of sequences in the form of Runs, Joins, Pot(ential energy), Turns or Vnomes.

Most methods are named with aliases, should you be used to referring to Perl statistics methods by one or another of the many conventions. Present conventions are mostly based on those used in Juan Yun-Fang's modules, e.g., Statistics::ChisqIndep.

new

 $seq = Statistics::Sequences->new();

Returns a new Statistics::Sequences object (inherited from Statistics::Data) by which all the methods for caching, reading and testing data can be accessed, including each of the methods for performing the Runs-, Joins-, Pot-, Turns- or Vnomes-tests.

Sub-packages also have their own new method - so, e.g., Statistics::Sequences::Runs, can be individually imported, and its own new method can be called, e.g.:

 use Statistics::Sequences::Runs;
 $runs = Statistics::Sequences::Runs->new();

In this case, data are not automatically shared across packages, and only one test (in this case, the Runs-test) can be accessed through the class-object returned by new.

load, add, unload

All these operations on the basic data are inherited from Statistics::Data - see Statistics::Data for details of these and other possible methods.

Dichotomous data: Both the runs- and joins-tests expect dichotomous data: a binary or binomial or Bernoulli sequence, but with whatever characters to symbolize the two possible events. They test their "loads" to make sure the data are dichotomous. To reduce numerical and categorical data to a dichotomous level, see the pool, match, split, swing, shrink (boolwin) and other methods in Statistics::Data::Dichotomize.

observed, observation

 $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
 $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', label => 'myLabelledLoadedData'); # just needs args for partic.stats

Return the observed value of the statistic for the loaded data, or data sent with this call, eg., how many runs in the sequence (1, 1, 0, 1). See the particular statistic's manpage for any other arguments needed or optional.

expected, expectation

 $v = $seq->expected(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->expected(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Return the expected value of the statistic for the loaded data, or data sent with this call, eg., how many runs should occur in a 4-length sequence of two possible events. See the statistic's manpage for any other arguments needed or optional.

variance

 $seq->variance(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $seq->variance(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Returns the expected range of deviation in the statistic's observed value for the given number of trials.

obsdev, observed_deviation

 $v = $seq->obsdev(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->obsdev(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Returns the deviation of (difference between) observed and expected values of the statistic for the loaded/given sequence (O - E).

stdev, standard_deviation

 $v = $seq->stdev(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->stdev(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Returns square-root of the variance.

z_value, zscore

 $v = $seq->zscore(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->zscore(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Return the deviation ratio: observed deviation to standard deviation. Use argument ccorr for continuity correction.

p_value, test

 $p = $seq->test(stat => 'runs');
 $p = $seq->test(stat => 'joins');
 $p = $seq->test(stat => 'turns');
 $p = $seq->test(stat => 'pot', state => 'a value appearing in the data');
 $p = $seq->test(stat => 'vnomes', length => 'an integer greater than zero and less than sample-size');

Returns the probability of observing so many runs, joins, etc., versus those expected, relative to the expected variance.

When using a Statistics::Sequences class-object, this method requires naming which test to perform, i.e., runs, joins, pot or vnomes. This is not required when the class-object already refers to one of the sub-modules, as created by the new method within Statistics::Sequences::Runs, Statistics::Sequences::Joins, Statistics::Sequences::Pot, Statistics::Sequences::Turns and Statistics::Sequences::Vnomes.

General options

Options to test available to all the sub-package tests are as follows.

data => 'string'

Optionally specify the name of the data to be tested. By default, this is not required: the data tested are those that were last loaded, either anonymously, or as returned by one of the Statistics::Data::Dichotomize methods. Otherwise, if the data are already ready for testing in a dichotomous format, data that were previously loaded by name can be individually tested. For example, here are two sets of data that are loaded by name, and then a single test of one of them is performed.

 @chimps = (qw/banana banana cheese banana cheese banana banana banana/);
 @mice = (qw/banana cheese cheese cheese cheese cheese cheese cheese/);
 $seq->load(chimps => \@chimps, mice => \@mice);
 $p = $seq->test(stat => 'runs', data => 'chimps');
ccorr => boolean

Specify whether or not to perform the continuity-correction on the observed deviation. Default is false. Relevant only for those tests relying on a Z-test. See Statistics::Zed.

tails => 1|2

Specify whether the z-value is calculated for both sides of the normal (or chi-square) distribution (2, the default for most tested data) or only one side (the default for data prepared with the swing method.

Test-specific required settings and options

Some sub-package tests need to have parameters defined in the call to test, and/or have specific options, as follows.

Joins : The Joins-test optionally allows the setting of a probability value; see test|test in the Statistics::Sequences::Joins manpage.

Pot : The Pot-test requires the setting of a state to be tested; see test in the Statistics::Sequences::Pot manpage.

Vnomes : The Seriality test for V-nomes requires a length, i.e., the value of v; see test in the Statistics::Sequences::Vnomes manpage..

Runs, Turns : There are presently no specific requirements nor options for the Runs- and Turns-tests.

stats_hash

 $href = $seq->stats_hash(stat => 'runs', values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1});

Returns a hashref with values for any of the descriptives and probability value relevant to the specified statistic. Include other required or optional arguments relevant to any of the values requested, e.g., ccorr if getting a z_value, tails and exact if getting a p_value, state if testing pot, prob if testing joins, ... precision_s, precision_p ...

dump

 $seq->dump(stat => 'runs|joins|pot ...', values => {}, format => 'string|table', flag => '1|0', precision_s => 'integer', precision_p => 'integer');

Alias: print_summary

Print results of the last-conducted test to STDOUT. By default, if no parameters to dump are passed, a single line of test statistics are printed. Options are as follows.

values => hashref

Hashref of the statistical parameters to dump. Default is observed value and p-value for the given stat.

flag => boolean

If true, the p-value associated with the z-value is appended with a single asterisk if the value if below .05, and with two asterisks if it is below .01.

If false (default), nothing is appended to the p-value.

format => 'table|labline|csv'

Default is 'csv', to print the stats hash as a comma-separated string (no newline), e.g., '4.0000,0.8596800". If specifying 'labline', you get something like "observed = 4.0000, p_value = 0.8596800\n". If specifying "table", this is a dump from Text::SimpleTable with the stat methods as headers and column length set to the maximum required for the given headers, level of precision, flag, etc. For example, with precision_s => 4 and precision_p => 7, you get:

 .-----------+-----------.
 | observed  | p_value   |
 +-----------+-----------+
 | 4.0000    | 0.8596800 |
 '-----------+-----------'
verbose => 1|0

If true, includes a title giving the name of the statistic, details about the hypothesis tested (if p_value => 1 in the values hashref), et al. No effect if format is not defined or equals 'csv'.

precision_s => 'non-negative integer'

Precision of the statistic values (observed, expected, variance, z_value).

precision_p => 'non-negative integer'

Specify rounding of the probability associated with the z-value to so many digits. If zero or undefined, you get everything available.

dump_data

 $seq->dump_data(delim => "\n");

Prints to STDOUT a space-separated line of the tested data - as dichotomized and put to test. Optionally, give a value for delim to specify how the datapoints should be separated. Inherited from "dump_data" in Statistics::Data.

REVISIONS

The series testing methods (series_init, series_update and series_test) have been moved to Statistics::Zed as of v0.03.

Simply giving the first argument to test as the name of the test, unkeyed, is deprecated.

See CHANGES file in installation dist.

BUNDLING?

This module uses its sub-modules implicitly - so a bundled program using this module might need to explicitly use its sub-modules if these need to be included in the bundle itself.

AUTHOR/LICENSE

rgarton AT cpan DOT org

This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).

Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.