The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::Sequences - Manage sequences (ordered list of literals) for testing their runs, joins, turns, trinomes, potential energy, etc.

VERSION

This is documentation for Version 0.12 of Statistics::Sequences.

SYNOPSIS

 use Statistics::Sequences 0.12;
 $seq = Statistics::Sequences->new();
 my @data = (1, 'a', 'a', 1); # ordered list of literal scalars (numbers, strings), as permitted by specific test
 $seq->load(\@data); # or @data or dataname => \@data
 print $seq->observed(stat => 'runs'); # expected, variance, z_value, p_value - assuming sub-module Runs.pm is installed
 print $seq->test(stat => 'vnomes', length => 2); # - - assuming sub-module Vnomes.pm is installed
 $seq->dump(stat => 'runs', values => {observed => 1, z_value => 1, p_value => 1}, exact => 1, tails => 1);
 # see also Statistics::Data for inherited methods

DESCRIPTION

Loading, updating and accessing data as ordered list of literal scalars (numbers, strings) for statistical tests of their sequential structure via Statistics::Sequences::Joins, Statistics::Sequences::Pot, Statistics::Sequences::Runs, Statistics::Sequences::Turns and Statistics::Sequences::Vnomes. Note that none of these sub-modules are installed by default; to use this module as intended, install one or more of these sub-modules.

To access the tests, use this base module to create a Statistics::Sequences object with new, then load data into it and access each test by calling the test method, specifying the stat attribute: either joins, pot, runs, turns or vnomes, where the relevant sub-module is installed. This allows running several tests on the same data, as the data are immediately available to each test (of joins, pot, runs, turns or vnomes). See the SYNOPSIS for a simple example.

Alternatively, use each sub-module directly, and restrict analyses to the sub-module's test; this module is used implicitly as their base. That is, to perform a test of one type (e.g., runs), use the relevant sub-package, load data via its constructor; see the SYNOPSIS for the particular test, i.e., Joins, Pot, Runs, Turns or Vnomes. You won't be able to access other tests of the same data by this approach, unless you create another object for that test, and then specifically pass the data from the earlier object into the new one.

SUBROUTINES/METHODS

new

 $seq = Statistics::Sequences->new();

Returns a new Statistics::Sequences object (inherited from Statistics::Data) by which all the methods for caching, reading and testing data can be accessed, including each of the methods for performing the Runs-, Joins-, Pot-, Turns- or Vnomes-tests.

Sub-packages also have their own new method - so, e.g., Statistics::Sequences::Runs, can be individually imported, and its own new method can be called, e.g.:

 use Statistics::Sequences::Runs;
 $runs = Statistics::Sequences::Runs->new();

In this case, data are not automatically shared across packages, and only one test (in this case, the Runs-test) can be accessed through the class-object returned by new.

load, add, access, unload

All these operations on the basic data are inherited from Statistics::Data - see this doc for details of these and other possible methods.

Dichotomous data: Both the runs- and joins-tests expect dichotomous data: a binary or binomial or Bernoulli sequence, but with whatever characters to symbolize the two possible events. They test their "loads" to make sure the data are dichotomous. To reduce numerical and categorical data to a dichotomous level, see the pool, match, split, swing, shrink (boolwin) and other methods in Statistics::Data::Dichotomize.

observed, observation

 $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats
 $v = $seq->observed(stat => 'joins|pot|runs|turns|vnomes', label => 'myLabelledLoadedData'); # just needs args for partic.stats

Return the observed value of the statistic for the loaded data, or data sent with this call, eg., how many runs in the sequence (1, 1, 0, 1). See the particular statistic's manpage for any other arguments needed or optional.

expected, expectation

 $v = $seq->expected(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->expected(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Return the expected value of the statistic for the loaded data, or data sent with this call, eg., how many runs should occur in a 4-length sequence of two possible events. See the statistic's manpage for any other arguments needed or optional.

variance

 $seq->variance(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $seq->variance(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Returns the expected range of deviation in the statistic's observed value for the given number of trials.

obsdev, observed_deviation

 $v = $seq->obsdev(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->obsdev(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Returns the deviation of (difference between) observed and expected values of the statistic for the loaded/given sequence (O - E).

stdev, standard_deviation

 $v = $seq->stdev(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->stdev(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Returns square-root of the variance.

z_value, zscore

 $v = $seq->zscore(stat => 'joins|pot|runs|turns|vnomes', %args); # gets data from cache, with any args needed by the stat
 $v = $seq->zscore(stat => 'joins|pot|runs|turns|vnomes', data => [qw/blah bing blah blah blah/]); # just needs args for partic.stats

Return the deviation ratio: observed deviation to standard deviation. Use argument ccorr for continuity correction.

p_value, test

 $p = $seq->test(stat => 'runs');
 $p = $seq->test(stat => 'joins');
 $p = $seq->test(stat => 'turns');
 $p = $seq->test(stat => 'pot', state => 'a value appearing in the data');
 $p = $seq->test(stat => 'vnomes', length => 'an integer greater than zero and less than sample-size');

Returns the probability of observing so many runs, joins, etc., versus those expected, relative to the expected variance.

When using a Statistics::Sequences class-object, this method requires naming which test to perform, i.e., runs, joins, pot or vnomes. This is not required when the class-object already refers to one of the sub-modules, as created by the new method within Statistics::Sequences::Runs, Statistics::Sequences::Joins, Statistics::Sequences::Pot, Statistics::Sequences::Turns and Statistics::Sequences::Vnomes.

Common options

Options common to all the sub-package tests are as follows.

data => 'string'

Optionally specify the name of the data to be tested. By default, this is not required: the data tested are those that were last loaded, either anonymously, or as returned by one of the Statistics::Data::Dichotomize methods. Otherwise, if the data are already ready for testing in a dichotomous format, data that were previously loaded by name can be individually tested. For example, here are two sets of data that are loaded by name, and then a single test of one of them is performed.

 @chimps = (qw/banana banana cheese banana cheese banana banana banana/);
 @mice = (qw/banana cheese cheese cheese cheese cheese cheese cheese/);
 $seq->load(chimps => \@chimps, mice => \@mice);
 $p = $seq->test(stat => 'runs', data => 'chimps');
ccorr => boolean

Specify whether or not to perform the continuity-correction on the observed deviation. Default is false. Relevant only for those tests relying on a Z-test. See Statistics::Zed.

tails => 1|2

Specify whether the z-value is calculated for both sides of the normal (or chi-square) distribution (2, the default for most tested data) or only one side (the default for data prepared with the swing method.

Test-specific required settings and options

Some sub-package tests need to have parameters defined in the call to test, and/or have specific options, as follows.

Joins : The Joins test optionally allows the setting of a probability value; see test|test in the Statistics::Sequences::Joins manpage.

Pot : The Pot test requires the setting of a state to be tested; see test in the Statistics::Sequences::Pot manpage.

Vnomes : The Serial test for v-nomes requires a length, i.e., the value of v; see test in the Statistics::Sequences::Vnomes manpage..

Runs, Turns : There are presently no specific requirements nor options for the Runs- and Turns-tests.

stats_hash

 $href = $seq->stats_hash(stat => 'runs', values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1});

Returns a hashref with values for any of the descriptives and probability value relevant to the specified statistic. Include other required or optional arguments relevant to any of the values requested, e.g., ccorr if getting a z_value, tails and exact if getting a p_value, state if testing pot, prob if testing joins, ... precision_s, precision_p ...

dump

 $seq->dump(stat => 'runs|joins|pot ...', values => {}, format => 'string|table', flag => '1|0', precision_s => 'integer', precision_p => 'integer');

Alias: print_summary

Print results of the last-conducted test to STDOUT. By default, if no parameters to dump are passed, a single line of test statistics are printed. Options are as follows.

values => hashref

Hashref of the statistical parameters to dump. Default is observed value and p-value for the given stat.

flag => boolean

If true, the p-value associated with the z-value is appended with a single asterisk if the value if below .05, and with two asterisks if it is below .01.

If false (default), nothing is appended to the p-value.

format => 'table|labline|csv'

Default is 'csv', to print the stats hash as a comma-separated string (no newline), e.g., '4.0000,0.8596800". If specifying 'labline', you get something like "observed = 4.0000, p_value = 0.8596800\n". If specifying "table", this is a dump from Text::SimpleTable with the stat methods as headers and column length set to the maximum required for the given headers, level of precision, flag, etc. For example, with precision_s => 4 and precision_p => 7, you get:

 .-----------+-----------.
 | observed  | p_value   |
 +-----------+-----------+
 | 4.0000    | 0.8596800 |
 '-----------+-----------'
verbose => 1|0

If true, includes a title giving the name of the statistic, details about the hypothesis tested (if p_value => 1 in the values hashref), et al. No effect if format is not defined or equals 'csv'.

precision_s => 'non-negative integer'

Precision of the statistic values (observed, expected, variance, z_value).

precision_p => 'non-negative integer'

Specify rounding of the probability associated with the z-value to so many digits. If zero or undefined, you get everything available.

dump_data

 $seq->dump_data(delim => "\n");

Prints to STDOUT a space-separated line of the tested data - as dichotomized and put to test. Optionally, give a value for delim to specify how the datapoints should be separated. Inherited from Statistics::Data.

BUNDLING

This module uses its sub-modules implicitly - so a bundled program using this module might need to explicitly use its sub-modules if these need to be included in the bundle itself.

AUTHOR

Roderick Garton, <rgarton at cpan.org>

LICENSE AND COPYRIGHT

This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).

Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.