The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package Statistics::Sequences::Runs;
use 5.008008;
use strict;
use warnings FATAL => 'all';
use Carp qw(carp croak);
use base qw(Statistics::Sequences);
use List::AllUtils qw(mesh sum uniq);
use Number::Misc qw(is_even);
use Statistics::Zed 0.08;
$Statistics::Sequences::Runs::VERSION = '0.21';

=pod

=head1 NAME

Statistics::Sequences::Runs - descriptives, deviation and combinatorial tests of Wald-Wolfowitz runs

=head1 VERSION

This is documentation for Version 0.21 of Statistics::Sequences::Runs.

=head1 SYNOPSIS

 use strict;
 use Statistics::Sequences::Runs 0.21; # not compatible with versions < .10
 my $runs = Statistics::Sequences::Runs->new();

 # Data make up a dichotomous sequence: 
 my @data = (qw/1 0 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1/);
 my $val;

 # - Pre-load data to use for all methods:
 $runs->load(\@data);
 $val = $runs->observed();
 $val = $runs->expected();

 # - or send data as "data => $aref" to each method:
 $val = $runs->observed(data => \@data);
 
 # - or send frequencies of each of the 2 elements:
 $val = $runs->expected(freqs => [11, 9]); # works with other methods except observed()

 # Deviation ratio:
 $val = $runs->z_value(ccorr => 1);

 # Probability:
 my ($z, $p) = $runs->z_value(ccorr => 1, tails => 1); # dev. ratio with p-value
 $val = $runs->p_value(tails => 1); # normal dist. p-value itself
 $val = $runs->p_value(exact => 1, tails => 1); # by combinatorics

 # Keyed list of descriptives etc.:
 my $href = $runs->stats_hash(values => {observed => 1, p_value => 1}, exact => 1);

 # Print descriptives etc. in the same way:
 $runs->dump(
  values => {observed => 1, expected => 1, p_value => 1},
  exact => 1,
  flag => 1,
  precision_s => 3,
  precision_p => 7
 );
 # prints: observed = 11.000, expected = 10.900, p_value = 0.5700167

=head1 DESCRIPTION

The module returns statistical information re Wald-type runs: a sequence of events on 1 or more consecutive trials. For example, in a signal-detection test composed of match (H) and miss (M) events over time like H-H-M-H-M-M-M-M-H, there are 5 runs: 3 Hs, 2 Ms. This number of runs between two events can be compared with the number expected to occur by chance over the number of trials, relative to the expected variance (see REFERENCES). More runs than expected ("negative serial dependence") can denote irregularity, instability, mixing up of alternatives. Fewer runs than expected ("positive serial dependence") can denote cohesion, insulation, isolation of alternatives. Both can indicate sequential dependency: either negative (a bias to produce too many alternations), or positive (a bias to produce too many repetitions).

The distribution of runs is asymptotically normal, and a deviation-based test of extra-chance occurrence when at least one alternative has more than 20 occurrences (Siegal rule), or both event occurrences exceed 10 (Kelly, 1982), is conventionally considered reliable; otherwise, the module provides an "exact test" option, based on combinatorics.

Have non-dichotomous, continuous or multinomial data? See L<Statistics::Data::Dichotomize|Statistics::Data::Dichotomize> for how to prepare them for test of runs.

=head1 SUBROUTINES/METHODS

=head2 Data-handling

=head3 new

 $runs = Statistics::Sequences::Runs->new();

Returns a new Runs object. Expects/accepts no arguments but the classname.

=head3 load

 $runs->load(@data);
 $runs->load(\@data);
 $runs->load(foodat => \@data); # labelled whatever

Loads data anonymously or by name - see L<load|Statistics::Data/load, load_data> in the Statistics::Data manpage for details on the various ways data can be loaded and then retrieved (more than shown here).

After the load, the data are L<read|Statistics::Data/read, read_data, get_data> to ensure that they contain only two unique elements - if not, carp occurs and 0 rather than 1 is returned. 

Alternatively, skip this action; data don't always have to be loaded to use the stats methods here. To get the observed number of runs, data of course have to be loaded, but other stats can be got if given the observed count - otherwise, they too depend on data having been loaded.

Every load unloads all previous loads and any additions to them.

=head3 add, access, unload

See L<Statistics::Data|Statistics::Data> for these additional operations on data that have been loaded.

=head2 Descriptives

=head3 observed

 $v = $runs->observed(); # use the first data loaded anonymously
 $v = $runs->observed(index => 1); # ... or give the required "index" for the loaded data
 $v = $runs->observed(label => 'foodat'); # ... or its "label" value
 $v = $runs->observed(data => [1, 0, 1, 1]); # ... or just give the data now

Returns the total observed number of runs in the loaded or given data. For example,

 $v = $runs->observed_per_state(data => [qw/H H H T T H H/]);

returns 3.

I<Aliases>: runcount_observed, rco

=cut

sub observed {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    return $args->{'observed'} if defined $args->{'observed'};
    my $data = _get_data( $self, $args );
    my $rco = 1;
    for ( 1 .. scalar @{$data} - 1 ) {
        $rco++ if $data->[$_] ne $data->[ $_ - 1 ];
    }
    return $rco;
}
*runcount_observed = \&observed;
*rco               = \&observed;

=head3 observed_per_state

 @freq = $runs->observed_per_state(data => \@data);
 $href = $runs->observed_per_state(data => \@data);

Returns the number of runs per state - as an array where the first element gives the count for the first state in the data, and so for the second. A keyed hashref is returned if not called in array context. For example:

 @ari = $runs->observed_per_state(data => [qw/H H H T T H H/]); # returns (2, 1)
 $ref = $runs->observed_per_state(data => [qw/H H H T T H H/]); # returns { H => 2, T => 1}

=cut

sub observed_per_state {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my $data   = _get_data( $self, $args );
    my @states = uniq @{$data};
    my @freqs  = $data->[0] eq $states[0] ? ( 1, 0 ) : ( 0, 1 );
    for ( 1 .. scalar @{$data} - 1 ) {
        if ( $data->[$_] ne $data->[ $_ - 1 ] ) {
            if ( $data->[$_] eq $states[0] ) {
                $freqs[0]++;
            }
            else {
                $freqs[1]++;
            }
        }
    }
    return wantarray ? @freqs : { mesh @states, @freqs };
}

=head3 expected

 $v = $runs->expected(); # or specify loaded data by "index" or "label", or give it as "data" - see observed()
 $v = $runs->expected(data => [qw/blah bing blah blah blah/]); # use these data
 $v = $runs->expected(freqs => [12, 7]); # don't use actual data; calculate from these two Ns

Returns the expected number of runs across the loaded data. Expectation is given as follows: 

=for html <p>&nbsp;&nbsp;<i>E[R]</i> = ( (2<i>n</i><sub>1</sub><i>n</i><sub>2</sub>) / (<i>n</i><sub>1</sub> + <i>n</i><sub>2</sub>) ) + 1</p>

where I<n>(I<i)> is the number of observations of each element in the data.

I<Aliases>: runcount_expected, rce

=cut 

sub expected {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $sum = $n1 + $n2;
    return $sum ? ( ( 2 * $n1 * $n2 ) / $sum ) + 1 : undef;
}
*rce               = \&expected;
*runcount_expected = \&expected;

=head3 variance

 $v = $runs->variance(); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
 $v = $runs->variance(data => [qw/blah bing blah blah blah/]); # use these data
 $v = $runs->variance(freqs => [5, 12]); # use these trial numbers - not any particular sequence of data

Returns the variance in the number of runs for the given data.

=for html <p>&nbsp;&nbsp;<i>V[R]</i> = ( (2<i>n</i><sub>1</sub><i>n</i><sub>2</sub>)([2<i>n</i><sub>1</sub><i>n</i><sub>2</sub>] &ndash; [<i>n</i><sub>1</sub> + <i>n</i><sub>2</sub>]) ) / ( ((<i>n</i><sub>1</sub> + <i>n</i><sub>2</sub>)<sup>2</sup>)((<i>n</i><sub>1</sub> + <i>n</i><sub>2</sub>) &ndash; 1) ) </p>

defined as above for L<runcount_expected|Statistics::Sequences::Runs/expected>.

The data to test can already have been L<load|load>ed, or you send it directly as a flat referenced array keyed as B<data>.

I<Aliases>: runcount_variance, rcv

=cut

sub variance {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $sum = $n1 + $n2;
    return $sum < 2
      ? 1
      : ( ( 2 * $n1 * $n2 * ( ( 2 * $n1 * $n2 ) - $sum ) ) /
          ( ( $sum**2 ) * ( $sum - 1 ) ) );
}
*rcv               = \&variance;
*runcount_variance = \&variance;

=head3 observed_deviation

 $v = $runs->obsdev(); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
 $v = $runs->obsdev(data => [qw/blah bing blah blah blah/]); # use these data

Returns the deviation of (difference between) observed and expected runs for the loaded/given sequence (I<O> - I<E>). 

I<Alias>: obsdev

=cut

sub observed_deviation {
    return observed(@_) - expected(@_);
}
*obsdev = \&observed_deviation;

=head3 standard_deviation

 $v = $runs->stdev(); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
 $v = $runs->stdev(data => [qw/blah bing blah blah blah/]);

Returns square-root of the variance.

I<Alias>: stdev

=cut

sub standard_deviation {
    return sqrt variance(@_);
}
*stdev = \&standard_deviation;

=head3 skewness

Returns run skewness as given by Barton & David (1958) based on the frequencies of the two different elements in the sequence.

=cut

sub skewness {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $k3 = 0;
    if ( $n1 != $n2 ) {
        my $sum = $n1 + $n2;
        $k3 =
          ( ( 2 * $n1 * $n2 ) / $sum**3 ) *
          ( ( ( 16 * $n1**2 * $n2**2 ) / $sum**2 ) -
              ( ( 4 * $n1 * $n2 * ( $sum + 3 ) ) / $sum ) +
              3 * $sum );
    }
    return $k3;
}

=head3 kurtosis

Returns run kurtosis as given by Barton & David (1958) based on the frequencies of the two different elements in the sequence.

=cut

sub kurtosis {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $sum = $n1 + $n2;
    my $k4 = ( ( 2 * $n1 * $n2 ) / $sum**4 ) * (
        ( ( 48 * ( 5 * $sum - 6 ) * $n1**3 * $n2**3 ) / ( $sum**2 * $sum**2 ) )
        - (
            ( 48 * ( 2 * $sum**2 + 3 * $sum - 6 ) * $n1**2 * $n2**2 ) /
              ( $sum**2 * $sum )
          ) + (
            ( 2 * ( 4 * $sum**3 + 45 * $sum**2 - 37 * $sum - 18 ) * $n1 * $n2 )
            / $sum**2
          ) - ( 7 * $sum**2 + 13 * $sum - 6 )
    );
    return $k4;
}

=head2 Distribution and tests

=head3 pmf

 $p = $runs->pmf(data => \@data); # or no args to use last pre-loaded data
 $p = $runs->pmf(observed => 5, freqs => [5, 20]);

Implements the runs probability mass function, returning the probability for a particular number of runs given so many dichotomous events (e.g., as in Swed & Eisenhart, 1943, p. 66); i.e., for I<u>' the observed number of runs, I<P>{I<u> = I<u>'}. The required function parameters are the observed number of runs, and the frequencies (counts) of each state in the sequence, which can be given directly, as above, in the arguments B<observed> and B<freqs>, respectively, or these will be worked out from a given data sequence itself (given here or as pre-loaded). For derivation, see its public internal methods L<n_max_seq|Statistics::Sequences::Runs/n_max_seq> and L<m_seq_k|Statistics::Sequences::Runs/m_seq_k>.

=cut

sub pmf {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $u = $self->observed($args);
    return _pmf_num( $u, $n1, $n2 ) / _pmf_denom( $n1, $n2 );
}

=head3 cdf

 $p = $runs->cdf(data => \@data); # or no args to use last pre-loaded data
 $p = $runs->cdf(observed => 5, freqs => [5, 20]);

Implements the cumulative distribution function for runs, returning the probability of obtaining the observed number of runs or less down to the expected number of 2 (assuming that the two possible events are actually represented in the data), as per Swed & Eisenhart (1943), p. 66; i.e., for I<u>' the observed number of runs, I<P>{I<u> <= I<u>'}. The summation is over the probability mass function L<pmf|Statistics::Sequences::Runs/pmf>. The function parameters are the observed number of runs, and the frequencies (counts) of the two events, which can be given directly, as above, in the arguments B<observed> and B<freqs>, respectively, or these will be worked out from a given data sequence itself (given here or as pre-loaded).

=cut

sub cdf {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $u   = $self->observed($args);
    my $sum = 0;
    for ( 2 .. $u ) {
        $sum += _pmf_num( $_, $n1, $n2 );
    }
    return $sum / _pmf_denom( $n1, $n2 );
}

=head3 cdfi

 $p = $runs->cdfi(data => \@data); # or no args for last pre-loaded data
 $p = $runs->cdfi(observed => 11, freqs => [5, 11]);

Implements the (inverse) cumulative distribution function for runs, returning the probability of obtaining more than the observed number of runs up from the expected number of 2 (assuming that the two possible events are actually represented in the data), as per Swed & Eisenhart (1943), p. 66; ; i.e., for I<u>' the observed number of runs, I<P> = 1 - I<P>{I<u> <= I<u>' - 1}. The summation is over the probability mass function L<pmf|Statistics::Sequences::Runs/pmf>. The function parameters are the observed number of runs, and the frequencies (counts) of the two events, which can be given directly, as above, in the arguments B<observed> and B<freqs>, respectively, or these will be worked out from a given data sequence itself (given here or as pre-loaded).

=cut

sub cdfi {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $u   = $self->observed($args);
    my $sum = 0;
    for ( 2 .. $u - 1 ) {
        $sum += _pmf_num( $_, $n1, $n2 );
    }
    return 1 - $sum / _pmf_denom( $n1, $n2 );
}

=head3 z_value

 $v = $runs->z_value(ccorr => 1); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
 $v = $runs->z_value(data => $aref, ccorr => 1);
 ($zvalue, $pvalue) = $runs->z_value(data => $aref, ccorr => 1, tails => 2); # same but wanting an array, get the p-value too

Returns the zscore from a test of runcount deviation, taking the runcount expected away from that observed and dividing by the root expected runcount variance, by default with a continuity correction to expectation. Called wanting an array, returns the z-value with its I<p>-value for the tails (1 or 2) given.

The data to test can already have been L<load|load>ed, or sent directly as an aref keyed as B<data>.

Other options are B<precision_s> (for the z_value) and B<precision_p> (for the p_value).

I<Aliases>: runcount_zscore, rzs, zscore

=cut

sub z_value {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my $zed = Statistics::Zed->new();
    my ( $zval, $pval ) = $zed->zscore(
        observed => $self->rco($args),
        expected => $self->rce($args),
        variance => $self->rcv($args),
        ccorr => ( defined $args->{'ccorr'} ? $args->{'ccorr'} : 1 ),
        tails => ( $args->{'tails'} || 2 ),
        precision_s => $args->{'precision_s'},
        precision_p => $args->{'precision_p'},
    );
    return wantarray ? ( $zval, $pval ) : $zval;
}
*rzs             = \&z_value;
*runcount_zscore = \&z_value;
*zscore          = \&z_value;

=head3 p_value

 $p = $runs->p_value(); # using loaded data and default args
 $p = $runs->p_value(ccorr => 0|1, tails => 1|2); # normal-approx. for last-loaded data
 $p = $runs->p_value(exact => 1); # calc combinatorially for observed >= or < than expectation
 $p = $runs->p_value(data => [1, 0, 1, 1, 0], exact => 1); #  given data
 $p = $runs->p_value(freqs => [12, 12], observed => 8); # no data sequence, specify known params

Returns the probability of getting the observed number of runs or a smaller number given the number of each of the two events. By default, a large sample is assumed, and the probability is obtained from the normalized deviation, as given by the L<zscore|Statistics::Sequences::Runs/z_value> method.

If the option B<exact> is defined and not zero, then the probability is worked out combinatorially, as per Swed & Eisenhart (1943), Eq. 1, p. 66 (see also Siegal, 1956, Eqs. 6.12a and 6.12b, p. 138). By default, this is a one-tailed test, testing the hypotheses that there are either too many or too few runs relative to chance expectation; the "correct" hypothesis is tested based on the expected value returned by the L<expected|Statistics::Sequences::Runs/expected> method. Setting B<tails> => 2 simply doubles the one-tailed I<p>-value from any of these tests. Output from these tests has been checked against the tables and examples in Swed & Eisenhart (given to 7 decimal places), and found to agree.

The option B<precision_p> gives the returned I<p>-value to so many decimal places.

I<Aliases>: test, runs_test, rct

=cut

sub p_value {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my $pval;
    if ( $args->{'exact'} ) {
        $pval =
          ( $self->observed($args) - $self->rce($args) >= 0 )
          ? $self->cdfi($args)
          : $self->cdf($args);
        $pval *= 2 if $args->{'tails'} and $args->{'tails'} == 2;
        $pval = sprintf( q{%.} . $args->{'precision_p'} . 'f', $pval )
          if $args->{'precision_p'};
    }
    else {
        $pval = ( $self->zscore($args) )[1];
    }
    return $pval;
}
*test      = \&p_value;
*runs_test = \*p_value;
*rct       = \*p_value;

=head3 lrx2

Likelihood ratio chi-square test for runs by length.

=cut

sub lrx2 {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    return;
}

=head3 ztest_ok

Returns true for the loaded sequence if its constituent sample numbers are sufficient for their expected runs to be normally approximated - using Siegal's (1956, p. 140) rule - ok if I<either> of the two I<N>s are greater than 20.

=cut

sub ztest_ok {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my ( $n1, $n2 ) = $self->bi_frequency($args);
    my $retval =
      $n1 > 20 || $n2 > 20
      ? 1
      : 0
      ; # Siegal's rule (p. 140) - ok if either of the two Ns are greater than 20
    return $retval;
}

=head2 Utils

Methods used internally, or for returning/printing descriptives, etc., in a bunch.

=head3 bi_frequency

 @freq = $runs->bi_frequency(data => \@data); # or no args if using last pre-loaded data

Returns frequency of the two elements - or croaks if there are more than 2, and gives zero for any absent.

=cut

sub bi_frequency {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    carp
'Argument named \'trials\' is deprecated; use \'freqs\' to give aref of frequencies per state'
      if $args->{'trials'};
    return @{ $args->{'freqs'} } if ref $args->{'freqs'};
    my $data = _get_data( $self, $args );
    my %states = ();
    $states{$_}++ for @{$data};    # hash keying each element with its frequency
    croak 'Cannot compute runs: More than two states were found in the data: '
      . join( q{, }, keys %states )
      if scalar keys %states > 2;
    my @vals = values %states;
    $vals[1] = 0 if scalar @vals < 2;
    return @vals;
}

=head3 n_max_seq

 $n = $runs->n_max_seq(); # loaded data
 $n = $runs->n_max_seq(data => \@data); # this sequence
 $n = $runs->n_max_seq(observed => int, freqs => [int, int]); # these specs

Returns the number of possible sequences for the two given state frequencies. So the proverbial urn contains I<N>1 black balls and I<N>2 white balls, well mixed, and take I<N>1 + I<N>2 drawings from it without replacement, so any sequence has the same probability of occurring; how many different sequences of black and white balls are possible? For the two counts, this is "sum of I<N>1 + I<N>2 I<choose> I<N>1", or:

=for html <p>&nbsp;&nbsp;&nbsp;<i>N</i><sub>max</sub> = ( <i>N</i><sub>1</sub> + <i>N</i><sub>2</sub> )! / <i>N</i><sub>1</sub>!<i>N</i><sub>2</sub>!</p>

With the usual definition of a probability as M / N, this is the denominator term in the runs L<probability mass function (pmf)|Statistics::Sequences::Run/pmf>. This does not take into account the probability of obtaining so many of each event, of the proportion of black and white balls in the urn. (That's work and play for another day.)

=cut

sub n_max_seq {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my @freqs = $self->bi_frequency($args);
    return _pmf_denom(@freqs);
}

=head3 m_seq_k

 $n = $runs->m_seq_k(); # loaded data
 $n = $runs->m_seq_k(data => \@data); # this sequence
 $n = $runs->m_seq_k(observed => int, freqs => [int, int]); # these specs

Returns the number of sequences that can produce I<k> runs from I<m> elements of a single kind, with all other kinds of elements in the sequence assumed to be of a single kind, under the conditions of L<n_max_seq|n_max_seq>. See Swed and Eisenhart (1943), or barton and David (1958, p. 253). With the usual definition of a probability as M / N, this is the numerator term in the runs L<probability mass function (pmf)|Statistics::Sequences::Run/pmf>.

=cut

sub m_seq_k {
    my ( $self, @args ) = @_;
    my $args  = ref $args[0] ? $args[0] : {@args};
    my $u     = $self->observed($args);
    my @freqs = $self->bi_frequency($args);
    return _pmf_num( $u, @freqs );
}

=head3 stats_hash

 $href = $runs->stats_hash(values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1}, exact => 0, ccorr => 1);

Returns a hashref for the counts and stats as specified in its "values" argument, and with any options for calculating them (e.g., exact for p_value). See L<Statistics::Sequences/stats_hash> for details. If calling via a "runs" object, the option "stat => 'runs'" is not needed (unlike when using the parent "sequences" object).

=head3 dump

 $runs->dump(values => { observed => 1, variance => 1, p_value => 1}, exact => 1, flag => 1,  precision_s => 3); # among other options

Print Runs-test results to STDOUT. See L<Statistics::Sequences/dump> for details of what stats to dump (default is observed() and p_value()). Optionally also give the data directly.

=cut

sub dump {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    $args->{'stat'} = 'runs';
    $self->SUPER::dump($args);
    return;
}

=head3 dump_data

 $runs->dump_data(delim => "\n"); # print whatevers loaded (or specify by label, index, or as "data") 

See L<Statistics::Sequences/dump_data> for details.

=cut

# Private methods:

sub _pmf_num {
    my ( $u, $m, $n ) = @_;
    my $f;
    if ( is_even($u) ) {
        my $k = $u / 2 - 1;
        $f = 2 * _choose( $m - 1, $k ) * _choose( $n - 1, $k );
    }
    else {
        my $k = ( $u + 1 ) / 2;
        $f =
          _choose( $m - 1, $k - 1 ) * _choose( $n - 1, $k - 2 ) +
          _choose( $m - 1, $k - 2 ) * _choose( $n - 1, $k - 1 );
    }
    return $f;
}

sub _pmf_denom {
    return _choose( sum(@_), $_[0] );

    #return _factorial(sum(@_)) / ( _factorial($_[0]) * _factorial($_[1]) );
}

sub _choose {    # from Orwant et al., p. 573
    my ( $n, $k ) = @_;
    my ( $res, $j ) = ( 1, 1 );
    return 0 if $k > $n || $k < 0;
    $k = ( $n - $k ) if ( $n - $k ) < $k;
    while ( $j <= $k ) {
        $res *= $n--;
        $res /= $j++;
    }
    return $res;
}

sub _factorial {
    my ( $n, $res ) = ( shift, 1 );
    return undef unless $n >= 0 and $n == int($n);
    $res *= $n-- while $n > 1;
    return $res;
}

sub _get_data {
    my ( $self, $args ) = @_;
    return ref $args->{'data'} ? $args->{'data'} : $self->read($args);
}

1;

__END__

=head1 EXAMPLE

=head2 Seating at the diner

Swed and Eisenhart (1943) list the occupied (O) and empty (E) seats in a row at a lunch counter. Have people taken up their seats on a random basis?

 use Statistics::Sequences::Runs;
 my $runs = Statistics::Sequences::Runs->new();
 my @seating = (qw/E O E E O E E E O E E E O E O E/); # data already form a single sequence with dichotomous observations
 $runs->dump(data => \@seating, exact => 1, tails => 1);

Suggesting some non-random basis for people taking their seats, this prints:

 observed = 11, p_value = 0.054834

But these data would fail Siegal's rule (L<ztest_ok|Statistics::Sequences::Runs/ztest_ok> = 0) (neither state has 20 observations). So just check exact probability of the hypothesis that the observed deviation is greater than zero (1-tailed):

 $runs->dump(data => \@seating, values => {'p_value'}, exact => 1, tails => 1);

This prints a I<p>-value of .0576923 (so the normal approximation seems good in any case).

These data are also used in an example of testing for L<Vnomes|Statistics::Sequences::Vnomes/EXAMPLE>.

=head2 Runs in multinomial matching

In a single run of a classic ESP test, there are 25 trials, each composed of a randomly generated event (typically, one of 5 possible geometric figures), and a human-generated event arbitrarily drawn from the same pool of alternatives. Tests of the match between the random and human data are typically for number of matches observed versus expected. The I<runs> of matches and misses can be tested by dichotomizing the data on the basis of the L<match|Statistics::Data::Dichotomize/match> of the random "targets" with the human "responses", as described by Kelly (1982):

 use Statistics::Sequences::Runs;
 use Statistics::Data::Dichotomize;
 my @targets = (qw/p c p w s p r w p c r c r s s s s r w p r w c w c/);
 my @responses = (qw/p c s c s s p r w r w c c s s r w s w p c r w p r/);

 # Test for runs of matches between targets and responses:
 my $runs = Statistics::Sequences::Runs->new();
 my $ddat = Statistics::Data::Dichotomize->new();
 $runs->load($ddat->match(data => [\@targets, \@responses]));
 $runs->dump_data(delim => ' '); # have a look at the match sequence; prints "1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0\n"
 print "Probability of these many runs vs expectation: ", $runs->test(), "\n"; # 0.51436
 # or test for runs in matching when responses are matched to targets one trial behind:
 print $runs->test(data => $ddat->match(data => [\@targets, \@responses], lag => -1)), "\n"; # 0.73766

=head1 REFERENCES

These papers provide the implemented algorithms and/or the sample data used in examples and tests.

B<Barton, D. E., & David, F. N.> (1958). Non-randomness in a sequence of two alternatives: II. Runs test. I<Biometrika>, I<45>, 253-256. doi: L<10.2307/2333062|http://dx.doi.org/10.2307/2333062> 

B<Kelly, E. F.> (1982). On grouping of hits in some exceptional psi performers. I<Journal of the American Society for Psychical Research>, I<76>, 101-142.

B<Siegal, S.> (1956). I<Nonparametric statistics for the behavioral sciences>. New York, NY, US: McGraw-Hill.

B<Swed, F., & Eisenhart, C.> (1943). Tables for testing randomness of grouping in a sequence of alternatives. I<Annals of Mathematical Statistics>, I<14>, 66-87. doi: L<10.1214/aoms/1177731494|http://dx.doi.org/10.1214/aoms/1177731494>

B<Wald, A., & Wolfowitz, J.> (1940). On a test whether two samples are from the same population. I<Annals of Mathematical Statistics>, I<11>, 147-162. doi: L<10.1214/aoms/1177731909|http://dx.doi.org/10.1214/aoms/1177731909>

B<Wolfowitz, J.> (1943). On the theory of runs with some applications to quality control. I<Annals of Mathematical Statistics>, I<14>, 280-288. doi: L<10.1214/aoms/1177731421|http://dx.doi.org/10.1214/aoms/1177731421>

=head1 SEE ALSO

L<Statistics::Sequences|Statistics::Sequences> for other tests of sequences, such as ...

L<Statistics::Sequences::Pot|Statistics::Sequences::Pot>, and for sharing data between these tests.

=head1 AUTHOR

Roderick Garton, C<< <rgarton at cpan.org> >>

=head1 LICENSE AND COPYRIGHT

=over 4

=item Copyright (c) 2006-2015 Roderick Garton

This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see L<http://www.perl.com/perl/misc/Artistic.html>).

=item Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.

=back

=cut

# end of Statistics::Sequences::Runs