roderick garton > Statistics-Sequences-Joins-0.061 > Statistics::Sequences::Joins
Module Version: 0.061

# NAME

Statistics::Sequences::Joins Wishart-Hirshfeld statistics for number of alternations between two elements of a dichotomous sequence

# SYNOPSIS

```  use Statistics::Sequences::Joins;
\$joins = Statistics::Sequences::Joins->new();
\$joins->load(qw/0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0/); # or load multinomial data and then dichotomise it
\$val = \$joins->observed(); # also expected() and variance()
(\$val, \$sig) = \$joins->zscore();
\$joins->test()->dump(); # print of all the descriptives and zscore, lumping each into object as well```

# DESCRIPTION

Joins are similar to runs but are counted for every alternation between dichotomous events (state, element, letter ...) whereas runs are counted for each continuous segment between alternations.. Joins are marked out with asterisks for the following sequence:

``` 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0
* *     * *       *   *     *       *```

So there's a join (of 0 and 1) at indices 1 and 2, then immediately another join (of 1 and 0) at indices 2 and 3, and then another join at 5 and 6 ... for a total count of eight joins.

There are methods to get the observed and expected joincounts, and the expected variance in joincount. Counting up the observed number of joins needs some data to count through, but getting the expectation and variance for the joincount - if not sent actual data in the call, or already cached via load - can just be fed with the number of trials, and, optionally, the probability of one of the two events (default = 0.50). Note that this also differs from the way runs are counted: the expected joincount, and its variance, are worked out from the relative probabilities of the two events, unlike runs where these are counted off the given data (or as told). Alternatively, the probabilities can be counted up from the proportional frequencies in the data at hand.

See Statistics::Sequences for ways to dichotomise a multinomial or continuous numerical sequence.

# METHODS

Methods are those described in Statistics::Sequences, but can be used directly from this module, as follows.

## new

` \$join = Statistics::Sequences::Joins->new();`

Returns a new Joins object. Expects/accepts no arguments but the classname.

``` \$joins->load(@data);
\$joins->load('sample1' => \@data1, 'sample2' => \@data2)
\$joins->load({'sample1' => \@data1, 'sample2' => \@data2})```

Optionally - pre-load some data: Load data anonymously or by name. See load in the Statistics::Sequences manpage. If dichotomising around a central tendency, the value of the central tendency itself should be ignored.

Alternatively, skip this action, and send the data to the descriptive methods that follow. Counting up the observed number of joins needs some data to count through, but getting the expectation and variance for the joincount can just be fed with the number of trials, and, optionally, the probability of one of the two events.

## observed, joincount_observed, jco

``` \$count = \$joins->observed(); # assumes testdata have already been loaded
\$count = \$joins->observed(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1]); # assumes window = 1```

Returns the number of joins in a sequence - i.e., when, from trial 2 on, the event on trial i doesn't equal the event on trial i - 1. So the following sequence adds up to 7 joins like this:

``` Sequence:  1 0 0 0 1 0 0 1 0 1 1 0
JoinCount: 0 1 1 1 2 3 3 4 5 6 6 7```

The data to test can already have been loaded, or you send it directly as a flat referenced array keyed as `data`.

## expected, joincount_expected, jce

``` \$val = \$joins->expected(); # assumes testdata have already been loaded, uses default prob value (.5)
\$val = \$joins->expected(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1]); # count these data, use default prob value (.5)
\$val = \$joins->expected(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1], prob => .2); # count these data, use given prob value
\$val = \$joins->expected(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1], state => 1); # count off trial numbers and prob. of event
\$val = \$joins->expected(prob => .2, trials => 10); # use this trial number and probability of one of the 2 events```

Returns the expected number of joins between every element of the given data, or for data of the given attributes, using.

E[J] = 2(N – 1)pq

where N is the number of observations/trials (width = 1 segments),

p is the expected probability of the joined event taking on its observed value, and

q is (1 - p), the expected probability of the joined event not taking on its observed value.

The data to test can already have been loaded, or you send it directly as a flat referenced array keyed as `data`. The data are only needed to count off the number of trials, and the proportion of 1s (or other given state of the two), if the `trials` and `prob` attributes are not defined. If `state` is defined, then `prob` is worked out from the actual data (as long as there are some, or 1/2 is assumed). If `state` is not defined, `prob` takes the value you give to it, or, if it too is not defined, then 1/2.

## variance, joincount_variance, jcv

``` \$val = \$joins->variance(); # assume the data are already "loaded" for counting
\$val = \$joins->variance(data => \$aref); # use inplace array reference, will use default prob of 1/2
\$val = \$joins->variance(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1], state => 1); # count off trial numbers and prob. of event
\$val = \$joins->variance(trials => number, prob => prob); # use this trial number and probability of one of the 2 events```

Returns the expected variance in the number of joins for the given data.

V[J] = 4Npq(1 – 3pq) – 2pq(3 – 10pq)

defined as above for joincount_expected.

The data to test can already have been loaded, or you send it directly as a flat referenced array keyed as `data`. The data are only needed to count off the number of trials, and the proportion of 1s (or other given state of the two), if the `trials` and `prob` attributes aren't defined. If `state` is defined, then `prob` is worked out from the actual data (as long as there are some, or expect a `croak`). If `state` is not defined, `prob` takes the value you give to it, or, if it too is not defined, then 1/2.

## zscore, joincount_zscore, jzs, z_value

``` \$val = \$joins->zscore(); # data already loaded, use default windows and prob
\$val = \$joins->zscore(data => \$aref, prob => .5, ccorr => 1);
(\$zvalue, \$pvalue) =  \$joins->zscore(data => \$aref, prob => .5, ccorr => 1, tails => 2); # same but wanting an array, get the p-value too```

Returns the zscore from a test of joincount deviation, taking the joincount expected away from that observed and dividing by the root expected joincount variance, by default with a continuity correction in the numerator. Called wanting an array, returns the z-value with its p-value for the tails (1 or 2) given.

The data to test can already have been loaded, or you send it directly as a flat referenced array keyed as `data`.

## test, joins_test, jnt

` \$joins->test();`

Test the currently loaded data for significance of the number of joins. Returns the Joins object, lumped with a `z_value`, `p_value`, and the descriptives `observed`, `expected` and `variance`.

## dump

` \$joins->dump(flag => '1|0', text => '0|1|2');`

Print Joins-test results to STDOUT. See dump in the Statistics::Sequences manpage for details.

# EXAMPLE

Here the problem is to assess the degree of consistency of in number of matches between target and response obtained in each of 200 runs of 25 trials each. The number of matches expected on the basis of chance is 5 per run. To test for sustained high or low scoring sequences, a join is defined as the point at which a score on one side of this value (4, 3, 2, etc.) is followed by a score on the other side (6, 7, 8, etc.). Ignoring scores equalling the expectation value (5), the probability of a join is 1/2, or 0.5 (the default value to test), assuming that, say, a score of 4 is as likely as a score of 6, and anything greater than a deviation of 5 (from 5) is improbable/impossible.

``` use Statistics::Sequences;

# Conduct pseudo identification 5 x 5 runs:
my (\$i, \$hits, \$stimulus, \$response, @scores);
foreach (\$i = 0; \$i < 200; \$i++) {
\$scores[\$i] = 0;
for (0 .. 24) {
\$stimulus = (qw/circ plus rect star wave/)[int(rand(5))];
\$response = (qw/circ plus rect star wave/)[int(rand(5))];
\$scores[\$i]++ if \$stimulus eq \$response;
}
}

my \$seq = Statistics::Sequences->new();
\$seq->cut(value => 5, equal => 0); # value is the expected number of matches (Np); ignoring values equal to this
\$seq->test(stat => 'joins', tails => 1, ccorr => 1)->dump(text => 1, flag => 1);
# prints, e.g., Joins: expected = 79.00, observed = 67.00, Z = -1.91, 1p = 0.028109*```

# REFERENCES

Wishart, J. & Hirshfeld, H. O. (1936). A theorem concerning the distribution of joins between line segments. Journal of the London Mathematical Society, 11, 227.

Statistics::Sequences::Runs : Analogous test.

Statistics::Sequences::Pot : Another concept of sequential structure.

# BUGS/LIMITATIONS

No computational bugs as yet identfied. Hopefully this will change, given time.

# REVISION HISTORY

See CHANGES in installation dist for revisions.