roderick garton > Statistics-Autocorrelation-0.03 > Statistics::Autocorrelation

Download:
Statistics-Autocorrelation-0.03.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Module Version: 0.03   Source  

NAME ^

Statistics::Autocorrelation - Coefficients for any lag, as correlogram, with significance tests

VERSION ^

Version 0.03

SYNOPSIS ^

 use Statistics::Autocorrelation 0.03;
 $acorr = Statistics::Autocorrelation->new();
 $coeff = $acorr->coefficient(data => \@data, lag => integer (from 1 to N-1), exact => 0, unbias => 1);
 # or load one or more data, optionally update, and test each discreetly:
 $acorr->load(\@data1, \@data2);
 $coeff = $acorr->coeff(index => 0, lag => 1); # default lag => 0

DESCRIPTION ^

Calculates autocorrelation coefficients for a single series of numerical data, for any valid length of lag.

METHODS ^

new

 $acorr = Statistics::Autocorrelation->new();

Return a new class object for accessing its methods. This ISA Statistics::Data object, so all the methods for loading, adding, saving, dumping, etc., data in that package are available here.

coefficient

 $coeff = $autocorr->coefficient(data => \@data, lag => integer (from 1 to N-1), exact => 0|1, unbias => 1|0, circular => 1|0);
 $coeff = $autocorr->coefficient(lag => 1); # using loaded data, and default args (exact = 0, unbias = 1, circular = 0)

Alias: coeff, acf

Returns the autocorrelation coefficient, the ratio of the autocovariance to variance of a sequence at any particular lag, ranging from -1 to +1, as in Chatfield (1975) and Kendall (1973). Specifically,

ρk =
γk
σ²k

where k is the lag (see below).

Data can be previously loaded or sent directly here (see Statistics::Data). There must be at least two elements in the data array. A croak will be heard if no data have been loaded or given here.

Options are:

lag

An integer to define how many indices ahead or behind to start correlating the data to itself, as in how many time-intervals separate one value from another. If lag is greater than or equal to number of observations, returns empty string. If the value of lag is less than zero, the calculation is made with its absolute value, given that

ρk = ρk

for all k (so that a coefficient for a lag of -k is equal in magnitude and sign to that for +k). If a value is not given for lag, it is set to the default value of 0.

exact

Boolean value, default = 0. In calculating the autocorrelation coefficient, the convention -- as in corporate stats programs (e.g., SPSS/PASW), and published examples of autocorrelation (e.g., nist.gov), and texts such as Chatfield (1975), and Box and Jenkins (1976) -- is to calculate the sum-of-squares for the autocovariance (the numerator term in the autocorrelation coefficient) from the residuals for each observation x from trial t = 1 (index = 0) to N - k (the lag) relative to the mean of the whole sequence:

γk
1
N
 
Nk
Σ
t=1
(xtx)(xt+kx)

rather than the means for each sub-sequence as lagged, and (2) the sum-of-squares for the variance in the denominator as that of the whole sequence:

σ²k
1
N
 
Nk
Σ
t=1
(xtx

instead of using completely pairwise products. This convention assumes that the series is stationary (has no linear or curvilinear trend, no periodicity), and that the number of observations, N, in the sample is "reasonably large". You get the autocorrelation coefficient with these assumptions, with the above formulations, by default; but if you specify exact => 1, then you get the coefficient as calculated by Kendall (1973) Eq. 3.35, where the sums use not the overall sample mean, but the mean for the first to the N - k elements, and the mean from the k to N elements:

xk
1
Nk
 
Nk
Σ
t=1
xt
, and
xk´
1
Nk
 
Nk
Σ
t=1
xt+k

Taking each observation relative to these means, the autocovariance in the numerator, and variance in the denominator, are calculated as follows to give the autocorrelation coefficient:

ρk
Nk
Σ
t=1
(xtxk)(xt+kxk´)
[
Nk
Σ
t=1
(xtxk]½ [
Nk
Σ
t=1
(xt+kxk´]½
unbias

Boolean, default = 1. In calculating the approximate autocovariance, it is conventional to divide the sum-product of residuals (as given above) by N, but some sources divide by N - lag for less biased estimation, so that

γk
1
Nk
 
Nk
Σ
t=1
(xtx)(xt+kx)

For the latter, set unbias => 0. This is only effective where circular => 0 and exact => 0.

circular

Boolean value, default = 0: For circularized lagging, set circular => 1.

autocovariance

 $covar = $autocorr->autocovariance(data => \@data, lag => integer (from 1 to N-1), exact => 0|1, unbias => 1|0, circular => 1|0);
 $covar = $autocorr->autocovariance(lag => 1); # using loaded data, and default args (exact = 0, unbias = 1, circular = 0)

Alias: autocov, acvf

Returns the autocovariance; see coefficient for definition and options.

correlogram

 $href = $autocorr->correlogram(nlags => integer, exact => 1|0, unbias => 1|0, circular => 1|0); # assuming data are loaded
 $href = $autocorr->correlogram(nlags => integer, exact => 1|0, unbias => 1|0, circular => 1|0); # assuming data are loaded
 $href = $autocorr->correlogram(); # use defaults, with loaded data
 $href = $autocorr->correlogram(data => \@data); # same as either of above, but give data here
 ($lags, $coeffs) = $autocorr->correlogram(); # with args as for either of the above 

Alias: coeff_list

Returns the autocorrelation coefficients for lags from 0 to a limit, or (by default) over all possible lags, from 0 to N - 1. If called in array context, returns two elements: an array-reference of the lags, and an array-reference of their respsective coefficients. Otherwise, returns a hash-reference of the coefficients keyed by their respective lags. The limit is given by argument nlags giving the number of lags to return, including the zero lag, as permitted by the data to be referenced. Options are exact, unbias and circular, as defined above for coefficient. The autocorrelation function being symmetric about lag zero, the correlogram is based only on positive lags.

correlogram_chart

ctest_bartlett

 $bool = $acorr->ctest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options
 ($crit, $coeff, $bool) = $acorr->ctest_bartlett(lag => integer, tails => 1|2);

Performs a 95% confidence test of the null hypothesis of no autocorrelation, assuming that the series was generated by a Gaussian white noise process. Following Bartlett (1946), it compares the value of a single correlation coefficient for a given lag with the critical values given tails => 2 (default) or 1:

rk,.95
s
N½

where s is a constant equalling 1.96 for a two-tailed, or 1.645 for a one-tailed test. If the absolute value of the sample correlation coefficient falls beyond this critical value, the null hypothesis is rejected at the 95% level.

Returns, if called in array context, a list comprising the critical value, the sample coefficient, and a boolean as to whether the null hypothesis is rejected; otherwise, just the latter boolean.

Accepts all the options as given for coefficient. Note that the critical value is not calculated with respect to the particular value of lag - see ctest_anderson for this.

ctest_anderson

 $bool = $acorr->ctest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options
 ($crit, $coeff, $bool) = $acorr->ctest_b(lag => integer, tails => 1|2);

Performs a 95% confidence test of the null hypothesis of no autocorrelation, assuming that the series was generated by a Gaussian white noise process. Following Anderson (1941), it compares the value of a single correlation coefficient for a given lag with the critical values given tails => 2 (default) or 1:

rk,.95(2-tailed) = 
–1 ±1.96(Nk – 1)½
Nk
rk,.95(1-tailed) = 
–1 + 1.645(Nk – 1)½
Nk

If the sample correlation coefficient falls outside these bounds, the null hypothesis is rejected at the 95% level.

Returns, if called in array context, a list comprising the critical value, the sample coefficient, and a boolean as to whether the null hypothesis is rejected; otherwise, just the latter boolean.

Accepts all the options as given for coefficient. Note that the critical value is calculated with respect to the particular value of lag - unlike ctest_anderson.

ztest_bartlett

 $p_value = $acorr->ztest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options
 ($z_value, $p_value) = $acorr->ztest_bartlett(lag => integer, tails => 1|2);

Returns the 2- or 1-tailed probability, given tails => 2 (default) or 1, respectively, for the deviation of the observed autocorrelation coefficient at the given lag from the expected value of zero, relative to the variance 1 / N, assuming that the series was generated by a Gaussian white noise process. If called in array context, returns both the actual Z-value and then the p-value. Other options, and methods of assigning the data to test, are as for coefficient.

qtest, boxpierce

 $p_value = $acorr->qtest(nlags => integer); # assuming data are loaded, or see above for alternative and extra options
 ($q_value, $df, $p_value) = $acorr->qtest(nlags => integer);

Returns the Q statistic for testing whether a range of autocorrelation coefficients differs from zero, and so if the series was produced by a random process (Box & Pierce, 1970). If called in array context, returns a list giving the value of Q, and, assuming chi-square distribtution, its degrees of freedom (= nlags) and p-value; returns the p-value only if called in scalar context. Other options, and methods of assigning the data to test, are as for coefficient. The range is (by default) over all possible lags from 1 to N - 1. The statistic is defined as follows:

QN 
M
Σ
k=1
ρk²

where M is the largest lag-value to test (= nlags).

REFERENCES ^

Anderson, R.L. (1941). Distribution of the serial correlation coefficients. Annals of Mathematical Statistics, 8, 1-13.

Bartlett M.S. (1946). On the theoretical specification of sampling properties of autocorrelated time series. Journal of the Royal Statistical Society, 27.

Box, G.E, & Jenkins, G. (1976). Time series analysis: Forecasting and control. San Francisco, US: Holden-Day.

Box, G.E., & Pierce D. (1970). Distribution of residual autocorrelations in ARIMA time series models. Journal of the American Statistical Association, 65, 1509-1526.

Chatfield, C. (1975). The analysis of time series: Theory and practice. London, UK: Chapman and Hall.

Kendall, M. G. (1973). Time-series. London, UK: Griffin.

SEE ALSO ^

Statistics::SerialCorrelation (at cpan). Returns single autocorrelation coefficient which, with the present modules, would be given by coefficient given lag => 1, circular => 1 (and the defaults exact => 0, unbias => 0).

AUTHOR ^

Roderick Garton, <rgarton at cpan.org>

BUGS/REQS ^

Report to bug-statistics-autocorrelation-0.02 at rt.cpan.org or http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Autocorrelation-0.02.

SUPPORT ^

Find documentation for this module with the perldoc command:

    perldoc Statistics::Autocorrelation

Also look for information at:

LICENSE AND COPYRIGHT ^

Copyright 2011-2013 Roderick Garton.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

syntax highlighting: