Ivan Tubert-Brohman > Statistics-Regression-0.50 > Statistics::Regression

Download:
Statistics-Regression-0.50.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  2
Open  0
View Bugs
Report a bug
Module Version: 0.50   Source   Latest Release: Statistics-Regression-0.53

NAME ^

  Regression.pm - weighted linear regression package (line+plane fitting)

SYNOPSIS ^

  use Statistics::Regression;

  # Create regression object
  my $reg = Statistics::Regression->new( 
    3, "sample regression", 
    [ "const", "someX", "someY" ] 
  );

  # Add data points
  $reg->include( 2.0, [ 1.0, 3.0, -1.0 ] );
  $reg->include( 1.0, [ 1.0, 5.0, 2.0 ] );
  $reg->include( 20.0, [ 1.0, 31.0, 0.0 ] );
  $reg->include( 15.0, [ 1.0, 11.0, 2.0 ] );

Please note that you must provide the constant.

  # Print the result
  $reg->print();

This prints the following:

  ****************************************************************
  Regression 'sample regression'
  ****************************************************************
  Name                         Theta          StdErr     T-stat
  [0='const']                 0.2950          6.0512       0.05
  [1='someX']                 0.6723          0.3278       2.05
  [2='someY']                 1.0688          2.7954       0.38

  R^2= 0.808, N= 4
  ****************************************************************

Or, use the subroutines to do the work yourself:

  my @theta  = $reg->theta();
  my @se     = $reg->standarderrors();
  my $rsq    = $reg->rsq();
  my $adjrsq = $reg->adjrsq();
  my $ybar   = $reg->ybar();  ## the average of the y vector
  my $sst    = $reg->sst();  ## the sum-squares-total
  my $sigmasq= $reg->sigmasq();  ## the variance of the residual
  my $k      = $reg->k();   ## the number of variables
  my $n      = $reg->n();   ## the number of observations

In addition, there are some other helper routines, and a subroutine linearcombination_variance(). If you don't know what this is, don't use it.

BACKGROUND WARNING ^

You should have an understanding of OLS regressions if you want to use this package. You can get this from an introductory college econometrics class and/or from most intermediate college statistics classes. If you do not have this background knowledge, then this package will remain a mystery to you. There is no support for this package--please don't expect any.

DESCRIPTION ^

Regression.pm is a multivariate linear regression package. That is, it estimates the c coefficients for a line-fit of the type

  y= c(0)*x(0) + c(1)*x1 + c(2)*x2 + ... + c(k)*xk

given a data set of N observations, each with k independent x variables and one y variable. Naturally, N must be greater than k---and preferably considerably greater. Any reasonable undergraduate statistics book will explain what a regression is. Most of the time, the user will provide a constant ('1') as x(0) for each observation in order to allow the regression package to fit an intercept.

ALGORITHM ^

Original Algorithm (ALGOL-60):

        W.  M.  Gentleman, University of Waterloo, "Basic
        Description For Large, Sparse Or Weighted Linear Least
        Squares Problems (Algorithm AS 75)," Applied Statistics
        (1974) Vol 23; No. 3

Gentleman's algorithm is the statistical standard. Insertion of a new observation can be done one observation at any time (WITH A WEIGHT!), and still only takes a low quadratic time. The storage space requirement is of quadratic order (in the indep variables). A practically infinite number of observations can easily be processed!

Internal Data Structures

R=Rbar is an upperright triangular matrix, kept in normalized form with implicit 1's on the diagonal. D is a diagonal scaling matrix. These correspond to "standard Regression usage" as

                X' X  = R' D R

A backsubsitution routine (in thetacov) allows to invert the R matrix (the inverse is upper-right triangular, too!). Call this matrix H, that is H=R^(-1).

          (X' X)^(-1) = [(R' D^(1/2)') (D^(1/2) R)]^(-1)
          = [ R^-1 D^(-1/2) ] [ R^-1 D^(-1/2) ]'

BUGS/PROBLEMS ^

None known.

Perl Problem

Unfortunately, perl is unaware of IEEE number representations. This makes it a pain to test whether an observation contains any missing variables (coded as 'NaN' in Regression.pm).

VERSION ^

0.50, 2007/04/04

AUTHOR ^

Naturally, Gentleman invented this algorithm. It was adaptated by Ivo Welch. Alan Miller (alan\@dmsmelb.mel.dms.CSIRO.AU) pointed out nicer ways to compute the R^2. Ivan Tubert-Brohman helped wrap the module as as a standard CPAN distribution.

LICENSE ^

This module is released for free public use under a GPL license.

(C) Ivo Welch, 2001,2004, 2007.

RECENT CHANGES ^

2007/04/04: Added Coefficient Standard Errors