The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::MVA::BayesianDiscrimination - Two-Sample Linear Discrimination Analysis with Posterior Probability Calculation.

VERSION

This document describes Statistics::MVA::BayesianDiscrimination version 0.0.2

DESCRIPTION

Discriminant analysis is a procedure for classifying a set of observations each with k variables into predefined classes such as to allow the determination of the class of new observations based upon the values of the k variables for these new observations. Group membership based on linear combinations of the variables. From the set of observations where group membership is know the procedure constructs a set of linear functions, termed discriminant functions, such that:

    L = B[0] + B[1] * x1 + B[2] * x2 +... ... + B[n] * x_n

Where B[0] is a constant, B[n's] are discriminant coefficients and x's are the input variables. These discriminant functions (there is one for each group - consequently as this module only analyses data for two groups atm it generates two such discriminant functions.

Before proceeding with the analysis you should: (1) Perform Bartlett´s test to see if the covariance matrices of the data are homogenous for the populations used (see Statistics::MVA::Bartlett. If they are not homogenous you should use Quadratic Discrimination analysis. (2) test for equality of the group means using Hotelling's T^2 (see Statistics::MVA::HotellingTwoSample or MANOVA. If the groups do not differ significantly it is extremely unlikely that discrimination analysis with generate any useful discrimination rules. (3) Specify the prior probabilities. This module allows you to do this in several ways - see "priors".

This class automatically generates the discrimination coefficients at part of object construction. You can then either use the output method to access these values or use the discriminate method to apply the equations to a new observation. Both of these methods are context dependent - see "METHODS". See http://en.wikipedia.org/wiki/Linear_discriminant_analysis for further details.

SYNOPSIS

    # we have two groups of data each with 3 variables and 10 observations - example data from http://www.stat.psu.edu/online/courses/stat505/data/insect.txt
    my $data_X = [
        [qw/ 191 131 53/],
        [qw/ 185 134 50/],
        [qw/ 200 137 52/],
        [qw/ 173 127 50/],
        [qw/ 171 128 49/],
        [qw/ 160 118 47/],
        [qw/ 188 134 54/],
        [qw/ 186 129 51/],
        [qw/ 174 131 52/],
        [qw/ 163 115 47/],
    ];

    my $data_Y = [
        [qw/ 186 107 49/],
        [qw/ 211 122 49/],
        [qw/ 201 144 47/],
        [qw/ 242 131 54/],
        [qw/ 184 108 43/],
        [qw/ 211 118 51/],
        [qw/ 217 122 49/],
        [qw/ 223 127 51/],
        [qw/ 208 125 50/],
        [qw/ 199 124 46/],
    ];
    
    use Statistics::MVA::BayesianDiscrimination;

    # Pass the data as a list of the two LISTS-of-LISTS above (termed X and Y). The module by default assumes equal prior probabilities.
    #my $bld = Statistics::MVA::BayesianDiscrimination->new($data_X,$data_Y);

    # Pass the data but telling the module to calculate the prior probabilities as the ratio of observations for the two groups (e.g. P(X) X_obs_num / Total_obs.
    #my $bld = Statistics::MVA::BayesianDiscrimination->new({priors => 1 },$data_X,$data_Y);

    # Pass the data but directly specifying the values of prior probability for X and Y to use as an anonymous array.
    #my $bld = Statistics::MVA::BayesianDiscrimination->new({priors => [ 0.25, 0.75 ] },$ins_a,$ins_b);

    # Print values for coefficients to STDOUT.
    $bld->output;

    # Pass the values as an ARRAY reference by calling in LIST context - see L</output>.
    my ($prior_x, $constant_x, $matrix_x, $prior_y, $constant_y, $matrix_y) = $bld->output;
   
    # Perform discriminantion analyis for a specific observation and print result to STDOUT.
    $bld->discriminate([qw/184 114 59/]);

    # Call in LIST context to obtain results directly - see L</discriminate>.
    my ($val_x, $p_x, $post_p_x, $val_y, $p_y, $post_p_y, $type) = $bld->discriminate([qw/194 124 49/]);

METHODS

new

Creates a new Statistics::MVA::BayesianDiscrimination. This accepts two references for List-of-Lists of values corresponding to the two groups of data - termed X and Y. Within each List-of-Lists each nested array corresponds to a single set of observations. It also accepts an optional HASH reference of options preceding these values. The constructor automatically generates the discrimination coefficients that are accessed using the output method.

    # Pass data as ARRAY references.
    my $bld = Statistics::MVA::BayesianDiscrimination->new($data_X,$data_Y);

    # Passing optional HASH reference of options.
    my $bld = Statistics::MVA::BayesianDiscrimination->new({priors => 1 },$data_X,$data_Y);
   

output

Context-dependent method for accessing results of discrimination analysis. In void context it prints the coefficients to STDOUT.

    $bld->output;

In LIST-context it returns a list of the relevant data accessed as follows:

    my ($prior_x, $constant_x, $matrix_x, $prior_y, $constant_y, $matrix_y) = $bld->output;

    print qq{\nPrior probability of X = $prior_x and Y = $prior_y.}; 
    print qq{\nConstants for discrimination function for X = $constant_x and Y = $constant_y.};
    print qq{\nCoefficients for discrimination function X = @{$matrix_x}.};
    print qq{\nCoefficients for discrimination function Y = @{$matrix_y}.};

discriminate

Method for classification of a new observation. Pass it an ARRAY reference of SCALAR values appropriate for the original data-sets passed to the constructor. In void context it prints a report to STDOUT:

    $bld->discriminate([qw/123 34 325/];

In LIST-context it returns a list of the relevant data as follows:

    my ($val_x, $p_x, $post_p_x, $val_y, $p_y, $post_p_y, $type) = $bld->discriminate([qw/123 34 325/];

    print qq{\nLinear score function for X = $val_x and Y = $val_y - the new observation is of type \x27$type\x27.};
    print qq{\nThe prior probability that the new observation is of type X = $p_x and the posterior probability = $post_p_x};
    print qq{\nThe prior probability that the new observation is of type X = $p_y and the posterior probability = $post_p_y};     

OPTIONS

priors

Pass within an anonymous HASH preceding the two data references during object construction:

    my $bld = Statistics::MVA::BayesianDiscrimination->new({priors => option_value },$data_X,$data_Y);

Passing '0' causes the module to assume equal prior probabilities for the two groups (prior_x = prior_y = 0.5). Passing '1' causes the module to generate priors depending on the ratios of the two data-sets e.g. X has 15 observations and Y has 27 observations gives prior_x = 15 / (15 + 27). Alternatively you may specify the values to use by passing an anonymous ARRAY reference of length 2 where the first value is prior_x and the second is prior_y. There are currently no checks on priors directly passed so ensure that prior_x + prior_y = 1 if you supply you own.

    # Use prior_x = prior_y = 0.5.
    my $bld = Statistics::MVA::BayesianDiscrimination->new({priors => 0 },$data_X,$data_Y);

    # Generate priors depending on rations of observation numbers.
    my $bld = Statistics::MVA::BayesianDiscrimination->new({priors => 1 },$data_X,$data_Y);

    # Specify your own priors.
    my $bld = Statistics::MVA::BayesianDiscrimination->new({priors => [$prior_x, $prior_y] },$data_X,$data_Y);

DEPENDENCIES

'Statistics::MVA' => '0.0.1', 'Carp' => '1.08', 'Math::Cephes' => '0.47', 'List::Util' => '1.19', 'Text::SimpleTable' => '2.0',

BUGS

Let me know.

AUTHOR

Daniel S. T. Hughes <dsth@cantab.net>

LICENCE AND COPYRIGHT

Copyright (c) 2010, Daniel S. T. Hughes <dsth@cantab.net>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

DISCLAIMER OF WARRANTY

Because this software is licensed free of charge, there is no warranty for the software, to the extent permitted by applicable law. Except when otherwise stated in writing the copyright holders and/or other parties provide the software "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of the software is with you. Should the software prove defective, you assume the cost of all necessary servicing, repair, or correction.

In no event unless required by applicable law or agreed to in writing will any copyright holder, or any other party who may modify and/or redistribute the software as permitted by the above licence, be liable to you for damages, including any general, special, incidental, or consequential damages arising out of the use or inability to use the software (including but not limited to loss of data or data being rendered inaccurate or losses sustained by you or third parties or a failure of the software to operate with any other software), even if such holder or other party has been advised of the possibility of such damages.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 36:

Non-ASCII character seen before =encoding in 'Bartlett´s'. Assuming UTF-8