Microarray::DataMatrix - abstraction to matrices of microarray data
Note : This documentation is for Developers only. Clients of concrete subclasses of this package should have no need to consult this documentation, as the API for those subclasses should be fully documented as part of those subclasses.
dataMatrix provides an abstract superclass for a collection of abstract classes pertaining to dealing with matrices. Only in the context of the those other classes is baseDataMatrix useful and meaningful. baseDataMatrix itself provides protected methods for certain primitive operations that can be used by its subclasses, and public methods for which it is required that its immediate subclasses have the same underlying structure to deal with their dataMatrix, such as which rows and columns have not yet been filtered out.
The collection of classes are structured like this:
dataMatrix /\ / \ / \ ISA / \ ISA / \ / \ / \ smallDataMatrix bigDataMatrix \ / \ / \ / \ / CanBeA \ / CanBeA \ / \ / \ / anySizeDataMatrix | | ISA | ------------------ - - - - - - | | | | | | | | | concreteClassA concreteClassB concreteClassX
anySizeDataMatrix provides an abstraction to a dataMatrix whose contents may or may not fit into memory. An object will inherit dynamically, at construction time, from either small- or bigDataMatrix, which know how to deal with a matrix of a particular size. anySizeDataMatrix itself is an abstract class, and will be subclassed by concrete classes dealing with a particular file type of data, which they know how to parse, for example a pclFile. Because development of dataMatrix, smallDataMatrix, bigDataMatrix and anySizeDataMatrix was done as a collection of classes, they are somewhat more intimate with each other than say a concrete subclass of anySizeDataMatrix would be with anySizeDataMatrix itself. While the subclasses do stick to the API, and respect the privacy of attributes and methods, the API was developed simultaneously with the subclasses that were using it. Thus it may not be the cleanest API in the world.....
This collection of classes tries to follow the rules that all attributes are preceded by the "$PACKAGE::", in the objects hash. Private attribute names and private methods are preceded by two underscores, protected attributes and protected methods (which can be accessed by subclasses, as well as in $PACKAGE itself) are preceded by a single underscore. Public attributes and methods (which can be accessed anywhere) have no preceding underscores. In actuality, all object attributes are (and should be private). If there is a need for either subclasses or clients to manipulate or access them, then there are provided protected and public methods respectively, for setting or getting the values of the attributes. Disobey this interface at your peril!!!!
This protected method is used to set the autodump flag, which can be either 1 or 0. This should only be utilized by subclasses, not clients.
Usage:
$self->_setAutoDump(1);
This protected setter method receives a reference to a hash, which has as its keys the indexes of the columns in the matrix which are valid. This method MUST be used when the matrix has been first read, to set up all the columns which are initially valid (this call will actually occur in the _init methods (or methods called by them) of big- and smallDataMatrix). The values of the hash will usually be undef, to simply save space. There is no expectation for them to be otherwise.
$self->_setValidColumns(\%validColumns);
This protected setter method receives a reference to a hash, which has as its keys the indexes of the rows in the matrix which are valid. This method is expected to only be used when the matrix has been first read, to set up all the rows which are initially valid (this call will actually occur in the _init methods (or methods called by them) of big- and smallDataMatrix). The values of the hash will usually be undef, to simply save space. There is no expectation for them to be otherwise.
$self->_setValidRows(\%validRows);
This protected setter method accepts a scalar, that will correspond to an error that has occurred, and will store it within the object.
$self->_setErrstr($error);
This protected mutator method makes a row invalid. This method is not undoable, because the invalidation also deletes the data for the row. Note that the row index MUST correspond to the index of that row in the original file, not whatever row it may currently be (ie if rows 1 and 2 were filtered out, row 3 should still be called row 3 when being invalidated, not row 1).
Usage :
$self->_invalidateMatrixRow($row);
This protected mutator method makes a column invalid. Note that the column index MUST correspond to the index of that column in the original file, not whatever column it may currently be (ie if columns 1 and 2 were filtered out, column 3 should still be called column 3 when being invalidated, not column 1).
$self->_invalidateMatrixColumn($column);
This protected method returns a boolean to indicate whether autodumping is enabled.
if ($self->_autoDump){ # blah }
This protected accessor returns a reference to an array that contains the indexes of all the valid rows
foreach my $row (@{$self->_validRowsArrayRef}){ # do something useful }
This protected accessor returns a reference to an array that contains the indexes of all the valid columns.
foreach my $column (@{$self->_validColumnsArrayRef}){ # do something useful }
This protected accessor returns a boolean to indicate whether a given row in the data matrix is still valid (ie has not been filtered out). The row index is with respect to its index in the original file that was used to construct the object.
if ($self->_matrixRowIsValid($row)){ # blah }
This protected accessor returns a boolean to indicate whether a given column in the data matrix is still valid (ie has not been filtered out). The column index is with respect to its index in the original file that was used to construct the object.
if ($self->_matrixColumnIsValid($column)){ # blah }
This protected method returns the number of columns to process after which reporting should be done, if verbose reporting has been indicated. If no value has been set, then the default of 50 is returned.
my $numColumnsToReport = $self->_numColumnsToReport;
This protected method returns the number of rows to process after which reporting should be done, if verbose reporting has been indicated. If no value has been set, then the default of 5000 is returned.
my $numRowsToReport = $self->_numRowsToReport;
This protected method returns the appropriate line ending, for text or html reporting. It expects a string, either 'html' or 'text' and will return the appropriate line ending.
my $lineEnding = $self->_lineEnding("text");
This protected method returns a boolean to indicate whether a centering method is allowed. Allowed methods are 'mean' and 'median'.
if ($self->_centeringMethodIsAllowed($method)){ # blah }
This protected method returns a boolean to indicate whether a particular operator is allowed. For each operator, there exists a corresponding method that uses that operator. Such operators are used when filtering rows by there values, eg >, or < etc.
if ($self->_operatorIsAllowed($operator)){ # blah }
This protected method returns the name of the method that is used to compare two values, based on the operator that was passed in.
my $method = $self->_methodForOperator($operator);
This method returns the average of the valid entries in a row, using either the mean or the median, depending on the requested method. The row is passed in as a reference to an array containing the values for the row. If no mean/median could be calculated, then the method returns undef. Only values at validRowIndexes within the passed in array are used in the calculation.
my $average = $self->_rowAverage(\@row, "mean");
This method calculates either the mean or median of a set of data, by receiving the total number of datapoints, an array by reference of all the datavalues, and the sum total of all the datapoints. The former is required to calculate the median (and is not assumed to be sorted), the latter to calculate the mean. The method must also be passed in. The number of datapoints must be non-zero.
my $average = $self->_average("mean", \@data, $total, $numDatapoints);
This protected method takes an array reference to a row, and the average (either mean or median, depending on what was requested), and subtracts that value from every valid value (ie for the valid column indexes) in the row.
$self->_centerRow(\@row, $average);
This method expects to receive hashes of the sums of X, the sums of X squared and the number of datapoints, where the keys for each hash are the unique identifiers for a series of numbers, whose mean and standard deviations are to be calculated. It returns references to hashes that hash the same ids to the means and standard deviations. It uses the n-1 version of standard deviation. If a standard deviation cannot be calculated, it will be stored as undef.
my ($stddevHashRef, $meansHashRef) = $self->_calculateMeansAndStdDeviations(\%sumOfX, \%sumX2, \%numDataPoints);
This method receives two hashes by reference. One is a hash of means, the other a hash of std deviations. It also receives a multiplier. It then calculates, and returns as hash references, the upper and lower bounds for the mean plus or minus that number of deviations. It also receives what line ending it should be using, if being verbose in its reporting.
my ($upperHashRef, $lowerHashRef) = $self->_calculateBounds($stddevHashRef, $meansHashRef, $deviations, $lineEnding);
This protected utility method can be used by any subclass that expects its own subclasses to implement certain methods. It can have stub methods, that simply call this method, which will give a standard error message saying that the class 'X' must override method Y.
$self->_giveOverrideMessage();
This public method returns a sorted array of all the allowed operators that may be used by methods (in subclasses) that employ the operators for whatever reason (their interface should indicate that they employ such operators).
my @operators = $matrix->allowedOperators;
This method accepts a positive integer, that indicates the number of columns that have been processed during a filtering/transformation method that is carried out on a column basis, after which progress should be indicated. If a client has not set this value, then it defaults to 50.
$matrix->setNumColumnsToReport(50);
This method accepts a positive integer, that indicates the number of rows that have been processed during a filtering/transformation method that is carried out on a row basis, after which progress should be indicated. If a client has not set this value, then it defaults to 5000.
$matrix->setNumRowsToReport(5000);
Gavin Sherlock
sherlock@genome.stanford.edu
To install Microarray::Config, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Microarray::Config
CPAN shell
perl -MCPAN -e shell install Microarray::Config
For more information on module installation, please visit the detailed CPAN module installation guide.