The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RandomJungle::File::DB - Low level access to the data in the RandomJungle DB file

VERSION

Version 0.05

SYNOPSIS

RandomJungle::File::DB provides access to the data contained within the RandomJungle database that is created using this module. See RandomJungle::Jungle and RandomJungle::Tree for higher-level methods.

        use RandomJungle::File::DB;

        my $rjdb = RandomJungle::File::DB->new( db_file => $filename ) || die $RandomJungle::File::DB::ERROR;

        # Load data files into the db (all params are optional)
        $rjdb->store_data( xml_file => $file1, oob_file => $file2, raw_file => $file3 ) || warn $rjdb->err_str;

        # Get the filenames for the data that was loaded
        my $file = $rjdb->get_db_filename;
        my $file = $rjdb->get_xml_filename;
        my $file = $rjdb->get_oob_filename;
        my $file = $rjdb->get_raw_filename;

        my $href = $rjdb->get_rj_params; # input params that were used when RJ was run
        my $aref = $rjdb->get_header_labels; # (expected:  FID IID PAT MAT)
        my $aref = $rjdb->get_variable_labels; # (expected:  SEX PHENOTYPE var1 ...)
        my $aref = $rjdb->get_sample_labels; # from the IID column of the RAW file
        my $aref = $rjdb->get_tree_ids; # sorted numerically

        # Returns the line (unsplit, unspliced) from the OOB file for a given sample (one param is required)
        my $line = $rjdb->get_oob_by_sample( label => $sample_label, index => $sample_index )
                or warn $rjdb->err_str;

        # Returns data for the sample specified by label => $label, where label is the IID from the RAW file
        my $href = $rjdb->get_sample_data( label => $label ) || warn $rjdb->err_str;

        # Returns a href (not RJ::Tree objects) for each tree ID specified as an input param
        my $href = $rjdb->get_tree_data( @tree_ids ); # may be big - use with caution

METHODS

new()

Creates and returns a new RandomJungle::File::DB object:

        my $rjdb = RandomJungle::File::DB->new( db_file => $filename );

The 'db_file' parameter is required. Sets $ERROR and returns undef on failure.

store_data()

This method loads data into the RJ::File::DB database. All parameters are optional, so files can be loaded in a single call or in multiple calls. Each type of file can only be loaded once; subsequent calls to this method for a given file type will overwrite the previously-loaded data.

        $rjdb->store_data( xml_file => $file1, oob_file => $file2, raw_file => $file3 ) || die $rjdb->err_str;

Returns true on success. Sets err_str and returns false if an error occurred.

get_db_filename()

Returns the name of the DB file specified in store_data():

        my $file = $rjdb->get_db_filename;

get_xml_filename()

Returns the name of the XML file specified in store_data():

        my $file = $rjdb->get_xml_filename;

get_rj_params()

Returns a href of the input parameters used when Random Jungle was run:

        my $href = $rjdb->get_rj_params; # $href->{$param_name} = $param_value;

get_tree_ids()

Returns an array ref of tree IDs (sorted numerically):

        my $aref = $rjdb->get_tree_ids;

get_tree_data()

Returns a href containing a data record for each tree ID specified as an input param. The record for each tree is a data structure from the XML file, not a RandomJungle::Tree object. Invalid tree IDs are skipped. An empty href is returned if no valid IDs are provided.

        my $href = $rjdb->get_tree_data( @tree_ids ); # may be big - use with caution

Note: This method is not intended to be called directly. See RandomJungle::Jungle::get_tree_by_id().

get_oob_filename()

Returns the name of the OOB file specified in store_data():

        my $file = $rjdb->get_oob_filename;

get_oob_by_sample()

Returns the line (unsplit, unspliced) from the OOB file for a given sample. The sample is specified by either label => $label or index => $index (one is required), where label is the sample label (IID) from the RAW file and index is the row number of the sample in the RAW file. Sets err_str and returns undef if neither required parameter is specified or if the specified sample cannot be found.

        my $line = $rjdb->get_oob_by_sample( label => $sample_label, index => $sample_index )
                or warn $rjdb->err_str;

get_raw_filename()

Returns the name of the RAW file specified in store_data():

        my $file = $rjdb->get_raw_filename;

get_header_labels()

Returns a reference to an array that contains the header labels from the RAW file:

        my $aref = $rjdb->get_header_labels; # (expected:  FID IID PAT MAT)

get_variable_labels()

Returns a reference to an array that contains the variable labels from the RAW file:

        my $aref = $rjdb->get_variable_labels; # (expected:  SEX PHENOTYPE var1 ...)

get_sample_labels()

Returns a reference to an array that contains the sample labels from the IID column of the RAW file:

        my $aref = $rjdb->get_sample_labels;

get_sample_data()

Returns a hash ref containing data for the sample specified by label => $label, where label is the IID from the RAW file. Sets err_str and returns undef if label is not specified or is invalid.

        my $href = $rjdb->get_sample_data( label => $label ) || warn $rjdb->err_str;

$href has the following structure: SEX => $val, PHENOTYPE => $val, orig_data => $line, (unsplit, unspliced) index => $i, (index in aref from get_sample_labels(), can be used to index into OOB matrix) classification_data => $aref, (can be passed to RandomJungle::Tree->classify_data)

set_err()

Sets the error message (provided as a parameter) and creates a stack trace:

        $rjdb->set_err( 'Something went boom' );

err_str()

Returns the last error message that was set:

        my $msg = $rjdb->err_str;

err_trace()

Returns a backtrace for the last error that was encountered:

        my $trace = $rjdb->err_trace;

INTERNAL METHODS

_sample_label_to_index()

Returns the sample index (row in the RAW file, used to index into the OOB file) for a given sample label (from the IID column in the RAW file). Returns undef if the parameter is undef or if the label is invalid.

        my $sample_index = $rjdb->_sample_label_to_index( $sample_label ) || warn "Invalid label";

SEE ALSO

RandomJungle::Jungle, RandomJungle::Tree, RandomJungle::Tree::Node, RandomJungle::XML, RandomJungle::OOB, RandomJungle::RAW, RandomJungle::DB, RandomJungle::Classification_DB

AUTHOR

Robert R. Freimuth

COPYRIGHT

Copyright (c) 2011 Mayo Foundation for Medical Education and Research. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.