The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

BioStudio::Basic - basic functions for the BioStudio synthetic biology framework

VERSION

Version 1.05

DESCRIPTION

Basic BioStudio functions

AUTHOR

Sarah Richardson <notadoctor@jhu.edu>.

Customization functions

configure_BioStudio()

This function loads the configuration file into a hash ref. You must pass it the path to the directory containing the configuration file; it will use Config::Auto to ``magically'' parse the file.

fetch_custom_features()

Pass the config hashref, receive a hashref of the custom features defined in the BioStudio configuration directory. Each key is a feature name, each value is a Bio::BioStudio::Feature object.

fetch_custom_markers()

Pass the config hashref, receive a hashref of the custom markers defined in the BioStudio configuration directory. Each key in the hashref is a marker name, each value in the hashref is a Bio::BioStudio::Marker object.

fetch_enzyme_lists()

Pass the config hashref, receive an array that contains the names of the enzyme lists in the BioStudio configuration directory.

Masking functions

make_mask()

Given a length, an array reference full of Bio::DB::SeqFeatures, and optionally an offset, returns a string of integers where each positon corresponds to a base of sequence, and the integer represents the number of features that overlap that base. Obviously limited to ten overlapping features before a serious bug sets in :(

mask_combine()

Takes two string masks (see make_mask()) and adds them. Returns the merged mask.

mask_filter()

Takes a string mask (see make_mask()) and returns a listref of break coordinates; that is, where does feature sequence end and interfeature sequence begin, and where does interfeature sequence end and feature sequence begin? For example, if the mask is "0001100033221100", the resulting list would be [0 3 5 8 14 16], meaning that features exist from 4 to 5 and 9 to 14. Intergenic sequence coordinates can thus be pulled out by hashing the array,

  %inter = @{mask_filter($mask)} 

where each key +1 is the left coordinate, and the value is the right coordinate.

Genome Repository functions

get_src_path()

Given a chromosome name and the config hashref, returns the absolute path to that chromosome in the BioStudio genome repository.

get_genome_list()

Given the config hashref, returns a list of all chromosomes in the BioStudio genome repository.

gather_versions()

Given a species, a target, and the config hashref, returns a hashref of all chromosomes in the species in the BioStudio genome repository that match the target. The target is an integer that represents a version.

If target is set to 0, we will return every wildtype version.

If target is set to -1, we will return every latest version.

For any other target (1, 3, 5) we will return that particular version.

rollback()

Given a chromosome name and the BioStudio config hashref, removes that chromosome from the BioStudio genome repository.

Editing and Markup functions

ORF_compile()

given a reference to an array full of Bio::SeqFeature gene objects, returns a reference to a hash with gene ids as keys and concatenated 5' to 3' coding sequences as values

get_feature_sequence()

For when you can't use the Bio::SeqFeature seq function. Given a Bio::DB::SeqFeature compliant feature and a sequence, returns the sequence that the coordinates of the feature indicate.

flatten_subfeats()

Given a seqfeature, iterate through its subfeatures and add all their subs to one big array. Mainly need this when CDSes are hidden behind mRNAs in genes.

gene_names()

Given a list of Bio::DB::SeqFeature gene objects and the BioStudio config hashref, returns a hash where each gene id is the key to a display friendly string.

allowable_codon_changes()

Given two codons (a from, and a to) and a GeneDesign codon table hashref, this function generates every possible peptide pair that could contain the from codon and checks to see if the peptide sequence can be maintained when the from codon is replaced by the to codon. This function is of particular use when codons are being changed in genes that overlap one another.

check_new_sequence()

Best when used as a confirmation that your edits went as expected. Given a Bio::DB::SeqFeature compliant feature that has a``newseq'' attribute, checks if the newseq and the actual sequence occupied by the feature are the same

takes a sequence as a string and a sequence id and returns an 80 column FASTA formatted sequence block as an array reference

COPYRIGHT AND LICENSE

Copyright (c) 2011, BioStudio developers All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

* Neither the name of the Johns Hopkins nor the names of the developers may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.