Bio::Phylo::Forest::DBTree - Phylogenetic database as a tree object
use Bio::Phylo::Forest::DBTree; # connect to the Green Genes tree my $file = 'gg_13_5_otus_99_annotated.db'; my $dbtree = Bio::Phylo::Forest::DBTree->connect($file); # $dbtree can be used as a Bio::Phylo::Forest::Tree object, # and the node objects that are returned can be used as # Bio::Phylo::Forest::Node objects my $root = $dbtree->get_root;
This package provides the functionality to handle very large phylogenies (examples: the NCBI taxonomy, the Green Genes tree) as if they are Bio::Phylo tree objects, with all the possibilities for traversal, computation, serialization, and visualization, but stored in a SQLite database. These databases are single files, so that they can be easily shared. Some useful database files are available here: https://figshare.com/account/home#/projects/18808
To make new tree databases, a number of scripts are provided with the distribution of this package:
megatree-loader Loads a very large Newick tree into a database.
megatree-loader
megatree-ncbi-loader Loads the NCBI taxonomy dump into a database.
megatree-ncbi-loader
megatree-phylotree-loader Loads a tree in the format of http://phylotree.org into a database.
megatree-phylotree-loader
As an example of interacting with a database tree, the script megatree-pruner can be used to extract subtrees from a database.
megatree-pruner
The following methods deal with the database as a whole: creating a new database, connecting to an existing one, persisting a tree in a database and extracting one as a mutable, in-memory object.
Creates a SQLite database file in the provided location. Usage:
use Bio::Phylo::Forest::DBTree; # second argument is optional Bio::Phylo::Forest::DBTree->create( $file, '/opt/local/bin/sqlite3' );
The first argument is the location where the database file is going to be created. The second argument is optional, and provides the location of the sqlite3 executable that is used to create the database. By default, the sqlite3 is simply found on the $PATH, but if it is installed in a non-standard location that location can be provided here. The database schema that is created corresponds to the following SQL statements:
sqlite3
$PATH
create table node( id int not null, parent int, left int, right int, name varchar(20), length float, height float, primary key(id) ); create index parent_idx on node(parent); create index left_idx on node(left); create index right_idx on node(right); create index name_idx on node(name);
Connects to a SQLite database file, returns the connection as a Bio::Phylo::Forest::DBTree object. Usage:
Bio::Phylo::Forest::DBTree
use Bio::Phylo::Forest::DBTree; my $dbtree = Bio::Phylo::Forest::DBTree->connect($file);
The argument is a file name. If the file exists, a DBD::SQLite database handle to that file is returned. If the file does not exist, a new database is created in that location, and subsequently the handle to that newly created database is returned. The creation of the database is handled by the create() method (see below).
create()
Persist a phylogenetic tree object (a subclass of Bio::Phylo::Forest::Tree) into a newly created database file. Usage:
use Bio::Phylo::Forest::DBTree; my $dbtree = Bio::Phylo::Forest::DBTree->persist( -file => $file, -tree => $tree, );
This method first create a database at the location specified by $file by making a call to the create() method. Subsequently, the $tree object is traversed from root to tips and inserted in the newly created database. Finally, the handle to this database is returned, i.e. a Bio::Phylo::Forest::DBTree object.
$file
$tree
Extracts a tree from a database. The returned tree is an in-memory object. Hence, this is an expensive operation that is best avoided as much as possible. Usage:
my $tree = $dbtree->extract;
Returns the underlying handle through which SQL statements can be executed directly on the database. This is a DBD::SQLite object. Usage:
my $dbh = $dbtree->dbh;
The following methods are implemented here to override methods of the same name in the Bio::Phylo hierarchy so that the tree database is accessed more efficiently than otherwise would be the case.
Returns the root of the tree, i.e. a Bio::Phylo::Forest::DBTree::Result::Node object, which is a subclass of Bio::Phylo::Forest::Node. Usage:
my $root = $dbtree->get_root;
Returns a dummy ID, an integer. Usage:
my $id = $dbtree->get_id;
Returns the first node object that has the provided name. Usage:
my $node = $dbtree->get_by_name( 'Homo sapiens' );
Given a code reference, visits all the nodes in the tree and executes the code on the focal node. Usage:
$dbtree->visit(sub{ my $node = shift; print $node->name, "\n"; });
To install Bio::Phylo::Forest::DBTree, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::Phylo::Forest::DBTree
CPAN shell
perl -MCPAN -e shell install Bio::Phylo::Forest::DBTree
For more information on module installation, please visit the detailed CPAN module installation guide.