Bio::Community::TaxonomyUtils - Functions for manipulating taxonomic lineages
use Bio::Community::TaxonomyUtils qw(split_lineage_string get_lineage_string); my $lineage = 'Bacteria;WCHB1-60;unidentified'; my $lineage_arr = split_lineage_string($lineage); $lineage = get_lineage_string($lineage_arr); print "Lineage is: $lineage\n"; # Bacteria;WCHB1-60
This module implements functions to manipulate taxonomic lineages, as arrayref of taxon names or taxon objects.
Florent Angly email@example.com
User feedback is an integral part of the evolution of this and other Bioperl modules. Please direct usage questions or support issues to the mailing list, firstname.lastname@example.org, rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.
If you have found a bug, please report it on the BioPerl bug tracking system to help us keep track the bugs and their resolution: https://redmine.open-bio.org/projects/bioperl/
Copyright 2011-2014 by Florent Angly <email@example.com>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
Function: Split a lineage string, clean it and autodetect whitespaces. Use the ';' separator is used to split lineages like 'Bacteria; Proteobacteria' into an arrayref of its individual components, e.g. ['Bacteria','Proteobact']. The number and type of spaces after the separator is saved for future use in get_lineage_string(), the reciprocal function. Also, optionally clean the arrayref using clean_lineage_arr(). Usage : my $taxa_names = split_lineage($lineage_string); Args : * a lineage string * whether to clean taxonomy or not (default is to clean) Returns : an arrayref of taxon names
Function: Two step cleanup: 1/ At the beginning of the array, remove anything called 'Root' 2/ Starting from the end of the array, remove ambiguous taxonomic information such as: '', 'No blast hit', 'unidentified', 'uncultured', 'environmental', 'Other', 'g__', 's__', etc Usage : $lineage_arr = clean_lineage_arr($lineage_arr); Args : A lineage arrayref (either taxon names or objects) Returns : A lineage arrayref
Function: Take a taxon object and return its lineage as an arrayref of the taxon itself, preceded by its ancestor taxa. Usage : my $lineage_arr = get_taxon_lineage($taxon); Args : A taxon object Returns : An arrayref of taxon names
Function: Take a lineage arrayref and return a full lineage string by joining the elements using the ';' separator. The opposite operation is split_lineage_string(). Usage : my $lineage = get_lineage_string(['Bacteria', 'Proteobacteria']); or my $lineage = get_lineage_string([$taxon1, $taxon2]); Args : * Arrayref of taxon names or objects * Optional: whitespace string to include after separator (omit to autodetect) Returns : A lineage string