Florent Angly > Bio-Community-0.001000 > Bio::Community::TaxonomyUtils



Annotate this POD


View/Report Bugs
Source   Latest Release: Bio-Community-0.001008


Bio::Community::TaxonomyUtils - Functions for manipulating taxonomic lineages


  use Bio::Community::TaxonomyUtils qw(split_lineage_string get_lineage_string);

  my $lineage = 'Bacteria;WCHB1-60;unidentified';
  my $lineage_arr = split_lineage_string($lineage);
  $lineage = get_lineage_string($lineage_arr);

  print "Lineage is: $lineage\n"; # Bacteria;WCHB1-60


This module implements functions to manipulate taxonomic lineages, as arrayref of taxon names or taxon objects.


Florent Angly florent.angly@gmail.com


User feedback is an integral part of the evolution of this and other Bioperl modules. Please direct usage questions or support issues to the mailing list, bioperl-l@bioperl.org, rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible.

If you have found a bug, please report it on the BioPerl bug tracking system to help us keep track the bugs and their resolution: https://redmine.open-bio.org/projects/bioperl/


Copyright 2011,2012,2013 by the BioPerl Team bioperl-l@bioperl.org

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.


The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _


 Function: Split a lineage string, clean it and autodetect whitespaces.
           Use the ';' separator is used to split lineages like 'Bacteria;
           Proteobacteria' into an arrayref of its individual components, e.g.
           ['Bacteria','Proteobact']. The number and type of spaces after the
           separator is saved for future use in get_lineage_string(), the
           reciprocal function. Also, optionally clean the arrayref using
 Usage   : my $taxa_names = split_lineage($lineage_string);
 Args    : * a lineage string
           * whether to clean taxonomy or not (default is to clean)
 Returns : an arrayref of taxon names


 Function: Two step cleanup:
           1/ At the beginning of the array, remove anything called 'Root'
           2/ Starting from the end of the array, remove ambiguous taxonomic
              information such as:
                '', 'No blast hit', 'unidentified', 'uncultured', 'environmental',
                'Other', 'g__', 's__', etc
 Usage   : $lineage_arr = clean_lineage_arr($lineage_arr);
 Args    : A lineage arrayref (either taxon names or objects)
 Returns : A lineage arrayref


 Function: Take a taxon object and return its lineage as an arrayref of the
           taxon itself, preceded by its ancestor taxa.
 Usage   : my $lineage_arr = get_taxon_lineage($taxon);
 Args    : A taxon object
 Returns : An arrayref of taxon names


 Function: Take a lineage arrayref and return a full lineage string by joining
           the elements using the ';' separator. The opposite operation is
 Usage   : my $lineage = get_lineage_string(['Bacteria', 'Proteobacteria']);
           my $lineage = get_lineage_string([$taxon1, $taxon2]);
 Args    : * Arrayref of taxon names or objects
           * Optional: whitespace string to include after separator (omit to autodetect)
 Returns : A lineage string
syntax highlighting: