Bio::Phylo::NeXML::DOM - XML DOM support for Bio::Phylo
use Bio::Phylo::NeXML::DOM; use Bio::Phylo::IO qw( parse ); Bio::Phylo::NeXML::DOM->new(-format => 'twig'); my $project = parse( -file=>'my.nex', -format=>'nexus' ); my $nex_twig = $project->doc();
This module adds
to_dom methods to Bio::Phylo::NeXML::Writable classes, which provide NeXML-valid objects for document object model manipulation. DOM formats currently available are
XML::LibXML. For any
XMLWritable object, use
to_dom in place of
to_xml to create DOM nodes.
doc() method is also added to the
Bio::Phylo::Project class. It returns a NeXML document as a DOM object populated by the current contents of the
The NeXML parsing/writing capability of
Bio::Phylo goes a long way towards wider adoption of this useful standard.
Bio::Phylo can write NeXML-valid XML, the way in which it does this natively is somewhat hard-coded and therefore restricted, and is essentially oriented toward text file output. As such, there is a mismatch between the sophisticated
Bio::Phylo data structure and its own ability to manipulate and serialize that structure in sophisticated but interoperable ways. Finer manipulations of XML-represented data are possible via through a variety of Perl packages that can store and control XML according to a document object model (DOM). Many of these packages allow extremely flexible computation over large datasets stored in XML format, and admit the use of XML-related facilities such as XPath and XSLT programmatically.
The purpose of
Bio::Phylo::NeXML::DOM is to introduce integrated DOM object creation and manipulation to
Bio::Phylo, both to make DOM computation in
Bio::Phylo more convenient, and also to provide a platform for potentially more sophisticated
Bio::Phylo modules to come.
Besides the notion that DOM capability should be optional for the user, there are two main design ideas. First, for each
Bio::Phylo object that can be parsed/written as NeXML (i.e., for each
Bio::Phylo::NeXML::Writable object), we provide analogous method for creating a representative DOM object, or element. These elements are aggregatable in a DOM document object, whose native stringifying method can be used to generate valid NeXML.
Second, we allow flexibility and extensibility in the choice of the underlying DOM package, while maintaining a consistent DOM interface that is similar in semantic and syntactic style to the accessors and mutators that act on the
Bio::Phylo objects themselves. This is achieved through the DOM::DocumentI and DOM::ElementI interfaces, which define a minimal subset of DOM accessors and mutators, their inputs and outputs. Concrete instances of these interface classes provide the bindings between the abstract methods and their counterparts in the desired DOM implementation. Currently, there are bindings for two popular packages,
Another priority was simplicity of use; most of the details remain under the hood in practice. The
Bio/Phylo/Util/DOM.pm file defines the
to_dom() method for each
XMLWritable package, as well as the
Bio::Phylo::NeXML::DOM package proper. The
DOM object is a factory that is used to create Element and Document objects; it is an inside-out object that subclasses
Bio::Phylo. To curb the proliferation of method arguments, a DOM factory instance (set by the latest invocation of
Bio::Phylo::NeXML::DOM->new()) is maintained in a package global. This is used by default for object creation with DOM methods if a DOM factory object is not explicitly provided in the argument list.
The underlying DOM implementation is set with the
DOM factory constructor's single argument,
-format. Even this can be left out; the default implementation is
XML::Twig, which is already required by
Bio::Phylo. Thus, for example, one can use the DOM to convert a Nexus file to a DOM representation as follows:
use Bio::Phylo::NeXML::DOM; use Bio::Phylo::IO qw( parse ); Bio::Phylo::NeXML::DOM->new(); my $project = parse( -file=>'my.nex', -format=>'nexus' ); my $nex_twig = $project->doc(); # The end.
Underlying DOM packages are loaded at runtime as specified by the
-format argument. Packages for unused formats do not need to be installed.
The minimal DOM interface specifies the following methods. Details can be obtained from the
get_tagname() set_tagname() get_attributes() set_attributes() clear_attributes() get_text() set_text() clear_text() get_parent() get_children() get_first_child() get_last_child() get_next_sibling() get_prev_sibling() get_elements_by_tagname() set_child() prune_child() to_xml_string()
get_encoding() set_encoding() get_root() set_root() get_element_by_id() get_elements_by_tagname() to_xml_string() to_xml_file()
Type : Constructor Title : new Usage : $dom = Bio::Phylo::NeXML::DOM->new(-format=>$format) Function: Create a new DOM factory Returns : DOM object Args : optional: -format => DOM format (defaults to 'twig')
Type : Factory method Title : create_element Usage : $elt = $dom->create_element() Function: Create a new XML DOM element Returns : DOM element Args : Optional: -tag => $tag_name -attr => \%attr_hash
Type : Factory method Title : parse_element Usage : $elt = $dom->parse_element($text) Function: Create a new XML DOM element from XML text Returns : DOM element Args : An XML String
Type : Creator Title : create_document Usage : $doc = $dom->create_document() Function: Create a new XML DOM document Returns : DOM document Args : Package-specific args
Type : Factory method Title : parse_document Usage : $doc = $dom->parse_document($text) Function: Create a new XML DOM document from XML text Returns : DOM document Args : An XML String
Type : Mutator Title : set_format Usage : $dom->set_format($format) Function: Set the format (underlying DOM package bindings) for this object Returns : format designator as string Args : format designator as string
Type : Accessor Title : get_format Usage : $dom->get_format() Function: Get the format designator for this object Returns : format designator as string Args : none
Type : Static accessor Title : get_dom Usage : __PACKAGE__->get_dom() Function: Get the singleton DOM object Returns : instance of this __PACKAGE__ Args : none
There is a mailing list at https://groups.google.com/forum/#!forum/bio-phylo for any user or developer questions and discussions.
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63
Mark A. Jensen (maj -at- fortinbras -dot- us), refactored by Rutger Vos
Bio::Phylo::Annotation class is not yet DOMized.