XML::Simple::DTDReader - Simple XML file reading based on their DTDs
use XML::Simple::DTDReader; my $ref = XMLin("data.xml");
Or the object oriented way:
require XML::Simple::DTDReader; my $xsd = XML::Simple::DTDReader->new; my $ref = $xsd->XMLin("data.xml");
XML::Simple::DTDReader aims to be a XML::Simple drop-in replacement, but with several aspects of the module controlled by the XML's DTD. Specifically, array folding and array forcing are inferred from the DTD.
Currently, only XMLin is supported; support for XMLout is planned for later releases.
XMLin
XMLout
Parses XML formatted data and returns a reference to a data structure which contains the same information in a more readily accessible form. (Skip down to "EXAMPLES" for sample code). The XML must have a valid <!DOCTYPE> element.
XMLin() accepts an optional XML specifier, which can be one of the following:
XMLin()
If the filename contains no directory components XMLin() will look for the file in the current directory. Note, the filename '-' can be used to parse from STDIN. eg:
$ref = XMLin('/etc/params.xml');
If there is no XML specifier, XMLin() will check the script directory for a file with the same name as the script but with the extension '.xml'. eg:
$ref = XMLin();
A string containing XML (recognized by the presence of '<' and '>' characters) will be parsed directly. eg:
$ref = XMLin('<opt username="bob" password="flurp" />');
An IO::HAndle object will be read to EOF and its contents parsed. eg:
$fh = new IO::File('/etc/params.xml'); $ref = XMLin($fh);
Currently, none of XML::Simple's myriad of options are supported. Support for ContentKey, ForceContent, KeepRoot, SearchPath, and ValueAttr are planned for future releases.
ContentKey
ForceContent
KeepRoot
SearchPath
ValueAttr
XML::Simple::DTDReader is able to deal with inline and external DTDs. Inline DTDs take the form:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>
External DTDs are either system DTDs or public DTDs. System DTDs are of the form:
system
public
<?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd"> <greeting>Hello, world!</greeting>
The path in the external system identifier hello.dtd is relative to the path to the XML file in question, or to the current working directory if the XML does not come from a file, or the path to the file cannot be determined.
hello.dtd
Public DTDs take the form:
<?xml version="1.0"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"> <svg> <path d="M202,702l1,-3l7,-3l3,1l3,7l-1,3l-7,4l-3,-1l-3,-8z" /> </svg>
Two properties of the DTD are used by XML::Simple::DTDReader when determining the final structure of the data; repeated elements, and ID attributes. In the DTD, specifications of the form element+ or element* will lead to the key element mapping to an anonymous array. This is perhaps best illustrated with an example:
element+
element*
element
<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE data [ <!ELEMENT data (stuff+)> <!ELEMENT stuff (name,other*)> <!ELEMENT name (#PCDATA)> <!ELEMENT other (#PCDATA)> ]> <data> <stuff> <name>Moose</name> <other>Value</other> </stuff> <stuff> <name>Thingy</name> <other>Value</other> <other>Value2</other> </stuff> </data>
...will map to the data structure:
{ stuff => [ { name => "Moose", other => ["Value"], }, { name => "Thingy", other => ["Value", "Value2"], } ] }
The other element of the DTD that impacts the data structure is ID attributes. In XML, ID attributes are unique across a file, which is a more general case of Perl's restriction that keys be unique in a hash. Hence, the presence of attributes of type ID will cause that layer of the data to be folded into a hash, based on the value of the ID attribute as the key. This is again, best illustrated by example:
<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE data [ <!ELEMENT data (stuff+)> <!ELEMENT stuff (name)> <!ATTLIST stuff attrib ID #REQUIRED> <!ELEMENT name (#PCDATA)> ]> <data> <stuff attrib="first"> <name>Moose</name> </stuff> <stuff attrib="second"> <name>Thingy</name> </stuff> </data>
...will lead to the data structure:
{ stuff => { first => { name => "Moose", attrib => "first" }, second => { name => "Thingy", attrib => "second" } } }
XML::Simple::DTDReader recognizes most ELEMENT types, with the exception of mixed data (#PCDATA intermixed with elements) or ANY data. Attempts to parse DTDs describing elements with these types will result in an error.
XML::Simple::DTDReader is more strict than XML::Simple in parsing of documents; not only must the documents be compliant, they must also follow the DTD specified. XML::Simple::DTDReader will die with an appropriate message if it encounters a parsing of validation error.
See the t/ directory of the distribution for a number of example XML files, and the perl data structures they map to.
t/
None currently known, but I'm sure there are several.
Alex Vandiver : alexmv@mit.edu
Copyright (C) 2003 Alex Vandiver. All rights reserved. This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install XML::Simple::DTDReader, copy and paste the appropriate command in to your terminal.
cpanm
cpanm XML::Simple::DTDReader
CPAN shell
perl -MCPAN -e shell install XML::Simple::DTDReader
For more information on module installation, please visit the detailed CPAN module installation guide.