Test::Formats::XML - Test::Formats specialization that tests XML content
use Test::Formats::XML; our $schema = (<schema/*.xsd>); our $relaxng = (<relaxng/*.rng>); our $sgmldtd = (<dtd/*.dtd>); our @schema_tests = <schema/*.xml>; our @relaxng_tests = <relaxng/*.xml>; our @sgmldtd_tests = <dtd/*.xml>; plan tests => (1 + @schema + @relaxng + @sgmldtd); is_well_formed_xml($schema, "Test that the XML Schema parses"); is_valid_against_xmlschema($schema, $_) for (@schema_tests); is_valid_against_relaxng($relaxng, $_) for (@relaxng_tests); is_valid_against_sgmldtd($sgmldtd, $_) for (@sgmldtd_tests);
Test::Formats::XML is a
specialization module for Test::Formats that provides test-functions for evaluating XML content against XML Schema, RelaxNG Schema and Document Type Declarations (DTDs).
This module is built on the framework provided by Test::Builder (see Test::Builder and Test::More), and works under the TAP-based Test::Harness system. It can be used directly as the only testing module a given suite uses, or it can be used in conjunction with other harness-friendly modules.
The module uses the XML::LibXML module from CPAN, and provides the user with simple-to-use wrappers around the various forms of validation provided by XML::LibXML::Schema, XML::LibXML::RelaxNG and XML::LibXML::Dtd.
This only covers the functions specific to this module. However, all functionality provided by Test::Builder/Test::More is accessible here, as well. See those modules for more information.
All of the functions described in the next section take the same sequence of parameters, with the same relevance. These are:
This argument represents the document being tested against the schema provided in the first argument. There are several ways in which to pass this:
If the user has pre-parsed the document, the resulting XML::LibXML::Document object can be passed in as the parameter. This can be useful if the test suite wishes to distinguish document well-formedness (the document is parseable without errors) versus document validity (whether the parsed document conforms to a given schema).
If the parameter passed in appears to be an open filehandle, it is passed to the parse_fh() method of XML::LibXML in order to obtain a document object.
If the parameter is a scalar reference, it is assumed to be a reference to the document in memory. The de-referenced scalar is passed to the
parse_string method of a XML::LibXML object, to result in a document object.
Lastly, if the value is a (non-reference) scalar, it is first examined to see if it looks like an XML document. Regular expressions are used to see if the content looks like XML. It will look for a
DOCTYPE declaration or an XML document declaration (the initial
<?xml ...?> line that most XML documents have), first. If neither of these are found, at least one XML tag must be found. If not even this is found, the string is presumed to be a filename and is passed to the
parse_file method of XML::LibXML. If the string looks like XML content after all, it is passed to the
parse_string method of that class.
Any of the forms that have to directly handle the reading of a file and/or parsing a document itself, are wrapped in
eval blocks to catch any fatal errors. If such occur, the test reports a failure and the error is given as diagnostic information for the test.
For all of the test routines, the first argument represents the schema being used to validate the document (the second argument). What type of schema is important to the function being called-- if you pass a DTD to the RelaxNG test, it will not automatically re-route you to the DTD test. The value of this argument may be any of the following:
The easiest form to deal with, of course, is when the user is generous-enough to compile the schema themselves with the appropriate XML::LibXML::* class and pass the resulting object. The object is then used directly. This also saves slightly on processing and overhead time when you intend to use the same schema for a large number of tests.
If the argument is a filehandle, the contents are read and the resulting document parsed. None of the schema-related classes can (currently) take a filehandle directly, so this is offered to the user as a matter of convenience. If you are re-using the same file across multiple tests, you can use the
seek command to move the filehandle back to the start of the file and re-use the existing filehandle as well.
If the argument is a scalar reference, it is presumed to contain the text of the schema and is passed to the parser as such.
If the argument is a (non-reference) scalar, it is treated as a string. It is first tested with some regular expressions to see if the content looks like a schema of the given type. If it does not look like the text of a schema, it is passed to the constructor method of the relevant schema-class as a location of the schema. The particular XML::LibXML::* class will try to read it and parse it into an object.
Any of the forms that have to read and/or parse the schema text are wrapped in
eval blocks. If they fail for any reason, the test reports a failure and the text of the error is output as diagnostic information.
The tests done to match plain text data to one of the specific schema-types are somewhat limited, and may not always be guaranteed to work. Generally, it is best to only use the straight string parameter for filenames. If you have the schema in string-form, consider passing it as a scalar reference.
This argument is the only optional parameter of the three. If passed, it should be a string identifying the test. It is displayed in the TAP output stream, just as the
name parameter to more-familiar test functions (ok(), like(), etc.) is used.
$name is not given, Test::Formats::XML will attempt to create a reasonable test-name based on the type of the
The following test functions are provided. Each has one or more aliases to allow the user to choose syntaxtic sugar that best fit their preferred linguistic view of test-names:
The first set test a document against a RelaxNG schema. For more on the RelaxNG syntax, see http://relaxng.org/.
This set test a document against a DTD. The names are slightly misleading, as both SGML and XML DTDs are supported by XML::LibXML::Dtd. There are some minor syntactical differences between SGML DTDs and XML DTDs, but you can use whichever is best for your needs.
This pair test that an XML document is
well-formed, which is to say that it parses without errors. This is not the same as validation. A passing test here says nothing about the validity of the XML content itself, only that all tags are properly closed, etc. Note that these functions do not take a schema argument, only the XML document and (optionally) the test name.
These tests are convenience, as the same basic functionality can be found in other test-related modules on CPAN. However, as long as XML::LibXML is already being used, there is no harm in making things easier for the user by providing them here and cutting down on the list of dependencies.
All of the tests capture any fatal errors thrown by the underlying XML::LibXML classes used, and report them as diagnostic data to accompany a failed test report. See the
diag method of Test::Builder for more information.
Please report any bugs or feature requests to
bug-test-formats at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Test-Formats. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
The original idea for this stemmed from a blog post on http://use.perl.org by Curtis "Ovid" Poe. He proferred some sample code based on recent work he'd done, that validated against a RelaxNG schema. I generalized it for all the validation types that XML::LibXML offers, and expanded the idea to cover more general cases of structured, formatted text.
Copyright (c) 2008 Randy J. Ray, all rights reserved.
This module and the code within are released under the terms of the Artistic License 2.0 (http://www.opensource.org/licenses/artistic-license-2.0.php). This code may be redistributed under either the Artistic License or the GNU Lesser General Public License (LGPL) version 2.1 (http://www.opensource.org/licenses/lgpl-license.php).
Randy J. Ray,
<rjray at blackperl.com>