DATR2XML.pm - manipulate DATR .dtr, XML, HTML, XML
#! perl -w use DATR2XML; undef $DATR2XML::includeNodePath; $datr -> set_stylesheet('D:/DATR/XSLT/datr.xsl'); $datr_eg1 = new DATR2XML('D:\DATR\perl\eg.dtr'); $datr_eg2 = new DATR2XML('D:/DATR/perl/eg.dtr', "on"); $datr_eg3 = new DATR2XML('http://somewhere/doc.dtr', "verbose"); viewAll $datr_eg1; $datr_eg2 -> viewHeader; $datr_eg3 -> printHeader; printOpening $datr_eg3; printNodes $datr_eg3; printClosing $datr_eg3; printAll $datr_eg3; save $datr_eg3; DATR2XML::convert('D:\DATR\XSLT\eg_opening.dtr');
This module parses into a Perl struct a DATR .dtr-formatted file, as defined in Gerald Gazdar's 'DATR By Example' published on the DATR web-pages at the University of Sussex < http://www.sussex.ac.uk/ >.
.dtr
Particular respect was paid to datanode31.html, though I confess the formal definitions found elsewhere on the site made no sense to me.
Process logging may be set to "off", "on" or "true", and "verbose".
If internet access is required, the following modules must be installed and on the @INC path:
LWP::UserAgent HTTP::Request
If no internet access is required, these modules will not be called.
The usual warnings if it can't read or write.
The module exports nothing to the calling namespace.
The module does not fully support The DATR Standard Library RFC, Version 2.20. Specifically, it does not support the use of the proposed path cut operator as a full-stop within a path: all full stops are taken to signify the end of a clause.
* Support The DATR Standard Library RFC, Version 2.20 * Change mechanism of _parseOpeningClosing to allow line-spanning of contents. * Support interpoloation of directives within body as specified by the style sheet * Fully support comment printing as specified by DATR XML DTD. Currently lumps all comments together.
These variables can adjust the output of the DTR parser: when they are undefined (using DATR2XML::$var = undef) they prevent the DTR parser from outputing any element which has a default value, as defined in the DATR DTD; when they are defined with any value, they force XML output in full.
DATR2XML::$var = undef
Set with any value to print comments, undef not to.
undef
The DTD provides the default path as a null path, but this can adjusted by setting $includeSentenceType to 1. This can be reset by calling undef upon the variable. See also include_sentence_type.
$includeSentenceType
The DATR DTD provides the default type as ==, and this can be left if this variable is set, which is its defualt state. See also include_sentence_type.
==
The path to the required XSLT stylesheet. The default is http://www.leegoddard.com/DATR/XSLT/datr.xsl. See also the method and procedure set_stylesheet.
http://www.leegoddard.com/DATR/XSLT/datr.xsl
The SYSTEM location of (that is, the path to) the DATR DTD. The default is http://www.leegoddard.com/DATR/DTD/DATR1.0.dtd. See also the method and procedure set_dtd.
http://www.leegoddard.com/DATR/DTD/DATR1.0.dtd
This is literally the root element as printed, and may contain a references, such as to XML schema.
Eg: $datr_root = '<DATR xmlns="x-schema:http://www.leegoddard.com/DATR/DTD/DATR1.0.xml">';
The defualt is simply the opening of the DATR element. See also set_schema.
DATR
Creates a new DATR2XML object from file, URI or DATR .dtr source.
Accepts: DATR source as scalar, array, scalar/array pointer, or path to a DATR file. If source is scalar or pointer to a scalar, is assumed to be just a list of node definitions, of BODY slot.
Optionally accepts a second argument to set logging: see the manual entry for the logging method for details.
Returns: reference to object.
Object Structure: a hash with the following fields:
LOCATION - the name of the file, if any HEADER - the file header (as defined in datrnode44.html#fileheader) OPENING - opening declarations/directives as defined in datrnode45.html#openingdeclarations BODY - node defintions,itself an array of hashes of the format defined in _parseNodes CLOSING - clsoing declarations/directives as defined in datrnode47.html#closingdeclarations
Sets or resets the type attribute of EQUATION elements.
type
EQUATION
Calling with an argument value of 1 includes the type attribute (default); calling with 0 forces the type attribute to be omitted.
1
0
Call without a value to stop comment printing; call with a value to restart comment printing. Default is to print comments.
Sets the path to the required XSLT stylesheet. See also location_xsl in the section Global Variables.
Sets the location of the DTD as used in the DOCTYPE SYSTEM declaration. See also location_dtd in the section Global Variables.
Sets the location of the XML Schema as used in the root element. If called with no arguemnt value, removes all references to an XML Schema, setting $datr_root to the opening of the DATR root tag without attributes.
$datr_root
Calling with a value of 1 sets the Schema to the author's, located at http://www.leegoddard.com/DATR/DTD/DATR1.0.xml. See also datr_root in the section Global Variables.
http://www.leegoddard.com/DATR/DTD/DATR1.0.xml
Turns logging off or on, verbose or minimal.
Accepts: "true|on|minimal" or "verbose" or "off|none|silent" Returns: None
Provides a rough printout of all records
Accepts: object ref; Returns: none
Provides a rough printout of all nodes
Provides a rough view of the opening directives/definitions
Provides a rough view of the closing directives/definitions
Saves to local filesystem an XML printout of all records
Accepts: object ref; optional file path to save at or, for internal use, typeglob for PERL filehandle. Returns: none Notes: simply calls printAll, passing filehandle if necessary.
Convert one or more DATR files to XML.
Accepts: I<Either>: a filepath with an extension, optionally with an additional destination filepath or directory, I<or,> for batch operation, a directory location. Returns: nothing, will die on errors Notes: Does not accept URLs and does not process sub-directories. Minimizes logging during operation.
Provides an XML printout of all records
Accepts: object ref; optional file path to save at. or, for internal use, typeglob for PERL filehandle Returns: none
Provides an rough printout of all nodes Accepts: object ref; optional file path or, for internal use, typeglob for PERL filehandle Returns: none
Provides an XML printout of the opening/closing directives/definitions block element. Without passing a filepath or typeglob for filehandle, outputs to STDOUT. Just a wrapper for _printOpeningClosing.
Accepts: object ref; optionally a file path or, for internal use, typeglob for PERL filehandle Returns: none
Provides an XML printout of all nodes. Basically writes the EQUATION element and calls _parsePath on each value of the object's {BODY} key.
_parsePath
{BODY}
Accepts: object ref Returns: none
All private method subroutine names are prefixed with an underscore.
Load a dtr file from the local file system.
Accepts: object reference Returns: an array of file contents
Load a dtr document from a URI
Parses a .dtr-format file header into the class record
Accepts: object ref; Returns: none Struct: This method fills the hash held in $self->{HEADER} with whatever fields the C<.dtr> file header contains that match a name/value pair delimited with a colon.
Extracts opening directives, those occuring before node definitions, and places them into the self-object's OPENING array.
Accepts: object ref, ref to DATR data Returns: none
Extracts closing directives, those occuring before node definitions
Accepts: object ref; reference to array of DATR data Returns: none Notes: reverses @_ then applies same proc as _parseOpening, then reverses output
Parse a list of nodes to the class BODY record.
Accepts: an obj ref and an reference to an array of DATR data Returns: none Struct: This method creates the array of hashes held in $self->{BODY} with the following fields: NODE - the name of the current node PATH - the (left-hand) path TYPE - the sentence-type signifier: = or == VALUE - the (right-hand) value COMMENT - an array of comments, index reflecting source line number
Decodes path attributes into an XML structure.
Accepts: a string of DATR path (as in $$hash{VALUE}); optionally a second argument, being the name of a node to build-out the sentence (cf. geraldg@cogs.susx.ac.uk, 06/07/00) Returns: a string of XML structure Notes: a bit of a hack, really.
Formats nodes for processing by removing comments/directives/linefeeds
Accepts: strings or array of DATR node/path/value sentences Returns: one string of DATR node/path/value sentences, without linebreaks
Sets up a filehandle for output, whether STDOUT or not
Accepts: string of a filepath, or a filehandle, or a (ref to a) typeglob, or undef Returns: a reference to a typeglob that is the filehandle See also: "Passing Filehandles" in perlfaq7 Perl documentation Note: Would it be better not to default to STDOUT but to default to a filename specified at object construction time?
Prints as XML contents of opening/clsoing, as requested.
Author: Lee Goddard code@leegoddard.com, leego@cogs.susx.ac.uk
Copyright: © Lee Goddard, 09/06/00 and as above. All Rights Reserved. License: The GNU General Public License applies: copies available from www.gnu.org/. You are free to distribute and modify this module under the same terms as those of Perl itself.
5 POD Errors
The following errors were encountered while parsing the POD:
'=item' outside of any '=over'
You forgot a '=back' before '=head1'
=cut found outside a pod block. Skipping to next block.
Non-ASCII character seen before =encoding in '©'. Assuming CP1252
To install Language::DATR::DATR2XML, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Language::DATR::DATR2XML
CPAN shell
perl -MCPAN -e shell install Language::DATR::DATR2XML
For more information on module installation, please visit the detailed CPAN module installation guide.