
DATR2XML.pm - manipulate DATR .dtr, XML, HTML, XML

#! perl -w
use DATR2XML;
undef $DATR2XML::includeNodePath;
$datr -> set_stylesheet('D:/DATR/XSLT/datr.xsl');
$datr_eg1 = new DATR2XML('D:\DATR\perl\eg.dtr');
$datr_eg2 = new DATR2XML('D:/DATR/perl/eg.dtr', "on");
$datr_eg3 = new DATR2XML('http://somewhere/doc.dtr', "verbose");
viewAll $datr_eg1;
$datr_eg2 -> viewHeader;
$datr_eg3 -> printHeader;
printOpening $datr_eg3;
printNodes $datr_eg3;
printClosing $datr_eg3;
printAll $datr_eg3;
save $datr_eg3;
DATR2XML::convert('D:\DATR\XSLT\eg_opening.dtr');

This module parses into a Perl struct a DATR .dtr-formatted file, as defined in Gerald Gazdar's 'DATR By Example' published on the DATR web-pages at the University of Sussex < http://www.sussex.ac.uk/ >.
Particular respect was paid to datanode31.html, though I confess the formal definitions found elsewhere on the site made no sense to me.

Process logging may be set to "off", "on" or "true", and "verbose".

If internet access is required, the following modules must be installed and on the @INC path:
LWP::UserAgent
HTTP::Request
If no internet access is required, these modules will not be called.

The usual warnings if it can't read or write.

The module exports nothing to the calling namespace.

The module does not fully support The DATR Standard Library RFC, Version 2.20. Specifically, it does not support the use of the proposed path cut operator as a full-stop within a path: all full stops are taken to signify the end of a clause.

* Support The DATR Standard Library RFC, Version 2.20
* Change mechanism of _parseOpeningClosing to allow
line-spanning of contents.
* Support interpoloation of directives within body
as specified by the style sheet
* Fully support comment printing as specified by DATR XML DTD.
Currently lumps all comments together.

These variables can adjust the output of the DTR parser: when they are undefined (using DATR2XML::$var = undef) they prevent the DTR parser from outputing any element which has a default value, as defined in the DATR DTD; when they are defined with any value, they force XML output in full.
Set with any value to print comments, undef not to.
The DTD provides the default path as a null path, but this can adjusted by setting $includeSentenceType to 1. This can be reset by calling undef upon the variable. See also include_sentence_type.
The DATR DTD provides the default type as ==, and this can be left if this variable is set, which is its defualt state. See also include_sentence_type.
The path to the required XSLT stylesheet. The default is http://www.leegoddard.com/DATR/XSLT/datr.xsl. See also the method and procedure set_stylesheet.
The SYSTEM location of (that is, the path to) the DATR DTD. The default is http://www.leegoddard.com/DATR/DTD/DATR1.0.dtd. See also the method and procedure set_dtd.
This is literally the root element as printed, and may contain a references, such as to XML schema.
Eg:
$datr_root = '<DATR xmlns="x-schema:http://www.leegoddard.com/DATR/DTD/DATR1.0.xml">';
The defualt is simply the opening of the DATR element. See also set_schema.

Creates a new DATR2XML object from file, URI or DATR .dtr source.
Accepts: DATR source as scalar, array, scalar/array pointer, or path to a DATR file. If source is scalar or pointer to a scalar, is assumed to be just a list of node definitions, of BODY slot.
Optionally accepts a second argument to set logging: see the manual entry
for the logging method for details.
Returns: reference to object.
Object Structure: a hash with the following fields:
LOCATION - the name of the file, if any
HEADER - the file header (as defined in datrnode44.html#fileheader)
OPENING - opening declarations/directives as defined in datrnode45.html#openingdeclarations
BODY - node defintions,itself an array of hashes of the format defined in _parseNodes
CLOSING - clsoing declarations/directives as defined in datrnode47.html#closingdeclarations
Sets or resets the type attribute of EQUATION elements.
Calling with an argument value of 1 includes the type attribute (default); calling with 0 forces the type attribute to be omitted.
Call without a value to stop comment printing; call with a value to restart comment printing. Default is to print comments.
Sets the path to the required XSLT stylesheet. See also location_xsl in the section Global Variables.
Sets the location of the DTD as used in the DOCTYPE SYSTEM declaration. See also location_dtd in the section Global Variables.
Sets the location of the XML Schema as used in the root element. If called with no arguemnt value, removes all references to an XML Schema, setting $datr_root to the opening of the DATR root tag without attributes.
Calling with a value of 1 sets the Schema to the author's, located at http://www.leegoddard.com/DATR/DTD/DATR1.0.xml. See also datr_root in the section Global Variables.
Turns logging off or on, verbose or minimal.
Accepts: "true|on|minimal" or "verbose" or "off|none|silent"
Returns: None
Provides a rough printout of all records
Accepts: object ref;
Returns: none
Provides a rough printout of all nodes
Accepts: object ref;
Returns: none
Provides a rough view of the opening directives/definitions
Accepts: object ref;
Returns: none
Provides a rough view of the closing directives/definitions
Accepts: object ref;
Returns: none
Provides a rough printout of all nodes
Accepts: object ref;
Returns: none
Saves to local filesystem an XML printout of all records
Accepts: object ref;
optional file path to save at
or, for internal use, typeglob for PERL filehandle.
Returns: none
Notes: simply calls printAll, passing filehandle if necessary.
Convert one or more DATR files to XML.
Accepts: I<Either>:
a filepath with an extension,
optionally with an additional destination filepath or directory,
I<or,>
for batch operation, a directory location.
Returns: nothing, will die on errors
Notes: Does not accept URLs and does not process sub-directories.
Minimizes logging during operation.
Provides an XML printout of all records
Accepts: object ref;
optional file path to save at.
or, for internal use, typeglob for PERL filehandle
Returns: none
Provides an rough printout of all nodes
Accepts: object ref;
optional file path
or, for internal use, typeglob for PERL filehandle
Returns: none
Provides an XML printout of the opening/closing directives/definitions block element. Without passing a filepath or typeglob for filehandle, outputs to STDOUT. Just a wrapper for _printOpeningClosing.
Accepts: object ref;
optionally a file path
or, for internal use, typeglob for PERL filehandle
Returns: none
Provides an XML printout of all nodes. Basically writes the EQUATION element and calls _parsePath on each value of the object's {BODY} key.
Accepts: object ref
Returns: none

All private method subroutine names are prefixed with an underscore.
Load a dtr file from the local file system.
Accepts: object reference
Returns: an array of file contents
Load a dtr document from a URI
Accepts: object reference
Returns: an array of file contents
Parses a .dtr-format file header into the class record
Accepts: object ref;
Returns: none
Struct: This method fills the hash held in $self->{HEADER}
with whatever fields the C<.dtr> file header contains that match
a name/value pair delimited with a colon.
Extracts opening directives, those occuring before node definitions, and places them into the self-object's OPENING array.
Accepts: object ref, ref to DATR data
Returns: none
Extracts closing directives, those occuring before node definitions
Accepts: object ref; reference to array of DATR data
Returns: none
Notes: reverses @_ then applies same proc as _parseOpening, then reverses output
Parse a list of nodes to the class BODY record.
Accepts: an obj ref and an reference to an array
of DATR data
Returns: none
Struct: This method creates the array of hashes held in $self->{BODY}
with the following fields:
NODE - the name of the current node
PATH - the (left-hand) path
TYPE - the sentence-type signifier: = or ==
VALUE - the (right-hand) value
COMMENT - an array of comments, index reflecting source line number
Decodes path attributes into an XML structure.
Accepts: a string of DATR path (as in $$hash{VALUE});
optionally a second argument, being the name of a node to
build-out the sentence (cf. geraldg@cogs.susx.ac.uk, 06/07/00)
Returns: a string of XML structure
Notes: a bit of a hack, really.
Formats nodes for processing by removing comments/directives/linefeeds
Accepts: strings or array of DATR node/path/value sentences
Returns: one string of DATR node/path/value sentences, without linebreaks
Sets up a filehandle for output, whether STDOUT or not
Accepts: string of a filepath, or a filehandle, or a (ref to a) typeglob, or undef
Returns: a reference to a typeglob that is the filehandle
See also: "Passing Filehandles" in perlfaq7 Perl documentation
Note: Would it be better not to default to STDOUT but
to default to a filename specified at object construction time?
Prints as XML contents of opening/clsoing, as requested.

Author: Lee Goddard code@leegoddard.com, leego@cogs.susx.ac.uk
Copyright: © Lee Goddard, 09/06/00 and as above. All Rights Reserved. License: The GNU General Public License applies: copies available from www.gnu.org/. You are free to distribute and modify this module under the same terms as those of Perl itself.