Lee ♫ Goddard > Language-DATR-DATR2XML-0.901 > Language::DATR::DATR2XML

Download:
Language-DATR-DATR2XML-0.901.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.901   Source  

NAME ^

DATR2XML.pm - manipulate DATR .dtr, XML, HTML, XML

SYNOPSIS ^

        #! perl -w
        use DATR2XML;

        undef $DATR2XML::includeNodePath;
        $datr -> set_stylesheet('D:/DATR/XSLT/datr.xsl');

        $datr_eg1 = new DATR2XML('D:\DATR\perl\eg.dtr');
        $datr_eg2 = new DATR2XML('D:/DATR/perl/eg.dtr', "on");
        $datr_eg3 = new DATR2XML('http://somewhere/doc.dtr', "verbose");

        viewAll $datr_eg1;
        $datr_eg2 -> viewHeader;

        $datr_eg3 -> printHeader;
        printOpening $datr_eg3;
        printNodes $datr_eg3;
        printClosing $datr_eg3;

        printAll $datr_eg3;

        save $datr_eg3;

        DATR2XML::convert('D:\DATR\XSLT\eg_opening.dtr');

DESCRIPTION ^

This module parses into a Perl struct a DATR .dtr-formatted file, as defined in Gerald Gazdar's 'DATR By Example' published on the DATR web-pages at the University of Sussex < http://www.sussex.ac.uk/ >.

Particular respect was paid to datanode31.html, though I confess the formal definitions found elsewhere on the site made no sense to me.

LOGGING ^

Process logging may be set to "off", "on" or "true", and "verbose".

REQUIRED MODULES ^

If internet access is required, the following modules must be installed and on the @INC path:

        LWP::UserAgent
        HTTP::Request

If no internet access is required, these modules will not be called.

DIAGNOSTICS ^

The usual warnings if it can't read or write.

EXPORTS ^

The module exports nothing to the calling namespace.

CAVEATS ^

The module does not fully support The DATR Standard Library RFC, Version 2.20. Specifically, it does not support the use of the proposed path cut operator as a full-stop within a path: all full stops are taken to signify the end of a clause.

TO DO ^

        * Support The DATR Standard Library RFC, Version 2.20
        * Change mechanism of _parseOpeningClosing to allow
          line-spanning of contents.
        * Support interpoloation of directives within body
          as specified by the style sheet
        * Fully support comment printing as specified by DATR XML DTD.
          Currently lumps all comments together.

GLOBAL VARIABLES ^

These variables can adjust the output of the DTR parser: when they are undefined (using DATR2XML::$var = undef) they prevent the DTR parser from outputing any element which has a default value, as defined in the DATR DTD; when they are defined with any value, they force XML output in full.

$printComments

Set with any value to print comments, undef not to.

$includeNodePath

The DTD provides the default path as a null path, but this can adjusted by setting $includeSentenceType to 1. This can be reset by calling undef upon the variable. See also include_sentence_type.

$includeSentenceType

The DATR DTD provides the default type as ==, and this can be left if this variable is set, which is its defualt state. See also include_sentence_type.

$location_xsl

The path to the required XSLT stylesheet. The default is http://www.leegoddard.com/DATR/XSLT/datr.xsl. See also the method and procedure set_stylesheet.

$location_dtd

The SYSTEM location of (that is, the path to) the DATR DTD. The default is http://www.leegoddard.com/DATR/DTD/DATR1.0.dtd. See also the method and procedure set_dtd.

$datr_root

This is literally the root element as printed, and may contain a references, such as to XML schema.

        Eg:
        $datr_root = '<DATR xmlns="x-schema:http://www.leegoddard.com/DATR/DTD/DATR1.0.xml">';

The defualt is simply the opening of the DATR element. See also set_schema.

PUBLIC METHODS ^

Constructor (new)

Creates a new DATR2XML object from file, URI or DATR .dtr source.

Accepts: DATR source as scalar, array, scalar/array pointer, or path to a DATR file. If source is scalar or pointer to a scalar, is assumed to be just a list of node definitions, of BODY slot.

                Optionally accepts a second argument to set logging: see the manual entry
                for the logging method for details.

Returns: reference to object.

Object Structure: a hash with the following fields:

        LOCATION - the name of the file, if any

        HEADER   - the file header (as defined in datrnode44.html#fileheader)

        OPENING  - opening declarations/directives as defined in datrnode45.html#openingdeclarations

        BODY     - node defintions,itself an array of hashes of the format defined in _parseNodes

        CLOSING  - clsoing declarations/directives as defined in datrnode47.html#closingdeclarations

include_sentence_type

Sets or resets the type attribute of EQUATION elements.

Calling with an argument value of 1 includes the type attribute (default); calling with 0 forces the type attribute to be omitted.

print_comments

Call without a value to stop comment printing; call with a value to restart comment printing. Default is to print comments.

set_stylesheet

Sets the path to the required XSLT stylesheet. See also location_xsl in the section Global Variables.

set_dtd

Sets the location of the DTD as used in the DOCTYPE SYSTEM declaration. See also location_dtd in the section Global Variables.

set_schema

Sets the location of the XML Schema as used in the root element. If called with no arguemnt value, removes all references to an XML Schema, setting $datr_root to the opening of the DATR root tag without attributes.

Calling with a value of 1 sets the Schema to the author's, located at http://www.leegoddard.com/DATR/DTD/DATR1.0.xml. See also datr_root in the section Global Variables.

logging

Turns logging off or on, verbose or minimal.

        Accepts:        "true|on|minimal" or "verbose" or "off|none|silent"
        Returns:        None

viewAll

Provides a rough printout of all records

        Accepts:        object ref;
        Returns:        none

viewHeader

Provides a rough printout of all nodes

        Accepts:        object ref;
        Returns:        none

viewOpening

Provides a rough view of the opening directives/definitions

        Accepts:        object ref;
        Returns:        none

viewClosing

Provides a rough view of the closing directives/definitions

        Accepts:        object ref;
        Returns:        none

viewNodes

Provides a rough printout of all nodes

        Accepts:        object ref;
        Returns:        none

save

Saves to local filesystem an XML printout of all records

        Accepts:        object ref;
                        optional file path to save at
                        or, for internal use, typeglob for PERL filehandle.
        Returns:        none
        Notes:          simply calls printAll, passing filehandle if necessary.

convert

Convert one or more DATR files to XML.

        Accepts:        I<Either>:
                        a filepath with an extension,
                        optionally with an additional destination filepath or directory,
                        I<or,>
                        for batch operation, a directory location.
        Returns:        nothing, will die on errors
        Notes:          Does not accept URLs and does not process sub-directories.
                        Minimizes logging during operation.

printAll

Provides an XML printout of all records

        Accepts:        object ref;
                        optional file path to save at.
                        or, for internal use, typeglob for PERL filehandle
        Returns:        none

printHeader

        Provides an rough printout of all nodes

        Accepts:        object ref;
                        optional file path
                        or, for internal use, typeglob for PERL filehandle
        Returns:        none

printOpening; printClosing

Provides an XML printout of the opening/closing directives/definitions block element. Without passing a filepath or typeglob for filehandle, outputs to STDOUT. Just a wrapper for _printOpeningClosing.

        Accepts:        object ref;
                        optionally a file path
                        or, for internal use, typeglob for PERL filehandle
        Returns:        none

printNodes

Provides an XML printout of all nodes. Basically writes the EQUATION element and calls _parsePath on each value of the object's {BODY} key.

        Accepts:        object ref
        Returns:        none

PRIVATE METHODS ^

All private method subroutine names are prefixed with an underscore.

_loadFile (private method)

Load a dtr file from the local file system.

        Accepts:        object reference
        Returns:        an array of file contents

_loadURI (private method)

Load a dtr document from a URI

        Accepts:        object reference
        Returns:        an array of file contents

_parseHeader (private method)

Parses a .dtr-format file header into the class record

        Accepts:        object ref;
        Returns:        none
        Struct:         This method fills the hash held in $self->{HEADER}
                        with whatever fields the C<.dtr> file header contains that match
                        a name/value pair delimited with a colon.

_parseOpening (private method)

Extracts opening directives, those occuring before node definitions, and places them into the self-object's OPENING array.

        Accepts:        object ref, ref to DATR data
        Returns:        none

_parseClosing (private method)

Extracts closing directives, those occuring before node definitions

        Accepts:        object ref; reference to array of DATR data
        Returns:        none
        Notes:          reverses @_ then applies same proc as _parseOpening, then reverses output

_parseNodes (private method)

Parse a list of nodes to the class BODY record.

        Accepts:        an obj ref and an reference to an array
                        of DATR data
        Returns:        none
        Struct:         This method creates the array of hashes held in $self->{BODY}
                        with the following fields:

                        NODE    - the name of the current node
                        PATH    - the (left-hand) path
                        TYPE    - the sentence-type signifier: = or ==
                        VALUE   - the (right-hand) value
                        COMMENT - an array of comments, index reflecting source line number

_parsePath (private pseudo-method)

Decodes path attributes into an XML structure.

        Accepts:        a string of DATR path (as in $$hash{VALUE});
                        optionally a second argument, being the name of a node to
                        build-out the sentence (cf. geraldg@cogs.susx.ac.uk, 06/07/00)
        Returns:        a string of XML structure
        Notes:          a bit of a hack, really.

_preFormatNodes (private method)

Formats nodes for processing by removing comments/directives/linefeeds

        Accepts:        strings or array of DATR node/path/value sentences
        Returns:        one string of DATR node/path/value sentences, without linebreaks

_setupOutput (private method)

Sets up a filehandle for output, whether STDOUT or not

        Accepts:        string of a filepath, or a filehandle, or a (ref to a) typeglob, or undef
        Returns:        a reference to a typeglob that is the filehandle
        See also:       "Passing Filehandles" in perlfaq7 Perl documentation
        Note:           Would it be better not to default to STDOUT but
                        to default to a filename specified at object construction time?

_printOpeningClosing (private pseudo-method)

Prints as XML contents of opening/clsoing, as requested.

AUTHOR and COPYRIGHT ^

Author: Lee Goddard code@leegoddard.com, leego@cogs.susx.ac.uk

Copyright: © Lee Goddard, 09/06/00 and as above. All Rights Reserved. License: The GNU General Public License applies: copies available from www.gnu.org/. You are free to distribute and modify this module under the same terms as those of Perl itself.

syntax highlighting: