The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
NAME
    XML::Records - Perlish record-oriented interface to XML

SYNOPSIS
      use XML::Records;
      my $p=XML::Records->new('data.lst');
      $p->set_records('credit','debit');
      my ($t,$r)
      while ( (($t,$r)=$p->get_record()) && $t) {
        my $amt=$r->{Amount};
        if ($t eq 'debit') {
          ...
        }
      }

DESCRIPTION
    XML::Records provides a single interface for processing XML data on a
    stream-oriented, tree-oriented, or record-oriented basis. A subclass of
    XML::TokeParser, it adds methods to read "records" and tree fragments
    from XML documents.

    In many documents, the immediate children of the root element form a
    sequence of identically-named and independent elements such as log
    entries, transactions, etc., each of which consists of "field" child
    elements or attributes. You can access each such "record" as a simple
    Perl hash.

    You can also read any element and its children into a lightweight tree
    implemented as a Perl hash, or feed the contents of any element and its
    children into a SAX handler (making it possible to process "records"
    with modules like XML::DOM or XML::XPath).

METHODS
    $parser=XML::Records->new(source, [options]);
        Creates a new parser object

        *source* and *options* are the same as for XML::TokeParser. *source*
        is either a reference to a string containing the XML, the name of a
        file containing the XML, or an open IO::Handle or filehandle glob
        reference from which the XML can be read.

    $parser->set_records(name [,name]*);
        Specifies what XML element-type names enclose records. If a name is
        prefixed with '-' then the reader will treat a start-tag for that
        name as indicating the end of a record.

    ($type,$record)=$parser->get_record([{options}] [name [,name]*]);
        Retrieves the next record from the input, skipping through the XML
        input until it encounters a start tag for one of the elements that
        enclose records. If the first argument is a hash reference and the
        value of the key 'here' is set to a non-zero value, then non-comment
        tokens will not be skipped and the method will return (undef,undef)
        if the next token is not a start tag for a record-enclosing element
        (the token will be pushed back in this case). If arguments are
        given, they will temporarily replace the set of record-enclosing
        elements. The method will return a list consisting of the name of
        the record's enclosing element and a reference to a hash whose keys
        are the names of the record's child elements ("fields") and whose
        values are the fields' contents (if called in scalar context, the
        return value will be the hash reference). Both elements of the list
        will be undef if no record can be found.

        If a field's content is plain text, its value will be that text.

        If a field's content contains another element (e.g. a <customer>
        record contains an <address> field that in turn contains other
        fields), its value will be a reference to another hash containing
        the "sub-record"'s fields.

        If a record includes repeated fields, the hash entry for that
        field's name will be a reference to an array of field values.

        Attributes of record or sub-record elements are treated as if they
        were fields. Attributes of field elements are ignored. Mixed content
        (fields with both non-whitespace text and sub-elements) will lead to
        unpredictable results.

        Records do not actually need to be immediately below the document
        root. If a <customers> document consists of a sequence of <customer>
        elements which in turn contain <address> elements that include
        further elements, then calling get_record with the record type set
        to "address" will return the contents of each <address> element.

    $tree=$parser->get_simple_tree([{options}] [name [,name]*]);
        Returns a lightweight tree rooted at the next element whose name is
        listed in the arguments, or at the next start-tag token if no
        arguments are given, skipping over any intermediate tokens unless
        the 'here' option is set as in get_record().

        The return value is a hash reference to the root node of the tree.
        Each node is a hash with a 'type' key whose value is the node's
        type: 'e' for elements, 't' for text, and 'p' for processing
        instructions; and a 'content' key whose value is a reference to an
        array of the element's child nodes for element nodes, the string
        value for text nodes, and the data value for processing instruction
        nodes. Element nodes also have an 'attrib' key whose value is a
        reference to a hash of attribute names and values. Processing
        instructions also have a 'target' key whose value is the PI's
        target.

    $result=$parser->drive_SAX(handler, [{options},[name [,name]*]);
        Skips to the next element whose names is listed in the arguments, or
        the next element if no arguments are given, and generates PerlSAX
        events which are sent to the SAX handler object in handler as if the
        element were an entire document. The return value is whatever the
        handler returned in response to the end_document event. If the
        'here' option is set, returns undef without generating any SAX
        events if the next non-comment token is not a start tag for a
        record-enclosing element. If the 'wrap' option is set to 0, does not
        generate start_document or end_document events and returns 1.

EXAMPLES
  Print a list of package names from a (rather out-of-date) list of XML modules:

     #!perl -w
     use strict;
     use XML::Records;
 
     my $p=XML::Records->new('modules.xml') or die "$!";
     $p->set_records('module');
     while (my $record=$p->get_record()) {
       my $pkg=$record->{package};
       if (ref $pkg eq 'ARRAY') {
         for my $subpkg (@$pkg) {
           print $subpkg->{name},"\n";
         }
       }
       else {
         print $pkg->{name},"\n";
       }
     }

  Extract interesting items from an RSS 0.91 file

     #!perl -w
     use strict;
     use XML::Records;
     use XML::Handler::YAWriter;

     my $r=XML::Records->new('messages.rss');
     $r->set_records('item');
     my $h=XML::Handler::YAWriter->new(AsString=>1);
     $h->start_document({});
     $h->start_element({Name=>'items'});
     while (my $t=$r->get_tag('item')) {
       $r->unget_token($t);
       $r->begin_saving();
       my $text=$r->get_text('/item');
       if ($text=~/perl/i) {
         $r->restore_saved();
         $r->drive_SAX($h,{wrap=>0,here=>1});
       }
     }
     $h->end_element({Name=>'items'});
     print $h->end_document({});

RATIONALE
    XML::RAX, which implements the proposed RAX standard for record-oriented
    XML access, does much of what XML::Records does but its interface is not
    very Perlish (due to the fact that RAX is a language-independent
    interface), it cannot cope with fields that have sub-structure (because
    RAX itself doesn't address the issue), and it doesn't allow mixing
    record- oriented and non-record-oriented operations.

    XML::Twig allows access to tree fragments, but only on a "push"
    (callback- driven) basis, and does not allow mixed tree- and token-level
    access.

PREREQUISITES
    XML::TokeParser (version 0.03 or higher), XML::Parser.

AUTHOR
    Eric Bohlman (ebohlman@earthlink.net, ebohlman@omsdev.com)

COPYRIGHT
    Copyright 2001 Eric Bohlman. All rights reserved.

    This program is free software; you can use/modify/redistribute it under
    the same terms as Perl itself.

SEE ALSO
      XML::TokeParser
      XML::RAX
      XML::Twig
      XML::Parser::PerlSAX
      perl(1).