The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

YAX::Parser - fast pure Perl tree and stream parser

SYNOPSIS

 use YAX::Parser;

 my $xml_str = <<XML
   <?xml version="1.0" ?>
   <doc>
     <content id="42"><![CDATA[
        This is a cdata section, so >>anything goes!<<
     ]]>
     </content>
     <!-- comments are nodes too -->
   </doc>
 XML

 # tree parse - the common case
 my $xml_doc = YAX::Parser->parse( $xml_str );
 my $xml_doc = YAX::Parser->parse_file( $path );

 # shallow parse
 my @tokens = YAX::Parser->tokenize( $xml_str );

 # stream parse 
 YAX::Parser->stream( $xml_str, $state, %handlers )
 YAX::Parser->stream_file( '/some/file.xml', $state, %handlers );
 

DESCRIPTION

This module implements a fast DOM and stream parser based on Robert D. Cameron's regular expression shallow parsing grammar and technique. It doesn't implement the full W3C DOM API by design. Instead, it takes a more pragmatic approach. DOM trees are constructed with everything being an object except for attributes, which are stored as a hash reference.

We also borrow some ideas from browser implementations, in particular, nodes are keyed in a table in the document on their id attributes (if present) so you can say:

 my $found = $xml_doc->get( $node_id );

Parsing is usually done by calling class methods on YAX::Parser, which, if invoked as a tree parser, returns an instance of YAX::Document

 my $xml_doc = YAX::Parser->parse( $xml_str );

METHODS

See the "SYNOPSIS" for, here's just the list for now:

parse( $xml_str )

Parse $xml_str and return a YAX::Document object.

parse_file( $path )

Same as above by read the file at $path for the input.

stream( $xml_str, $state, %handlers )

Although not its main focus, YAX::Parser also provides for stream parsing. It tries to be a bit more sane than Expat, in that it allows you to specify a state holder which can be anything and is passed as the first argument to the handler functions. A typical case is to use a hash reference with a stack (for tracking nesting):

 my $state = { stack => [ ] };

all handler functions are optional, but the full list is:

 my %handlers = (
     text => \&handle_text,          # called for text nodes
     elmt => \&handle_element_open,  # called for open tags
     elcl => \&handle_element_close, # called for tag close
     decl => \&handle_declaration,   # called for declarations
     proc => \&handle_proc_inst,     # called for processing instructions
     pass => \&handle_passthrough,   # called when no handlers match
 );

an element handler is passed the state, tag name and attributes hash:

 sub handle_element_open {
     my ( $state, $name, %attributes ) = @_;
     if ( $name eq 'a' and $attributes{href} ) {
         ... 
     }
 }

element close handlers take two arguments: state and tag name:

 sub handle_element_close {
     my ( $state, $name ) = @_;
     die "not well formed" unless pop @{ $state->{stack} } eq $name;
 }

all other handlers take the state and the entire matched token

 sub handle_proc_inst {
     my ( $state, $token ) = @_;
     $token =~ /^<\?(.*?)\?>$/;
     my $instr = $1;
     ...
 }
stream_file( $path, $state, %handlers )

Same as above by read the file at $path for the input.

tokenize( $xml_str )

Useful for quick and dirty tokenizing of $xml_str. Returns a list of tokens.

SEE ALSO

YAX::Document, YAX::Node

LICENSE

This program is free software and may be modified and distributed under the same terms as Perl itself.

AUTHOR

 Richard Hundt