Barrie Slaymaker > XML-Filter-Essex-0.01 > XML::Handler::Essex

Download:
XML-Filter-Essex-0.01.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  1
Open  0
View/Report Bugs
Module Version: 0.0001   Source  

NAME ^

XML::Handler::Essex - Essex handler object (including XML::Filter::Essex)

SYNOPSIS ^

    use XML::Handler::Essex;

    my $h = XML::Handler::Essex->new(
        Main => sub {
            while ( get_chars ) {
                put uc;
            }
        }
    );

DESCRIPTION ^

Defines (and exports, by default) get() and get_...() routines that allow an Essex handler and filter to pull events from the SAX stream.

Pulling is handled in one of two ways: the entire input document is buffered if a perl earlier than 5.8.0 is used, due to lack of multithreading, and threading is used in perls later than 5.8.0.

Note that the event constructor functions (start_doc(), end_doc(), etc) are not exported by this module as they are from XML::Generator::Essex and XML::Filter::Essex; handlers rarely need these.

Returns a "1" by default, use result_value to change.

Exported Functions ^

These are exported by default, use the use XML::Essex (); syntax to suppress exporting these. All of these act on $_ by default.

Miscellaneous

isa
    get until isa "start_elt" and $_->name eq "foo";
    $r = get until isa $r, "start_elt" and $_->name eq "foo";

Returns true if the parameter is of the indicated object type. Tests $_ unless more than one parameter is passed.

Note the use of and instead of && to get paren-less isa() to behave as expected (this is a typical Perl idiom).

path
   get_start_elt until path eq "/path/to/foo:bar"

Returns the path to the current element as a string.

type
    get until type eq "start_document";
    $r = get until type $r eq "start_document";

Return the type name of the object. This is the class name with a leading XML::Essex:: stripped off. This is a wrapper around the event's type() method.

Dies undef if the parameter is not an object with a type method.

xeof

Return TRUE if the last event read was an end_document event.

get

Gets an event or element from the incoming SAX input stream, puts it in $_ and returns it. Throws an exception when reading past the last event in a document. This exception is caught by XML::Essex and causes it to wait until the beginning of the next document and reenter the main routine.

    Code                     Action
    =======================  =======================================
    get;                     Get the next SAX event, whatever it is.
    get "node()";            Get the next SAX event, whatever it is.
    get "*";                 Get the next element, whatever its name.
    get "start-document::*"; Get the next start document event.
    get "end-document::*";   Get the next end document event.
    get "start-element::*";  Get the next start element event.
    get "end-element::*";    Get the next end element event.
    get "text()";            Get the next characters event.

Right now, only the expressions shown are supported. This is a limitation that will be lifted. There may be multiple characters events in a row, unlike xpath's text() matching expression.

See isa() and type() functions and method (in XML::Essex::Object) for how to test what was just gotten.

skip

Skips one event. This is what happens to events that are not returned from get(). For a handler, skip() does nothing (the event is ignored). For a Filter, the event is passed on the the handler.

next_event

Returns the event that the next call to get() will return. Dies if at xeof. Does not set $_.

NOTE: NOT YET IMPLEMENTED IN THREADED MODE.

on
    on(
        "start_document::*" => sub { warn "start of document reached" },
        "end_document::*"   => sub { warn "end of document reached"   },
    );

This declares that a rule should be in effect until the end of the document is reached. Each rule is a ( $pattern => $action ) pair where $pattern is an EventPath pattern (see XML::Filter::Dispatcher) and $action is a subroutine reference.

The Essex event object matched is passed in $_[1]. A reference to the current Essex handler is passed in $_[0]. This allows you to write libraries of functions that access the current Essex handler/filter/whatever.

Do not call get() in the actions, you'll confuse everything. That's a limitation that should be lifted one day.

For now, this must be called before the first get() for predictable results.

Rules remain in effect after the main() routine has exited to facilitate pure rule based processing.

LIMITATIONS ^

COPYRIGHT ^

    Copyright 2002, R. Barrie Slaymaker, Jr., All Rights Reserved

LICENSE ^

You may use this module under the terms of the BSD, Artistic, oir GPL licenses, any version.

AUTHOR ^

Barrie Slaymaker <barries@slaysys.com>

syntax highlighting: