The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::Traverse::ParseTree - iterators and getters for xml-access

SYNOPSIS

    my $xml = XML::Parser->new(Style => "Tree")->parse($xmlcont);
    my $h   = XML::Traverse::ParseTree->new();

    my $a1  = $h->get($xml,'document','section','entries');
    my $i   = $h->child_iterator($a1);
    while (my $e = $i->()) {
        ...
        $attr = $h->get($e,'another-child-element','@attribute-name');
        $text = $h->get($e,'#TEXT');
    }
    ...
    my $filter = sub { ... }
    my $i   = $h->child_iterator($xml,$filter);
    while (my $e = $i->()) {
        ...
    }
    ...
    my $i = $h->get($xml,'section[*]','sections[*]','#TEXT');
    my $i = $h->get($xml,'//sections');
    my $i = $h->get($xml,'section[2],'sections[3]','*');

DESCRIPTION

XML::Traverse::ParseTree supplies iterators and getters for accessing the contents of a xml content. The xml content must be already parsed using XML::Parser (tree-style)

METHODS

new()

Creates an instance of XML::Traverse::ParseTree. Currently, this instance does not have an intrinsic state. Although it could be used in a static way, this is not recommended. (Possible extention: support for different character encodings)

get_element_name($current)

Returns the element name of the current element.

get_element_attrs($current)

Return all attributes of the current element.

get_element_text($current)

Returns the text of the current element.

get($parse_tree,access_path [,access_path ...])

General purpose access method. Depending on the access path elements, it returns an iterator ("iterator-context") or an scalar value. Returned value may be an element (position in the parse tree), an attribute value, all attributes of an element or the contents of text node.

Access path may consist of one or more entries. Each entry specifies a hierarchy level. The last one specifies if a attribute value is requested (prefix @) or the text (special value of #TEXT) or an element (position in the parse tree). Examples:

    $h->get($current,'@id') - returns the value of the attribute "id" of the current element
    $h->get($current,'a-child') - returns the first child element named "a-child"
    $h->get($current,'#TEXT') - returns the text node of the current element
    $h->get($current,'section[2]') - returns the second section child element
    $h->get($current,'section[*]') - returns an iterator over all child elements named section
    $h->get($current,'//section')  - returns an iterator over all child elements named section on all hierarchy levels AT and BELOW $current

More than one entry in the access path means more hierarchy levels, e.g.:

    $h->get($current,'document','sections','section','@id')

Returns the value of the attribute "id" of the element "section" which is a child element of an element "sections", which in turn is a child element of an element named "document", the "document" element is a child of the current element. (xpath-style: document/sections/section/@id)

    $h->get($current,'document','#TEXT')

Returns the text of the element document (and all of its children), which is a child element of current.

    <current><document>abc<sub>child</sub>def</document></current>

Then only "abcchilddef" will be returned.

More (advanced) examples:

    $h->get($current,'sub1[*]','sub2[*]')

Returns an iterator over all sub2 elements which are child elements of all sub1 elements, which in turn are child elements of $current.

    $h->get($current,'sub1[*]','sub2[2]')

Returns an iterator over sub2 Elements. Only those sub2 elements will be returned which are on second position relative to their respective parents. Example:

    <xml>
        <sub1>
            <sub2 id="1"/>
            <sub2 id="2"/>
            <sub2 id="3"/>
        </sub1>
        <sub1>
            <sub2 id="4"/>
            <sub2 id="5"/>
        </sub1>
        <sub1>
            <sub2 id="6"/>
        </sub1>
    </xml>

With the above mentioned get:

    $h->get($current,'sub1[*]','sub2[2]')

an iterator is returned, it delivers elements with the following ids: 2 and 5.

    $h->get($current,'sub1[*]','#TEXT')

returns an iterator which delivers text content of all sub1 elements. Caution: Does a sub1 element has no text at all, undef is returned. This undef cannot be distinguished from undef used to terminate the iteration.

    $h->get($current,'@*')

returns a hashref containing all attributes of the current element (no iterator!)

    $h->get($current,'sub1[3]','//sub2')

returns an iterator over all sub2 elements on all hierarchy levels below the third sub1 element.

child_iterator($current,[$name|$coderef])

returns an iterator over child elements (one hierarchy level below $current). When neither $name or $coderef is given, all child elements will be iterated.

If a name (scalar) is given, only child elements with that name will be iterated.

If a codereff is given, only those child elements will be iterated, for which the given function evaluates to true. The respective element is passed as parameter. Example:

    my $filter = sub {
        $pkg->get($_[0],'@class') eq "heading" ||
        defined $pkg->get($_[0],'@style')
    };
    $i = $pkg->child_iterator($current,$filter);
dfs_iterator($current,[$name|$filter])

returns an iterator over the current element and child elements on all hierarchy levels. The order is depth-first (exactly: current, then childs). Regarding the meaning of $name and $filter see child_iterator above.

element_to_object($current)

Creates a hashref with the contens of the current element (experimental)

getter(access_path [,access_path ...])

Returns a curried version of get(), this is usefull in cases where the same access path is used in different places. Example:

    *get_id = $pkg->getter('@id');

    $i = $pkg->child_iterator($xml);
    while(my $e = $i->()) {
        if (get_id($e) eq '45')) {
            ...
        }
    ...

BUGS

None known.

SEE ALSO

  Concerning the concepts of iterators using closures/anonymous subs: 
  http://hop.perl.plover.com/

AUTHOR

  Martin Busik <martin.busik@busik.de>