View on
Richard Hundt > YAX > YAX::Query



Annotate this POD


New  1
Open  0
View/Report Bugs


YAX::Query - Query the YAX DOM


 use YAX::Query;

 $q = YAX::Query->new( $node );
 $q->select( $expr );

 # method interface
 $q->children( $type );
 $q->child( $tag_name );
 $q->attribute( $name );
 $q->filter( \&code );


This module implements a tool for querying a YAX DOM tree. It supports an expression parser for simple querying of the DOM using an E4X-ish syntax, as well as a method interface.

It is useful to note that a YAX::Query object is a blessed array reference and that the resulting nodes matching the query are stored in this array reference. Therefore all query methods return the query object itself, and to access the results you simply inspect this object. For example, the following searches for all text nodes which are children of `em' elements, which in turn are children of all `div' descendants:

 my $q = YAX::Query->new( $node );
 for my $found ( @$q ) {
     # $found is a YAX::Text node

The select method returns the query object itself, so the following, which selects all `li' descendants which have an `foo' attribute equal to "bar", also works:

 for my $item ( @{ $q->select(q{ eq "bar")}) } ) {


A query expression is constructed of a sequence of tokens separated by a literal `.' (dot). Each successive token represents an operation on the resulting set of the application of the previous token's operation.

In the initial state, the set of nodes contains only the context node passed to the constructor: YAX::Query-new( $node )>.

Filters are enclosed in `(' and `)', and generally contain Perl expressions with the exception that tokens of the form /\@(\w+)/ are replaced with $_->{$1} where `$_' is the current node in the loop which is applying the filter.

The following is a list of valid tokens:


descendants of


all element children of


all elements named element_name


all attributes of

NOTE: This adds the hash reference of the element itself, and not a list of attribute values. Moreover, adding a node selector after this in sequence is meaningless since attributes cannot have children. An exception will be raised if this occurs.


all attributes named attribute_name

NOTE: This adds a list of attribute values to the set. As above, node selectors following this are meaningless, and will raise and exception.


parent nodes of the set


all text children


all processing instruction children


all CDATA children


all child nodes of


all comment children of

'.( $expr )'

Apply the filter $expr by turning it into a Perl code reference. Expressions are Perl with the exception that tokens of the form /\@(\w+)/ are replaced with $_->{$1} where `$_' is the current node in the loop which is applying the filter.


the n-th element of the set


new( $node )


select( $expr )

Evaluates $expr and returns the query object itself. The results are simply the elements in the query object which is a blessed array reference. This allows for chaining and piecemeal querying. The follow shows some different ways of achieving the same thing:

 my $q = YAX::Query->new( $node );
 $q->select('..div.*');         # get all children of all `div' descendants
 $q->filter( \&filter );        # filter the set obtained on the live above
 $q->select('..div.*')->filter( \&filter ); # same as the two lines above
 # or the equivalent
 @ids = grep { filter( $_ ) } @{ $q->select('..div.*') };

See `.parent()' above

children( $type )

Selects child nodes of type $type (see YAX::Constants for valid types). The `#text', `#cdata', `#processing-instruction' and `#comment' selectors are implemented with children(...).

child( $name )

Selects elements named $name.

attribute( $name )

Selects attribute values named $name.


Selects the attributes hash for each element in the set.


Selects descendants for each element in the set.


Applies the passed code reference to each element in the set, adding the element to the resulting set iff the code reference returns a true value.


Syntax errors in the expressions are currently not handled very well. If the expression doesn't parse, an exception is raised, but because of the simplicity of the lexer, the information required to inform the user of exactly what went wrong is unavailable.

Changing this requires a more complex parser which will significantly impact performance, and so I'm reluctant to implement this since query expressions tend to be short enough for debugging by inspection.

Result sets from a query are not "live". That is, if a node is removed from or added to the DOM tree after the query is performed, these changes will not be reflected in the query result set.


t/03-query.t in the test suite for an extensive list of examples


 Richard Hundt


This program is free software and may be used and distributed under the same terms as Perl itself.

syntax highlighting: