MATSUNO★Tokuhiro > Web-Query-0.15 > Web::Query

Download:
Web-Query-0.15.tar.gz

Dependencies

Annotate this POD

Website

View/Report Bugs
Module Version: 0.15   Source   Latest Release: Web-Query-0.26

NAME ^

Web::Query - Yet another scraping library like jQuery

SYNOPSIS ^

    use Web::Query;

    wq('http://google.com/search?q=foobar')
          ->find('h2')
          ->each(sub {
                my $i = shift;
                printf("%d) %s\n", $i+1, $_->text
          });

DESCRIPTION ^

Web::Query is a yet another scraping framework, have a jQuery like interface.

Yes, I know ingy's pQuery. But it's just a alpha quality. It doesn't works. Web::Query built at top of the CPAN modules, HTML::TreeBuilder::XPath, LWP::UserAgent, and HTML::Selector::XPath.

So, this module uses HTML::Selector::XPath and only supports the CSS 3 selector supported by that module. Web::Query doesn't support jQuery's extended queries(yet?).

THIS LIBRARY IS UNDER DEVELOPMENT. ANY API MAY CHANGE WITHOUT NOTICE.

FUNCTIONS ^

wq($stuff)

This is a shortcut for Web::Query->new($stuff). This function is exported by default.

METHODS ^

my $q = Web::Query->new($stuff, \%options )

Create new instance of Web::Query. You can make the instance from URL(http, https, file scheme), HTML in string, URL in string, URI object, and instance of HTML::Element.

This method throw the exception on unknown $stuff.

This method returns undefined value on non-successful response with URL.

Currently, the only option valid option is indent, which will be used as the indentation string if the object is printed.

my $q = Web::Query->new_from_element($element: HTML::Element)

Create new instance of Web::Query from instance of HTML::Element.

my $q = Web::Query->new_from_html($html: Str)

Create new instance of Web::Query from HTML.

my $q = Web::Query->new_from_url($url: Str)

Create new instance of Web::Query from URL.

If the response is not success(It means /^20[0-9]$/), this method returns undefined value.

You can get a last result of response, use the $Web::Query::RESPONSE.

Here is a best practical code:

    my $url = 'http://example.com/';
    my $q = Web::Query->new_from_url($url)
        or die "Cannot get a resource from $url: " . Web::Query->last_response()->status_line;
my $q = Web::Query->new_from_file($file_name: Str)

Create new instance of Web::Query from file name.

my @html = $q->html();
my $html = $q->html();
$q->html('<p>foo</p>');

Get/Set the innerHTML.

$q->as_html();

Return the elements associated with the object as strings. If called in a scalar context, only return the string representation of the first element.

my @text = $q->text();
my $text = $q->text();
$q->text('text');

Get/Set the inner text.

my $attr = $q->attr($name);
$q->attr($name, $val);

Get/Set the attribute value in element.

$q = $q->find($selector)

This method find nodes by $selector from $q. $selector is a CSS3 selector.

$q->each(sub { my ($i, $elem) = @_; ... })

Visit each nodes. $i is a counter value, 0 origin. $elem is iteration item. $_ is localized by $elem.

$q->map(sub { my ($i, $elem) = @_; ... })

Creates a new array with the results of calling a provided function on every element.

$q->filter(sub { my ($i, $elem) = @_; ... })

Reduce the elements to those that pass the function's test.

$q->end()

Back to the before context like jQuery.

my $size = $q->size() : Int

Return the number of DOM elements matched by the Web::Query object.

my $parent = $q->parent() : Web::Query

Return the parent node from $q.

my $first = $q->first()

Return the first matching element.

This method constructs a new Web::Query object from the first matching element.

my $last = $q->last()

Return the last matching element.

This method constructs a new Web::Query object from the last matching element.

$q->remove()

Delete the elements associated with the object from the DOM.

    # remove all <blink> tags from the document
    $q->find('blink')->remove;
$q->replace_with( $replacement );

Replace the elements of the object with the provided replacement. The replacement can be a string, a Web::Query object or an anonymous function. The anonymous function is passed the index of the current node and the node itself (with is also localized as $_).

    my $q = wq( '<p><b>Abra</b><i>cada</i><u>bra</u></p>' );

    $q->find('b')->replace_with('<a>Ocus</a>);
        # <p><a>Ocus</a><i>cada</i><u>bra</u></p>

    $q->find('u')->replace_with($q->find('b'));
        # <p><i>cada</i><b>Abra</b></p>

    $q->find('i')->replace_with(sub{ 
        my $name = $_->text;
        return "<$name></$name>";
    });
        # <p><b>Abra</b><cada></cada><u>bra</u></p>

HOW DO I CUSTOMIZE USER AGENT? ^

You can specify your own instance of LWP::UserAgent.

    $Web::Query::UserAgent = LWP::UserAgent->new( agent => 'Mozilla/5.0' );

INCOMPATIBLE CHANGES ^

0.10

new_from_url() is no longer throws exception on bad response from HTTP server.

AUTHOR ^

Tokuhiro Matsuno <tokuhirom AAJKLFJEF@ GMAIL COM>

SEE ALSO ^

pQuery

LICENSE ^

Copyright (C) Tokuhiro Matsuno

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: