WWW::Curl::UserAgent - UserAgent based on libcurl
version 0.9.7
use HTTP::Request; use WWW::Curl::UserAgent; my $ua = WWW::Curl::UserAgent->new( timeout => 10000, connect_timeout => 1000, ); $ua->add_request( request => HTTP::Request->new( GET => 'http://search.cpan.org/' ), on_success => sub { my ( $request, $response ) = @_; if ($response->is_success) { print $response->content; } else { die $response->status_line; } }, on_failure => sub { my ( $request, $error_msg, $error_desc ) = @_; die "$error_msg: $error_desc"; }, ); $ua->perform;
WWW::Curl::UserAgent is a web user agent based on libcurl. It can be used easily with HTTP::Request and HTTP::Response objects and handler callbacks. For an easier interface there is also a method to map a single request to a response.
WWW::Curl::UserAgent
HTTP::Request
HTTP::Response
WWW::Curl is used for the power of libcurl, which e.g. handles connection keep-alive, parallel requests, asynchronous callbacks and much more. This package was written, because WWW::Curl::Simple does not handle keep-alive correctly and also does not consider PUT, HEAD and other request methods like DELETE.
WWW::Curl
WWW::Curl::Simple
There is a simpler interface too, which just returns a HTTP::Response for a given HTTP::Request, named request(). The normal approach to use this library is to add as many requests with callbacks as your code allows to do and run perform afterwards. Then the callbacks will be executed sequentially when the responses arrive beginning with the first received response. The simple method request() does not support this of course, because there are no callbacks defined.
perform
This library is in production use on https://www.xing.com.
The following constructor methods are available:
This method constructs a new WWW::Curl::UserAgent object and returns it. Key/value pair arguments may be provided to set up the initial state. The default values should be based on the default values of libcurl. The following options correspond to attribute methods described below:
KEY DEFAULT ----------- -------------------- user_agent_string www.curl.useragent/$VERSION connect_timeout 300 timeout 0 parallel_requests 5 keep_alive 1 followlocation 0 max_redirects -1
Get/set the timeout in milliseconds waiting for the response to be received. If the response is not received within the timeout the on_failure handler is called.
Get/set the number of the maximum of requests performed in parallel. libcurl itself may use less requests than this number but not more.
Get/set if TCP connections should be reused with keep-alive. Therefor the TCP connection is forced to be closed after receiving the response and the corresponding header "Connection: close" is set. If keep-alive is enabled (default) libcurl will handle the connections.
Get/set if curl should follow redirects. The headers of the redirect respones are thrown away while redirecting, so that the final response will be passed into the corresponding handler.
Get/set the maximum amount of redirects. -1 (default) means infinite redirects. 0 means no redirects at all. If the maximum redirect is reached the on_failure handler will be called.
Get/set the user agent submitted in each request.
Get the size of the not performed requests.
Perform immediately a single HTTP::Request. Parameters can be submitted optionally, which will override the user agents settings for this single request. Possible options are:
connect_timeout timeout keep_alive followlocation max_redirects
Some examples for a request
my $request = HTTP::Request->new( GET => 'http://search.cpan.org/'); $response = $ua->request($request); $response = $ua->request($request, timeout => 3000, keep_alive => 0, );
If there is an error e.g. like a timeout the corresponding HTTP::Response object will have the statuscode 500, the short error description as message and a longer message description as content. It runs perform() internally, so queued requests will be performed, too.
Adds a request with some callback handler on receiving messages. The on_success callback will be called for every successful read response, even those containing error codes. The on_failure handler will be called when libcurl reports errors, e.g. timeouts or bad curl settings. The parameters request, on_success and on_failure are mandatory. Optional are timeout, connect_timeout, keep_alive, followlocation and max_redirects.
request
on_success
on_failure
timeout
connect_timeout
keep_alive
followlocation
max_redirects
$ua->add_request( request => HTTP::Request->new( GET => 'http://search.cpan.org/'), on_success => sub { my ( $request, $response, $easy ) = @_; print $request->as_string; print $response->as_string; }, on_failure => sub { my ( $request, $err_msg, $err_desc, $easy ) = @_; # error handling } );
The callbacks provide as last parameter a WWW:Curl::Easy object which was used to perform the request. This can be used to obtain some informations like statistical data about the request.
WWW:Curl::Easy
Chaining of add_request calls is a feature of this module. If you add a request within an on_success handler it will be immediately executed when the callback is executed. This can be useful to immediately react on a response:
add_request
$ua->add_request( request => HTTP::Request->new( POST => 'http://search.cpan.org/', [], $form ), on_failure => sub { die }, on_success => sub { my ( $request, $response ) = @_; my $target_url = get_target_from($response); $ua->add_request( request => HTTP::Request->new( GET => $target_url ), on_failure => sub { die }, on_success => sub { my ( $request, $response ) = @_; # actually do sth. } ); }, ); $ua->perform; # executes both requests
To have more control over the handler you can add a WWW::Curl::UserAgent::Handler by yourself. The WWW::Curl::UserAgent::Request inside of the handler needs all parameters provided to libcurl as mandatory to prevent defining duplicates of default values. Within the WWW::Curl::UserAgent::Request is the possiblity to modify the WWW::Curl::Easy object before it gets performed.
WWW::Curl::UserAgent::Handler
WWW::Curl::UserAgent::Request
WWW::Curl::Easy
my $handler = WWW::Curl::UserAgent::Handler->new( on_success => sub { my ( $request, $response, $easy ) = @_; print $request->as_string; print $response->as_string; }, on_failure => sub { my ( $request, $err_msg, $err_desc, $easy ) = @_; # error handling } request => WWW::Curl::UserAgent::Request->new( http_request => HTTP::Request->new( GET => 'http://search.cpan.org/'), connect_timeout => $ua->connect_timeout, timeout => $ua->timeout, keep_alive => $ua->keep_alive, followlocation => $ua->followlocation, max_redirects => $ua->max_redirects, ), ); $handler->request->curl_easy->setopt( ... ); $ua->add_handler($handler);
Perform all queued requests. This method will return after all responses have been received and handler have been processed.
A test with the tools/benchmark.pl script against loadbalanced webserver performing a get requests to a simple echo API on an Intel i5 M 520 with Fedora 19 gave the following results:
500 requests (sequentially, 500 iterations): +-------------------------------+-----------+------+------+------------+------------+ | User Agent | Wallclock | CPU | CPU | Requests | Iterations | | | seconds | usr | sys | per second | per second | +-------------------------------+-----------+------+------+------------+------------+ | LWP::UserAgent 6.05 | 21 | 1.10 | 0.20 | 23.8 | 384.6 | +-------------------------------+-----------+------+------+------------+------------+ | LWP::Parallel::UserAgent 2.61 | 20 | 1.13 | 0.22 | 25.0 | 370.4 | +-------------------------------+-----------+------+------+------------+------------+ | WWW::Curl::Simple 0.100191 | 95 | 0.66 | 0.27 | 5.3 | 537.6 | +-------------------------------+-----------+------+------+------------+------------+ | Mojo::UserAgent 4.83 | 10 | 1.19 | 0.08 | 50.0 | 393.7 | +-------------------------------+-----------+------+------+------------+------------+ | WWW::Curl::UserAgent 0.9.6 | 10 | 0.55 | 0.06 | 50.0 | 819.7 | +-------------------------------+-----------+------+------+------------+------------+ 500 requests (5 in parallel, 100 iterations): +-------------------------------+-----------+--------+--------+------------+------------+ | User Agent | Wallclock | CPU | CPU | Requests | Iterations | | | seconds | usr | sys | per second | per second | +-------------------------------+-----------+--------+--------+------------+------------+ | LWP::Parallel::UserAgent 2.61 | 10 | 1.26 | 0.26 | 50.0 | 65.8 | +-------------------------------+-----------+--------+--------+------------+------------+ | WWW::Curl::Simple 0.100191 | 815 | 270.16 | 191.76 | 0.6 | 0.2 | +-------------------------------+-----------+--------+--------+------------+------------+ | Mojo::UserAgent 4.83 | 3 | 1.03 | 0.04 | 166.7 | 93.5 | +-------------------------------+-----------+--------+--------+------------+------------+ | WWW::Curl::UserAgent 0.9.6 | 3 | 0.42 | 0.06 | 166.7 | 208.3 | +-------------------------------+-----------+--------+--------+------------+------------+
See HTTP::Request and HTTP::Response for a description of the message objects dispatched and received. See HTTP::Request::Common and HTML::Form for other ways to build request objects.
See WWW::Curl for a description of the settings and options possible on libcurl.
Julian Knocke
Othello Maurer
This software is copyright (c) 2014 by XING AG.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install WWW::Curl::UserAgent, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Curl::UserAgent
CPAN shell
perl -MCPAN -e shell install WWW::Curl::UserAgent
For more information on module installation, please visit the detailed CPAN module installation guide.