Mike South > ParallelUserAgent > LWP::Parallel::UserAgent

Download:
ParallelUserAgent-2.61.tgz

Dependencies

Annotate this POD

CPAN RT

New  10
Open  1
View/Report Bugs
Source  

NAME ^

LWP::Parallel::UserAgent - A class for parallel User Agents

SYNOPSIS ^

  require LWP::Parallel::UserAgent;
  $ua = LWP::Parallel::UserAgent->new();
  ...

  $ua->redirect (0); # prevents automatic following of redirects
  $ua->max_hosts(5); # sets maximum number of locations accessed in parallel
  $ua->max_req  (5); # sets maximum number of parallel requests per host
  ...
  $ua->register ($request); # or
  $ua->register ($request, '/tmp/sss'); # or
  $ua->register ($request, \&callback, 4096);
  ...
  $ua->wait ( $timeout ); 
  ...
  sub callback { my($data, $response, $protocol) = @_; .... }

DESCRIPTION ^

This class implements a user agent that access web sources in parallel.

Using a LWP::Parallel::UserAgent as your user agent, you typically start by registering your requests, along with how you want the Agent to process the incoming results (see $ua->register).

Then you wait for the results by calling $ua->wait. This method only returns, if all requests have returned an answer, or the Agent timed out. Also, individual callback functions might indicate that the Agent should stop waiting for requests and return. (see $ua->register)

See the file LWP::Parallel for a set of simple examples.

METHODS ^

The LWP::Parallel::UserAgent is a sub-class of LWP::UserAgent, but not all of its methods are available here. However, you can use its main methods, $ua->simple_request and $ua->request, in order to simulate singular access with this package. Of course, if a single request is all you need, then you should probably use LWP::UserAgent in the first place, since it will be faster than our emulation here.

For parallel access, you will need to use the new methods that come with LWP::Parallel::UserAgent, called $pua->register and $pua->wait. See below for more information on each method.

$ua = LWP::Parallel::UserAgent->new();

Constructor for the parallel UserAgent. Returns a reference to a LWP::Parallel::UserAgent object.

Optionally, you can give it an existing LWP::Parallel::UserAgent (or even an LWP::UserAgent) as a first argument, and it will "clone" a new one from this (This just copies the behavior of LWP::UserAgent. I have never actually tried this, so let me know if this does not do what you want).

$ua->initialize;

Takes no arguments and initializes the UserAgent. It is automatically called in LWP::Parallel::UserAgent::new, so usually there is no need to call this explicitly.

However, if you want to re-use the same UserAgent object for a number of "runs", you should call $ua->initialize after you have processed the results of the previous call to $ua->wait, but before registering any new requests.

$ua->redirect ( $ok )

Changes the default value for permitting Parallel::UserAgent to follow redirects and authentication-requests. The standard value is 'true'.

See $ua-register> for how to change the behaviour for particular requests only.

$ua->nonblock ( $ok )

Per default, LWP::Parallel will connect to a site using a blocking call. If you want to speed this step up, you can try the new non-blocking version of the connect call by setting $ua->nonblock to 'true'. The standard value is 'false' (although this might change in the future if nonblocking connects turn out to be stable enough.)

$ua->duplicates ( $ok )

Changes the default value for permitting Parallel::UserAgent to ignore duplicate requests. The standard value is 'false'.

$ua->in_order ( $ok )

Changes the default value to restricting Parallel::UserAgent to connect to the registered sites in the order they were registered. The default value FALSE allows Parallel::UserAgent to make the connections in an apparently random order.

$ua->remember_failures ( $yes )

If set to one, enables ParalleUA to ignore requests or connections to sites that it failed to connect to before during this "run". If set to zero (the dafault) Parallel::UserAgent will try to connect to every single URL you registered, even if it constantly fails to connect to a particular site.

$ua->max_hosts ( $max )

Changes the maximum number of locations accessed in parallel. The default value is 7.

Note: Although it says 'host', it really means 'netloc/server'! That is, multiple server on the same host (i.e. one server running on port 80, the other one on port 6060) will count as two 'hosts'.

$ua->max_req ( $max )

Changes the maximum number of requests issued per host in parallel. The default value is 5.

$ua->register ( $request [, $arg [, $size [, $redirect_ok]]] )

Registers the given request with the User Agent. In case of an error, a HTTP::Request object containing the HTML-Error message is returned. Otherwise (that is, in case of a success) it will return undef.

The $request should be a reference to a HTTP::Request object with values defined for at least the method() and url() attributes.

$size specifies the number of bytes Parallel::UserAgent should try to read each time some new data arrives. Setting it to '0' or 'undef' will make Parallel::UserAgent use the default. (8k)

Specifying $redirect_ok will alter the redirection behaviour for this particular request only. '1' or any other true value will force Parallel::UserAgent to follow redirects, even if the default is set to 'no_redirect'. (see $ua-redirect>) '0' or any other false value should do the reverse. See LWP::UserAgent for using an object's requests_redirectable list for fine-tuning this behavior.

If $arg is a scalar it is taken as a filename where the content of the response is stored.

If $arg is a reference to a subroutine, then this routine is called as chunks of the content is received. An optional $size argument is taken as a hint for an appropriate chunk size. The callback function is called with 3 arguments: the data received this time, a reference to the response object and a reference to the protocol object. The callback can use the predefined constants C_ENDCON, C_LASTCON and C_ENDALL as a return value in order to influence pending and active connections. C_ENDCON will end this connection immediately, whereas C_LASTCON will inidicate that no further connections should be made. C_ENDALL will immediately end all requests and let the Parallel::UserAgent return from $pua->wait().

If $arg is omitted, then the content is stored in the response object itself.

If $arg is a LWP::Parallel::UserAgent::Entry object, then this request will be registered as a follow-up request to this particular entry. This will not create a new entry, but instead link the current response (i.e. the reason for re-registering) as $response->previous to the new response of this request. All other fields are either re-initialized ($request, $fullpath, $proxy) or left untouched ($arg, $size). (This should only be use internally)

LWP::Parallel::UserAgent->request also allows the registration of follow-up requests to existing requests, that required redirection or authentication. In order to do this, an Parallel::UserAgent::Entry object will be passed as the second argument to the call. Usually, this should not be used directly, but left to the internal $ua->handle_response method!

$ua->on_connect ( $request, $response, $entry )

This method should be overridden in an (otherwise empty) subclass in order to present customized messages for each connection attempted by the User Agent.

$ua->on_failure ( $request, $response, $entry )

This method should be overridden in an (otherwise empty) subclass in order to present customized messages for each connection or registration that failed.

$ua->on_return ( $request, $response, $entry )

This method should be overridden in an (otherwise empty) subclass in order to present customized messages for each request returned. If a callback function was registered with this request, this callback function is called before $pua->on_return.

Please note that while $pua->on_return is a method (which should be overridden in a subclass), a callback function is NOT a method, and does not have $self as its first parameter. (See more on callbacks below)

The purpose of $pua->on_return is mainly to provide messages when a request returns. However, you can also re-register follow-up requests in case you need them.

If you need specialized follow-up requests depending on the request that just returend, use a callback function instead (which can be different for each request registered). Otherwise you might end up writing a HUGE if..elsif..else.. branch in this global method.

$us->discard_entry ( $entry )

Completely removes an entry from memory, in case its output is not needed. Use this in callbacks such as on_return or <on_failure> if you want to make sure an entry that you do not need does not occupy valuable main memory.

$ua->wait ( $timeout )

Waits for available sockets to write to or read from. Will timeout after $timeout seconds. Will block if $timeout = 0 specified. If $timeout is omitted, it will use the Agent default timeout value.

$ua->handle_response($request, $arg [, $size])

Analyses results, handling redirects and security. This method may actually register several different, additional requests.

This method should not be called directly. Instead, indicate for each individual request registered with $ua-register()> whether or not you want Parallel::UserAgent to handle redirects and security, or specify a default value for all requests in Parallel::UserAgent by using $ua-redirect()>.

DEPRECATED $ua->deprecated_simple_request($request, [$arg [, $size]])

This method simulated the behavior of LWP::UserAgent->simple_request. It was actually kinda overkill to use this method in Parallel::UserAgent, and it was mainly here for testing backward compatibility with the original LWP::UserAgent.

The name has been changed to deprecated_simple_request in case you need it, but because it it no longer compatible with the most recent version of libwww, it will no longer run by default.

The following description is taken directly from the corresponding libwww pod:

$ua->simple_request dispatches a single WWW request on behalf of a user, and returns the response received. The $request should be a reference to a HTTP::Request object with values defined for at least the method() and url() attributes.

If $arg is a scalar it is taken as a filename where the content of the response is stored.

If $arg is a reference to a subroutine, then this routine is called as chunks of the content is received. An optional $size argument is taken as a hint for an appropriate chunk size.

If $arg is omitted, then the content is stored in the response object itself.

DEPRECATED $ua->deprecated_request($request, $arg [, $size])

Previously called 'request' and included for compatibility testing with LWP::UserAgent. Every day usage was deprecated, and now you have to call it with the deprecated_request name if you want to use it (because an incompatibility was introduced with the newer versions of libwww).

Here is what LWP::UserAgent has to say about it:

Process a request, including redirects and security. This method may actually send several different simple reqeusts.

The arguments are the same as for simple_request().

$ua->as_string

Returns a text that describe the state of the UA. Should be useful for debugging, if it would print out anything important. But it does not (at least not yet). Try using LWP::Debug...

ADDITIONAL METHODS ^

$ua->use_alarm([$boolean])

This function is not in use anymore and will display a warning when called and warnings are enabled.

Callback functions ^

You can register a callback function. See LWP::UserAgent for details.

BUGS ^

Probably lots! This was meant only as an interim release until this functionality is incorporated into LWPng, the next generation libwww module (though it has been this way for over 2 years now!)

Needs a lot more documentation on how callbacks work!

SEE ALSO ^

LWP::UserAgent

COPYRIGHT ^

Copyright 1997-2004 Marc Langheinrich <marclang@cpan.org>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: