The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

LWP::UserAgent::ProxyHopper::Base - base class for LWP::UserAgent based modules which want to proxy-hop their requests

SYNOPSIS

    package LWP::UserAgent::Prox;

    use base 'LWP::UserAgent';
    use base 'LWP::UserAgent::ProxyHopper::Base';

    package main;

    use strict;
    use warnings;

    my $ua = LWP::UserAgent::Prox->new( agent => 'fox', timeout => 2);

    $ua->proxify_load( debug => 1 );

    for ( 1..10 ) {
        my $response = $ua->proxify_get('http://www.privax.us/ip-test/');

        if ( $response->is_success ) {
            my $content = $response->content;
            if ( my ( $ip ) = $content
                =~ m|<p>.+?IP Address:\s*</strong>\s*(.+?)\s+|s
            ) {
                printf "\n\nSucces!!! \n%s\n", $ip;
            }
            else {
                printf "Response is successfull but seems like we got a wrong "
                        . " page... here is what we got:\n%s\n", $content;
            }
        }
        else {
            printf "\n[SCRIPT] Network error: %s\n", $response->status_line;
        }
    }

DESCRIPTION

The module is a base class for LWP::UserAgent based modules which want to proxy-hop their requests. In other words each request can be sent out from different proxy servers. Originally, this module was ment to be released as LWP::UserAgent::ProxyHopper but I figured it would be more useful as a base class.

WHAT'S IN IT?

By adding use base 'LWP::UserAgent::ProxyHopper::Base'; to your code it should be possible to enable extra functionality this base class provides without trouble. Your code should be a subclass of LWP::UserAgent or at least properly support the proxy() and one or more of LWP::UserAgent's request methods returning HTTP::Response objects.

HOW GOOD IS IT?

Don't get your hopes up too high... unless you can feed the module 100% working and fast proxies. Even though the module does some basic checks on whether the request succeeded and blacklists proxies that appear to be real bad there is still quite a good chance that either (a) your request will timeout after several tries or worse: (b) your request will succeed but will return not what you would expect it to as some proxies tend to drop garbage on you. Depending on settings your mileage will vary, it's speed for quality trade off.

HOW IT WORKS

The module fetches a list of proxy servers (see proxify_load() method) when one of proxify_*() request methods is called it will get a proxy from the list and try to make your request with the proxy in use. If request succeeds it will check for a couple of "this is not what you wanted" proxies and retry the request with a different proxy if that the case. If this check did not raise any suspicion the result (HTTP::Response object) will be returned back to you and proxy which was used will be put into a "working" list. If the request failed the module will do a basic check on the return status code and decide whether to blacklist proxy into a "bad" list or "real_bad" list after which it will retry. The number of times it will retry depends on retry setting to proxify_load() method.

When the original proxy list is exhausted the module will make a new list out of proxies which it previously listed as "working", if that fails the "bad" list which might have working proxies. The "real_bad" list will never be used. If both "working" and "bad" lists do not have any proxies left the module will call proxify_load() automatically with the same arguments you used it with the last time, therefore your program can live long with just one call to proxify_load() during startup.

PROVIDED METHODS

All public methods are prefixed with proxify_ all private methods are prefixed with _proxify_.

proxify_load

    $your_ua->proxify_load; # plain defaults

    $your_ua->proxify_load(  # juicy override
        freeproxylists  => 1,
        plan_b          => 1,
        proxy4free      => 0,
        timeout         => 20,
        debug           => 0,
        retries         => 5,
        extra_proxies   => [],
        schemes         => [ 'http', 'ftp' ],
        get_list_args   => {
            freeproxylists  => [ type => 'anonymous' ],
            proxy4free      => [ [2,3] ],
        },
    );

Instructs the object to load up a list of proxies. You must call this method at least once before calling any other proxify_* request methods. The return value is an arrayref of proxy addresses in a form "http://122.122.122.122:8080/". Will croak() if after trying to fetch proxy lists and after adding extra_proxies (see below) the proxy list is still empty. The method takes quite a bit of arguments, all of which are given in a key/value fashion. All of them are optional. Possible argumens are as follows:

freeproxylists

    $your_ua->proxify_load( freeproxylists => 1 );

Optional. The module uses WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom modules to get the proxy list. If you set freeproxylists argument to a false value the module will not attempt to load any proxies from http://freeproxylists.com/ website. Defaults to: 1

proxy4free

    $your_ua->proxify_load( proxy4free => 0 );

Optional. The module uses WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom modules to get the proxy list. If you set proxy4free argument to a false value (which is the default) the module will not attempt to load any proxies from http://www.proxy4free.com/ website. Defaults to: 0

plan_b

    $your_ua->proxify_load( plan_b => 1 );

Optional. When set to a true value will enable a "Plan B" mechanism. In other words, when plan_b and freeproxylists both set to true values and the fetch from http://freeproxylists.com/ did not give us any proxies the module will fetch a list from http://www.proxy4free.com/ website irrelevant of whether or not proxy4free is set to a true value. In other words, this is sort of a fallback thing in case http://freeproxylists.com is down when proxy4free is set to a false value to speedup proxy list loading process. Defaults to: 1 (enabled)

timeout

    $your_ua->proxify_load( timeout => 20 );

Optional. Takes a positive integer value which will be passed to WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom constructors as a timeout argument. In other words, this specifies the timeout for proxy list fetching. Defaults to: 20

retries

    $your_ua->proxify_load( retries => 5 );

Optional. This argument specifies how many times the module should retry the proxy_* requests if they doesn't look as successfull ones. Generally, setting the retries argument to a higher value will yield to more reliable requests but will also slow down the request process. See HOW IT WORKS section about to get the idea when the module will retry the request. Defaults to: 5.

extra_proxies

    $your_ua->proxify_load( extra_proxies => [] );

Optional. Takes an arrayref of proxy addresses in a format acceptable to LWP::UserAgent's proxy() method. These will be the extra proxies to use which you can provide. Basically you can set freeproxylists and plan_b arguments to false values and stuff your own proxies into extra_proxies arrayref in which case the module will not even attempt to fetch any lists from proxy list sites (i.e. the loading will be way faster). Defaults to: [] (no extra proxies)

schemes

    $your_ua->proxify_load( schemes => [ 'http', 'ftp' ] );

    $your_ua->proxify_load( schemes => 'ftp' );

Optional. Specifies the first argument to pass to LWP::UserAgent's proxy() method (i.e. the schemes to proxy for). Note: any other schemes besides 'http' were not tested and might not even work with the proxy lists the module fetches by default. Defaults to: http

get_list_args

    $your_ua->proxify_load(
        get_list_args   => {
            freeproxylists  => [ type => 'anonymous' ],
            proxy4free      => [ [1,2] ],
        },
    );

Optional. Here you have a chance to specify specific arguments to get_list() methods of WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom modules used under the hood. The get_list_args takes a hashref with two keys as a value. The keys must be freeproxylists and proxy4free values of which must be arrayrefs with arguments to give to get_list() methods of respecive modules.

debug

    $your_ua->proxify_load( debug => 0 );

Optional. When set to a true value will make the module carp() out some debugging info (including the time when proccessing of any proxify_* request methods). Defaults to: 0

proxify_get

    my $response = $your_ua->proxify_get('http://something.com/');

Must be called after a successfull call to proxify_load() method. The method is the same as LWP::UserAgent's get() method except proxify_get() will switch proxies before attempting the request.

proxify_post

    my $response = $your_ua->proxify_post('http://something.com/');

Must be called after a successfull call to proxify_load() method. The method is the same as LWP::UserAgent's post() method except proxify_post() will switch proxies before attempting the request. Note: during my tests a lot (almost all) proxies from http://www.freeproxylist.com/ did not permit POST requests. You might have better luck with setting proxy4free to a true value disabling freeproxylists argument and setting higher retries argumnet (see proxify_load() method above),

proxify_request

    my $response = $your_ua->proxify_request( $req_obj );

Must be called after a successfull call to proxify_load() method. The method is the same as LWP::UserAgent's request() method except proxify_request() will switch proxies before attempting the request.

proxify_head

    my $response = $your_ua->proxify_head('http://something.com/');

Must be called after a successfull call to proxify_load() method. The method is the same as LWP::UserAgent's head() method except proxify_head() will switch proxies before attempting the request.

proxify_mirror

    my $response = $your_ua->proxify_mirror(
        'http://something.com/file.tar.gz',
        'here.tar.gz',
    );

Must be called after a successfull call to proxify_load() method. The method is the same as LWP::UserAgent's mirror() method except proxify_mirror() will switch proxies before attempting the request. Note: use this method with caution as some proxies return an HTML document insted of actual content you requested.

proxify_simple_request

    my $response = $your_ua->proxify_simple_request('http://something.com/');

Must be called after a successfull call to proxify_load() method. The method is the same as LWP::UserAgent's simple_request() method except proxify_simple_request() will switch proxies before attempting the request.

proxify_list

    my $proxies_list_ref = $your_ua->proxify_list;

Must be called after a successfull call to proxify_load() method. Takes no arguments, returns an arrayref of proxies used internally for requests. This list will shrink as more requests are made (until it's depleted and reloaded see HOW IT WORKS section). Note: you can shift, push, etc. on this arrayref to dinamically set what proxies will be used. The proxy to be used on the next proxify_* request is the first element of this arrayref.

proxify_working_list

    my $proxies_working_list_ref = $your_ua->proxify_working_list;

Must be called after a successfull call to proxify_load() method. Takes no arguments, returns an arrayref of proxies listed as "working". See HOW IT WORKS section above for details. Note: you can shift, push, etc. on this arrayref to dinamically change it.

proxify_bad_list

    my $proxies_bad_list_ref = $your_ua->proxify_bad_list;

Must be called after a successfull call to proxify_load() method. Takes no arguments, returns an arrayref of proxies listed as "bad". See HOW IT WORKS section above for details. Note: you can shift, push, etc. on this arrayref to dinamically change it.

proxify_real_bad_list

    my $proxies_real_bad_list_ref = $your_ua->proxify_real_bad_list;

Must be called after a successfull call to proxify_load() method. Takes no arguments, returns an arrayref of proxies listed as "real bad". See HOW IT WORKS section above for details.

proxify_schemes

    my $used_schemes = $your_ua->proxify_schemes;

    $your_ua->proxify_schemes( [ 'http', 'ftp' ] );

Returns a currently used value for the proxify_load() method's schemes argument. If called with an optional argument will use it as a new value. See proxify_load() method above for details. Note: the value will be reset on the next proxify_load() call, which can happen automatically if proxy lists are exhausted. See HOW IT WORKS section for details.

proxify_retries

    my $used_retries = $your_ua->proxify_retries;

    $your_ua->proxify_retries( 10 );

Returns a currently used value for the proxify_load() method's retries argument. If called with an optional argument will use it as a new value. See proxify_load() method above for details. Note: the value will be reset on the next proxify_load() call, which can happen automatically if proxy lists are exhausted. See HOW IT WORKS section for details.

proxify_debug

    my $used_debug = $your_ua->proxify_debug;

    $your_ua->proxify_debug( 1 );

Returns a currently used value for the proxify_load() method's debug argument. If called with an optional argument will use it as a new value. See proxify_load() method above for details. Note: the value will be reset on the next proxify_load() call, which can happen automatically if proxy lists are exhausted. See HOW IT WORKS section for details.

proxify_current

    my $current_proxy = $your_ua->proxify_current;

Takes no arguments, returns a last proxy used in proxify_* request methods. Why is is called "current"? Because it changes several times during the calls to proxify_* request methods depending on the retries argument's setting ( in the proxify_load() method ).

AUTHOR

Zoffix Znet, <zoffix at cpan.org> (http://zoffix.com/, http://haslayout.net/, http://zofdesign.com/)

Thanks for reporting bugs and/or providing a patches goes to: lordnynex.

BUGS

Please report any bugs or feature requests to bug-lwp-useragent-proxyhopper-base at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=LWP-UserAgent-ProxyHopper-Base. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc LWP::UserAgent::ProxyHopper::Base

You can also look for information at:

COPYRIGHT & LICENSE

Copyright 2008 Zoffix Znet, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.