The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

POE::Component::WWW::DoctypeGrabber - non-blocking wrapper around WWW::DoctypeGrabber

SYNOPSIS

    use strict;
    use warnings;

    use POE qw(Component::WWW::DoctypeGrabber);

    my $poco = POE::Component::WWW::DoctypeGrabber->spawn;

    POE::Session->create(
        package_states => [ main => [qw(_start results)] ],
    );

    $poe_kernel->run;

    sub _start {
        $poco->grab( {
                page  => 'http://zoffix.com',
                event => 'results',
            }
        );
    }

    sub results {
        my $in_ref = $_[ARG0];

        if ( $in_ref->{error} ) {
            print "ERROR: $in_ref->{error}\n";
        }
        else {
            my $result = $in_ref->{result};

            print $result->{has_doctype}
                ? "$in_ref->{page} has $result->{doctype} doctype\n"
                : "$in_ref->{page} does not contain a doctype\n";

            print $result->{xml_prolog}
                ? "Contains XML prolog\n" : "Does not contain XML prolog\n";

            print "Doctype is preceeded by $result->{non_white_space} non-whitespace characters\n";
            print "\n\n\n";
        };

        $poco->shutdown;
    }

DESCRIPTION

The module is a non-blocking wrapper around WWW::DoctypeGrabber which provides means to grab the doctype from a given webpage along with some other related information.

CONSTRUCTOR

spawn

    my $poco = POE::Component::WWW::DoctypeGrabber->spawn;

    POE::Component::WWW::DoctypeGrabber->spawn(
        alias => 'grabber',
        obj_args => {
            raw => 1,
        },
        options => {
            debug => 1,
            trace => 1,
            # POE::Session arguments for the component
        },
        debug => 1, # output some debug info
    );

The spawn method returns a POE::Component::WWW::DoctypeGrabber object. It takes a few arguments, all of which are optional. The possible arguments are as follows:

alias

    ->spawn( alias => 'grabber' );

Optional. Specifies a POE Kernel alias for the component.

obj_args

    obj_args => {
        raw => 1,
    },

Optional. Takes a hashref as a value. This hashref will be directly dereferenced into WWW::DoctypeGrabber's constructor (new() method). See documentation for WWW::DoctypeGrabber for more information.

options

    ->spawn(
        options => {
            trace => 1,
            default => 1,
        },
    );

Optional. A hashref of POE Session options to pass to the component's session.

debug

    ->spawn(
        debug => 1
    );

When set to a true value turns on output of debug messages. Defaults to: 0.

METHODS

grab

    $poco->grab( {
            event       => 'event_for_output',
            page        => 'http://zoffix.com/',
            raw         => 1,
            _blah       => 'pooh!',
            session     => 'other',
        }
    );

Takes a hashref as an argument, does not return a sensible return value. See grab event's description for more information.

session_id

    my $poco_id = $poco->session_id;

Takes no arguments. Returns component's session ID.

shutdown

    $poco->shutdown;

Takes no arguments. Shuts down the component.

ACCEPTED EVENTS

grab

    $poe_kernel->post( grabber => grab => {
            event       => 'event_for_output',
            page        => 'http://zoffix.com',
            raw         => 1,
            _blah       => 'pooh!',
            session     => 'other',
        }
    );

Instructs the component to grab a doctype from a specified page. Takes a hashref as an argument, the possible keys/value of that hashref are as follows:

event

    { event => 'results_event', }

Mandatory. Specifies the name of the event to emit when results are ready. See OUTPUT section for more information.

page

    { page => 'http://zoffix.com/' }

Mandatory. Specifies the page of which to grab the doctype.

raw

    { raw => 1 },

Optional. If specified then WWW::DoctypeGrabber's raw() method will be called and the value you specified to the raw argument will be passed along as an argument to raw() method. Note that this will affect any future "grabs".

session

    { session => 'other' }

    { session => $other_session_reference }

    { session => $other_session_ID }

Optional. Takes either an alias, reference or an ID of an alternative session to send output to.

user defined

    {
        _user    => 'random',
        _another => 'more',
    }

Optional. Any keys starting with _ (underscore) will not affect the component and will be passed back in the result intact.

shutdown

    $poe_kernel->post( grabber => 'shutdown' );

Takes no arguments. Tells the component to shut itself down.

OUTPUT

    $VAR1 = {
        'page' => 'google.ca',
        'result' => {
            'xml_prolog' => 0,
            'doctype' => '',
            'non_white_space' => 0,
            'has_doctype' => 0,
            'mime' => 'text/html; charset=UTF-8'
        },
        '_blah' => 'foos'
    };

    $VAR1 = {
        'page' => 'zoffix.com',
        'raw' => 1,
        'result' => '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">',
        '_blah' => 'foos'
    };

The event handler set up to handle the event which you've specified in the event argument to grab() method/event will receive input in the $_[ARG0] in a form of a hashref. The possible keys/value of that hashref are as follows:

page

    { page => 'google.ca' }

The page key will contain the same thing you specified for page argument in grab() event/method.

raw

    { raw => 1 }

The raw key will contain the same thing you specified for raw argument in grab() event/method. If you didn't specify anything - it won't be present in the output.

error

    { error => 'Network error: timeout' }

If an error occurred then the error key will be present describing the reason for failure.

result

    'result' => {
        'xml_prolog' => 0,
        'doctype' => '',
        'non_white_space' => 0,
        'has_doctype' => 0,
        'mime' => 'text/html'
    },


    'result' => '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">',

Depending on the setting of raw argument the result key will either contain a hashref filled with info or the actual doctype. See description of grab() method in WWW::DoctypeGrabber's documentation for explanation of all the keys/values in the hashref.

user defined

    { '_blah' => 'foos' }

Any arguments beginning with _ (underscore) passed into the grab() event/method will be present intact in the result.

SEE ALSO

POE, WWW::DoctypeGrabber

REPOSITORY

Fork this module on GitHub: https://github.com/zoffixznet/POE-Component-Bundle-WebDevelopment

BUGS

To report bugs or request features, please use https://github.com/zoffixznet/POE-Component-Bundle-WebDevelopment/issues

If you can't access GitHub, you can email your request to bug-POE-Component-Bundle-WebDevelopment at rt.cpan.org

AUTHOR

Zoffix Znet <zoffix at cpan.org> (http://zoffix.com/, http://haslayout.net/)

LICENSE

You can use and distribute this module under the same terms as Perl itself. See the LICENSE file included in this distribution for complete details.