Abe Timmerman > WWW-CheckSite-0.018 > WWW::CheckSite::Validator

Download:
WWW-CheckSite-0.018.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  2
Open  2
View Bugs
Report a bug
Module Version: 0.017   Source   Latest Release: WWW-CheckSite-0.019_52

NAME ^

WWW::CheckSite::Validator - A spider that assesses 'kwalitee' for a site

SYNOPSIS ^

    use WWW::CheckSite::Validator;
    my $wcv = WWW::CheckSite::Validator->new(
        uri => 'http://www.test-smoke.org'
    );

    while ( my $info = $wcv->get_page ) {
        # handle the info
    }

DESCRIPTION ^

This is a subclass of WWW::CheckSite::Spider.

WWW::CheckSite::Validator starts its work after the spider has fetched the page. It will check these things:

METHODS ^

WWW::CheckSite::Validator->new( %args )

Extend WWW::CheckSite::Spider->new to check for Image::Info so we can do a basic check on the images.

$wcs->process_page

This method overrides the WWW::CheckSite::Spider::process_page() method to check on the availability of links, images and stylesheets. When specified it will also send the page for validation by W3.ORG.

On top of the standard information it returns more:

$wcs->check_links( $stats )

The check_links() method gets information about the links on this page. If there is no return status, it will HEAD the uri and update the cache status for this link to prevent multiple HEADing.

NOTE: This method does not respect the exclusion rules, and only robot-rules with strictrules enabled!

The structure for links:

$wcs->check_images( $stats )

The check_images() method gets information about the images on the page. The list comes from the images() method of the mechanize object. It will only HEAD the uri.

The structure for images:

$wcs->check_styles( $stats )

The check_styles() method checks the validity of stylesheets used in the page. We check for <link rel="stylesheet" type="text/css"> tags.

The structure for stylesheets:

$wcs->validate

The validate() method sends the url/contents off to W3.org to validate.

$wcs->validate_by_none

The fallback do-not-validate method.

$wcs->validate_by_uri

Sends only the uri to W3.ORG and get the validation result.

$wcs->validate_by_upload( $stats )

Create a temporary file (with File::Temp) from $agent->content, call the validator with that temporary file and save the result (as a boolean) in $stats->{validate}.

$wcs->validate_by_xmllint( $stats )

Use the xmllint(1) program to validate the (X)HTML.

$wcs->validate_style( $ua )

Dispatch the validation to the right method.

$wcs->style_by_none

The fallback do-not-validate-stylesheet method.

$wcs->style_by_uri( $ua )

Sends only the uri to JIGSAW.W3.ORG and get the validation result.

$wcs->style_by_upload( $ua )

Create a temporary file (with File::Temp) from $ua->content, call the validator with that temporary file and return the result.

$wcs->validate_image( $ua )

This is more like a basic consistency check, that uses Image::Info::image_info().

$wcs->ct_can_validate( $ua )

Check if the content-type is "validatable".

$wcs->set_action

Why?

SEE ALSO ^

WWW::CheckSite::Spider, WWW::CheckSite

AUTHOR ^

Abe Timmerman, <abeltje@cpan.org>

BUGS ^

Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE ^

Copyright MMV Abe Timmerman, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.