
checksite - Check the contents of a website

$ checksite [options] -p <name> uri

--prefix|-p <name> The prefix (dir) of this check [mandatory] --dir|-d <dir> The target directory
--[no]save Save validation results --load Load the validation results
--nohtml Skip (X)HTML validation --html_validator <uri> Base uri for the W3C (X)HTML validator --html_upload Validate (X)HTML by uploading --html_uri Validate (X)HTML by sending the uri --xmllint Validate by using the xmllint program
--nocss Skip CSS validation --css_validator <uri> Base uri for the W3C CSS validator --css_upload Validate CSS by uploading --css_uri Validate CSS by sending the uri
--disallow <path> Add Disallow: rules to robots.txt (multiple)
--nostrictrules Do not impose /robots.txt on the validator
for "local" url's
--lang|-l <lang> Set language(s) for Accept-Language: header
--ua_class <Module> Set a new UserAgent class
(child of WWW::Mechanize)
-v Increase verbosity (multiple)
--help|-h This message
See WWW::CheckSite::Manual for more information.

This program will spider the specified url and check the availability of the links, images and stylesheets on each page.
INCOMPATIBLE CHANGE AS OF 0.020: Pages and stylesheets are NO LONGER validated with the validators available at http://validator.w3.org and http://jigsaw.w3.org. These validators do not allow robots! The W3C-HTML validator is now widly available and very installable, so I advise you to run your own. The W3C-CSS validator is more work, but I have managed to get that to work as well with Jigsaw.
When all pages are checked two reports in HTML-format are generated. The full.html report contains all the information for all pages and the summ.html report contains only the pages with errors and their errors.
Each page fetched by the spider will have these metrics:
The HTTP-returncode and a verbal explanation of that code
The contents of the <title></title> tag.
The MIME type returned by the HTTP-server for the document.
A list of <a href=>, <area href=> and <frame src=> uri's found on the page with the HTTP-returncode. Each HTML-code is also checked for the text or ALT/TITLE attribute.
The number of links found and the number of links that are ok.
A list of <img src=> and <input type=image> uri's found on the page with the HTTP-returncode and MIME type. Each HTML tag is also checked for the existance of the ALT attribute.
The number of images found and the number of images that are ok.
A list of <link rel=stylesheet type=text/css> uri's found on the page with the HTTP-returncode, MIME type and CSS-validation result.
The number of stylesheets found and the number of stylesheets that are ok.
The HTML-validation result.

checksite supports Config::Auto. This means that any of following directories is searched for checksiteconfig, checksite.config, checksiterc and .checksiterc:


Abe Timmerman, <abeltje@cpan.org>

Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

Copyright MMV-MMVII Abe Timmerman, All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.