Abe Timmerman > WWW-CheckSite-0.019_52 > checksite

Download:
snapdir/WWW-CheckSite-0.019_52.tar.gz

Annotate this POD

CPAN RT

New  2
Open  2
View Bugs
Report a bug
Source  

NAME ^

checksite - Check the contents of a website

SYNOPSIS ^

    $ checksite [options] -p <name> uri

OPTIONS ^

Results
  --prefix|-p <name>        The prefix (dir) of this check [mandatory]
  --dir|-d <dir>            The target directory
Persistence
  --[no]save                Save validation results
  --load                    Load the validation results
(X)HTML validation
  --nohtml                  Skip (X)HTML validation
  --html_validator <uri>    Base uri for the W3C (X)HTML validator
  --html_upload             Validate (X)HTML by uploading
  --html_uri                Validate (X)HTML by sending the uri
  --xmllint                 Validate by using the xmllint program
CSS validation
  --nocss                   Skip CSS validation
  --css_validator <uri>     Base uri for the W3C CSS validator
  --css_upload              Validate CSS by uploading
  --css_uri                 Validate CSS by sending the uri
Exclusion
  --disallow <path>         Add Disallow: rules to robots.txt (multiple)

  --nostrictrules           Do not impose /robots.txt on the validator
                            for "local" url's
General
  --lang|-l <lang>          Set language(s) for Accept-Language: header

  --ua_class <Module>       Set a new UserAgent class
                            (child of WWW::Mechanize)

  -v                        Increase verbosity (multiple)
  --help|-h                 This message

See WWW::CheckSite::Manual for more information.

DESCRIPTION ^

This program will spider the specified url and check the availability of the links, images and stylesheets on each page.

INCOMPATIBLE CHANGE AS OF 0.020: Pages and stylesheets are NO LONGER validated with the validators available at http://validator.w3.org and http://jigsaw.w3.org. These validators do not allow robots! The W3C-HTML validator is now widly available and very installable, so I advise you to run your own. The W3C-CSS validator is more work, but I have managed to get that to work as well with Jigsaw.

When all pages are checked two reports in HTML-format are generated. The full.html report contains all the information for all pages and the summ.html report contains only the pages with errors and their errors.

Metrics for a spidered page

Each page fetched by the spider will have these metrics:

FILES ^

checksite supports Config::Auto. This means that any of following directories is searched for checksiteconfig, checksite.config, checksiterc and .checksiterc:

current directory
bin directory (where the script is installed)
$HOME
/etc/
/usr/local/etc/

SEE ALSO ^

AUTHOR ^

Abe Timmerman, <abeltje@cpan.org>

BUGS ^

Please report any bugs or feature requests to bug-WWW-CheckSite@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE ^

Copyright MMV-MMVII Abe Timmerman, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.