
poacher - a simple web site validation robot, based on WWW::Robot

poacher [ -help | -verbose | -version ] [ -command program ] url

Poacher is a web robot which is used to check a web site for various problems, such as broken links, and badly formed HTML. The checking of page content is performed by a separate program, currently just weblint.
Poacher is provided as a sample application of the WWW::Robot module - it is just an expanded version of the sample code included in the documentation for WWW::Robot. Please let me know if you have any ideas for new features, or improvements: for Poacher or the Robot module.

Specifies a program which you would like invoked on every page for which the GET request is successful. For example, you could use this to invoke weblint on every page.
Provides your email address, which is noted in requests made by the robot. This is so people can contact you if your robot goes crazy.
The robot should check that external URLs are accessible. We effectively ping them using a HEAD request.
Specifies whether the robot should perform a depth-first traversal or a breadth-first traversal.
Display a short help message with a reminder of supported command-line options.
Display the version of Poacher.
Enabled verbose reporting as the poacher runs.

The following example shows how you can use Poacher to check a site, running weblint on every page seen:
% poacher -command 'weblint -s' http://www.foobar.com/
The -s switch to weblint enables short messages, where the filename is not listed, just the line number.

This is a brief list of some ideas related to Poacher:

The robot module which provides the web traversal engine.
The collection of modules which provides all the base functionality on which all this stuff is built. The modules are available in the libwww-perl5 distribution on CPAN. Kudos to Gisle Aas.
Johan Vromans' module for parsing command-line options, included in the Perl distribution.
A perl script for checking the contents of a web page for syntax errors, and other classes of problem.

Neil Bowers <neilb@cre.canon.co.uk>

Copyright (C) 1997, Canon Research Centre Europe.
This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.