Neil Bowers > Robot-0.011 > poacher

Download:
Robot-0.011.tar.gz

Annotate this POD

CPAN RT

New  1
Open  0
View Bugs
Report a bug
Source  

NAME ^

poacher - a simple web site validation robot, based on WWW::Robot

SYNOPSIS ^

poacher [ -help | -verbose | -version ] [ -command program ] url

DESCRIPTION ^

Poacher is a web robot which is used to check a web site for various problems, such as broken links, and badly formed HTML. The checking of page content is performed by a separate program, currently just weblint.

Poacher is provided as a sample application of the WWW::Robot module - it is just an expanded version of the sample code included in the documentation for WWW::Robot. Please let me know if you have any ideas for new features, or improvements: for Poacher or the Robot module.

OPTIONS ^

-command program

Specifies a program which you would like invoked on every page for which the GET request is successful. For example, you could use this to invoke weblint on every page.

-email address

Provides your email address, which is noted in requests made by the robot. This is so people can contact you if your robot goes crazy.

-external

The robot should check that external URLs are accessible. We effectively ping them using a HEAD request.

-traversal [ depth | breadth ]

Specifies whether the robot should perform a depth-first traversal or a breadth-first traversal.

-help

Display a short help message with a reminder of supported command-line options.

-version

Display the version of Poacher.

-verbose

Enabled verbose reporting as the poacher runs.

EXAMPLE ^

The following example shows how you can use Poacher to check a site, running weblint on every page seen:

    % poacher -command 'weblint -s' http://www.foobar.com/

The -s switch to weblint enables short messages, where the filename is not listed, just the line number.

TODO LIST ^

This is a brief list of some ideas related to Poacher:

SEE ALSO ^

WWW::Robot

The robot module which provides the web traversal engine.

libwww-perl5

The collection of modules which provides all the base functionality on which all this stuff is built. The modules are available in the libwww-perl5 distribution on CPAN. Kudos to Gisle Aas.

Getopt::Long

Johan Vromans' module for parsing command-line options, included in the Perl distribution.

Weblint

A perl script for checking the contents of a web page for syntax errors, and other classes of problem.

AUTHOR ^

Neil Bowers <neilb@cre.canon.co.uk>

COPYRIGHT ^

Copyright (C) 1997, Canon Research Centre Europe.

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.