The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

weblint++

SYNOPSIS

        weblint++ [-c] [-e] [-E] [-f] [-l] [-M] [-t] [-T] [-V]
                [-C [config] ] [-d file] [-m [md5 digest] ] [-R template]
                [-s file] [-u username -p password] [-v [level | 1] ]
                url

        weblint++ -h

NOTE: You will not see any output without the -v switch

DESCRIPTION

The weblint++ program fetches a web resource and runs the response through an HTML lint filter as well as other tests.

You can use this program interactively if you specify the -v switch, or use it in batch mode by observing the exit status.

OPTIONS

Command line switches cannot be grouped. You must specify them separately. If you do it correctly, things will work.

        -l -T -m        CORRECT

If you do it incorrectly, you get undefined behaviour.

        -ltm            WRONG BAD BAD INCORRECT NAUGHTY
-c

The -c switch checks IMG and A links if url returns a 'text/html' resource. Each problem link adds 1 to the exit status value. With the -v switch, -c reports the status of just the status of bad links. The status of all links is available to the -R template.

-C [ config ]

The -C switch loads configuration information from a file. If you do not specify a file then the program looks in the current directory for a file named .weblintrc. If it does not find that file, it looks in your home directory.

See the Configuration section for details on allowed directives.

This switch requires ConfigReader::Simple.

-d file

The -d switch performs a diff between the HTTP response message body and the specified file. The program exits if they differ, unless -e is present.

-e

When present, the -e switch prevents the program from exiting from errors with the -d or -m switches. This way the program can continue and eventually print a report with the -R switch.

-E

When present, the -e switch prevents the program from creating reports unless it has web problems to report.

-f file

Read the resources to check from file instead of from the command line.

This functionality is unimplemented.

-h

Print a help message and exit.

-l

Run the contents of url, if it is 'text/html' through HTML::Lint. Each lint warning adds 1 to the exit status value. With the -v switch, it prints the results to standard output.

The test will be skipped if HTML::Lint cannot be loaded.

-m [md5 digest]

The -m switch by itself reports the MD5 digest (in hex) of the message body of the request from URL. The program exits if the digests do not match, unless -e is present.

The test will be skipped if Digest::MD5 cannot be loaded.

-M

Email the report (from -R). You should specify the mail headers in the template, including the To: header. The report will not be printed to standard output.

This functionality is unimplemented.

-p password

The -p switch specifies the Basic authentication password.

-R file

The -R switch specifies the report template file. Once the program fills in the template, it prints it to standard output unless you specified the -M switch to email the report instead. It uses Text::Template, and skips the report if that module is not available.

The report will be skipped if Text::Template cannot be loaded, unless Data::Dumper can dump the report data structure to STDOUT.

-s file

The -s switch specifies the file to save the HTTP message body to.

-t

The -t switch reports the download time of the resource, using Time::HiRes.

The test will be skipped if Time::HiRes cannot be loaded.

-T

The -T switch reports the total download size of the resource. For 'text/html' resources, this size includes the sizes of the IMG links.

The test will be skipped if HTTP::Size cannot be loaded.

-u username

The -u switch specifies the Basic authentication user name.

-v [level]

The -v switch turns on verbose reporting. The greater the value of level, the more verbose the reporting. If you do not specify -v, you will see no output, although you can observe the results from the exit status.

The -v switch implies -t.

-V

Print the version number and exit.

CONFIGURATION

You can load configuration information from a file with the -C switch. Configuration directives found in the file override those found on the command line. Some directives must have a value, some may take a value, and others set flags by their mere presence.

Configuration directives

VERBOSITY [ level ]

Same as the -v switch.

USERNAME username

Same as the -u switch.

PASSWORD password

Same as the -p switch.

Same as the -l switch.

DIFF file

Same as the -d switch.

DO_NOT_EXIT

Same as the -e switch.

Same as the -f switch.

LINT

Same as the -l switch.

MD5 [ md5 ]

Same as the -m switch.

MAIL_REPORT

Same as the -M switch.

MAIL_PROGRAM

The mail program to use to send mail, such as /usr/lib/sendmail or /usr/local/bin/qmail-inject. The program name must exist and must be executable. The template must contain all of the headers. If you do not specify this directive, then the program attempts to use Mail::Sendmail.

MAIL_TO

Sets the To address of the emailed report.

This directive is ignored unless the -M and -R switches are used.

MAIL_FROM

Sets the From address of the emailed report.

This directive is ignored unless the -M and -R switches are used.

MAIL_SUBJECT

Sets the subject line of the emailed report.

This directive is ignored unless the -M and -R switches are used.

REPORT_ON_ERROR_ONLY

Reports will only be made if there was an error. If no problems were found with the resource, then nothing will be printed to standard output or mailed.

Same as the -E switch.

REPORT template

Same as the -R switch.

SAVE_RESPONSE file

Same as the -s switch.

TIMER

Same as the -t switch.

DOWNLOAD_SIZE

Same as the -T switch.

ORDER OF TESTS

The program performs the tests, and possibly exits based on errors, in this order:

        HTTP fetch
        time download ( C<-t> switch )
        MD5 digest comparison ( C<-m> switch )
        File content comparison ( C<-d> switch )
        Download size check (C<-T> switch)
        HTML Lint warnings (C<-l> switch )
        Link Check (C<-c> switch )

REPORT TEMPLATES

The -R switch allows you to generate a report from your own template.

These variables are available:

$url

The value of url from the command line.

%options

A hash of all of the specified switches, and their values. A value of 1 indicates either the literal value is 1 or the switch was specified without a value.

$name

The program name, as reported in $0. You can also simply use $0.

$version

The program version number

$request

The HTTP request, from HTTP::Request

$response

The HTTP response, from HTTP::Response

$response_code

The HTTP response status code, from HTTP::Response

$response_success

True if the request was successful, from HTTP::Response

$download_time

The download time of url.

$data

The message body of the HTTP response.

$type

The content-type of the HTTP response. Some tests only work for the 'text/html' MIME type.

$fetched_md5

The MD5 digest of the message body of the HTTP response. The -m switch compares its value, $options{m}, to this value.

This applies to the -m switch only, and is not set otherwise.

$md5_mismatch

True if the MD5 digest of the message body of the HTTP response does not match the value specified with the -m switch.

This applies to the -m switch only, and is not set otherwise.

$diff

The text differences between the message body of the HTTP response and the filel specified with the -d switch.

This applies to the -d switch only, and is not set otherwise.

$total_download_size

The total download size of url, along with image file sizes it includes, as determined by HTTP::Size.

This applies to the -T switch only, and is not set otherwise.

%total_download_hash

The hash from HTTP::Size::get_sizes. See that module for details.

This applies to the -T switch only, and is not set otherwise.

$lint_error_count

The number or warnings reported by HTML::Lint.

This applies to the -l switch only, and is not set otherwise.

@lint_errors

The warnings reported by HTML::Lint.

This applies to the -l switch only, and is not set otherwise.

The links extracted from the message body of the HTTP response, reported by HTML::SimpleLinkExtor.

This applies to the -c switch only, and is not set otherwise.

The number of links extracted from the message body of the HTTP response, reported by HTML::SimpleLinkExtor.

This applies to the -c switch only, and is not set otherwise.

The unique links extracted from the message body of the HTTP response, reported by HTML::SimpleLinkExtor, as the keys to this hash. Their values are the HTTP response code for each link.

This applies to the -c switch only, and is not set otherwise.

The number of unique links extracted from the message body of the HTTP response, reported by HTML::SimpleLinkExtor.

This applies to the -c switch only, and is not set otherwise.

The number of unique links from the message body of the HTTP response which returned HTTP error statuses (4xx, 5xx).

This applies to the -c switch only, and is not set otherwise.

$errors

The total number of lint warnings and HTTP errors from link checking.

This applies to the -c and -l switches only, and is not set otherwise.

@error_messages

An array of error messages from all parts of the program, in the order that the program encountered them.

EXIT STATUSES

-1

The MD5 digest of the HTTP response message body did not match the digest specified with -m, if you specified one.

-2

The file specified with the -d switch does not exist.

-3

The HTTP response message body differed from the content of the file specified with <-d>.

< 0

The program encountered HTTP error. The exit code is the HTTP response code negated. If the HTTP response was 404 (Not Found), the exit status is -404.

> 0

HTML::Lint found HTML errors. The exit status is the number of HTML errors (from -l) and broken links (from -c).

+0

Success. No HTTP errors, no MD5 digest mismatches, no file diffs, no HTML warnings.

EXAMPLES

Check for HTML errors

These commands interactively check HTML for errors. The -v switch prints results to the terminal and the -l switch loads HTML::Lint.

        # from the web
        weblint++ -v -l http://www.example.com

        # a local file with an absolute path
        weblint++ -v -l /usr/local/web/test.html

        # a local file with a absolute file: URI
        weblint++ -v -l file:/usr/local/web/test.html

        # a local file with a relative URI
        weblint++ -v -l test.html

        # a local file with a relative file: URI
        weblint++ -v -l file:test.html

This command check for broken links. You can use the same form of the URIs in Check for HTML errors. The -v switch prints results to the terminal and the -c switch loads HTTP::SimpleLinkChecker.

        # from the web
        weblint++ -v -c http://www.example.com

Get the MD5 digest of a web resource

These command check MD5 digests. You can use the same form of the URIs in Check for HTML errors. The -v switch prints results to the terminal and the -M switch loads Digest::MD5.

        # get MD5 digest
        weblint++ -v -m http://www.example.com

        # compare MD5 digest
        weblint++ -v -m9ec29ae8d1268b82acb8e3ab7ce0f5c6 http://www.example.com

Get the file contents

This command checks for content differences. You can use the same form of the URIs in Check for HTML errors. The -v switch prints results to the terminal and the -d switch loads Text::Diff.

        weblint++ -v -d should_be/test.html http://www.example.com

Read a configuration file

        weblint++ -C .configrc http://www.example.com

Access a password protected website

This command accesses a password protected website with the Basic authentication username and password.

        weblint++ -v -u username -p password http://www.example.com

This command check for broken links. You can use the same form of the URIs in Check for HTML errors. The -v switch prints results to the terminal and the -R switch loads Text::Template and populates template.txt. The program prints the results to STDOUT.

        # print the report despite results
        weblint++ -R template.txt -l http://www.example.com

The -E switch only prints reports if the program needs to report a problem with the resource. The program will not print a report if it did not find a problem with the resource. For example, you might use this as a cron job. If something needs your attention, the program prints the report to standard output which cron then mails to you. If everything is okay, you do not get mail.

        # print the result only if there were HTML errors
        weblint++ -E -R template.txt -l http://www.example.com

        # print the result only if there were HTML errors
        # or bad link problems
        weblint++ -E -R template.txt -l -c http://www.example.com

Save the HTTP response in a file

This command saves the HTTP message body. You can use the same form of the URIs in Check for HTML errors. The -s switch saves the results in saved.txt.

        weblint++ -s saved.txt http://www.example.com

Time the download

This command measures the download time of http://www.example.com. You can use the same form of the URIs in Check for HTML errors. The -v switch prints results to the terminal and the -t switch loads Time::HiRes.

        weblint++ -v -t http://www.example.com

Measure the total download size, including linked images

This command measures the download time of http://www.example.com. You can use the same form of the URIs in Check for HTML errors. The -v switch prints results to the terminal and the -T switch loads HTTP::Size.

        weblint++ -v -T http://www.example.com

Perform all tests

        # print to the terminal
        weblint++ -v -c -l -t -m -T -d test.html http://www.example.com

        # print to a template
        weblint++ -v -c -l -t -m -T -R template.txt http://www.example.com

BUGS

* to be determined

TO DO

* test various HTTP header things (cookies, etc)

* email templates on error

* implement -M

* implement -f

* allow global configuration files.

* reconsider exiting on errors from -d and -m

* exiting with negative error codes is probably not such a great idea. maybe -e should allow the exit rather than the other way around.

SOURCE AVAILABILITY

This source is part of a SourceForge project which always has the latest sources in CVS, as well as all of the previous releases.

        https://sourceforge.net/projects/brian-d-foy/

If, for some reason, I disappear from the world, one of the other members of the project can shepherd this module appropriately. cvs comm

AUTHOR

brian d foy <bdfoy@cpan.org>

COPYRIGHT

Copyright 2002-2007, brian d foy. All rights reserved.

This program may be redistributed under the same turns as Perl itself.

SCRIPT CATEGORIES

Web

SEE ALSO

HTML::Lint, Text::Diff, HTTP::Request, HTTP::Response, Time::HiRes, Text::Template, HTTP::Size