The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

W3C::LogValidator::HTMLValidator - [W3C Log Validator] Batch HTML validation (using the W3C Markup Validator)

SYNOPSIS

  use  W3C::LogValidator::HTMLValidator;
  my %config = ("verbose" => 2);
  my $validator = W3C::LogValidator::HTMLValidator->new(\%config);
  $validator->uris('http://www.w3.org/Overview.html', 'http://www.yahoo.com/index.html');
  my %results = $validator->process_list;

DESCRIPTION

This module is part of the W3C::LogValidator suite, and checks HTML validity of a given document via the W3C HTML validator service.

API

Constructor

$val = W3C::LogValidator::HTMLValidator->new

Constructs a new W3C::LogValidator:HTMLValidator processor.

You might pass it a configuration hash reference (see "config_module" in W3C::LogValidator and W3C::LogValidator::Config)

  $validator = W3C::LogValidator::HTMLValidator->new(\%config);  

Main processing method

$val->process_list

Processes a list of sorted URIs through the W3C Markup Validator.

The list can be set uris. If the $val was given a config has when constructed, and if the has has a "tmpfile" key, process_list will try to read this file as a hash of URIs and "hits" (popularity) with DB_File.

Returns a result hash. Keys for this hash are:

  name (string): the name of the module, i.e "HTMLValidator"
  intro (string): introduction to the processing results
  thead (array): headers of the results table
  trows (array of arrays): rows of the results table
  outro (string): conclusion of the processing results

General methods

$val->uris

Returns a list of URIs to be processed (unless the configuration gives the location for the hash of URI/hits berkeley file, see process_list If an array is given as a parameter, also sets the list of URIs and returns it.

$val->trim_uris

Given a list of URIs of documents to process, returns a subset of this list containing the URIs of documents the module supposedly can handle. The decision is made based on file extensions (see auth_ext), content-type (see HEAD_check) , and the setting for ExcludedAreas

$val->HEAD_check

Checks whether a document with no extension is actually an HTML/XML document through an HTTP HEAD request returns 1 if the URI is of an expected content-type, 0 otherwise

$val->auth_ext

Returns the file extensions (space separated entries in a string) supported by the Module. Public method accessing $self->{AUTH_EXT}, itself coming from either the AuthorizedExtensions configuration setting, or a default value

$val->valid

Sets / Returns whether the document being processed has been found to be valid or not. If an argument is given, sets the variable, otherwise returns the current variable.

$val->valid_err_num

Sets / Returns the number of validation errors for the document being processed. If an argument is given, sets the variable, otherwise returns the current variable.

$val->valid_success

Sets / Returns whether the module was able to process validation of the current document successfully (regardless of valid/invalid result) If an argument is given, sets the variable, otherwise returns the current variable.

$val->valid_head

Sets / Returns all HTTP headers returned by the markup validator when attempting to validate the current document. If an argument is given, sets the variable, otherwise returns the current variable.

$val->new_doc

Resets all validation variables to 'undef'. In effect, prepares the processing module to the handling of a new document.

BUGS

Public bug-tracking interface at http://www.w3.org/Bugs/Public/

AUTHOR

Olivier Thereaux <ot@w3.org>

SEE ALSO

W3C::LogValidator, perl(1). Up-to-date complete info at http://www.w3.org/QA/Tools/LogValidator/