The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::Driver::HTML - SAX Driver for non wellformed HTML.

SYNOPSIS

  use XML::Driver::HTML;

  $driver = new XML::Driver::HTML(
        'Handler' => $some_sax_filter_or_handler,
        'Source' => $some_PerlSAX_like_hash
        );

  $driver->parse();

or

  use XML::Driver::HTML;

  $driver = new XML::Driver::HTML();

  $driver->parse(
        'Handler' => $some_sax_filter_or_handler,
        'Source' => $some_PerlSAX_like_hash
        );

  $driver->parse(
        'Handler' => $some_other_sax_filter_or_handler,
        'Source' => $some_other_source
        );
  

DESCRIPTION

XML::Driver::HTML is a SAX Driver for HTML. There is no need for the HTML input to be weel formed, as XML::Driver::HTML is generating its SAX events by walking a HTML::TreeBuilder object. The simplest kind of use, is a filter from HTML to XHTML using XML::Handler::YAWriter as a SAX Handler.

    my $ya = new XML::Handler::YAWriter( 
        'Output' => new IO::File ( ">-" ),
        'Pretty' => {
            'NoWhiteSpace'=>1,
            'NoComments'=>1,
            'AddHiddenNewline'=>1,
            'AddHiddenAttrTab'=>1,
            }
        );

    my $html = new XML::Driver::HTML(
        'Handler' => $ya,
        'Source' => { 'ByteStream' => new IO::File ( "<-" ) }
        );
    
    $html->parse();

METHODS

new

Creates a new XML::Driver::HTML object. Default options for parsing, described below, are passed as key-value pairs or as a single hash. Options may be changed directly in the object.

parse

Parses a document. Options, described below, are passed as key-value pairs or as a single hash. Options passed to parse() override the default options in the parser object for the duration of the parse.

OPTIONS

The following options are supported by XML::Driver::HTML :

Handler

Default SAX Handler to receive events

Source

Hash containing the input source for parsing. The `Source' hash may contain the following parameters:

ByteStream

The raw byte stream (file handle) containing the document.

String

A string containing the document.

SystemId

The system identifier (URL) of the document.

Encoding

A string describing the character encoding.

If more than one of `ByteStream', `String', or `SystemId', then preference is given first to `ByteStream', then `String', then `SystemId'.

NOTES

XML::Driver::HTML requires Perl 5.6 to convert from ISO-8859-1 to UTF-8.

BUGS

not yet implemented:

    Interpretation of SystemId as being an URI
    XHTML document type

other bugs:

    HTML::Parser and HTML::TreeBuilder bugs concerning DOCTYPE and CSS.
    Perl handling of UFT8 is compatible between different versions. So
    you need exactly Perl 5.6.0, not lower not higher.

AUTHOR

  Michael Koehne, Kraehe@Copyleft.De
  (c) 2001 GNU General Public License

SEE ALSO

XML::Parser::PerlSAX and HTML::TreeBuilder