
HTML::TableExtractor - Do stuff with the layout of HTML tables.

use HTML::TableExtractor;
$p = HTML::TableExtractor->new();
$p->parse($html, table => sub { ... }, tr => sub { ... });

Parses HTML looking for table-related elements (table, tr, td and th as of version 0.1).
Three callbacks can be registered for each element. These callbacks, described below, are executed whenever an element of a particular type is encountered.
o start_${tagname} Called whenever $tagname is opened.
o ${tagname} Called immediately after start_${tagname}, and
immediately before end_${tagname}.
o end_${tagname} Called whenever a closing $tagname is encountered.
use HTML::TableExtractor;
$p = HTML::TableExtractor->new();
$p->parse($html,
start_table => sub {
my ($attr, $origtext) = @_;
print "Table border is $table->{border}\n";
},
tr => sub { print "Row opened or closed.\n" },
);

Called whenever a particular start tag has been recognised. This module recognises these tags: <table>, <tr>, <td> & <th>.
This method will be called by the parser and is not intended to be called from an application.
Called whenever a particular end tag is encountered.
This method will be called by the parser and is not intended to be called from an application.
This method is all you really need to do. Call it with callbacks for each tag type. These will be executed as described above.
o parse() should handle other data sources, such as streaming, file handle etc.
HTML::Parser, HTML::TableContentParser

Simon Drabble <simon@thebigmachine.org<gt>
(C) 2002 Simon Drabble
This software is released under the same terms as perl.