HTML::TableExtractor - Do stuff with the layout of HTML tables.
use HTML::TableExtractor; $p = HTML::TableExtractor->new(); $p->parse($html, table => sub { ... }, tr => sub { ... });
Parses HTML looking for table-related elements (table, tr, td and th as of version 0.1).
Three callbacks can be registered for each element. These callbacks, described below, are executed whenever an element of a particular type is encountered.
o start_${tagname} Called whenever $tagname is opened. o ${tagname} Called immediately after start_${tagname}, and immediately before end_${tagname}. o end_${tagname} Called whenever a closing $tagname is encountered.
use HTML::TableExtractor; $p = HTML::TableExtractor->new(); $p->parse($html, start_table => sub { my ($attr, $origtext) = @_; print "Table border is $table->{border}\n"; }, tr => sub { print "Row opened or closed.\n" }, );
Called whenever a particular start tag has been recognised. This module recognises these tags: <table>, <tr>, <td> & <th>.
This method will be called by the parser and is not intended to be called from an application.
Called whenever a particular end tag is encountered.
This method is all you really need to do. Call it with callbacks for each tag type. These will be executed as described above.
o parse() should handle other data sources, such as streaming, file handle etc.
HTML::Parser, HTML::TableContentParser
Simon Drabble <simon@thebigmachine.org<gt>
(C) 2002 Simon Drabble
This software is released under the same terms as perl.
To install HTML::TableExtractor, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::TableExtractor
CPAN shell
perl -MCPAN -e shell install HTML::TableExtractor
For more information on module installation, please visit the detailed CPAN module installation guide.