HTML::TableContentParser - Do interesting things with the contents of tables.
use HTML::TableContentParser; $p = HTML::TableContentParser->new(); $tables = $p->parse($html);
This package pulls out the contents of a table from a string containing HTML. Each time a table is encountered, data will be stored in an array consisting of a hash of whatever was discovered about the table -- id, name, border, cellspacing etc, and of course data contained within the table.
The format of each hash will look something like
attributes keys from the attributes of the <table> tag @{$table_headers} array of table headers, in order found @{$table_rows} rows discovered, in order
If the table has a caption, this will be provided as
caption keys from the caption tag's attributes data the text of the <caption>..</caption> element
then for each table row, @{$table_data} td's found, in order other attributes the ... in <tr ...>
then for each data cell, data what comes between <td> and </td> other attributes the ... in <td ...>
use HTML::TableContentParser; $p = HTML::TableContentParser->new(); $html = read_html_from_somewhere(); $tables = $p->parse($html); for $t (@$tables) { for $r (@{$t->{rows}}) { print "Row: "; for $c (@{$r->{cells}}) { print "[$c->{data}] "; } print "\n"; } }
Called whenever a particular start tag has been recognised. This is called automatically by the parser and should not be called from the application.
Called whenever a piece of content is encountered. This is called automatically by the parser and should not be called from the application.
Called whenever a particular end tag is encountered. This is called automatically by the parser and should not be called from the application.
Called with the HTML to parse. This is all the application needs to do. The return value will be an arrayref containing each table encountered, in the format detailed above.
Not a method, but a class variable. Set to 1 to cause debugging output (basically the structure and content of the table) to be sent to stdout via warn().
Nothing.
Simon Drabble E<lt>sdrabble@cpan.orgE<gt> (C) 2002 Simon Drabble
This software is released under the same terms as perl.
To install HTML::TableContentParser, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::TableContentParser
CPAN shell
perl -MCPAN -e shell install HTML::TableContentParser
For more information on module installation, please visit the detailed CPAN module installation guide.