The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::TableExtractor - Do stuff with the layout of HTML tables.

SYNOPSIS

  use HTML::TableExtractor;
  $p = HTML::TableExtractor->new();
  $p->parse($html,      table => sub { ... }, tr => sub { ... });

DESCRIPTION

Parses HTML looking for table-related elements (table, tr, td and th as of version 0.1).

Three callbacks can be registered for each element. These callbacks, described below, are executed whenever an element of a particular type is encountered.

  o  start_${tagname}  Called whenever $tagname is opened.
  o  ${tagname}        Called immediately after start_${tagname}, and
                                   immediately before end_${tagname}.
  o  end_${tagname}    Called whenever a closing $tagname is encountered.

EXAMPLE

  use HTML::TableExtractor;
  $p = HTML::TableExtractor->new();
  $p->parse($html,
      start_table => sub {
        my ($attr, $origtext) = @_;
        print "Table border is $table->{border}\n";
      },
      tr => sub { print "Row opened or closed.\n" },
      );

        

METHODS

start($parser, $tag, $attr, $attrseq, $origtext);

Called whenever a particular start tag has been recognised. This module recognises these tags: <table>, <tr>, <td> & <th>.

This method will be called by the parser and is not intended to be called from an application.

end($parser, $tag, $origtext);

Called whenever a particular end tag is encountered.

This method will be called by the parser and is not intended to be called from an application.

$p->parse($html, tag_type => \&coderef, ...);

This method is all you really need to do. Call it with callbacks for each tag type. These will be executed as described above.

EXPORTS

CAVEATS, BUGS, and TODO

o parse() should handle other data sources, such as streaming, file handle etc.

SEE ALSO

HTML::Parser, HTML::TableContentParser

AUTHOR

Simon Drabble <simon@thebigmachine.org<gt>

(C) 2002 Simon Drabble

This software is released under the same terms as perl.