The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package HTML::Parser::Simple::Reporter;

use strict;
use warnings;

use HTML::Parser::Simple::Attributes;

use Moo;

extends 'HTML::Parser::Simple';

our $VERSION = '2.02';

# -----------------------------------------------

sub traverse
{
	my($self, $node, $output, $depth) = @_;
	$depth        ||= 0;
	my(@child)    = $node -> getAllChildren;
	my($metadata) = $node -> getNodeValue;
	my($content)  = $$metadata{content};
	my($name)     = $$metadata{name};

	# We ignore the root, which means we ignore the DOCTYPE.

	if ($name ne 'root')
	{
		my($s) = ('  ' x ($depth - 1) ) . "$name. Attributes: ";
		my($p) = HTML::Parser::Simple::Attributes -> new;
		my($a) = $p -> parse($$metadata{attributes});
		$s     .= $p -> hashref2string($a) . '. Content:';
		my($c) = '';

		for my $index (0 .. $#child + 1)
		{
			$c .= $index <= $#$content && defined($$content[$index]) ? $$content[$index] : '';
		}

		$c =~ s/^\s+//;
		$c =~ s/\s+$//;
		$s .= " $c" if (length $c);

		push @$output, $s;
	}

	for my $index (0 .. $#child)
	{
		$self -> traverse($child[$index], $output, $depth + 1);
	}

} # End of traverse.

# -----------------------------------------------

sub traverse_file
{
	my($self, $input_file_name) = @_;
	$input_file_name  ||= $self -> input_file;

	$self -> input_file($input_file_name);
	$self -> log("Reading $input_file_name");

	open(INX, $input_file_name) || Carp::croak "Can't open($input_file_name): $!";
	my($html);
	read(INX, $html, -s INX);
	close INX;

	Carp::croak "Can't read($input_file_name): $!" if (! defined $html);

	$self -> log('Parsing');

	$self -> parse($html);

	$self -> log('Traversing');

	my($output) = [];

	$self -> traverse($self -> root, $output, 0);

	return $output;

} # End of traverse_file.

# -----------------------------------------------

1;

=head1 NAME

HTML::Parser::Simple::Reporter - A sub-class of HTML::Parser::Simple

=head1 Synopsis

	#!/usr/bin/env perl

	use strict;
	use warnings;

	use HTML::Parser::Simple::Reporter;

	# -------------------------

	# Method 1:

	my($p) = HTML::Parser::Simple::Reporter -> new(input_file => 'data/s.1.html');
	my($s) = $p -> traverse_file;

	print "$_\n" for @$s;

	# Method 2:

	my($p) = HTML::Parser::Simple::Reporter -> new;
	my($s) = $p -> traverse_file(input_file => 'data/s.1.html');

	print "$_\n" for @$s;

See scripts/traverse.file.pl.

=head1 Description

C<HTML::Parser::Simple::Reporter> is a pure Perl module.

It is a sub-class of L<HTML::Parser::Simple>.

Specifically, this module overrides the method L<HTML::Parse::Simple/traverse($node)>, to demonstrate
a different way of formatting the output.

It parses HTML V 4 files, and generates a tree of nodes, with 1 node per HTML tag.

The data associated with each node is documented in the L<HTML::Parse::Simple/FAQ>.

See also L<HTML::Parser::Simple> and L<HTML::Parser::Simple::Attributes>.

=head1 Distributions

This module is available as a Unix-style distro (*.tgz).

See http://savage.net.au/Perl-modules.html for details.

See http://savage.net.au/Perl-modules/html/installing-a-module.html for
help on unpacking and installing.

=head1 Constructor and initialization

new(...) returns an object of type C<HTML::Parser::Simple::Reporter>.

This is the class contructor.

Usage: C<< HTML::Parser::Simple::Reporter -> new() >>.

This method takes a hashref of options.

Call C<new()> as C<< new({option_1 => value_1, option_2 => value_2, ...}) >>.

Available options (each one of which is also a method):

=over 4

=item o None specific to this class

=back

But since this class is a sub-class of L<HTML::Parser::Simple>, it share all the options to
C<< new() >> documented in that class: L<HTML::Parser::Simple/Constructor and initialization>.

=head1 Methods

This module is a sub-class of L<HTML::Parser::Simple>, and inherits all its methods.

Further, it overrides the L<HTML::Parser::Simple/traverse($node)> method.

=head2 traverse($node, $output, $depth)

Returns $output as an arrayref of strings.

Traverses the tree built by calling L<HTML::Parser::Simple/parse($html)>.

Parameters:

=over 4

=item o $node

The node at which to start the traversal. This is normally $self -> root.

=item o $output

The arrayref in which output is stored. It is normally used like this:

	my($arrayref) = [];

	$p -> traverse($p -> root, $arrayref);

	print "$_\n" for @$arrayref;

=item o $depth

The depth of $node within the tree. This is normally set to 0.

In C<< traverse() >> it is used to indent the output.

If not specified, it defaults to 0.

=back

Lastly note that this method ignores the root of the tree, and hence ignores the DOCTYPE which is stored
as an attribute of the root.

=head2 traverse_file($input_file_name)

Returns an arrayref of formatted text generated from the nodes in the tree built by calling
L<HTML::Parse::Simple/parse($html)>.

Traverses the given file, or the file named in C<< new(input_file => $name) >>, or the file named in
C<< input_file($name) >>.

Basically it does this (recalling that this class sub-classes L<HTML::Parser::Simple>):

	# Read file and store contents in $html.

	$self -> parse($html);

	my($output) = [];

	$self -> traverse($self -> root, $output, 0);

	return $output;

However, since this class has overridden the L<HTML::Parse::Simple/traverse($node)> method, the output is
not written anywhere, but rather is stored in an arrayref, and returned as the result of this method.

Note: The parameter passed in to C<< traverse_file($input_file_name) >>, takes precedence over the
I<input_file> parameter passed in to C<< new() >>, and over the internal value set with
C<< input_file($in_file_name) >>.

Lastly, the parameter passed in to C<< traverse_file($input_file_name) >> is used to update
the internal value set with the I<input_file> parameter passed in to C<< new() >>,
or set with a call to C<< input_file($in_file_name) >>.

See the L</Synopsis> for sample code. See also scripts/traverse.file.pl.

=head1 FAQ

See L<HTML::Parse::Simple/FAQ>.

=head1 Author

C<HTML::Parser::Simple> was written by Ron Savage I<E<lt>ron@savage.net.auE<gt>> in 2009.

Home page: L<http://savage.net.au/index.html>.

=head1 Copyright

Australian copyright (c) 2009 Ron Savage.

	All Programs of mine are 'OSI Certified Open Source Software';
	you can redistribute them and/or modify them under the terms of
	The Artistic License, a copy of which is available at:
	http://www.opensource.org/licenses/index.html

=cut