The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package HTML::Parser::Simple::Reporter;

use strict;
use warnings;

use HTML::Parser::Simple::Attributes;

use Moo;

extends 'HTML::Parser::Simple';

our $VERSION = '2.02';

# -----------------------------------------------

sub traverse
	my($self, $node, $output, $depth) = @_;
	$depth        ||= 0;
	my(@child)    = $node -> getAllChildren;
	my($metadata) = $node -> getNodeValue;
	my($content)  = $$metadata{content};
	my($name)     = $$metadata{name};

	# We ignore the root, which means we ignore the DOCTYPE.

	if ($name ne 'root')
		my($s) = ('  ' x ($depth - 1) ) . "$name. Attributes: ";
		my($p) = HTML::Parser::Simple::Attributes -> new;
		my($a) = $p -> parse($$metadata{attributes});
		$s     .= $p -> hashref2string($a) . '. Content:';
		my($c) = '';

		for my $index (0 .. $#child + 1)
			$c .= $index <= $#$content && defined($$content[$index]) ? $$content[$index] : '';

		$c =~ s/^\s+//;
		$c =~ s/\s+$//;
		$s .= " $c" if (length $c);

		push @$output, $s;

	for my $index (0 .. $#child)
		$self -> traverse($child[$index], $output, $depth + 1);

} # End of traverse.

# -----------------------------------------------

sub traverse_file
	my($self, $input_file_name) = @_;
	$input_file_name  ||= $self -> input_file;

	$self -> input_file($input_file_name);
	$self -> log("Reading $input_file_name");

	open(INX, $input_file_name) || Carp::croak "Can't open($input_file_name): $!";
	read(INX, $html, -s INX);
	close INX;

	Carp::croak "Can't read($input_file_name): $!" if (! defined $html);

	$self -> log('Parsing');

	$self -> parse($html);

	$self -> log('Traversing');

	my($output) = [];

	$self -> traverse($self -> root, $output, 0);

	return $output;

} # End of traverse_file.

# -----------------------------------------------


=head1 NAME

HTML::Parser::Simple::Reporter - A sub-class of HTML::Parser::Simple

=head1 Synopsis

	#!/usr/bin/env perl

	use strict;
	use warnings;

	use HTML::Parser::Simple::Reporter;

	# -------------------------

	# Method 1:

	my($p) = HTML::Parser::Simple::Reporter -> new(input_file => 'data/s.1.html');
	my($s) = $p -> traverse_file;

	print "$_\n" for @$s;

	# Method 2:

	my($p) = HTML::Parser::Simple::Reporter -> new;
	my($s) = $p -> traverse_file(input_file => 'data/s.1.html');

	print "$_\n" for @$s;

See scripts/

=head1 Description

C<HTML::Parser::Simple::Reporter> is a pure Perl module.

It is a sub-class of L<HTML::Parser::Simple>.

Specifically, this module overrides the method L<HTML::Parse::Simple/traverse($node)>, to demonstrate
a different way of formatting the output.

It parses HTML V 4 files, and generates a tree of nodes, with 1 node per HTML tag.

The data associated with each node is documented in the L<HTML::Parse::Simple/FAQ>.

See also L<HTML::Parser::Simple> and L<HTML::Parser::Simple::Attributes>.

=head1 Distributions

This module is available as a Unix-style distro (*.tgz).

See for details.

See for
help on unpacking and installing.

=head1 Constructor and initialization

new(...) returns an object of type C<HTML::Parser::Simple::Reporter>.

This is the class contructor.

Usage: C<< HTML::Parser::Simple::Reporter -> new() >>.

This method takes a hashref of options.

Call C<new()> as C<< new({option_1 => value_1, option_2 => value_2, ...}) >>.

Available options (each one of which is also a method):

=over 4

=item o None specific to this class


But since this class is a sub-class of L<HTML::Parser::Simple>, it share all the options to
C<< new() >> documented in that class: L<HTML::Parser::Simple/Constructor and initialization>.

=head1 Methods

This module is a sub-class of L<HTML::Parser::Simple>, and inherits all its methods.

Further, it overrides the L<HTML::Parser::Simple/traverse($node)> method.

=head2 traverse($node, $output, $depth)

Returns $output as an arrayref of strings.

Traverses the tree built by calling L<HTML::Parser::Simple/parse($html)>.


=over 4

=item o $node

The node at which to start the traversal. This is normally $self -> root.

=item o $output

The arrayref in which output is stored. It is normally used like this:

	my($arrayref) = [];

	$p -> traverse($p -> root, $arrayref);

	print "$_\n" for @$arrayref;

=item o $depth

The depth of $node within the tree. This is normally set to 0.

In C<< traverse() >> it is used to indent the output.

If not specified, it defaults to 0.


Lastly note that this method ignores the root of the tree, and hence ignores the DOCTYPE which is stored
as an attribute of the root.

=head2 traverse_file($input_file_name)

Returns an arrayref of formatted text generated from the nodes in the tree built by calling

Traverses the given file, or the file named in C<< new(input_file => $name) >>, or the file named in
C<< input_file($name) >>.

Basically it does this (recalling that this class sub-classes L<HTML::Parser::Simple>):

	# Read file and store contents in $html.

	$self -> parse($html);

	my($output) = [];

	$self -> traverse($self -> root, $output, 0);

	return $output;

However, since this class has overridden the L<HTML::Parse::Simple/traverse($node)> method, the output is
not written anywhere, but rather is stored in an arrayref, and returned as the result of this method.

Note: The parameter passed in to C<< traverse_file($input_file_name) >>, takes precedence over the
I<input_file> parameter passed in to C<< new() >>, and over the internal value set with
C<< input_file($in_file_name) >>.

Lastly, the parameter passed in to C<< traverse_file($input_file_name) >> is used to update
the internal value set with the I<input_file> parameter passed in to C<< new() >>,
or set with a call to C<< input_file($in_file_name) >>.

See the L</Synopsis> for sample code. See also scripts/

=head1 FAQ

See L<HTML::Parse::Simple/FAQ>.

=head1 Author

C<HTML::Parser::Simple> was written by Ron Savage I<E<lt><gt>> in 2009.

Home page: L<>.

=head1 Copyright

Australian copyright (c) 2009 Ron Savage.

	All Programs of mine are 'OSI Certified Open Source Software';
	you can redistribute them and/or modify them under the terms of
	The Artistic License, a copy of which is available at:
