The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTML::Element - Class for objects that represent HTML elements

SYNOPSIS

 require HTML::Element;
 $a = new HTML::Element 'a', href => 'http://www.oslonett.no/';
 $a->pushContent("Oslonett AS");

 $tag = $a->tag;
 $tag = $a->starttag;
 $tag = $a->endtag;
 $ref = $a->attr('href');

 $links = $a->extractLinks();

 print $a->asHTML;

DESCRIPTION

Objects of the HTML::Element class can be used to represent elements of HTML. Objects have attributes and content. The content is a sequence of text segments and other HTML::Element objects. Thus a tree of HTML::Element objects as nodes can represent the syntax tree for a HTML document.

The following methods are available:

new HTML::Element 'tag', 'attrname' => 'value',...

The object constructor. Takes an tag name as argument. Optionally allows you to specify initial attributes at object creation time.

->tag()

Returns the tag name for the element.

->starttag()

Returns the complete start tag for the element. Including <> and attributes.

->endtag()

Returns the complete end tag.

->parent([$newparent])

Returns (optionally sets) the parent for this element.

->implicit([$bool])

Returns (optionally sets) the implicit attribute. This attribute is used to indicate that the element was not originally present in the source, but was inserted in order to conform to HTML strucure.

->isInside('tag',...)

Returns true if this tag is contained inside one of the specified tags.

->pos()

Returns (and optionally sets) the current position.

->attr('attr', [$value])

Returns (and optionally sets) the value of some attribute.

->content()

Returns the content of this element. The content is represented as a array of text segments and references to other HTML::Element objects.

->isEmpty()

Returns true if there is no content.

->insertElement($element, $implicit)

Inserts a new element at current position and sets the pos.

->pushContent($element)

Adds to the content of the element. The content should be a text segment (scalar) or a reference to a HTML::Element object.

->deleteContent()

Clears the content.

->delete()

Frees memory assosiated with the element an all children. This is needed because perl's reference counting does not work since we use circular references.

->traverse(\&callback, [$ignoretext])

Traverse the element and all its children. For each node visited, the callback routine is called with the node, a startflag and the depth as arguments. If the $ignoretext parameter is true, then the callback will not be called for text content. The flag is 1 when we enter a node and 0 when we leave the node.

If the return value from the callback is false then we will not traverse the children.

->extractLinks([@wantedTypes])

Returns links found by traversing the element and all its children. The return value is a reference to an array. Each element of the array is an array with 2 values; the link value and a reference to the corresponding element.

You might specify that you just want to extract some types of links. For instance if you only want to extract <a href="..."> and <img src="..."> links you might code it like this:

  for (@{ $e->extractLinks(qw(a img)) }) {
      ($link, $linkelem) = @$_;
      ...
  }
->dump()

Prints the element and all its children to STDOUT. Mainly useful for debugging.

->asHTML()

Returns a string (the HTML document) that represents the element and its children.

COPYRIGHT

Copyright (c) 1995 Gisle Aas. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Gisle Aas <aas@oslonett.no>