The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::RelExtor - Extract "rel" and "rev" information from LINK and A tags.

SYNOPSIS

  use HTML::RelExtor;

  my $parser = HTML::RelExtor->new();
  $parser->parse($html);

  for my $link ($parser->links) {
      print $link->href, "\n" if $link->has_rel('nofollow');
  }

  my($canonical) = grep $_->has_rev('canonical'), $parser->links;
  if ($canonical) {
      $shorten_url = $canonical->href;
  }

DESCRIPTION

HTML::RelExtor is a HTML parser module to extract relationship information from A and LINK HTML tags.

METHODS

new
  $parser = HTML::RelExtor->new();
  $parser = HTML::RelExtor->new(base => $base_uri);

Creates new HTML::RelExtor object.

parse
  $parser->parse($html);

Parses HTML content. See HTML::Parser for other method signatures.

  my @links = $parser->links();
  my @links = $parser->links(rel => 'alternate');
  my @links = $parser->links(rev => 'canonical');

Returns list of link information with 'rel' or 'rev' attributes as a HTML::RelExtor::Link object. When given rel or rev parameter, returns only links that has the rel or rev value.

  # These are equivalent
  @links = $parser->links(rel => 'alternate');
  @links = grep $_->has_rel('alternate'), $parser->links;

HTML::RelExtor::Link METHODS

href
  my $href = $link->href;

Returns 'href' attribute of links.

tag
  my $tag = $link->tag;

Returns tag name of links in lowercase, either 'a' or 'link';

attr
  my $attr = $link->attr;

Returns a hash reference of attributes of the tag.

rel
  my @rel = $link->rel;

Returns list of 'rel' attributes. If a link contains <a href="tag nofollow">blahblah</a>, rel() method returns a list that contains tag and nofollow.

rev
  my @rev = $link->rev;

Returns list of 'rev' attributes.

has_rel
  if ($link->has_rel('nofollow')) { }

A handy shortcut method to find out if a link contains specific relationship.

has_rev
  if ($link->has_rev('canonical')) { }

A handy shortcut method to find out if a link contains specific reverse relationship.

text
  my $text = $link->text;

Returns text inside tags, only avaiable with A tags. It returns undef value when called with LINK tags.

EXAMPLES

Collect A links tagged with rel="friend" used in XFN (XHTML Friend Network).

  my $p = HTML::RelExtor->new();
  $p->parse($html);

  my @links = map { $_->href }
      grep { $_->tag eq 'a' && $_->has_rel('friend') } $p->links;

TODO

  • Accept callback parameter when creating a new instance.

AUTHOR

Tatsuhiko Miyagawa <miyagawa at bulknews.net>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

HTML::LinkExtor, HTML::Parser

http://www.w3.org/TR/REC-html40/struct/links.html

http://www.google.com/googleblog/2005/01/preventing-comment-spam.html

http://developers.technorati.com/wiki/RelTag

http://gmpg.org/xfn/11

http://shiflett.org/blog/2009/apr/save-the-internet-with-rev-canonical