Daisuke Maki > HTML-RobotsMETA-0.00004 > HTML::RobotsMETA

Download:
HTML-RobotsMETA-0.00004.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.00004   Source  

NAME ^

HTML::RobotsMETA - Parse HTML For Robots Exclusion META Markup

SYNOPSIS ^

  use HTML::RobotsMETA;
  my $p = HTML::RobotsMETA->new;
  my $r = $p->parse_rules($html);
  if ($r->can_follow) {
    # follow links here!
  } else {
    # can't follow...
  }

DESCRIPTION ^

HTML::RobotsMETA is a simple HTML::Parser subclass that extracts robots exclusion information from meta tags. There's not much more to it ;)

DIRECTIVES ^

Currently HTML::RobotsMETA understands the following directives:

ALL
NONE
INDEX
NOINDEX
FOLLOW
NOFOLLOW
ARCHIVE
NOARCHIVE
SERVE
NOSERVE
NOIMAGEINDEX
NOIMAGECLICK

METHODS ^

new

Creates a new HTML::RobotsMETA parser. Takes no arguments

parse_rules

Parses an HTML string for META tags, and returns an instance of HTML::RobotsMETA::Rules object, which you can use in conditionals later

parser

Returns the HTML::Parser instance to use.

get_parser_callbacks

Returns callback specs to be used in HTML::Parser constructor.

TODO ^

Tags that specify the crawler name (e.g. <META NAME="Googlebot">) are not handled yet.

There also might be more obscure directives that I'm not aware of.

AUTHOR ^

Copyright (c) 2007 Daisuke Maki <daisuke@endeworks.jp>

SEE ALSO ^

HTML::RobotsMETA::Rules HTML::Parser

LICENSE ^

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

syntax highlighting: