Maciej Ceglowski > WWW-Blog-Identify-0.06 > WWW::Blog::Identify

Download:
WWW-Blog-Identify-0.06.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.06   Source  

NAME ^

WWW::Blog::Identify - Identify blogging tools based on URL and content

SYNOPSIS ^

  use WWW::Blog::Identify "identify";
  
  my $flavor = identify( $url, $html );

FUNCTIONS ^

identify URL, HTML

Attempts to identify the blog based on an examination of the URL and content. Returns undef if all tests fail, otherwise returns a guess as to the blog 'flavor'.

DESCRIPTION ^

This is a heuristic module for identifying weblogs based on their URL and content. The module is a compilation of identifying patterns observed in the wild, for a variety of blogging tools and providers worldwide. You can read a full list of blogs represented in the README. Please email the author if you have a blogging engine you would like added to the detector.

The module first checks the URL for common blog hosts (BlogSpot, Userland, Persianblog, etc.) and returns immediately if it can find a match. Failing that, it will look through the blog HTML for distinctive markers (such as "powered by" images) or META generator tags. As a last resort, it will test to see if the page contains an RSS feed, or has the word 'blog' in it repeated at least five times.

The philosophy of this module is to favor false negatives over false positives. If you are a blog tool author, you can vastly improve the detection rate simply by using a generator tag in your default template, like this:

<meta name="generator" content="myBlogTool 0.01" />

This module is in active use on a large blog index, so I'll try to keep it reasonably up to date.

EXPORT

None by default. You can export 'identify' out into your namespace if you like.

AUTHOR ^

Maciej Ceglowski, <developer@ceglowski.com>

COPYRIGHT ^

(c) 2003 Maciej Ceglowski

This module is distributed under the same license as Perl itself.

syntax highlighting: