☼ 林永忠 ☼ > WWW-Bookmark-Crawler-0.01 > WWW::Bookmark::Crawler

Download:
WWW-Bookmark-Crawler-0.01.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.01   Source  

NAME ^

WWW::Bookmark::Crawler - Personal bookmark search engine

SYNOPSIS ^

  use WWW::Bookmark::Crawler;
  $crawler = WWW::Bookmark::Crawler->new({
                                           SOURCE => 'bookmarks.html',
                                           DBNAME => 'mybookmark.db',
                                           PEEK   => 1,
                                           TOKENIZER => \&my_tokenizer,
                                         });
  $crawler->peek();
  $crawler->crawl();

  $crawler->nopeek();

  $crawler->query('Ars longa');

DESCRIPTION ^

WWW::Bookmark::Crawler is a WWW spider and a search engine for personal bookmark. It first extracts links in either a browser-generated bookmark or a plain html file, then retrieves each page's content online and builds the index file. User can use this module to build a personal bookmark search engine.

METHODS ^

new

Parameters:

crawl

Starts fetching and building index file.

query

Returns an array of hashes of URLs and Titles related to the given terms. The default tokenizer treats space as intersection. This method builds an in-memory inverted file from index file when it appears the first time in a script.

No advanced IR skills are used.

peek

Turns on the debugging output. Same effective as PEEK given to new.

nopeek

Turns off the debugging information.

proxy

Sets the proxy server. Same effective as PROXY given to new.

timeout

Sets the TIMEOUT value. Same effective as PROXY given to new.

AUTHOR ^

xern <xern@cpan.org>

LICENSE ^

Released under The Artistic License.

syntax highlighting: