Apache::Wyrd::Site::IndexBot - Sample 'bot for forcing index builds
Sample Implementation:
package BASENAME::IndexBot; use strict; use base qw(Apache::Wyrd::Site::MySQLIndexBot BASENAME::Wyrd); use BASENAME::Index; sub params { my ($self) = @_; my $params = { basefile => $self->dbl->req->document_root . '/var/indexbot', server_hostname => $self->dbl->req->server->server_hostname, document_root => $self->dbl->req->document_root, fastindex => $self->_flags->fastindex || 0, purge => $self->_flags->purge || 0, realclean => $self->_flags->realclean || 0, }; return $params; } sub _work { my ($self) = @_; my $index = BASENAME::Index->new; $index->delete_index if ($self->{'purge'}); $self->index_site($index); }
Sample Usage:
<BASENAME::IndexBot refresh="20" expire="40" flags="reverse, purge"> <BASENAME::Template name="meta">$:meta</BASENAME::Attribute> <H1>Rebuilding the Index</H1> <H2>$:status</H2> $:view </BASENAME::Page> </BASENAME::IndexBot>
The IndexBot is an Apache::Wyrd::Bot object which performs the action of causing a site to be completely indexed, and any remaining deleted documents purged from the index. It does so by reading the name of existing files from the document root down, purging files that are no longer found in that file- tree, and generating HTTP requests for all the pages which are found.
Apache::Wyrd::Bot
As these pages are "Indexable Pages", they update their own index pages when loaded by the server in answer to the HTTP request.
It should be used in a webmaster-protected section of the site for two reasons: 1. providing public access to the indexing bot is inviting a denial- of-service attack, since indexing is very resource-intensive and 2. The Apache:Wyrd::Site::IndexBot "borrows" the webmaster's authorization cookie in order to be granted full access to the site.
Apache:Wyrd::Site::IndexBot
Per Apache::Wyrd::Bot.
Per Apache::Wyrd::Bot, but now required.
Clear the entire index beforehand. When a first-time or major change has been made to a site, this tends to speed up the process by eliminating the need to detect and purge stale data.
Only purge missing documents and index documents that have changed or have been added since the last build.
Per Apache::Wyrd::Bot. Show the bot output log in reverse, with newest events at the top.
(format: (returns) name (arguments after self))
_work
Per Apache::Wyrd::Bot. Each site must provide a _work method to the Bot in which the index is given as a reference and pass that index as the argument to the index_site method.
index_site
Performs the indexing.
Other bugs/caveats per Apache::Wyrd::Bot. Also reserves the methods index_site and purge_missing.
Barry King <wyrd@nospam.wyrdwright.com>
General-purpose HTML-embeddable perl object
Server-launched, monitored processes.
Construct and track a page of an integrated site
Copyright 2002-2007 Wyrdwright, Inc. and licensed under the GNU GPL.
See LICENSE under the documentation for Apache::Wyrd.
Apache::Wyrd
To install Apache::Wyrd, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Apache::Wyrd
CPAN shell
perl -MCPAN -e shell install Apache::Wyrd
For more information on module installation, please visit the detailed CPAN module installation guide.