WWW::Wikevent::Bot
use WWW::Wikevent::Bot; use HTML::TreeBuilder; use utf8; my $bot = WWW::Wikevent::Bot->new(); $bot->name( 'HideoutBot' ); $bot->url( 'http://www.hideoutchicago.com/schedule.html' ); $bot->sample( 'sample.html' ); $bot->encoding( 'utf8' ); $bot->parser( sub { my ( $bot, $html ) = @_; # Use HTML::TreeBuilder and HTML::Element, or if you prefer # HTML::TokeParser to parse the HTML down to whatever elements # contains events, then ... foreach my $container ( @event_containers ) { my $event = $bot->add_event(); # build up the event using methods of L<HTML::Wikevent::Event> } # Figure out the next page to scrape (not needed if you are parsing # by month) and set $bot->url( $next_page_to_scrape ); }); $bot->scrape(); $bot->upload();
WWW::Wikevent::Bot is a package which will help you write scraper scripts for gathering events from venue and artist websites and for inclusion in the Free content events compendium, Wikevent.
The module takes care of the tedium of interaction with the website, and leaves to you the fun work of writing the scraper subroutine for the venue or artist you are interested in.
item $SEEN_FILE
Creates a new bot object.
$bot->name( $bot_name );
The name of your bot.
This setting will be used to control where your bot will submit information about itself and the list of events it scrapes on each run.
my @events = $bot->events()
or
my $event_ref = $bot->events()
The list of events which this bot has scraped (so far).
$bot->sample( 'somepage.html' );
A local file containing a sample page to scrape while you are building and debugging your parser subroutine.
$bot->charset( 'utf8' );
The charset of the target site/page.
Sometimes the charset is detected incorrectly, or even set incorrectly in venue and artist webpages. This lets you override.
An alias for charset, if you prefer.
$bot->url( 'http://venue.com/schedule.html' );
The next URL to scrape.
Initially you should set this to the first page which your scraper bot should look at. Afterwords if there are more pages to scrape you'll set it again in your parser subroutine.
If the site you're scraping has calendar pages with elements of the date in the URL you can put Date::Format placeholders in the your URL string, as in:
$bot->url( 'http://venue.com/calendar.html?year=%Ymonth=%L' );
.. and your bot will scrape months months ahead from the current month, whatever that is. You can of course override this behaviour by specifying a new URL to parse in the parser subroutine, but then you'll have to do all of the date calculation yourself.
months
$bot->months( $int ); my $int = $bot->months();
The number of months to scrape if url is a Date::Format specification.
url
Defaults to 3.
my $dir = $bot->user_dir( $dir );
The directory to which your events will be dumped.
Normally this is set as a side-effect of setting the name accessor, however it can be optionally set to something else after setting name.
name
my $page = $bot->user_page( $page );
The page on which information about your bot is to be found.
my $page = $bot->shows_page( $page );
The page to which events scraped by your bot will be uploaded.
my $e = $bot->add_event();
Create a new event and return it.
This is a convenience method which both creates a new event, adds it to events list (see above) and returns a refernce to which you may manipulate as necessary.
events
my @events = $bot->parse( $html );
my $events_ref = $bot->parse( $html );
Run the user supplised parser subroutine against the argument HTML and return any events found. This is used internally by scrape.
parser
scrape
$bot->check_allowed();
Check the user page of this bot to see if it is currently allowed to run. This will be indicated by the text:
run = true
at the top of the page. If that text is present return true, other wise die with an error. This method is called internally by upload so you don't have to call it, but you do have to make sure that the above text appears on the bot's user page.
upload
$bot->scrape_sample();
Runs the parser against the supplied sample HTML page.
sample
$bot->scrape();
Starts scraping at the supplied url and continues as long as url changes.
$bot->scrape_page( $url );
Scrapes a single page of HTML found at the given URL. This method is called internally by scrape.
$bot->dump();
Dumps the contents of events as text to standard out.
$bot->remember( $event );
Records an md5sum of the given event, so as to not repeat it again when running dump_to_file.
dump_to_file
$bot->load_remembered_events
Loads in the md5sums of previously remembered events. This is called internally by new so it's unlikely that you will need to call it.
remember
new
my $bool = $bot->is_new( $event );
Checks to see if the md5sum of an event is in our list of remembered events.
$bot->dump_to_file
Prints out the events in their final form to the appropriate .wiki file for upload to the bot's event page. This is called internally by upload but is also useful for the last stages of writing and debugging your bot.
$bot->upload();
This is the method which interacts with the Wikevent server, first checking to see if the bot is allowed to proceed, then doing an update, printing out the bot's events and then proceeding to do the upload.
Please submit bug reports to the CPAN bug tracker at http://rt.cpan.org/NoAuth/Bugs.html?Dist=www-wikevent-bot.
Discussion should take place on the Wiki, probably on the page "/wikevent.org/en/Wikevent:Perl library" in http:
Original author, maintainer
Copyright (c) 2004-2005 Mark Jaroski.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
=cut found outside a pod block. Skipping to next block.
To install WWW::Wikevent::Bot, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Wikevent::Bot
CPAN shell
perl -MCPAN -e shell install WWW::Wikevent::Bot
For more information on module installation, please visit the detailed CPAN module installation guide.