WWW::Scraper::Sherlock - Scrapes search engines via Sherlock plugins.
require WWW::Scraper; $search = new WWW::Scraper('Sherlock'); $search->sherlockPlugin($pluginURI); # then proceed as any normal WWW::Search module. $result = $search->next_result(); # The result objects include additional methods specifically for Sherlock. $result->name(); $result->url(); $result->relevance(); $result->price(); $result->avail(); $result->email(); $result->detail(); $result->banner(); $result->browserResultType(); # Attributes of the <SEARCH> and <BROWSER> blocks of the plugin # can be accessed via a hash in the object named 'sherlockSearchParam'. $search->{'sherlockSearchParam'}{'name'} # name . . . {...}{'description'} # description . . . {...}{'method'} # method . . . {...}{'action'} # action . . . {...}{'routeType'} # routeType . . . {...}{'update'} # update . . . {...}{'updateCheckDays'} # updateCheckDays
Performs WWW::Scraper-style searches on search engines, given a Sherlock plugin to define the request and response (as defined in http://developer.apple.com/technotes/tn/tn1141.html and enhanced by http://www.mozilla.org/projects/search/technical.html).
The plugin is named by a URI, such as "file:yahoo.src" or "http://sherlock.mozdev.org/yahoo.src".
This version does not automatically update plugins; it ignores the 'update' and 'updateCheckDays' attributes of the <SEARCH> block.
Getchur plugins red-hot from http://sherlock.mozdev.org/source/browse/sherlock/www/.
Also ignored in this version are the <INTERPRET> attributes of 'skipLocal' (partially implemented), 'charset', 'resultEncoding', 'resultTranslationEncoding' and 'resultTranslation'.
$search->sherlockPlugin(pluginURI, { 'option' => $value });
You may supply any of the options available to WWW::Scraper objects (which are, in turn, WWW::Search objects). Options may also be passed to new Sherlock object via the sherlockPlugin() method, just as they would be in WWW::Search's next_result(). New Sherlock options include
sherlockPlugin()
next_result()
noUpdate - boolean, do not fetch an updated plugin, even if that is called for by updateCheckDays.
This sample is a complete script that runs Sherlock against Yahoo.com. The query is "Greeting Cards". It lists all the harvested fields to STDOUT. Note that WWW::Scraper('Sherlock') loads WWW::Scraper::Sherlock, so you don't have to.
use WWW::Scraper; my $scraper = new WWW::Scraper('Sherlock'); $scraper->sherlockPlugin('http://sherlock.mozdev.org/yahoo.src'); # or 'file:Sherlock/yahoo.src'; $scraper->native_query('Greeting Cards', {'search_debug' => 1}); while ( my $result = $scraper->next_result() ) { print "NAME: '".$result->name()."'\n"; print "URL: '".$result->url()."'\n"; print "RELEVANCE: '".$result->relevance()."'\n"; print "PRICE: '".$result->price()."'\n"; print "AVAIL: '".$result->avail()."'\n"; print "EMAIL: '".$result->email()."'\n"; print "DETAIL: '".$result->detail()."'\n"; }
http://www.apple.com/sherlock/plugindev.html
http://developer.apple.com/technotes/tn/tn1141.html
http://www.mozilla.org/projects/search/technical.html
http://sherlock.mozdev.org/source/browse/sherlock/www/
WWW::Scraper::Sherlock is written and maintained by Glenn Wood, glenwood@alumni.caltech.com.
WWW::Scraper::Sherlock
Copyright (c) 2001 Glenn Wood All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install WWW::Scraper, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Scraper
CPAN shell
perl -MCPAN -e shell install WWW::Scraper
For more information on module installation, please visit the detailed CPAN module installation guide.