The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::Scraper::Sherlock - Scrapes search engines via Sherlock plugins.

SYNOPSIS

    require WWW::Scraper;
    $search = new WWW::Scraper('Sherlock');
    $search->sherlockPlugin($pluginURI);
    
    # then proceed as any normal WWW::Search module.
    $result = $search->next_result();
    
    # The result objects include additional methods specifically for Sherlock.
    $result->name();
    $result->url();
    $result->relevance();
    $result->price();
    $result->avail();
    $result->email();
    $result->detail();
    $result->banner();
    $result->browserResultType();    

    # Attributes of the <SEARCH> and <BROWSER> blocks of the plugin
    #  can be accessed via a hash in the object named 'sherlockSearchParam'.
    $search->{'sherlockSearchParam'}{'name'}  # name
       . . . {...}{'description'}             # description
       . . . {...}{'method'}                  # method
       . . . {...}{'action'}                  # action
       . . . {...}{'routeType'}               # routeType
       . . . {...}{'update'}                  # update
       . . . {...}{'updateCheckDays'}         # updateCheckDays

DESCRIPTION

Performs WWW::Scraper-style searches on search engines, given a Sherlock plugin to define the request and response (as defined in http://developer.apple.com/technotes/tn/tn1141.html and enhanced by http://www.mozilla.org/projects/search/technical.html).

The plugin is named by a URI, such as "file:yahoo.src" or "http://sherlock.mozdev.org/yahoo.src".

This version does not automatically update plugins; it ignores the 'update' and 'updateCheckDays' attributes of the <SEARCH> block.

Getchur plugins red-hot from http://sherlock.mozdev.org/source/browse/sherlock/www/.

Also ignored in this version are the <INTERPRET> attributes of 'skipLocal' (partially implemented), 'charset', 'resultEncoding', 'resultTranslationEncoding' and 'resultTranslation'.

OPTIONS

    $search->sherlockPlugin(pluginURI, { 'option' => $value });

You may supply any of the options available to WWW::Scraper objects (which are, in turn, WWW::Search objects). Options may also be passed to new Sherlock object via the sherlockPlugin() method, just as they would be in WWW::Search's next_result(). New Sherlock options include

noUpdate - boolean, do not fetch an updated plugin, even if that is called for by updateCheckDays.

EXAMPLE

This sample is a complete script that runs Sherlock against Yahoo.com. The query is "Greeting Cards". It lists all the harvested fields to STDOUT. Note that WWW::Scraper('Sherlock') loads WWW::Scraper::Sherlock, so you don't have to.

    use WWW::Scraper;
    
    my $scraper = new WWW::Scraper('Sherlock');
    $scraper->sherlockPlugin('http://sherlock.mozdev.org/yahoo.src'); # or 'file:Sherlock/yahoo.src';
   
    $scraper->native_query('Greeting Cards', {'search_debug' => 1});
   
    while ( my $result = $scraper->next_result() ) {
        print "NAME: '".$result->name()."'\n";
        print "URL: '".$result->url()."'\n";
        print "RELEVANCE: '".$result->relevance()."'\n";
        print "PRICE: '".$result->price()."'\n";
        print "AVAIL: '".$result->avail()."'\n";
        print "EMAIL: '".$result->email()."'\n";
        print "DETAIL: '".$result->detail()."'\n";
    }

SEE ALSO

Apple's Introduction to Sherlock plugin development

http://www.apple.com/sherlock/plugindev.html

Sherlock Specification Technote TN1141

http://developer.apple.com/technotes/tn/tn1141.html

Mozilla Enhancements

http://www.mozilla.org/projects/search/technical.html

Mozdev Plugins Library

http://sherlock.mozdev.org/source/browse/sherlock/www/

AUTHOR

WWW::Scraper::Sherlock is written and maintained by Glenn Wood, glenwood@alumni.caltech.com.

COPYRIGHT

Copyright (c) 2001 Glenn Wood All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.