Daisuke Maki > Web-Scraper-Config > Web::Scraper::Config

Download:
Web-Scraper-Config-0.01.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.01   Source  

NAME ^

Web::Scraper::Config - Run Web::Scraper From Config Files

SYNOPSIS ^

  ---
  scraper:
    - process:
      - td>ul>li
      - trailers[]
      - scraper:
        - process_first:
          - li>b
          - title
          -  TEXT
        - process_first:
          - ul>li>a[href]
          - url
          - @href
        - process:
          - ul>li>ul>li>a
          - movies[]
          - __callback(process_movie)__


  my $scraper = Web::Scraper::Config->new(
    $config,
    {
      callbacks => {
        process_movie => sub {
          my $elem = shift;
          return {
            text => $elem->as_text,
            href => $elem->attr('href')
          }
        }
     }
   }
  );
  $scraper->scrape($uri);

DESCRIPTION ^

Web::Scraper::Config allows you to harness the power of Web::Scraper from a config file.

The config files can be written in any format that Config::Any understands, as long as it conforms to this module's rules.

METHODS ^

new

Creates a new Web::Scraper::Config instance.

The first arguments is either a hashref that represents a config, or a filename to the config. The config file can be in any format that Config::Any understands as long as it returns a hash that's conformant to the Web::Scraper::Config rules.

The second argument (options) is optional, and is currently only used to provider callbacks to be called from the scraper. When Web::Scraper::Config encounters an element in the form of:

  __callback(function_name)__

then that is replaced by the corresponding callback specified in the options hash.

scrape

Starts scraping. The semantics are exactly the same as Web::Scraper::scrape

AUTHOR ^

Daisuke Maki <daisuke@endeworks.jp>

LICENSE ^

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

syntax highlighting: