Allen Day > Archive-Heritrix-0.02 > Archive::Heritrix

Download:
Archive-Heritrix-0.02.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.02   Source  

NAME ^

Archive::Heritrix - Perl extension for processing Heritrix archive (.arc) files

SYNOPSIS ^

  use Archive::Heritrix;
  my $arc;

  #open a single .arc.gz archive
  $arc = Archive::Heritrix->new( file => 'a.arc.gz' );
  while ( my $rec = $arc->next_record() ) {
    #it's a HTTP::Response object
  }

  #open a directory of .arc.gz archives.  matches recursively on file extension
  $arc = Archive::Heritrix->new( directory => 'eg' );
  while ( my $rec = $arc->next_record() ) {
    #it's a HTTP::Response object
  }

DESCRIPTION ^

Process Heritrix archive (arc) files as a stream of HTTP::Response objects.

Heritrix is the archival-grade crawler used by the Internet Archive.

SEE ALSO ^

  Heritrix project homepage, http://crawler.archive.org

AUTHOR ^

Allen Day, <allenday@ucla.edu>

COPYRIGHT AND LICENSE ^

Copyright (C) 2008 by Allen Day

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.

syntax highlighting: