Archive::Heritrix - Perl extension for processing Heritrix archive (.arc) files
use Archive::Heritrix; my $arc; #open a single .arc.gz archive $arc = Archive::Heritrix->new( file => 'a.arc.gz' ); while ( my $rec = $arc->next_record() ) { #it's a HTTP::Response object } #open a directory of .arc.gz archives. matches recursively on file extension $arc = Archive::Heritrix->new( directory => 'eg' ); while ( my $rec = $arc->next_record() ) { #it's a HTTP::Response object }
Process Heritrix archive (arc) files as a stream of HTTP::Response objects.
Heritrix is the archival-grade crawler used by the Internet Archive.
Heritrix project homepage, http://crawler.archive.org
Allen Day, <allenday@ucla.edu>
Copyright (C) 2008 by Allen Day
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.
To install Archive::Heritrix, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Archive::Heritrix
CPAN shell
perl -MCPAN -e shell install Archive::Heritrix
For more information on module installation, please visit the detailed CPAN module installation guide.