The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.
package CPAN::Metrics;

=pod

=head1 NAME

CPAN::Metrics - Create and maintain a Perl::Metrics database for all of CPAN

=head1 SYNOPSIS

  # Do a CPAN::Metrics run
  my $metrics = CPAN::Metrics->new(
      remote  => 'http://mirrors.kernel.org/cpan/',
      local   => '/home/adam/.minicpan',
      extract => '/home/adam/.cpanmetrics',
      metrics => '/home/adam/.cpanmetrics/metrics.sqlite',
  )->run;

=head1 DESCRIPTION

C<CPAN::Metrics> is a combination of L<CPAN::Mini> and L<Perl::Metrics>.

It lets you pull out all of CPAN (for various definitions of "all") and
run L<Perl::Metrics> on it to generate massive amounts of metrics data
on the 16,000,000 lines of code in CPAN.

=head2 Resource Usage

While it might make it relatively easy to write the B<code> to "process
all of CPAN", make no mistake that it's going to take you LOT of
computing resources to do it. And especially so the first time.

To do a single run should require 1-10 gigabytes of disk space, up to
several hundred megabytes of memory, and hours (or days) of CPU time.

The result will be a SQLite database containing somewhere between several
hundred thousand and several million rows of metrics data.

What you do with the metrics after B<that> is up to you.

=head1 METHODS

=cut

use 5.005;
use strict;
use base 'CPAN::Mini::Extract';
use Carp 'croak';
use Perl::Metrics ();

use vars qw{$VERSION};
BEGIN {
	$VERSION = '0.08';
}





#####################################################################
# Constructor

=pod

=head1 new

The C<new> constructor creates a new CPAN metrics processor.

Although it is created as an object, due to L<Perl::Metrics> you can
only create a single object within a single process. (I think)

It takes a variety of different parameters.

=over

=item minicpan arguments

=back

Returns a new C<CPAN::Metrics> object, or dies on error.

=cut

sub new {
	my $class = ref $_[0] ? ref shift : shift;

	# Call up to get the base object
	my $self = $class->SUPER::new(
		force          => 1,
		skip_perl      => 1,
		extract_check  => 1,
		path_filters   => [
			qr/\bAcme\b/i,
			qr/\bPDF\-API2\b/i,
			qr/\bPerl6\b/i,
			],
		# Remove some known troublemakers
		module_filters => [
			qr/^Acme::/i,
			qr/^Meta::/i,
			qr/\bPerl6\b/i,
			],
	        extract_filter =>
	        	sub {
				return 0 if /\:/;
	        		return 0 if /\binc\b/;
	        		return 1 if /\.pl$/;
				return 0 if /\bexamples?\b/;
	        		if ( /\bt\b/ ) {
		        		return 1 if /\.t$/;
	        		} else {
		        		return 1 if /\.pm$/;
	        		}
	        		return 0;
	        	},
		@_,
		);

	# Check and set the metrics database
	unless ( $self->{metrics} ) {
		croak("Metrics database param 'metrics' was not provided");
	}
	Perl::Metrics->import( $self->{metrics} );

	$self;
}

=pod

=head2 run

The C<run> method launches the CPAN metrics processor. It will
syncronize its L<minicpan> mirror from the remote server, expanding
any new archives, and removing old ones. Once updated, the directory
will be reindexed at update it in the metricsdatabase, and any required
processing done to add the resulting metrics needed.

And then (a C<very> long time later) it will stop. :)

Oh, and return true. Any errors will cause an exception (i.e. die)

=cut

sub run {
	my $self = shift;
	$self->SUPER::run( @_ );
	$self->process_index;
}

sub process_index {
	my $self = shift;

	# Process the extraction directory
	local $Perl::Metrics::TRACE = 1;
	$self->trace("Indexing and processing documents in $self->{extract}...\n");
	Perl::Metrics->process_index( $self->{extract} );

	return 1;
}

1;

=pod

=head1 TO DO

- Improve Perl::Metrics to add needed things

- Improve CPAN::Metrics::Extract to add needed things

- Improve CPAN::Metrics to add needed things

- Get all three of the above to use accessors

- Possibly consider intentionally B<disabling> caching so that
we don't end up with a multi-multi-gigabyte parse cache.

=head1 SUPPORT

Bugs should be reported via the CPAN bug tracker at

L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=CPAN-Metrics>

For other issues, contact the author.

=head1 AUTHOR

Adam Kennedy E<lt>adamk@cpan.orgE<gt>, L<http://ali.as/>

=head1 COPYRIGHT

Copyright 2005 - 2008 Adam Kennedy.

This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the
LICENSE file included with this module.

=cut