The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

CPAN::Access::AdHoc - Retrieve stuff from an arbitrary CPAN repository

SYNOPSIS

 use CPAN::Access::AdHoc;
 
 my ( $module ) = @ARGV;
 my $cad = CPAN::Access::AdHoc->new();
 my $index = $cad->fetch_module_index();
 if ( $index->{$module} ) {
     print "$module is in $index->{distribution}\n";
 } else {
     print "$module is not indexed\n";
 }

DESCRIPTION

This class provides a lowish-level interface to an arbitrary CPAN repository. You can fetch anything, but there is particular support for the author and module indices, distributions, and their metadata.

What it does not provide is module installation, dependency resolution, or what-have-you. There are already plenty of tools for that.

The intent is that this should be a zero-configuration system, or at least a configuration-optional system.

Attributes can be specified explicitly either when the object is instantiated or afterwards. The default is from the global section of a Config::Tiny configuration file, CPAN-Access-AdHoc.ini, which is found in directory File::HomeDir->my_dist_config( 'CPAN-Access-AdHoc' ). The named sections are currently unused, though CPAN-Access-AdHoc reserves to itself all section names which contain no uppercase letters.

In addition, it is possible to take the default CPAN repository URL from the user's CPAN::Mini, cpanm, CPAN, or CPANPLUS configuration. They are accessed in this order by default, and the first available is used. But which of these are considered, and the order in which they are considered is under the user's control, via the default_cpan_source attribute/configuration item.

What actually happened here is that I got an RT ticket on one of my CPAN distributions, pointing out that the Free Software Foundation had moved, and I needed to update the copy of the Gnu GPL that I distributed. Well, it's the same text for all my distributions, so I wanted a tool to tell me which ones had already been updated in CPAN.

A little later, I realized that a clobbered version of one of my author tests got shipped in a couple distributions, so I wrote another Perl script to see how far the rot had spread.

Then I found out about an interesting but somewhat heavyweight module, and wanted to know what I really needed to install to get it going. Yes, cpanm will do this, but I have not taken that step yet.

So I found myself writing mostly the same code for the third time, and decided there ought to be a better way. Hence this module.

METHODS

This class supports the following public methods:

Instantiator

new

This static method instantiates the object. You can specify attribute values by passing name/value argument pairs. Defaults are documented with the individual attributes.

If you do not specify an explicit cpan argument, and a default CPAN URL can not be computed, an exception is thrown. See the cpan attribute documentation for a few more details.

Accessors/Mutators

config

When called with no arguments, this method acts as an accessor, and returns the current configuration as a Config::Tiny object.

When called with an argument, this method acts as a mutator. If the argument is a Config::Tiny object it becomes the new configuration. If the argument is undef, file CPAN-Access-AdHoc.ini in File::HomeDir->my_dist_config( 'CPAN-Access-AdHoc' ) is read for the configuration. If this file does not exist, the configuration is set to an empty Config::Tiny object.

cpan

When called with no arguments, this method acts as an accessor, and returns a URI object representing the URL of the CPAN repository being accessed.

When called with an argument, this method acts as a mutator. It sets the URL of the CPAN repository accessed by this object, and (for reasons of sanity) calls flush() to purge any data cached from the old repository. The argument can be either a string or an object that stringifies (such as a URI object). To be valid, the scheme must be supported by LWP::UserAgent (that is, LWP::Protocol::implementor() must return a true value), and must support a hierarchical name space. That means that schemes like file:, http:, and ftp: are accepted, but schemes like mailto: (non-hierarchical name space) and foobar: (not known to be supported by LWP::UserAgent) are not.

If the argument is undef, the default URL as computed from the sources in default_cpan_source is used. If no URL can be computed from any source, an exception is thrown.

default_cpan_source

When called with no arguments, this method acts as an accessor, and returns the current list of default CPAN sources as an array reference. This is incompatible with version 0.000_08 and before, where the return was a comma-delimited string.

When called with an argument, this method acts as a mutator, and sets the list of default CPAN sources. This list is either an array reference or a comma-delimited string, and consists of the names of zero or more CPAN::Access::AdHoc::Default::CPAN::* classes. With either mechanism the names of the classes may be passed without the common prefix, which will be added back if needed. See the documentation of these classes for more information.

If any of the elements in the string does not represent an existing CPAN::Access::AdHoc::Default::CPAN:: class, an exception is thrown and the value of the attribute remains unmodified.

If the argument is undef, the default is restored.

The default is 'CPAN::Mini,cpanm,CPAN,CPANPLUS'.

http_error_handler

When called with no arguments, this method acts as an accessor, and returns the current HTTP error handler.

When called with an argument, this method acts as a mutator, and sets the HTTP error handler. This must be a code reference.

When an HTTP error is encountered, the handler will be called and passed three arguments: the CPAN::Access::AdHoc object, the path relative to the base URL of the CPAN repository, and the HTTP::Response object. Whatever it returns will be returned by the caller.

If the argument is undef, the default is restored.

The default is \&CPAN::Access::AdHoc::DEFAULT_HTTP_ERROR_HANDLER, which throws an exception, giving the URL and the HTTP status line. If you do not want to code for every error you might encounter, handle the uninteresting errors with

 goto &CPAN::Access::AdHoc::DEFAULT_HTTP_ERROR_HANDLER:

This assumes that you have not modified @_.

Functionality

These methods are what all the rest is in aid of.

corpus

This convenience method returns a list of the indexed distributions by the author with the given CPAN ID. This information is derived from the output of indexed_distributions(). The argument is converted to upper case before use.

exists

This method returns true if the named file exists in the CPAN repository, and false otherwise. Its argument is the name of the file relative to the root of the repository.

This method should be faster than fetch(), because it does not actually retrieve the archive.

fetch

This method fetches the named file from the CPAN repository. Its argument is the name of the file relative to the root of the repository.

If this method determines that there might be checksums for this file, it attempts to retrieve them, and if successful will compare the SHA256 checksum of the retrieved data to the retrieved value.

If the file is compressed in some way it will be decompressed.

If the fetched file is an archive of some sort, an object representing the archive will be returned. This object will be of one of the CPAN::Access::AdHoc::Archive::* classes, each of which wraps the corresponding Archive::* class and provides CPAN::Access::AdHoc with a consistent interface. These classes will be initialized with

 content => the literal content of the archive, as downloaded,
 encoding => the MIME encoding used to decode the archive,
 path => the path to the archive, relative to the base URL.

If the fetched file is not an archive, it is wrapped in a CPAN::Access::AdHoc::Archive::Null object and returned.

All other fetch functionality is implemented in terms of this method.

fetch_author_index

This method fetches the author index, authors/01mailrc.txt.gz. It is expanded and interpreted, and returned as a hash reference keyed by the authors' CPAN IDs. The data for each author is an anonymous hash with the following keys:

name => the name of the author;
address => the electronic mail address of the author.

The results of the first fetch are cached; subsequent calls are supplied from cache.

fetch_module_index

This method fetches the module index, modules/02packages.details.txt.gz. It is expanded and interpreted, and returned as a hash reference keyed by the module names. The data for each module is an anonymous hash with the following keys:

distribution => the name of the distribution that contains the module, relative to the authors/id/ directory;
version => the version of the module.

If called in list context, the first return is the index, and the second is another hash reference containing the metadata that appears at the top of the expanded index file.

If an HTTP error is encountered while fetching the index, normally an error is thrown. But if the http_error_handler returns nothing, an empty index (and empty index metadata) are returned.

The results of the first fetch are cached; subsequent calls are supplied from cache.

fetch_distribution_archive

This method takes as its argument the name of a distribution file relative to the archive's authors/id/ directory, and returns the distribution as a CPAN::Access::AdHoc::Archive::* object.

Note that since this method is implemented in terms of fetch(), the archive method's path attribute will be set to its path relative to the base URL of the CPAN repository, not its path relative to the authors/id/ directory. So, for example,

 $arc = $cad->fetch_distribution_archive(
     'B/BA/BACH/PDQ-0.000_01.zip' );
 say $arc->path(); # authors/id/B/BA/BACH/PDQ-0.000_01.zip

For convenience, either the top or the top two directories can be omitted, since they can be reconstructed from the rest. So the above example can also be written as

 $arc = $cad->fetch_distribution_archive(
     'BACH/PDQ-0.000_01.zip' );
 say $arc->path(); # authors/id/B/BA/BACH/PDQ-0.000_01.zip

fetch_distribution_checksums

 use YAML::Any;
 print Dump( $cad->fetch_distribution_checksums(
     'B/BA/BACH/' ) );
 print Dump( $cad->fetch_distribution_checksums(
     'BACH' ) );        # equivalent to previous
 print Dump( $cad->fetch_distribution_checksums(
     'B/BA/BACH/Johann-0.001.tar.bz2' ) );
 print Dump( $cad->fetch_distribution_checksums(
     'BACH/Johann-0.001.tar.bz2' ) );   # ditto

This method takes as its argument either a file name or a directory name relative to authors/id/. A directory is indicated by a trailing slash.

If the request if for the CHECKSUMS file, the return is a reference to a hash which contains the interpreted contents of the entire file.

If the argument is a file name other than CHECKSUMS, the return is a reference to the CHECKSUMS entry for that file, provided it exists.

If the argument is a directory name, it is treated like a request for the CHECKSUMS file in that directory.

If the CHECKSUMS file does not exist, an exception is raised. If the argument was a file name and the file has no entry in the CHECKSUMS file, nothing is returned.

For convenience, either the top or the top two directories can be omitted, since they can be reconstructed from the rest.

The result of the first fetch for a given directory is cached, and subsequent calls for the same author are supplied from cache.

fetch_registered_module_index

This method fetches the registered module index, modules/03modlist.data.gz. It is interpreted, and returned as a hash reference keyed by module name.

If called in list context, the first return is the index, and the second is a hash reference containing the metadata that appears at the top of the expanded index file.

The results of the first fetch are cached; subsequent calls are supplied from cache.

flush

This method deletes all cached results, causing them to be re-fetched when needed.

indexed_distributions

This convenience method returns a list of all indexed distributions in ASCIIbetical order. This information is derived from the results of fetch_module_index(), and is cached.

Subclass Methods

The following methods exist for the benefit of subclasses, and should not be considered part of the public interface. I am willing to make this interface public on request, but until the request comes I will consider myself at liberty to modify it without notice.

__attr

This method returns a hash containing all attributes specific to the class that makes the call. This hash may be modified, and in fact must be to store new attribute values.

__cache

This method returns a hash containing all values cached by the object. This hash may be modified, and in fact must be to cache new values.

__create_accessor_mutators

 __PACKAGE__->__create_accessor_mutators( @attributes );

This static method creates accessor/mutator methods for the attributes named in its argument list. If a subroutine with the same name as an attribute exists at the time this method is called, that subroutine is assumed to be the accessor/mutator for that attribute.

The methods created by __create_accessor_mutators() have three hooks for behavior modification. For any attribute whatever, these are:

__attr__whatever__default
 my ( $self, $value ) = @_;
 my $code;
 not defined $value
     and $code = $self->can( '__attr__whatever__default' )
     and $value = $code->( $self );

This is called when the mutator is passed a new value of undef. Its only argument is the invocant. It must return a valid value of the attribute.

If a subclass overrides this, the subclass probably should not call $self->SUPER::__attr__whatever__default().

__attr__whatever__validate
 my ( $self, $value ) = @_;
 my $code;
 $code = $self->can( '__attr__whatever__validate' )
     and $value = $code->( $self, $value );

This method is called after __attr__whatever__default(), and validates the value. It rejects a value by throwing an exception. The preferred way to do this is by calling __wail().

If a subclass overrides this, the subclass must execute

 $value = $self->SUPER::__attr__whatever__validate( $value );

before it performs its own validation. The superclass method must return the internal format of the attribute's value, which the subclass must return after validating.

__attr__whatever__post_assignment
 $self->__attr__whatever__post_assignment()

This method is called after the new value has been assigned to the attribute.

If a subclass overrides this, it must call $self->SUPER::__attr__whatever__post_assignment(), and it should call it last thing before returning.

All these hooks are optional, but __create_accessor_mutators() will generate dummy __attr__whatever__validate() and __attr__whatever__post_assignment() methods for any attributes that do not have them at the time it is called.

__init

This method is called when a new object is instantiated. Its arguments are the invocant and a reference to a hash containing attribute names and values.

If a subclass adds attributes, it must override this method. The override must call $self->SUPER::__init( $args ) first thing. It must then set its own attributes from the $args hash reference, deleting them from the hash. The override returns nothing.

SEE ALSO

App::cpanlistchanges by Tatsuhiko Miaygawa lists Changes files -- by default the changes from the version you have installed to the most-current CPAN version.

CPAN::DistnameInfo by Graham Barr, which parses distribution name and version (among other things) from the name of a particular distribution archive. This was very helpful in some of my CPAN ad-hocery.

CPAN::Easy by Chris Weyl, which retrieves distributions and their meta information. As of this writing, it does not support version 2.0 of the meta spec.

CPAN::Index by Adam Kennedy, which accesses the CPAN indices, storing them in an SQLite database.

CPAN::Inject by Adam Kennedy, which injects tarballs into a .cpan/sources directory for a given CPAN ID.

CPAN::Meta by David Golden, which presents a unified interface for the various versions of the CPAN meta-data specification.

CPAN::Mini by Ricardo Signes, which lets you have your own personal CPAN, optionally with only the latest distributions.

CPAN::Mini::Devel by David Golden, which is CPAN::Mini with the addition of developer releases.

CPAN::Mini::Inject by Christian Walde, which injects distributions into a Mini-CPAN.

CPAN::PackageDetails by Brian D Foy, which reads and writes the CPAN modules/02packages.details.txt.gz file.

Parse::CPAN::Packages by Christian Wade, which parses the CPAN modules/02packages.details.txt.gz file.

Parse::CPAN::Packages::Fast by Slaven Rezic, non-OO code which parses the CPAN modules/02packages.details.txt.gz file.

SUPPORT

Support is by the author. Please file bug reports at http://rt.cpan.org, or in electronic mail to the author.

AUTHOR

Thomas R. Wyant, III wyant at cpan dot org

COPYRIGHT AND LICENSE

Copyright (C) 2012-2014 by Thomas R. Wyant, III

This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.