The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MyCPAN::App::DPAN::UserManual - How to manage a DPAN

DESCRIPTION

DPAN, short for D{ark|istributed|ecentralized} Perl Archive Network, helps you create your own Perl distributions repository that you can use with standard CPAN tools. You can put any distributions with any versions that you like in your own repository since you completely control it. These can be distributions from the real CPAN, private distributions that you create yourself, or older versions of distributions from BackPAN (the historical CPAN archive).

The DPAN stuff takes a bunch of distributions that you specify and organizes them into a CPAN-like repository. The simplest process is to dump all of your distributions in a single directory and run the dpan command in the same repository:

        % cd my_dist_directory
        % dpan

By default, dpan finds and organizes all of the distributions in the directory into the appropriate CPAN-like structure. Behind the scenes, MyCPAN::Indexer looks at each distribution and creates a report from it. You should see a directory that contains a reports for each distribution:

        indexer_reports/
                error_reports/
                success_reports/

From the list of reports, dpan creates the right index files, including:

        authors/00whois.xml
        authors/01mailrc.txt.gz
        modules/02packages.details.txt.gz
        modules/03modlist.data.gz

After everything is done, you can use your new repository as your CPAN source. You can use it as a local directory, serve it through a webserver, or put it behind an FTP server.

Deciding on which modules to add

So far, it's up to you to decide which distributions that you want in your repository, but we'd like to create a tool that can take a single distribution and tell you everything else it needs.

For a more general solution, you can start with a MiniCPAN which either filters the distributions from the real CPAN or just includes all of them.

You can keep a separate directory of your private distributions that dpan can merge for you.

Running dpan for the first time

Running dpan for the first time over a large repository can take quite a bit of time. Running it over all of a MiniCPAN, about 20,000 distributions taking up about 1 GB, can take a couple of hours. Fortunately, on subsequent runs dpan only needs to analyze the distributions it hasn't succesfully analyzed yet and the run should be much faster.

To play with dpan, start with a directory that only has a couple of modules in it. Once you work out how everything works and setup everything to your satisfaction, you can run dpan against a full repository.

Once you have your DPAN, you can run minicpan-webserver from CPAN::Mini::Webserver. You'll have a basic website that allows you to search for modules and read documentation just for the distributions in your DPAN.

ADVANCED USE

Configuring dpan

dpan can take two different configuration files: one for its setup and one for Log::Log4perl:

        % dpan -f dpan.conf -l dpan.log4perl

See the LOGGING section for more details about the logging setup.

The dpan configuration directives are listed in the dpan documentation. The format is a simple, line-oriented list of key-value pairs:

        organize_dists 1
        retry_errors   0
        merge_dirs     my_local_modules/foo/bar

To see the configuration for any setup, you can use the -c switch:

        % dpan -c
        alarm   15
        author_map
        collator_class   MyCPAN::App::DPAN::Reporter::Minimal
        copy_bad_dists   0
        dispatcher_class   MyCPAN::Indexer::Dispatcher::Serial
        dpan_dir   /Users/brian/DEv/mycpan--app--dpan
        error_report_subdir   /Users/brian/DEv/mycpan--app--dpan/indexer_reports/error
        extra_reports_dir
        fresh_start   0
        i_ignore_errors_at_my_peril   0
        ignore_missing_dists   0
        ignore_packages   main MY MM DB bytes DynaLoader
        indexer_class   MyCPAN::App::DPAN::Indexer
        indexer_id   Joe Example <joe@example.com>
        interface_class   MyCPAN::Indexer::Interface::Text
        log_file_watch_time   30
        organize_dists   1
        parallel_jobs   1
        pause_full_name   DPAN user <CENSORED>
        pause_id   DPAN
        prefer_bin   0
        queue_class   MyCPAN::App::DPAN::SkipQueue
        relative_paths_in_report   1
        report_dir   /Users/brian/DEv/mycpan--app--dpan/indexer_reports
        reporter_class   MyCPAN::App::DPAN::Reporter::Minimal
        retry_errors   1
        skip_perl   0
        success_report_subdir   /Users/brian/DEv/mycpan--app--dpan/indexer_reports/success
        system_id   an unnamed system
        use_real_whois   0
        worker_class   MyCPAN::Indexer::Worker

There are some directives that you'll probably want to set right away because they are specific to your setup:

        dpan_dir   /path/to/my/dpan/repository
        indexer_id   Joe Example <joe@example.com>
        pause_full_name   DPAN user <CENSORED>
        pause_id   DPAN
        system_id   an unnamed system

You are probably safe with the remaining defaults which configure dpan for the most common situation.

Dealing with indexing failures

If there's an error, you'll see some error output and dpan will dump the error into a file for that distribution under indexer_reports/error_reports/.

There are two common reasons for an index failure: either the analysis could not complete in the alloted time (by default 15 seconds) or dpan could not unpack the distribution. Although rare, the next most frequent problem comes from an unexpected distribution structure.

We're developing a bunch of reports that we can distribute separately so you don't have to do this by hand. If your run into problem distributions, let us know.

Time-outs

If you see the error "Alarm rang", it means the analysis timed-out. You can set a longer time by configuring the alarm time in your configuration file:

        alarm: 120

Some distributions can take an extremely long time (more than a couple minutes) to unpack. This time might include the transfer speed over your network if you have to get the file over NFS, etc), the size of the distribution, the speed at which you can write files.

Distribution unpacking

MyCPAN::Indexer relies on Archive::Extract to unpack distributions. Archive::Extract can try a pure Perl solution through Archive::Tar or use an external binary.

Can't find modules

CPAN authors can do almost anything they like with their modules, so MyCPAN::Indexer might have some trouble indexing some modules. The most frequent problems comes from errors unpacking archives.

Although MyCPAN::Indexer is constantly trying to improve its ability to analyze distributions, DPAN specifically disables MyCPAN::Indexer's preferred method of running the build file and inspecting blib/ to see what showed up. dpan tries not to run any code, so it sometimes can't guess what would have shown up in blib/. It does its best though.

Aside from improvements in MyCPAN::Indexer's ability to deal with odd situations, dpan has another way to handle these problematic distributions. You can configure the extra_reports directive so the indexer can use pre-prepared reports in addition to the reports that it generates. These extra reports can be ones that you create by hand with information that you know about the module or reports that you get from a more in-depth index.

Adding your local distributions

You could just dump the private distributions you want to add into the DPAN directory, but you can also copy them in from other directories:

        merge_dirs /repo/foo/bar /repo/baz/quux

This is quite handy when you are using CPAN::Mini, which tries to remove files it doesn't think belong in the repository. After you update your MiniCPAN, dpan can copy these additional modules into your DPAN repository.

USE CASES

dpan is usually only part of the process to manage your DPAN. The particular process depends on your needs, and there are several ways that you could manage it.

A small private repository

If you prefer a very small repository that contains only the distributions that your application uses, you have a bit of work to do. It's on our to-do list to automatically list and download all of the distributions that a particular application uses, but we're not that far yet.

        ... Magic happens ...

Assuming the magic that gets you all of the distributions that you need, put all of those distributions in the a single directory and run dpan.

Tracking a MiniCPAN

You can base your DPAN on a MiniCPAN. There are a few steps to go through, so you might create a shell script to handle this for you.

Update your MiniCPAN:

        % minicpan

The ~/.minicpanrc configuration file should use your DPAN directory as the value for local:

        local: /path/to/your/dpan
        remote: some CPAN mirror

The minicpan is going to try to clean up your directory, so your local modules might disappear. That's not a problem. There is currently a bug in CPAN::Mini that will also clean up source control files and other files that might disappear. We're working on that too. Here's a patch to CPAN::Mini:

        http://github.com/briandfoy/cpan-mini/commit/6cc882cc09b2987ce0f3a4f8087ea751feaa88f1

You can filter distributions from your MiniCPAN with the hooks that CPAN::Mini provides. See it's documentation for the details.

Once you update with minicpan, your repository is stale. That's okay. It's time to run dpan:

        % dpan -f dpan.conf

If you have merge_dirs configured, dpan will pull those distributions and put them into DPAN with the fake CPAN author that you specified in pause_id ("DPAN" by default). If you don't want to merge in this way. you can copy the distributions into your MiniCPAN with rsync or something else. Ensure you keep the originals so minicpan doesn't delete them.

Once complete, start up minicpan_webserver from CPAN::Mini::Webserver:

        % minicpan_webserver

 C<minicpan_webserver> uses the C<local> value from F<~/.minicpanrc>.

Keeping DPAN in source control

DPAN might be most useful when you keep it in source control. At the end of an indexing run, dpan can commit the changes to source control. There are some adjustments that you need to make, however.

First, you have to ensure that minicpan won't remove your source control directories. There's a patch for that:

        http://github.com/briandfoy/cpan-mini/commit/6cc882cc09b2987ce0f3a4f8087ea751feaa88f1

Next, configure a postflight_class for dpan. Start with the MyCPAN::App::DPAN::SVNPostFlight for an example. At the end of processing, dpan calls the run method in postflight_class. In the example, MyCPAN::App::DPAN::SVNPostFlight figures out what to remove or add to your subversion repository and commits the result. It's more fully explained in the example, which is intended as a starting point for your own process.

LOGGING

dpan uses Log::Log4perl, which you can configure any way that you like. Each component has its own logging category:

        Coordinator
        Queue
        Dispatcher
        Worker
        Reporter
        Collator
        PostFlight

For more details on the components, see MyCPAN::Indexer::Tutorial. There are some example Log4perl configurations in the MyCPAN::Indexer and MyCPAN::App::DPAN distributions.

GETTING MORE HELP

If you have any other questions, don't hesitate to ask. If you need help setting up a DPAN, we can also arrange for private help.

SEE ALSO

MyCPAN::Indexer::Tutorial, dpan

SOURCE AVAILABILITY

This code is in Github:

        git://github.com/briandfoy/mycpan-app-dpan.git

AUTHOR

brian d foy, <bdfoy@cpan.org>

COPYRIGHT AND LICENSE

Copyright © 2010-2018, brian d foy <bdfoy@cpan.org>. All rights reserved.

You may redistribute this under the terms of the Artistic License 2.0.