#!perl

=head1 HOW TO MIRROR CPAN

As of this writing the single purpose for this distribution is to
provide a fast rsync client/server combination for the CPAN.

You find a list of potential rsync servers at
http://cpan.perl.org/SITES.html

Find current stats at http://mirror.cpan.org/

The first thing you should prepare is the CPAN tree itself on your
disk. The source where you take it from does not matter that much.
Take it from where you always took it.

The loop is something like this:

Short version:

    #!/bin/sh

    rrr-client --source rsync://cpan-rsync.perl.org::CPAN --target /home/rrrcpan/target --tmpdir /home/tmp &

Or if the short version is not sufficient for some reason, maybe you
want only the authors/ and modules/ tree:

    use File::Rsync::Mirror::Recent;
    my @rrr = map {
        File::Rsync::Mirror::Recent->new
                (
                 localroot                  => "/home/rrrcpan/target/$_", # your local path
                 remote                     => "cpan-rsync.perl.org::CPAN/$_/RECENT.recent", # your upstream
                 max_files_per_connection   => 863, # arbitrary
                 tempdir                    => "/home/ftp/tmp", # optional tempdir to hide temporary files
                 ttl                        => 10,
                 rsync_options              =>
                 {
                  compress          => 1,
                  links             => 1,
                  times             => 1,
                  checksum          => 0,
                  'omit-dir-times'  => 1, # not available before rsync 3.0.3
                  timeout           => 300, # do not allow rsync to hang forever
                 },
                 tmpdir                     => "/home/rrrcpan/target",
                 verbose                    => 1,
                 verboselog                 => "/var/log/rmirror-pause.log",
                )} "authors", "modules";
    die "directory $_ doesn't exist, giving up" for grep { ! -d $_->localroot } @rrr;
    while (){
        my $ttgo = time + 1200; # arbitrary
        for my $rrr (@rrr){
            $rrr->rmirror ( "skip-deletes" => 1 );
        }
        my $sleep = $ttgo - time;
        if ($sleep >= 1) {
            print STDERR "sleeping $sleep ... ";
            sleep $sleep;
        }
    }

=pod

Target directory is, of course, your choice, so replace
'/home/rrrcpan/target' with your favorite path.

The tmpdir is important and must reside on the same partition as the
target directory, so that rename(2) works.

You see the 'skip-deletes' and guess, this is mirroring without doing
deletes. You can do it with deletes or you can leave the deletion to
the other, the traditional rsync loop. Worry about that later.

You see the option 'max_files_per_connection' which I filled with 863
and you need to find your favorite number. The larger the number the
faster the whole download goes. As you already have a mirror, you
probably want to take 10 or 20 times times that much. CPAN's authors
directory is at the moment (2010-10) consisting of 180000 files. This
option chunks the downloads into piecemeal units and between these
units the process takes a peek at the most recent files to download
them with higher preference. That way we get the newest files
immediately in sync while we are mirroring the long tail of old files.

The localroot parameter contains your target directory. Because only
the authors/ and the modules/ directory are hosted by the PAUSE, you
need the loop over two directories. The other directories of CPAN are
currently not available for downloading with
File::Rsync::Mirror::Recent.

The timeslice in the while loop above needs to be large enough to let
the rsync server survive. If you choose a random rsync server and are
not an rsync server yourself please be modest and choose 1200. Choose
less if you're offering rsync yourself and have a fat pipe, and
especially if you know your upstream can manage it.

Set the key 'verbose' to 0 if you have no debugging demands. In this
case you want to omit the 'sleeping $sleep ...' noise further down the
script as well.

Set the key 'verboselog' to your favorite progress watcher file. This
option is underdeveloped and may later be replaced with something
better. Note that the program still sends error messages to STDERR.

You can leave everything else as it stands. Start experimenting and
let me know how it works out.

-- 
andreas koenig

=cut