The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

AC::MrGamoo::FileList - get list of files

SYNOPSIS

    emacs /myperldir/Local/MrGamoo/FileList.pm
    copy. paste. edit.

    use lib '/myperldir';
    my $m = AC::MrGamoo::D->new(
        class_filelist    => 'Local::MrGamoo::FileList',
    );

IMPORTANT

You can fire up the system, and get the servers talking to each other, and perform some limited tests without this file.

But you must provide this file in order to actually run map/reduce jobs.

DESCRIPTION

MrGamoo only runs map/reduce jobs. It is up to you to get the files on to the servers and keep track of where they are. And to tell MrGamoo.

Some people keep the file meta-information in a sql database. Some people keep the file meta-information in a yenta map. Some people keep the file meta-information in the filesystem.

When a new job starts, your get_file_list function will be called with the job config, and should return an arrayref of matching files along with meta-info.

Each element of the returned arrayref should be a hashref containing at least the following fields:

filename

the name of the file, relative to the basedir in your config file.

    filename    => 'www/2010/01/17/23/5943_prod_5x2N5qyerdeddsNi'

location

an arrayref of servers where this file is located. the locations should be the persistent-ids of the servers (see MySelf).

if the same file is replicated on multiple servers, mrgamoo will be able to both intelligently determine which servers will process which files, as well as recover from failures.

    location    => [ 'mrm@athena.example.com', 'mrm@zeus.example.com' ]

size

this should be the size of the file, in bytes. mrgamoo will consider the sizes of files in determining which servers will process which files.

    size        => 10843

BUGS

none. you write this yourself.

SEE ALSO

    AC::MrGamoo

AUTHOR

    You!