AC::MrGamoo::FileList - get list of files
emacs /myperldir/Local/MrGamoo/FileList.pm copy. paste. edit. use lib '/myperldir'; my $m = AC::MrGamoo::D->new( class_filelist => 'Local::MrGamoo::FileList', );
You can fire up the system, and get the servers talking to each other, and perform some limited tests without this file.
But you must provide this file in order to actually run map/reduce jobs.
MrGamoo only runs map/reduce jobs. It is up to you to get the files on to the servers and keep track of where they are. And to tell MrGamoo.
Some people keep the file meta-information in a sql database. Some people keep the file meta-information in a yenta map. Some people keep the file meta-information in the filesystem.
When a new job starts, your get_file_list function will be called with the job config, and should return an arrayref of matching files along with meta-info.
get_file_list
Each element of the returned arrayref should be a hashref containing at least the following fields:
the name of the file, relative to the basedir in your config file.
basedir
filename => 'www/2010/01/17/23/5943_prod_5x2N5qyerdeddsNi'
an arrayref of servers where this file is located. the locations should be the persistent-ids of the servers (see MySelf).
if the same file is replicated on multiple servers, mrgamoo will be able to both intelligently determine which servers will process which files, as well as recover from failures.
location => [ 'mrm@athena.example.com', 'mrm@zeus.example.com' ]
this should be the size of the file, in bytes. mrgamoo will consider the sizes of files in determining which servers will process which files.
size => 10843
none. you write this yourself.
AC::MrGamoo
You!
To install AC::MrGamoo, copy and paste the appropriate command in to your terminal.
cpanm
cpanm AC::MrGamoo
CPAN shell
perl -MCPAN -e shell install AC::MrGamoo
For more information on module installation, please visit the detailed CPAN module installation guide.