use Log::Parallel::Writers; __PACKAGE__->register_writer(); my $writer = get_writer('TSV', lazy_open_filename => $filename_or_program_to_open_when_there_is_output, fh => $filehandle, columns => \@list_of_all_columns, sort_by => \@list_of_columns_to_sort_by, host => $hostname_where_fh_ends_up, filename => $path_where_fh_ends_up, bucket => $the_bucket_number_for_this_file, new_fields_cb => \&code_to_handle_new_fields, ); $writer->write($log); $writer->done(); $writer->sort_arguments(); $writer->post_sort_transform(); $writer->metadata(); # return the medadata for the files written $writer->header(); # return the data format record $writer->host; # accessor methods $writer->filename; $writer->fh; $writer->bucket;
A writer formats a
$log record for output. Since the output from a single job may be streamed to multiple hosts and into multiple buckets there may be multiple writer objects active at the same time.
The actual open is performed elsewhere because the output may be sent somewhere else before it ends up where the Writer thinks it is going. For example, it may need to be sorted.
This module, Log::Parallel::Writers, dispatches the call to get a writer to the named writer. The writer modules, like Log::Parallel::Raw and Log::Parallel::TSV must register themselves with Log::Parallel::Writers so that they can be found by name.
To create a writer, you need to subclass
Log::Parallel::Writers::BaseClass and register yourself:
our @ISA = qw(Log::Parallel::Writers::BaseClass); __PACKAGE__->register_writer();
Writes must override the
write() method. They may also want to override other methods like
With the exception of
write() all of these methods are defined in the base class and overriding them is opitonal.
This must return a header object for the log written. The header has all the information required to use the log file.
The header object is an anonymous hash. It must have the following keys:
The name must uniquely identify a particular format. For formats that don't have a predefined set of columns, the name should include an md5 of the column names.
The name of the parser as registered by a parser. See Log::Parallel::Parsers.
An ordered list of the column names.
An ordered list of column names by which the output file is sorted (if any)
Sort_types is a hash of the sort_by column names to their unix sort(1) flags, eg:
It can have additional keys in the hash. It cannot have anything that isn't uniquely specified by the name field: The header structures for two different headers with the same name field must be identical.
The metadata is very simple: the full path to the file, the hostname, and a file header object (as returned by header()).
Return the filename (not including host) for the output file.
As per the
Which output bucket this file is in. Buckets are integers, starting from zero.
The entries in this file. Usually the number of lines.
If the output needs to be sorted by the unix sort program, perhaps it needs to be in a temporary format so that sort can handle it.
If so, then Writer should output the temporary format and post_sort_transform() should return a function that takes a line of input and provides one or more lines of output that transform the Writer's output in the the format it actually needs to be in.
The function returned by post_sort_transform() must be a string that is eval'ed to create the sort transformation function.
This is done in Log::Task::PostSort
If the output needs to be sorted by the unix sort program, this method provides the arguments to unix sort so that it performs the correct sort.
Note that the merge-sort used to combine multiple buckets wil do a numeric comparison before it does a string comparison so the unix sort aguments should reflect a number sort preference.
This method returns a string. It does not include the filename.
This package may be used and redistributed under the terms of either the Artistic 2.0 or LGPL 2.1 license.