James E Keenan > List-RewriteElements-0.09 > List::RewriteElements

Download:
List-RewriteElements-0.09.tar.gz

Dependencies

Annotate this POD

View/Report Bugs
Module Version: 0.09   Source  

NAME ^

List::RewriteElements - Create a new list by rewriting elements of a first list

SYNOPSIS ^

   use List::RewriteElements;

Constructor

Simplest case: Input from array, output to STDOUT.

    $lre = List::RewriteElements->new( {
        list        => \@source,
        body_rule   => sub {
                            my $record = shift;
                            $record .= q{additional field};
                       },
    } );

Input from file, output to STDOUT:

    $lre = List::RewriteElements->new( {
        file        => "/path/to/source/file",
        body_rule   => sub {
                            my $record = shift;
                            $record .= q{,additional field};
                       },
    } );

Provide a different rule for the first element in the list:

    $lre = List::RewriteElements->new( {
        file        => "/path/to/source/file",
        header_rule => sub {
                            my $record = shift;
                            $record .= q{,ADDITIONAL HEADER};
                       },
        body_rule   => sub {
                            my $record = shift;
                            $record .= q{,additional field};
                       },
    } );

Input from file, output to file:

    $lre = List::RewriteElements->new( {
        file        => "/path/to/source/file",
        body_rule   => sub {
                            my $record = shift;
                            $record .= q{additional field};
                       },
        output_file => "/path/to/output/file",
    } );

To name output file, just provide a suffix to filename:

    $lre = List::RewriteElements->new( {
        file            => "/path/to/source/file",
        body_rule       => sub {
                            my $record = shift;
                            $record .= q{additional field};
                           },
        output_suffix   => '.out',
    } );

Provide criteria to suppress output of header or individual record.

    $lre = List::RewriteElements->new( {
        file            => "/path/to/source/file",
        header_suppress => sub {
                            my $record = shift;
                            return if $record =~ /$somepattern/;
                        },
        body_suppress   => sub {
                            my $record = shift;
                            return if $record ne 'somestring';
                        },
        body_rule       => sub {
                            my $record = shift;
                            $record .= q{additional field};
                        },
    } );

Generate Output

    $lre->generate_output();

Report Output Information

    $path_to_output_file    = $lre->get_output_path();

    $output_file_basename   = $lre->get_output_basename();

    $output_row_count       = $lre->get_total_rows();

    $output_record_count    = $lre->get_total_records();

    $records_changed        = $lre->get_records_changed();

    $records_unchanged      = $lre->get_records_unchanged();

    $records_deleted        = $lre->get_records_deleted();

    $header_status          = $lre->get_header_status();

DESCRIPTION ^

It is common in many situations for you to receive a flat data file from someone else and have to generate a new file in which each row or record in the incoming file must either (a) be transformed according to some rule before being printing to the new file; or (b) if it meets certain criteria, not output to the new file at all.

List::RewriteElements enables you to write such rules and criteria, generate the file of transformed data records, and get back some basic statistics about the transformation.

List::RewriteElements is useful when the number of records in the incoming file may be large and you do not want to hold the entire list in memory. Similarly, the newly generated records are not held in memory but are immediately printed to STDOUT or to file.

On the other hand, if for some reason you already have an array of records in memory, you can use List::RewriteElements to apply rules and criteria to each element of the array and then print the transformed records (again, without holding the output in memory).

SUBROUTINES ^

new()

Purpose: List::RewriteElements constructor.

Arguments: Reference to a hash holding the following keys:

Return Value: List::RewriteElements object.

generate_output()

Purpose: Generates the output specified by arguments to new(), i.e., creates an output file or prints to STDOUT with records transformed as per those arguments.

Arguments: None.

Return Value: Returns true value upon success. In case of failure it will croak with some error message.

get_output_path()

Purpose: Get the full path to the newly created output file.

Arguments: None.

Return Value: String holding path to newly created output file.

Comment: Since use of the output_suffix attribute means that the full path to the output file will not be known until generate_output() has been called, get_output_path() will only give a meaningful result once generate_output() has been called. Otherwise, it will default to an empty string.

get_output_basename()

Purpose: Get only the basename of the newly created output file.

Arguments: None.

Return Value: String holding basename of newly created output file.

Comment: Since use of the output_suffix attribute means that the full path to the output file will not be known until generate_output() has been called, get_output_basename() will only give a meaningful result once generate_output() has been called. Otherwise, it will default to an empty string.

get_total_rows()

Purpose: Get the total number of rows in the newly created output file. This will include any header row.

Arguments: None.

Return Value: Nonnegative integer.

get_total_records()

Purpose: Get the total number of data records in the newly created output file. If a header row is present in that file, get_total_records() will return a value 1 less than that returned by get_total_rows().

Arguments: None.

Return Value: Nonnegative integer.

get_records_changed()

Purpose: Get the number of data records in the newly created output file that are altered versions of records in the incoming file. This value does not include changes in the header row.

Arguments: None.

Return Value: Nonnegative integer.

get_records_unchanged()

Purpose: Get the number of data records in the newly created output file that are unaltered versions of records in the incoming file. This value does not include changes in the header row.

Arguments: None.

Return Value: Nonnegative integer.

get_records_deleted()

Purpose: Get the number of data records in the original source (file or list) that were omitted from the newly created output file due to application of a body_suppress criterion. This value does not include any suppression of a header row following application of a header_suppress criterion.

Arguments: None.

Return Value: Nonnegative integer.

get_header_status()

Purpose: Indicate whether any header row in the original source (file or list)

Arguments: None.

Return Value: Numerical flag: 1, 0, -1 or undef as described above.

FAQ ^

Can I simultaneously rewrite records and interact with the external environment?

Yes. If a header_rule, body_rule, header_suppress or body_suppress either (a) needs additional information from the external environment above and beyond that contained in the individual data record or (b) needs to cause a change in the external environment, you can write a closure and call that closure insider the rule.

Example:

    my @greeks = qw( alpha beta gamma );
    
    my $get_a_greek = sub {
        return (shift @greeks);
    };

    my $lre  = List::RewriteElements->new ( {
        list        => [ map {"$_\n"} (1..5) ],
        body_rule   => sub {
            my $record = shift;
            my $rv;
            chomp $record;
            if ($record eq '4') {
                $rv = &{$get_a_greek};
            } else {
                $rv = (10 * $record);
            }
            return $rv;
        },
        body_suppress   => sub {
            my $record = shift;
            chomp $record;
            return if $record eq '5';
        },
    } );

    $lre->generate_output();

This will produce:

    10
    20
    30
    alpha

Can I use List-Rewrite Elements with fixed-width data?

Yes. Suppose that you have this fixed-width data (adapted from Dave Cross' Data Munging with Perl):

    my @dataset = (
        q{00374Bloggs & Co       19991105100103+00015000},
        q{00375Smith Brothers    19991106001234-00004999},
        q{00376Camel Inc         19991107289736+00002999},
        q{00377Generic Code      19991108056789-00003999},
    );

Suppose further that you need to update certain records and that %revisions holds the data for updating:

    my %revisions = (
        376 => [ 'Camel Inc', 20061107, 388293, '+', 4999 ],
        377 => [ 'Generic Code', 20061108, 99821, '-',  6999 ],
    );

Write a body_rule subroutine which uses unpack, pack and sprintf as needed to update the records.

    my $lre  = List::RewriteElements->new ( {
        list        => \@dataset,
        body_rule   => sub {
            my $record = shift;
            my $template = 'A5A18A8A6AA8';
            my @rec  = unpack($template, $record);
            $rec[0] =~ s/^0+//;
            my ($acctno, %values, $result);
            $acctno = $rec[0];
            $values{$acctno} = [ @rec[1..$#rec] ];
            if ($revisions{$acctno}) {
                $values{$acctno} = $revisions{$acctno};
            }
            $result = sprintf  "%05d%-18s%8d%06d%1s%08d",
                ($acctno, @{$values{$acctno}});
            return $result;
        },
    } );

How does this differ from Tie::File?

Mark Jason Dominus' Tie::File module is one of my Fave 5 CPAN modules. It's excellent for modifying a file in place. But I frequently have to leave the source file unmodified and create a new file, which implies, at the very least, opening, printing to, and closing filehandles in addition to using Tie::File. List::RewriteElements hides all that. It also provides the statistical report methods.

Couldn't I do this with map and grep?

Quite possibly. But if your rules and criteria were complicated or long, the content of the map and grep {} blocks would be hard to read. You also wouldn't get the statistical report methods.

How Does It Work?

Why do you care? Why do you want to look inside the black box? If you really want to know, read the source!

PREREQUISITES ^

List::RewriteElements relies only on modules distributed with the Perl core as of 5.8.0. IO::Capture::Stdout is required for the test suite, but a copy is included in the distribution under the t/ directory.

BUGS ^

None known at this time. File bug reports at http://rt.cpan.org.

HISTORY ^

0.09 Mon Jan 22 22:35:56 EST 2007 - Update version number and release date only. Purpose: generate new round of tests by cpan testers, in the hope that it eliminates a FAIL report on v0.08 where failure was due solely to error on tester's box.

0.08 Mon Jan 1 08:54:01 EST 2007 - xdg to the rescue! Applied and extended patches supplied by David Golden for Win32. In constructor, value of $/ is supplied to the recsep option.

0.07 Sun Dec 31 11:13:04 EST 2006 - Switched to using File::Spec::catfile() to generate one path (rather than Cwd::realpath(). This was done in an attempt to respond to corion's FAIL reports (but I don't have a good Windows box, so I can't be certain of the results).

0.06 Sat Dec 16 11:31:38 EST 2006 - Created t/07_fixed_width.t and t/testlib/fixed.t to illustrate use of List::RewriteElements with fixed-width data.

0.05 Thu Dec 14 07:42:24 EST 2006 - Correction of POD formatting errors only; no change in functionality. CPAN upload.

0.04 Wed Dec 13 23:04:33 EST 2006 - More tests; fine-tuning of code and documentation. First CPAN upload.

0.03 Tue Dec 12 22:13:00 EST 2006 - Implementation of statistical methods; more tests.

0.02 Mon Dec 11 19:38:26 EST 2006 - Added tests to demonstrate use of closures to supply additional information to elements such as body_rule.

0.01 Sat Dec 9 22:29:51 2006 - original version; created by ExtUtils::ModuleMaker 0.47

ACKNOWLEDGEMENTS ^

Thanks to David Landgren for raising the question of use of List-RewriteElements with fixed-width data.

I then adapted an example from Dave Cross' Data Munging with Perl, Chapter 7.1, "Fixed-width Data," to provide a test demonstrating processing of fixed-width data.

AUTHOR ^

James E Keenan. CPAN ID: JKEENAN. jkeenan@cpan.org. http://search.cpan.org/~jkeenan/ or http://thenceforward.net/perl/modules/List-RewriteElements.

COPYRIGHT ^

Copyright 2006 James E Keenan (USA).

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO ^

David Cross, Data Munging with Perl (Manning, 2001).

syntax highlighting: