Daniel B. Boorstein > Text-xSV-Slurp-0.22 > Text::xSV::Slurp

Download:
Text-xSV-Slurp-0.22.tar.gz

Dependencies

Annotate this POD

CPAN RT

New  2
Open  0
View/Report Bugs
Module Version: 0.22   Source  

NAME ^

Text::xSV::Slurp - Convert xSV data to common data shapes.

VERSION ^

Version 0.22

SYNOPSIS ^

Text::xSV::Slurp converts xSV (typically CSV) data to nested data structures of various shapes. It allows both column and row filtering using user defined functions.

This brief example creates an array of hashes from a file, where each array record corresponds to a line of the file, and each line is represented as a hash of header-to-value pairs.

    use Text::xSV::Slurp 'xsv_slurp';
    
    my $aoh = xsv_slurp( 'foo.csv' );
    
    ## if foo.csv contains:
    ##
    ##   uid,name
    ##   342,tim
    ##   939,danboo
    ##
    ## then $aoh contains:
    ##
    ##   [
    ##     { uid => '342', name => 'tim' },
    ##     { uid => '939', name => 'danboo' },
    ##   ]

FUNCTIONS ^

xsv_slurp()

xsv_slurp() converts xSV (typically CSV) data to nested data structures of various shapes. It allows both column and row filtering using user defined functions.

Option summary:

The file, handle and string options are mutually exclusive. Only one source parameter may be passed in each call to xsv_slurp(), otherwise a fatal exception will be raised.

The source can also be provided implicitly, without the associated key, and the source type will be guessed by examining the first item in the option list. If the item is a reference type, it is treated as a handle source. If the item contains a newline or carriage return, it is treated as a string source. If the item passes none of the prior tests, it is treated as a file source.

   ## implicit C<handle> source
   my $aoa = xsv_slurp( \*STDIN, shape => 'aoa' );

   ## implicit C<string> source
   my $aoh = xsv_slurp( "h1,h2\n" . "d1,d2\n" );

   ## implicit C<file> source
   my $aoh = xsv_slurp( 'foo.csv' );

The shape parameter supports values of aoa, aoh, hoa or hoh. The default shape is aoh. Each shape affects certain parameters differently (see below).

The text_csv option can be used to control Text::CSV/Text::CSV_XS parsing. The given HASH reference is passed to the Text::CSV constructor. If the text_csv option is undefined, the default Text::CSV constructor is called. For example, to change the separator to a colon, you could do the following:

   my $aoh = xsv_slurp( file => 'foo.csv',
                    text_csv => { sep_char => ':' } );

aoa

example input:

   h1,h2,h3
   l,m,n
   p,q,r

example data structure:

   [
      [ qw/ h1 h2 h3 / ],
      [ qw/ l  m  n  / ],
      [ qw/ p  q  r  / ],
   ]

shape specifics:

full example:

   ## - convert xSV example to an array of arrays
   ## - include only rows containing values matching /[nr]/
   ## - include only the first and last columns 

   my $aoa = xsv_slurp( string   => $xsv_data,
                        shape    => 'aoa',
                        col_grep => sub { return @( shift() }[0,-1] },
                        row_grep => sub { return grep /[nr]/, @{ $_[0] } },
                      );

   ## $aoa contains:
   ##
   ##   [
   ##      [ 'l',  'n' ],
   ##      [ 'p',  'r' ],
   ##   ]

aoh

example input:

   h1,h2,h3
   l,m,n
   p,q,r

example data structure:

   [
      { h1 => 'l', h2 => 'm', h3 => 'n' },
      { h1 => 'p', h2 => 'q', h3 => 'r' },
   ]

shape specifics:

full example:

   ## - convert xSV example to an array of hashes
   ## - include only rows containing values matching /n/
   ## - include only the h3 column 

   my $aoh = xsv_slurp( string   => $xsv_data,
                        shape    => 'aoh',
                        col_grep => sub { return 'h3' },
                        row_grep => sub { return grep /n/, values %{ $_[0] } },
                      );

   ## $aoh contains:
   ##
   ##   [
   ##      { h3 => 'n' },
   ##   ]

hoa

example input:

   h1,h2,h3
   l,m,n
   p,q,r

example data structure:

   {
      h1 => [ qw/ l p / ],
      h2 => [ qw/ m q / ],
      h3 => [ qw/ n r / ],
   }

shape specifics:

full example:

   ## - convert xSV example to a hash of arrays
   ## - include only rows containing values matching /n/
   ## - include only the h3 column 

   my $hoa = xsv_slurp( string   => $xsv_data,
                        shape    => 'hoa',
                        col_grep => sub { return 'h3' },
                        row_grep => sub { return grep /n/, values %{ $_[0] } },
                      );

   ## $hoa contains:
   ##
   ##   {
   ##      h3 => [ qw/ n r / ],
   ##   }

hoh

example input:

   h1,h2,h3
   l,m,n
   p,q,r

example data structure (assuming a key of 'h2,h3'):

   {
   m => { n => { h1 => 'l' } },
   q => { r => { h1 => 'p' } },
   }

shape specifics:

full example:

   ## - convert xSV example to a hash of hashes
   ## - index using h1 values
   ## - include only rows containing values matching /n/
   ## - include only the h3 column 

   my $hoh = xsv_slurp( string   => $xsv_data,
                        shape    => 'hoh',
                        key      => 'h1',
                        col_grep => sub { return 'h3' },
                        row_grep => sub { return grep /n/, values %{ $_[0] } },
                      );

   ## $hoh contains:
   ##
   ##   {
   ##      l => { h3 => 'n' },
   ##      p => { h3 => 'r' },
   ##   }

HoH storage handlers ^

Using the hoh shape can result in non-unique key combinations. The default action is to simply assign the values to the given slot as they are encountered, resulting in any prior values being lost.

For example, using h1,h2 as the indexing key with the default collision handler:

   $xsv_data = <<EOXSV;
   h1,h2,h3
   1,2,3
   1,2,5
   EOXSV

   $hoh = xsv_slurp( string => $xsv_data,
                     shape  => 'hoh',
                     key    => 'h1,h2'
                   );

would result in the initial value in the h3 column being lost. The resulting data structure would only record the 5 value:

   {
      1 => { 2 => { h3 => 5 } },  ## 3 sir!
   }

Typically this is not very useful. The user probably wanted to aggregate the values in some way. This is where the on_store and on_collide handlers come in, allowing the caller to specify how these assignments should be handled.

The on_store handler is called for each assignment action, while the on_collide handler is only called when an actual collision occurs (i.e., the nested value path for the current line is the same as a prior line).

If instead we wanted to push the values onto an array, we could use the built-in push handler for the on_store event as follows:

   $hoh = xsv_slurp( string   => $xsv_data,
                     shape    => 'hoh',
                     key      => 'h1,h2',
                     on_store => 'push',
                   );

the resulting HoH, using the same data as above, would instead look like:

   {
      1 => { 2 => { h3 => [3,5] } },  ## 3 sir!
   }

Or if we wanted to sum the values we could us the sum handler for the on_collide event:

   $hoh = xsv_slurp( string     => $xsv_data,
                     shape      => 'hoh',
                     key        => 'h1,h2',
                     on_collide => 'sum',
                   );

resulting in the summation of the values:

   {
      1 => { 2 => { h3 => 8 } },
   }

builtin on_store handlers

A number of builtin on_store handlers are provided and can be specified by name.

The example data structures below use the following data.

   h1,h2,h3
   1,2,3
   1,2,5

count

Count the times a key occurs.

   { 1 => { 2 => { h3 => 2 } } }

frequency

Create a frequency count of values.

   { 1 => { 2 => { h3 => { 3 => 1, 5 => 1 } } } }

push

push values onto an array *always*.

   { 1 => { 2 => { h3 => [ 3, 5 ] } } }

unshift

unshift values onto an array *always*.

   { 1 => { 2 => { h3 => [ 5, 3 ] } } }

builtin on_collide handlers

A number of builtin on_collide handlers are provided and can be specified by name.

The example data structures below use the following data.

   h1,h2,h3
   1,2,3
   1,2,5

sum

Sum the values.

   { 1 => { 2 => { h3 => 8 } } }

average

Average the values.

   { 1 => { 2 => { h3 => 4 } } }

push

push values onto an array *only on colliding*.

   { 1 => { 2 => { h3 => [ 3, 5 ] } } }

unshift

unshift values onto an array *only on colliding*.

   { 1 => { 2 => { h3 => [ 5, 3 ] } } }

die

Carp::confess if a collision occurs.

   Error: key collision in HoH construction (key-value path was: { 'h1' => '1' }, { 'h2' => '2' })

warn

Carp::cluck if a collision occurs.

   Warning: key collision in HoH construction (key-value path was: { 'h1' => '1' }, { 'h2' => '2' })

AUTHOR ^

Dan Boorstein, <dan at boorstein.net>

TODO ^

BUGS ^

Please report any bugs or feature requests to bug-text-xsv-slurp at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-xSV-Slurp. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT ^

You can find documentation for this module with the perldoc command.

    perldoc Text::xSV::Slurp

You can also look for information at:

ACKNOWLEDGEMENTS ^

COPYRIGHT & LICENSE ^

Copyright 2009 Dan Boorstein.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

syntax highlighting: