The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Tabular::Dumper - Seamlessly dump tabular data to XML, CSV and XLS.

SYNOPSIS

    use Data::Tabular::Dumper;

    $date=strftime('%Y%m%d', localtime);

    my $dumper = Data::Tabular::Dumper->open(
                            XML => [ "$date.xml", "data" ],
                            CSV => [ "$date.csv", {} ],
                            Excel => [ "$date.xls" ]
                        );

    # $data is a 2-d or 3-d data structure
    $data = {
        '0-monday' => { hits=>30, misses=>5, GPA=>0.42 },
        '1-tuesday' => { hits=>17, misses=>3, GPA=>0.17 },
    };

    $dumper->dump( $data );


    ## If you want more control :
    $dumper->page_start( "My Page" );

    # what each field is called
    $dumper->fields([qw(uri hits bytes)]);

    # now output the data
    foreach my $day (@$month) {
        $dumper->write($day);
    }

    $dumper->page_end( "My Page" );
    # sane shutdown
    $dumper->close();

This would produce the following XML :

    <?xml version="1.0" encoding="iso-8859-1"?>
    <access>
      <My_Page>
        <page>
           <uri>/index.html</uri>
           <hits>4000</hits>
           <bytes>5123412</bytes>
        </page>
        <page>
          <uri>/something/index.html</uri>
          <hits>400</hits>
          <bytes>51234</bytes>
        </page>
      </My_Page>
      <!-- more page tags here -->
    </access>

DESCRIPTION

Data::Tabular::Dumper aims to make it easy to turn tabular data into as many file formats as possible. This is useful when you need to provide data that folks will then process further. Because you don't really know what format they want to use, you can provide as many as possible, and let them choose which they want.

Tabular data means data that has 2 dimensions, like a list of lists, a hash of lists, a list of hashes or a hash of hashes.

You may also dump 3 dimentional data; in this case, each of the top-level elements are called pages and each sub-element is independent.

While it might seem desirable to give an example for each data type, this would be onerous to maintain. Please look at the tests to see what a given data object yields.

2 DIMENSIONAL DATA

List of lists

Simplest type of data; each of the sub-lists is output as-is. For XML, the lowest elements are number 0, 1, etc.

Hash of lists

Each of the sub-lists is output prefixed with the key name. For XML, the lowest elements are number 0, 1, etc, with 0 being the key.

List of hashes

The bottom hashes keyed records, column names are hash keys, column values are hash values. Obviously, the list of column names has to be the same for all records, so all the keys in all the hashes are used. If a given hash doesn't have a key, it will be blank in the output at that position.

    [   {   camera=>"EOS 2000", price=>12000.00 },
        {   camera=>"FinePix 1300", price=>150 },
    ]

This corresponds to the following table:

    camera       price
    EOS 2000     12000.00
    FinePix 1300   150.00

Note that keys are asciibetically sorted.

Hash of hashes

Similar to List of hashes, except the first column is the key in the top hash. For XML the key is used instead of record, unless you are using master_key (see open). Keys are asciibetically sorted.

Example :

    {   monday => { honk => 42, bonk=>17 },
        wednesday => { honk => 12, blurf=>36 }
    }

CSV and Excel would look like:

    ,blurf,bonk,honk
    monday,,17,42
    wednesday,36,12

The XML would look like:

    <DATA>
      <monday>
        <bonk>17</bonk>
        <honk>42</honk>
      </monday>
      <wednesday>
        <blurf>36</blurf>
        <honk>12</honk>
      </wednesday>
    </DATA>

3 DIMENSIONAL DATA

List of 2D data

Each element in the top list is a page. Pages are named Page 1, Page 2 and so on. Each 2D element is treated seperately as above.

Hash of lists of lists

Hash of lists of hashes

Hash of hashes of hashes

Each value in the top hash is a page. Pages are named by their keys. Each 2D element is treated seperately as above, as if you were doing:

    foreach my $key ( sort keys %$HoX ) {
        $dumper->page_start( $key );
        $dumper->dump( $HoX->{$key} );
        $dumper->page_send( $key );
    }

Hash of hashes of lists

NOT SUPPORTED

FUNCTIONS

Dump( $data )

Calls dump as a package method. In other words, it does the following:

    Data::Tabular::Dumper->dump( $data );

Data::Tabular::Dumper METHODS

open(%writers)

Creates the Data::Tabular::Dumper object. %writers is a hash that contains the the package of the object (as keys) and the parameters for it's open() function (as values). As a convienience, the Data::Tabular::Dumper::* modules can be specified as XML, Excel or CSV. The example in the SYNOPSIS would create 3 objects, via the following calls :

    $obj0 = Data::Tabular::Dumper::XML->open( ["$date.xml","users", "user"] );
    $obj1 = Data::Tabular::Dumper::Excel->open( ["$date.xls"] );
    $obj2 = Data::Tabular::Dumper::CSV->open( ["$date.xls", {}] );

Note that you must load a given package first. Data::Tabular::Dumper-open> will not do so for you.

You may also create your own packages. See WRITER OBJECTS below.

There is one special key in %writers :

master_key

Sets the column name for the first column when dumping hash of lists, hash of hashes or the equivalent 3-D structures. The first column corresponds to the key names of the top hash.

close()

Does an orderly close of all the writers. Some of the writers need this to clean up data and write file footers properly. Note that DESTROY also calls close.

master_key( [$key] )

Sets the master_key, returning old value. If called without a parameter, returns current master_key.

dump( $data )

Analyses $data, then dumps each of it's component objects to the configured files.

Dump is not efficient. It must walk over the data 2 and sometimes 3 times. It may also modify your data, so watch out.

May also be called as a package method, in which case it returns a CSV representation of the data.

    print $fh Data::Tabular::Dumper->dump( $data );

page_start( $name )

Opens a new page in each file named $name. You must call fields() if you want it to have a header.

For XML, a page is an XML element that wraps all furthur data. The element's name is $name with all non-word characters converted to an underscore ($name =~ s/\W/_/g.)

page_end( $name )

Closes the current page. Please make sure $name is identical to what was passed to page_start.

fields($fieldref)

Sets the column headers to the values in the arrayref $fieldref. Calling this "fields" might be misdenomer. Field headers are often concidered a "special" row of data.

write($dataref)

Writes a row of data from the arrayref $dataref.

WRITER OBJECTS

An object must implement 4 methods for it to be useable by Data::Tabular::Dumper.

open($package, $p)

Create the object, opening any necessary files. $p is the data handed to Data::Tabular::Dumper->open.

close()

Do any necesssary cleaning up, like outputing a footer, closing files, etc.

fields($fieldref)

Define the names of the fields. $fieldref is an arrayref containing all the field headings.

write($dataref)

Write a row of data to the output. $dataref is an arrayref containing a row of data to be output.

page_start($name) =head2 page_end($name)

Start and end a new page in the output. If it is called from dump, all pages are started and ended with the same $name. If called from user code, all bets are off.

PREDEFINED WRITERS

Data::Tabular::Dumper::XML

Produces an XML file of the tabular data.

open($package, [$file_or_fh, $top, $record])

Opens the file $file_or_fh for writing if it is a scalar. Otherwise $file_or_fh is considered a filehandle. The top element is $top and defaults to DATA. Each record is a $record element and defaults to RECORD.

fields($fieldref)

Define the tag for each data value.

write($dataref)

Output a record. Each item in the arrayref $dataref becomes an element named by the corresponding name set in fields(). If there are more items in $dataref then fields, the last field name is duplicated. If there are no fields defined, elementes are named 0, 1, etc.

Example :

    $xml=Data::Tabular::Dumper::XML->open(['something.xml']);
    $xml->fields([qw(foo bar)]);
    $xml->write([0..5]);

Would produce the following XML :

    <?xml version="1.0" encoding="iso-8859-1"?>
    <DATA>
      <RECORD>
        <foo>0</foo>
        <bar>1</bar>
        <bar>2</bar>
        <bar>3</bar>
        <bar>4</bar>
        <bar>5</bar>
      </RECORD>
    </DATA>

Likewise,

    $xml=Data::Tabular::Dumper::XML->open(['something.xml']);
    $xml->dump( [ [ { up=>1, down=>-1, left=>0.5, right=>-0.5 } ] ] );
    $xml->close

Would produce the following XML :

    <?xml version="1.0" encoding="iso-8859-1"?>
    <DATA>
      <Page_1>
        <down>-1</down>
        <left>0.5</left>
        <right>-0.5</right>
        <up>1</up>
      </Page_1>
    </DATA>

Data::Tabular::Dumper::CSV

Produces an CSV file of the tabular data. Each new page is started a row with the page name on it and ending with a blank line.

open($package, [$file_or_fh, $CSVattribs])

Opens the file $file_or_fh for writing if it is a scalar. Otherwise $file_or_fh is considered a filehandle. Creates a Text::CSV_XS object using the attributes in the hashref $CSVattribs.

It should be noted that you probably want to set eol to \n, otherwise all the output will be on one line. See Text::CSV_XS for details.

Example :

    $xml=Data::Tabular::Dumper::CSV->open(['something.xml', 
                                          {eol=>"\n", binary=>1}]);
    $xml->fields([qw(foo bar)]);
    $xml->write("me,you", "other");

Would produce the following CSV :

    foo,bar
    "me,you",other

fields( $fieldref )

Outputs a row that contains the names of the fields. Basically, it's the same as write.

Data::Tabular::Dumper::Excel

Produces an Excel workbook of the tabular data. Each page is a new worksheet.

If you want a header on each worksheet, you must call fields() after each page is started. If you do not call page_start(), a default empty worksheet is used. Note that dump() handles all this for you.

open($package, [$file])

Creates the workbook $file.

fields($fieldref)

Creates a row in bold from the elements in the arrayref $fieldref.

BUGS

There are no test cases for all dump's edge cases, such as non-heterogeous lower data elements.

There is no verification of the Excel workbooks produced.

No support for RDBMSes. I'm not fully sure how this would work... each page would be a table? What about lists as the lowest data structure? We'd need a way to match data columns to table columns.

close should call page_end if there is one pending.

AUTHOR

Philip Gwyn <gwyn-at-cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2005-2009 by Philip Gwyn

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

Text::CSV_XS, Spreadsheet::WriteExcel, http://www.xml.org, perl.