The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Tie::CSV_File - ties a csv-file to an array of arrays

SYNOPSIS

  use Tie::CSV_File;

  tie my @data, 'Tie::CSV_File', 'xyz.dat';
  print "Data in 3rd line, 5th column: ", $data[2][4];
  untie @data;
  
  # or to read a tabular, or a whitespace or a (semi-)colon separated file
  tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPARATED;
  # or  use instead COLON_SEPARATED, SEMICOLON_SEPARATED, PIPE_SEPARATED,
  #         or even WHITESPACE_SEPARATED
  
  # or to read something own defined
  tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_char     => '|',
                                            sep_re       => qr/\s*\|\s*/,
                                            quote_char   => undef,
                                            eol          => undef, # default
                                            escape_char  => undef,
                                            always_quote => 0;  # default
                                            
  $data[1][3] = 4;
  $data[-1][-1] = "last column in last line";
  
  $data[0] = [qw/Name Address Country Phone/];
  push @data, ["Gates", "Redmond",  "Washington", "0800-EVIL"];
  push @data, ["Linus", "Helsinki", "Finnland",   "0800-LINUX"];

  my @headings = @{ shift @data };     # removes also the first line
  my @last_row = @{ pop   @data };     # removes also the last line

  @data = [ [1..3], [4..6], [7..9] ];
  # With default paramaters, 
  # the following csv file is created:
  # 1,2,3
  # 4,5,6
  # 7,8,9
  

DESCRIPTION

Tie::CSV_File represents a regular csv file as a Perl array of arrays. The first dimension of the represents the line-nr in the original file, the second dimension represents the col-nr. Both indices are starting with 0. You can also access with the normal array value, e.g. $data[-1][-1] stands for the last field in the last line, or @{$data[1]} stands for the columns of the second line.

An empty field has the value '', while a not existing field has the value undef. E.g. about the file

  "first field",,
  "last field"
  
  "the above line is empty"
  

we can say

  $data[0][0] eq "first field"
  $data[0][1] eq ""
  !defined $data[0][2] 
  
  $data[1][0] eq "last field"
  
  @{$data[2]}  # is an empty list ()
  !defined $data[2][0]

  $data[3][0] eq "the above line is empty"

  !defined $data[$x][$y] # for every $x > 3, $y any 

Similar every row from 0 .. $#data exists. (Even if some of them have never been set explicitly). The same principle works also for the columns (every between the first and the last defined one exists for each row). So, belonging to this module, the defined method and the exists operator are equivalent.

Note, that it is possible also, to change the data.

  $data[0][0]   = "first line, first column";
  $data[3][7]   = "anywhere in the world";
  $data[-1][-1] = "last line, last column";
  
  $data[0] = ["Last name", "First name", "Address"];
  push @data, ["Schleicher", "Janek", "Germany"];
  my @header   = @{ shift @data };
  my @last_row = @{ pop   @data };

You can also assign the content of whole another array to the csv-tied array. It has the effect that the content of the other array is copied and it overwrites the previous content. However, it's perhaps the easiest way to create a csv file :-)

Please pay attention that deleting an array element has a slightly different meaning to the normal behaviour. Deleting an element set the element empty ("" or []), but not undef.

  delete $data[5];    # similar to $data[5] = [];
  delete $data[5][5]; # similar to $data[5][5] = "";

In fact, in a file there is no value undefined. A cell of the CSV-File can only be empty (""). Undefined values signalizes that the line or the column doesn't exist. Especially the lines ,,, and "","","","" are the same for Tie::CSV_File and the second version could be changed without a warning to the first one (and vice versa if the autoquote option is set) when you write to the tied array.

There's only a small part of the whole file in memory, so this module will work also for large files. Please look the Tie::File module for any details, as I use it to read the lines of the file.

But it won't work with large fields, as all fields of one line are parsed, even if you only want to get one field.

CSV options for tieing

Similar to Text::CSV_XS, you can add the following options:

quote_char {default: "} =item eol {default: undef}, =item sep_char {default: ,} =item escape_char {default: "} =item always_quote {default: 0}

Please read the documentation of Text::CSV_XS for details.

Note, that the binary option isn't available.

In addition to have an easier working with files, that aren't separated with different characters, e.g. sometimes one whitespace, sometimes more, I added the sep_re option (defaults to undef).

If it is specified, sep_char is ignored when reading, instead something similar to split at the separater is done to find out the fields.

E.g., you can say

  tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_re       => qr/\s+/,
                                            quote_char   => undef,
                                            eol          => undef, # default
                                            escape_char  => undef,
                                            always_quote => 0;     # default
                                        

to read something like

    PID TTY          TIME CMD
 1200 pts/0    00:00:00 bash
 1221 pts/0    00:00:01 nedit
 1224 pts/0    00:00:01 nedit
 1228 pts/0    00:00:06 nedit
 1318 pts/0    00:00:01 nedit
 1605 pts/0    00:00:00 ps

Note, that the value of sep_re must be a regexp object, e.g. generated with qr/.../. A simple string produces an error.

Note also, that sep_char is used to write data. As the name suggests sep_char should only consists of one char. It gives you a warning if you try something else.

If you specify a sep_char and a sep_re, you'll get also a warning if sep_char isn't match with sep_re itself.

Predefined file types

Without any options you define a standard csv file. However, tabular separated, colon separated and whitespace separated files are also commonly used, so they are predefined. That's why it's possible to say:

  tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPARATED;
  tie my @data, 'Tie::CSV_File', 'xyz.dat', COLON_SEPARATED;
  tie my @data, 'Tie::CSV_File', 'xyz.dat', SEMICOLON_SEPARATED;
  tie my @data, 'Tie::CSV_File', 'xyz.dat', PIPE_SEPARATED;
  tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPARATED;

There's a common mistake writing SEPARATED. Often there's written SEPERATED (with an E at the 4th letter instead of an A). In fact, up till version 0.11, this module had also this spelling mistake implemented. As this module tries to be friendly (and backward compatible), it also accepts the (in this way) mispelled versions of predefined file types. Thanks a lot to Harald Fuchs who found this typo.

TAB_SEPARATED

It's defined with:

     sep_char     => "\t",
     quote_char   => undef,
     eol          => undef, # default
     escape_char  => undef,
     always_quote => 0     # default
     

Note, that the data isn't allowed to contain any tab.

COLON_SEPARATED

It's defined with:

     sep_char     => ":",
     quote_char   => undef,
     eol          => undef, # default
     escape_char  => undef,
     always_quote => 0     # default

Note, that the data isn't allowed to contain any colon.

SEMICOLON_SEPARATED

It's defined with:

     sep_char     => ";",
     quote_char   => undef,
     eol          => undef, # default
     escape_char  => undef,
     always_quote => 0     # default

Note, that the data isn't allowed to contain any semicolon.

Allthough that looks very similar to CSV files, SEMICOLON_SEPARATED doesn't quote data and can't work properly with quoted data. If you want just a normal CSV file with semicolons instead of commas, just write

  tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_char => ";";
PIPE_SEPARATED

It's defined with:

     sep_char     => "|",
     quote_char   => undef,
     eol          => undef, # default
     escape_char  => undef,
     always_quote => 0     # default

Note, that the data isn't allowed to contain any pipe delimeter.

WHITESPACE_SEPARATED

It's defined with:

     sep_re       => qr/\s+/,
     sep_char     => ' ',
     quote_char   => undef,
     eol          => undef, # default
     escape_char  => undef,
     always_quote => 0     # default

Note that it reads with splitting at all whitespace sequences. Especially it's not possible to define an empty field. Note also, that when setting an element, all whitespace sequences are transformed to a simple blank.

Of course, you can overwrite some options. E.g., let's assume that you have a whitespace separated file, but you want to write a tab instead of a blank when changing the data. That can be done with:

   tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPARATED, sep_char => "\t";

Please suggest me other useful file types, I could predeclare.

EXPORT

By default these constants are exported:

  TAB_SEPARATED
  COLON_SEPARATED
  SEMICOLON_SEPARATED
  PIPE_SEPARATED
  WHITESPACE_SEPARATED

(There are also some mispelled versions of these filetypes exported, please look at the documentation for predefined file types for details).

BUGS

This module is slow, even slower than necessary with object oriented features. I'll change it when implementing some more features.

The slowest part is perhaps if you shift, pop, splice, ... or assign another array to the tied array. I'll fix it in some of the very next versions.

This module expects that the tied file doesn't change from anywhere else as this module when it is tied. But the file isn't locked, so it's your job to take care about.

Please inform me about every bug or missing feature of this module.

TODO

Implement efficient routines for shift, pop, splice, unshift, ... .

Avoid using Text::CSV_XS if none is installed.

Enabling deferred writing, similar to Tie::File.

Possibility to give (memory) options at tieing, like mode, memory, dw_size similar to Tie::File.

Discuss differences to AnyData module.

Discuss differenced to DBD::CSV module.

I'm open to many more ideas, please inform me about any missing features or occurring problems.

THANKS

Thanks a lot to Harald Fuchs, who found the typos in

  *_SEPARATED
       ^
      (there had been an E instead of an A)

SEE ALSO

Tie::File Text::CSV Text::CSV_XS AnyData DBD::CSV

AUTHOR

Janek Schleicher, <bigj@kamelfreund.de>

COPYRIGHT AND LICENSE

Copyright 2002 by Janek Schleicher

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.