View on
涂汉明 > Data-Describe-1.03 > Data::Describe



Annotate this POD


New  1
Open  0
View/Report Bugs
Module Version: 1.03   Source  


Data::Describe - Perl extension for scanning/describing a text file or array.


  use Data::Describe;

  $dsp = Data::Describe->new;       # create an empty object
  my %arg = ( input_file_name => 'input.txt', # the same as 'ifn' 
              skip_first_row  => 'Y',         # the same as 'sfr'
              input_field_sep => ',',         # the same as 'ifs'
              ofs=>'|',             # the same as 'output_field_sep'
              ofn=>'out.dat',       # the same as 'output_file_name'
              odf=>'out.def',       # the same as 'output_def_file'
  $dsp = Data::Describe->new(%arg); # with arguments

  $dsp->skip_first_row;             # i,e. 1st row contains col names
  $dsp->set_sfr(1);                 # is the same as the above       

  $dsp->set_ifs('\t');              # set input field separator to tab
  $dsp->input_field_separator('|'); # set input field separator to '|'
  $dsp->set_ofs('|');               # set output field separator to |
  $dsp->output_field_separator('|');# set output field separator to | 

  $dsp->set_ifn('input.txt');       # set input file name
  $dsp->input_file_name('input.txt'); # set input file name
  $dsp->input_file_name($arf);      # it can be array ref

  $dsp->set_ofn('out.dat');         # set output file name
  $dsp->output_file_name('out.dat');# set output file name
  $dsp->output_file_name('Y');      # it can be array ref

  $dsp->set_odf('out.def');         # set output def file name
  $dsp->output_def_file('output.def');# set output definition file name
  $dsp->output_def_file('Y');       # default to '${in}.def" 

  # all the set method has its corresponding get method
  $rc      = $dsp->get_sfr;
  $rc      = $dsp->get_ifs; 
  $rc      = $dsp->input_field_separator; # the same as get_ifs

  $dsp->debug(5);                   # set debug level to 5
  $dsp->echoMSG('This message', 1); # tag the message as level 1
  my $crf = $dsp->get_def_arrayref;
  my $drf = $dsp->get_dat_arrayref;
  $dsp->output($crf, "", 'def');    # output def file to STDOUT
  $dsp->outptu($drf, 'out.dat', 'dat'); 


This class contains a describe method that scans through each records or number of records sepcified and fields in those records in the array or a file to collect information about the content in the array or the file. It creates a column definition array and a data array containing all the data without the column record.

The column definition array built by the module is actually an array with hash members. It contains these hash elements ('col', 'typ', 'max', 'min', 'dec', 'req' and 'dsp') for each column. The subscripts in the array are in the format of $ary[$col_seq]{$hash_ele}. The hash elements are:

  col - column name
  typ - column type, 'N' for numeric, 'C' for characters, 
        and 'D' for date
  max - maximum length of the records in the column
        (could use 'wid' to record the max length of the 
  min - minimum length of the record in the column
        (When 'wid' is used, no 'min' is needed.)
  dft - date format such as YYYY/MM/DD, MON/DD/YYYY, etc.
  dec - maximun decimal length of the record in the column
  req - whether there is null or zero length records in the 
        column only 'NOT NULL is shown
  dsp - description of the columns

The array or records passed to the module can have the first row containing column names.


This class contains many methods to "set" and/or "get" parameters. Here is the list of methods:


How to create a describe object?

You can create a describe object as the following:

  $dsc = Data::Describe->new;   # an empty object

You can set a hash to define your object attributes and create it as the following:

  %attr = ( 
     input_field_sep => ':',    # output field separator
     skip_first_row' => 1,      # 1st row has col names
  $dsp = Data::Describe->new(%attr);

How is the column definition generated?

If the first row in the data array contains column names, it uses the column names in the row to define the column definition array. The column type is determined by searching all the records in the data array. If all the records in the column only contain digits, i.e., only [0-9.], the column is defined as numeric ('N'); otherwise, it is defined as character ('C'). In type 'C', it checks whether the string is a date type. If the field only contains digits and '/', then it consider the field as a date field. It calls to get_date_foramt to determine the date format.

If the first row does not contain column names, it will generate field names as "FLD###". The "###" is a sequential number starting with 1. If the minimum length of a column is zero, then the value in the column can be null; if the minimum length is greater than zero, then it is a required column.

The default indicator for the first row is false, i.e., the first row does not contain column names. You can indicate whether the first row in the data array is column names by using skip_first_row or set_sfr to set it.

  $dsp->skip_first_row('Y');      # first row contains column names
  $dsp->set_sfr('Y');             # the same as the above
  $dsp->set_sfr(1);               # the same as the above

To reverse it, here is how to

  $dsp->set_sfr('N');             # no column in the first row
  $dsp->skip_first_row(0);        # the same as the above

Future Implementation

Although it seems a simple task, it requires a lot of thinking to get it working in an object-oriented frame. Intented future implementation includes


Hanming Tu,


SEE ALSO (some of docs that I check often) ^

perltoot(1), perlobj(1), perlbot(1), perlsub(1), perldata(1), perlsub(1), perlmod(1), perlmodlib(1), perlref(1), perlreftut(1).

syntax highlighting: