The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Presenter - Reformat database reports

VERSION

This document refers to version 1.03 of Data::Presenter, which consists of Data::Presenter.pm and various packages subclassed thereunder, most notably Data::Presenter::Combo.pm and its subclasses Data::Presenter::Combo::Intersect.pm and Data::Presenter::Combo::Union.pm. This version was released February 10, 2008.

SYNOPSIS

    use Data::Presenter;
    use Data::Presenter::[Package1];  # example:  use Data::Presenter::Census

    our (@fields, %parameters, $index);
    $configfile = 'fields.XXX.data';
    do $configfile;

    $dp1 = Data::Presenter::[Package1]->new(
        $sourcefile, \@fields,\%parameters, $index
    );

    $data_count = $dp1->get_data_count();

    $dp1->print_data_count();

    $keysref = $dp1->get_keys();

    $seenref = $dp1->get_keys_seen();

    $dp1->print_to_screen();

    $dp1->print_to_file($outputfile);

    $dp1->print_with_delimiter($outputfile, $delimiter);

    $dp1->full_report($outputfile);

    $dp1->select_rows($column, $relation, \@choices);

    $sorted_data = $dp1->sort_by_column(\@columns_selected);

    $seen_hash_ref = $dp1->seen_one_column($column);

    $dp1->writeformat(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
    );

    $dp1->writeformat_plus_header(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        title       => $title,
    );

    %reprocessing_info = (
        lastname    => 17,
        firstname   => 15,
    );

    $dp1->writeformat_with_reprocessing(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        reprocess   => \%reprocessing_info,
    );

    $dp1->writeformat_deluxe(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        title       => $title,
        reprocess   => \%reprocessing_info,
    );

    $dp1->writedelimited(
        sorted      => $sorted_data,
        file        => $outputfile,
        delimiter   => $delimiter,
    );

    $dp1->writedelimited_plus_header(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        delimiter   => $delimiter,
    );

    @reprocessing_info = qw( instructor timeslot room );

    $dp1->writedelimited_with_reprocessing(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        delimiter   => $delimiter,
        reprocess   => \@reprocessing_info,
    );

    $dp1->writedelimited_deluxe(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        delimiter   => $delimiter,
        reprocess   => \@reprocessing_info,
    );

    $dp1->writeHTML(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => 'somename.html',
        title       => $title,
    );

Data::Presenter::Combo objects:

    use Data::Presenter;
    use Data::Presenter::[Package1];  # example:  use Data::Presenter::Census
    use Data::Presenter::[Package2];  # example:  use Data::Presenter::Medinsure

    our (@fields, %parameters, $index);
    $configfile = 'fields.XXX.data';
    do $configfile;

    $dp1 = Data::Presenter::[Package1]->new(
        $sourcefile, \@fields,\%parameters, $index
    );

    # different source file and configuration file

    $configfile = 'fields.YYY.data';
    do $configfile;

    $dp2 = Data::Presenter::[Package2]->new(
        $sourcefile, \@fields,\%parameters, $index);

    @objects = ($dp1, $dp2);
    $dpC = Data::Presenter::Combo::Intersect->new(\@objects);
    $dpC = Data::Presenter::Combo::Union->new(\@objects);

Notice of Changes of Interface

If you have not used Data::Presenter prior to version 1.0, skip this section.

writeformat()-Family of Methods Now Takes List of Key-Value Pairs

Since the last publicly available version of Data::Presenter (0.68), the interface to nine of its public methods has been changed. Previously, methods in the writeformat()-family of methods took a list of arguments which had to be provided in a very specific order. For example, writeformat_deluxe() took five arguments:

    $dp1->writeformat_deluxe(
        $sorted_data,
        \@columns_selected,
        $outputfile,
        $title,
        \%reprocessing_info
    );

As the number of elements in the list of arguments increases, it becomes more difficult to remember the order in which they must be passed. At a certain point it becomes easier to pass the arguments in the form of key-value pairs. As long as each pair is correctly specified, the order of the pairs no longer matters. writeformat_deluxe(), for example, now has this interface:

    $dp1->writeformat_deluxe(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        title       => $title,
        reprocess   => \%reprocessing_info,
    );

Please study the "SYNOPSIS" above to see how to revise your calls to methods with writeformat, writedelimited or writeHTML in their names.

Change in Assignment of $index in Data::Presenter::[Package1]::_init()

Data::Presenter is used by writing and using a subclass in which a new object is created. Each such subclass must hold an _init() method and each such _init() method must accomplish certain tasks. One of these tasks is to store the value of $index (found in the configuration file) in the object being created. In versions 0.68 and earlier, the code which did this looked like this:

    $data{'index'} = [$index];

In other words, $index was not directly assigned to the hash holding the Data::Presenter::[Package1] object's data. Instead, a reference to a one-element array holding $index was passed.

This has now been simplified:

    $data{'index'} = $index;

In other words, simply assign $index; no reference is needed. See the sample packages included under the t/ directory in this distribution for a live presentation of this change.

PREREQUISITES

Data::Presenter requires Perl 5.6 or later. The module and its test suite require the following modules from CPAN:

List::Compare

By the same author as Data::Presenter: http://search.cpan.org/dist/List-Compare.

IO::Capture

Used only in the test suite to capture output printed to screen by Data::Presenter methods. By Mark Reynolds and Jon Morgan. http://search.cpan.org/dist/IO-Capture.

IO::Capture::Extended

Used only in the test suite to capture output printed to screen by Data::Presenter methods. By the same author as Data::Presenter. Has IO::Capture (above) as prerequisite. http://search.cpan.org/dist/IO-Capture-Extended.

Tie::File

Used only in the test suite to validate text printed to files by Data::Presenter methods. By Mark-Jason Dominus. Distributed with Perl since 5.7.3; otherwise, available from CPAN: http://search.cpan.org/dist/Tie-File.

Each of the prerequisites is pure Perl and should install with the cpan shell by typing 'y' at the prompts as needed.

DESCRIPTION

Data::Presenter is an object-oriented module useful for the reformatting of already formatted text files such as reports generated by database programs. If the data can be represented by a row-column matrix, where for each data entry (row):

  • there are one or more fields containing data values (columns); and

  • at least one of those fields can be used as an index to uniquely identify each entry,

then the data structure is suitable for manipulation by Data::Presenter. In Perl terms, if the data can be represented by a hash of arrays, it is suitable for reformatting with Data::Presenter.

Data::Presenter can be used to output some fields (columns) from a database while excluding others (see "sort_by_column()" below). It can also be used to select certain entries (rows) from the database for output while excluding other entries (see "select_rows()" below).

In addition, if a user has two or more database reports, each of which has the same field serving as an index for the data, then it is possible to construct either a:

  • Data::Presenter::Combo::Intersect object which holds data for those entries found in common in all the source databases (the intersection of the entries in the source databases); or a

  • Data::Presenter::Combo::Union object which holds data for those entries found in any of the source databases (the union of the entries in the source databases).

Whichever flavor of Data::Presenter::Combo object the user creates, the module guarantees that each field (column) found in any of the source databases appears once and once only in the Combo object.

Data::Presenter is not a database module per se, nor is it an interface to databases in the manner of DBI. It cannot used to enter data into a database, nor can it be used to modify or delete data. Data::Presenter operates on reports generated from databases and is designed for the user who:

  • may not have direct access to a given database;

  • receives reports from that database generated by another user; but

  • needs to manipulate and re-output that data in simple, useful ways such as text files, Perl formats and HTML tables.

Data::Presenter is most appropriate in situations where the user either has no access to (or chooses not to use) commercial desktop database programs such as Microsoft Access(r) or open source database programs such as MySQL(r). Data::Presenter's installation and preparation require moderate knowledge of Perl, but the actual running of Data::Presenter scripts can be delegated to someone with less knowledge of Perl.

DEFINITIONS AND EXAMPLES

Definitions

Administrator

The individual in a workplace responsible for the installation of Data::Presenter on the system or network, analysis of sources, preparation of Data::Presenter configuration files and preparation of Data::Presenter subclass packages other than Data::Presenter::Combo and its subclasses. (Cf. "Operator".)

Entry

A row in the source containing the values of the fields for one particular item.

Field

A column in the source containing a value for each entry.

Index

The column in the source whose values uniquely identify each entry in the source. Also referred to as ''unique ID.'' (In the current implementation of Data::Presenter, an index must be a strictly numerical value.)

Index Field

The column in the source containing a unique value ("index") for each entry.

Metadata

Entries in the Data::Presenter object's data structure which hold information prepared by the administrator about the data structure and output parameters.

In the current version of Data::Presenter, metadata is extracted from the variables @fields, %parameters and $index found in the configuration file fields.XXX.data. The metadata is first stored in package variables in the invoking Data::Presenter subclass package and then entered into the Data::Presenter object as hash entries keyed by 'fields', 'parameters' and $index, respectively. (The word 'options' has also been reserved for future use as the key of a metadata entry in the object's data structure.)

Object's Current Data Structure

Non-metadata entries found in the Data::Presenter object at the point a particular selection, sorting or output method is called.

The object's current data structure may be thought of as the result of the following calculations:

            construct a Data::Presenter::[Package1] object

    less:   entries excluded by application of selection criteria found
                in C<select_rows>

    less:   metadata entries in object keyed by 'fields', 'parameters' or
                'fields'

    result: object's current data structure

Operator

The individual in a workplace responsible for running a Data::Presenter script, including:

  • selection of sources;

  • selection of particular entries and fields from the source for presentation in the output; and

  • selection of output formats and names of output files. (Cf. "Administrator".)

Source

A report, typically saved in the form of a text file, generated by a database program which presents data in a row-column format. The source may also contain other information such as page headers and footers and table headers and footers. Also referred to herein as ''source report,'' ''source file'' or ''database source report.''

Examples

Sample files are included in the archive file in which this documentation is found. Three source files, census.txt, medinsure.txt and hair.txt, are included, as are the corresponding Data::Presenter subclass packages (Census.pm, Medinsure.pm and Hair.pm) and configuration files (fields.census.data, fields.medinsure.data and fields.hair.data).

USAGE: Administrator

This section addresses those aspects of the usage of Data::Presenter which must be implemented by the administrator:

If Data::Presenter has already been properly configured by your administrator and you are simply concerned with using Data::Presenter to generate reports, you may skip ahead to "USAGE: Operator".

Installation

Data::Presenter installs in the same way as other Perl extensions available from CPAN: either automatically via the CPAN shell or manually with these commands:

    % gunzip Data-Presenter-1.03.tar.gz
    % tar xf Data-Presenter-1.03.tar
    % cd Data-Presenter-1.03
    % perl Makefile.PL
    % make
    % make test
    % make install

This will install the following directory tree in your ''site perl'' directory, i.e., a directory such as /usr/local/lib/perl5/site_perl/5.8.7/:

    Data/
        Presenter.pm
        Presenter/
            Combo.pm
            Combo/
                Intersect.pm
                Union.pm

Once the Administrator has installed Data::Presenter, she must then decide which location on the network will be used to hold Data::Presenter::[Package1] subclass packages, where [Package1] is a Data::Presenter subclass in which a new object will be created. That location could be the Data/Presenter/ directory listed above or it could be some other location which users can access in a Perl program via the use lib () pragma.

The Administrator must also decide on a location on the network which will be used to hold the Data::Presenter configuration files -- one for each data source to be used by Data::Presenter. By convention, each configuration file is named by some variation on the theme of fields.XXX.data.

Suppose, for instance, that /usr/share/datapresenter/ is the directory created to hold Data::Presenter-related files accessible to all users. Suppose, further, that in this business two database reports, census and medinsure, will be processed via Data::Presenter. The Administrator would then create a directory tree like this:

    /usr/share/datapresenter/
        Data/
            Presenter/
                Census.pm
                Medinsure.pm
        config/
            fields.census.data
            fields.medinsure.data

The Administrator could also create a directory called source/ to hold the source files to be processed with Data::Presenter, and she could also create a directory called results/ to hold files created via Data::Presenter -- but neither of these are strictly necessary.

Analysis of Source Files

Successful use of Data::Presenter assumes that the administrator is able to analyze a report generated from a database, distinguish key structural features of such a source report and write Perl code which will extract the most relevant information from the report. A complete discussion of these issues is beyond the scope of this documentation. What follows is a taste of the issues involved.

Structural features of a database report are likely to include the following: report headers, page headers, table headers, data entries reporting values of a variety of fields, page footers and report footers. Of these features, data entries and table headers are most important from the perspective of Data::Presenter. The data entries are the data which will actually be manipulated by Data::Presenter, while table headers will provide the administrator guidance when writing the configuration file fields.XXX.data. Report and page headers and footers are generally irrelevant and will be stripped out.

For example, let us suppose that a portion of a client census looks like this:

    CLIENTS - AUGUST 1, 2001 - C O N F I D E N T I A L        PAGE  1
    SHRED WHEN NEW LIST IS RECEIVED!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     LAST NAME     FIRST NAM  C. NO  BIRTH

     HERNANDEZ     HECTOR     456791 1963-07-16
     VASQUEZ       ADALBERTO  456792 1973-10-02
     WASHINGTON    ALBERT     906786 1953-03-31

The first two lines are probably report or page headers and should be stripped out. The third line consists of table column names and may give clues as to how fields.census.data should be written. The fourth line is blank and should be stripped out. The next three lines constitute actual rows of data; these will be the focus of Data::Presenter.

A moderately experienced Perl programmer will look at this report and say, ''Each row of data can be stored in a Perl array. If each client's 'c. no' is unique, then it can be used as the key of an entry in a Perl hash where the entry's value is a reference to the array just mentioned. A hash of arrays -- I can use Data::Presenter!''

Our Perl programmer would then say, ''I'll open a filehandle to the source file and read the file line-by-line into a while loop. I'll write lines beginning next if to bypass the headers and the blank lines.'' For instance:

    next if (/^CLIENTS/);
    next if (/^SHRED/);
    next if (/^\s?LAST\sNAME/);
    next if (/^$/);

Our Perl hacker will then say, ''I could try to write regular expressions to handle the rows of data. But since the data appears to be strictly columnar, I'll probably be better off using the Perl unpack function. I'll use the column headers to suggest names for my variables.'' For instance:

    my ($lastname, $firstname, $cno, $datebirth) =
        unpack("x A14 x A10 x A6 x A10", $_);

Having provided a taste of what to do with the rows of the data structure, we now turn to an analysis of the columns of the structure.

Preparation of Configuration File (fields.XXX.data)

For each data source, the administrator must prepare a configuration file, typically named as some variation on fields.XXX.data. fields.XXX.data consists of three Perl variables: @fields, %parameters and $index.

@fields

@fields has one element for each column (field) that appears in the data source. The elements of @fields must appear in exactly the same order as they appear in the data source. Each element should be a single Perl word, i.e., consist solely of letters, numerals or the underscore character (_).

In the sample configuration file fields.census.data included with this documentation, this variable reads:

    @fields = qw(
        lastname firstname cno unit ward dateadmission datebirth
    );

In another sample configuration file, fields.medinsure.data, this variable reads:

    @fields = qw(lastname firstname cno stateid medicare medicaid);

%parameters

%parameters is a bit trickier. There must be one entry in %parameters for each element in @fields. Hence, there is one entry in %parameters for each column (field) in the data source. However, the keys of %parameters are spelled $fields[0], $fields[1], and so on through the highest index number in @fields (which is 1 less than the number of elements in @fields). Using the example above, we can begin to construct %parameters as follows:

    %parameters = (
        $fields[0] =>
        $fields[1] =>
        $fields[2] =>
        $fields[3] =>
        $fields[4] =>
        $fields[5] =>
        $fields[6] =>
    );

The value for each entry in %parameters consists of an array of 4 elements specified as follows:

Element 0

A positive integer specifying the maximum number of characters which may be displayed in any output format for the given column (field). In the example above, we will specify that column 'lastname' ($fields[0]) may have a maximum of 14 characters.

    $fields[0]        => [14,
Element 1

An upper-case letter 'U' or 'D' (for 'Up' or 'Down') enclosed in single quotation marks indicating whether the given column should be sorted in ascending or descending order. In the example above, 'lastname' sorts in ascending order.

    $fields[0]        => [14, 'U',
Element 2

A lower-case letter 'a', 'n' or 's' enclosed in single quotation marks indicating whether the given column should be sorted alphabetically (case-insensitive), numerically or ASCII-betically (case-sensitive). In the example above, 'lastname' sorts in alphabetical order. (Data::Presenter per se does not yet have a facility for sorting in date or time order. If dates are entered as pure numerals in 'MMDD' order, they may be sorted numerically. If they are entered in the MySQL standard format ' YY-MM-DD', they may be sorted alphabetically.)

    $fields[0]        => [14, 'U', 'a',
Element 3

A string enclosed in single quotation marks to be used as a column header when the data is outputted in some table-like format such as a Perl format with a header or an HTML table. The administrator may choose to use exactly the same words here that were used in @fields, but a more natural language string is probably preferable. In the example above, the first column will carry the title 'Last Name' in any output.

    $fields[0]        => [14, 'U', 'a', 'Last Name'],

Using the same example as previously, we can now complete %parameters as:

    %parameters = (
        $fields[0]        => [14, 'U', 'a', 'Last Name'],
        $fields[1]        => [10, 'U', 'a', 'First Name'],
        $fields[2]        => [ 7, 'U', 'n', 'C No.'],
        $fields[3]        => [ 6, 'U', 'a', 'Unit'],
        $fields[4]        => [ 4, 'U', 'n', 'Ward'],
        $fields[5]        => [10, 'U', 'a', 'Date of Admission'],
        $fields[6]        => [10, 'U', 'a', 'Date of Birth'],
    );

$index

$index is the simplest element of fields.XXX.data. It is the array index for the entry in @fields which describes the field in the data source whose values uniquely identify each entry in the source. If, in the example above, 'cno' is the index field for the data in census.txt, then $index is 2. (Remember that Perl starts counting array elements with 0.)

Preparation of Data::Presenter Subclasses

Data::Presenter.pm, Data::Presenter::Combo.pm, Data::Presenter::Combo::Intersect.pm and Data::Presenter::Combo::Union are ready to use ''as is.'' They require no further modification by the administrator. However, each report from which the operator draws data needs to have a package subclassed beneath Data::Presenter and written specifically for that report by the administrator.

Indeed, no object is ever constructed directly from Data::Presenter. All objects are constructed from subclasses of Data::Presenter.

Hence:

    $dp1 = Data::Presenter->new(                    # INCORRECT
        $source, \@fields, \%parameters, $index);

    $dp1 = Data::Presenter::[Package1]->new(        # CORRECT
        $source, \@fields, \%parameters, $index);

Data::Presenter::[Package1], however, does not contain a new() method. It inherits Data::Presenter's new() method -- which then turns around and delegates the task of populating the object with data to Data::Presenter::[Package1]'s _init() method!

This _init() method must be customized by the administrator to properly handle the specific features of each source file. This requires that the administrator be able to write a Perl script to 'clean up' the source file so that only lines containing meaningful data are written to the Data::Presenter object. (See "Analysis of Source Files" above.) With that in mind, a Data::Presenter::[Package1] package must always include the following methods:

  • _init()

    This method is called from within the constructor and is used to populate the hash which is blessed into the new object. It opens a filehandle to the source file and typically reads that source file line-by-line via a Perl while loop. Perl techniques and functions such as regular expressions, split and unpack are used to populate a hash of arrays and to strip out lines in the data source not needed in the object. Should the administrator need to ''munge'' any of the incoming data so that it appears in a uniform format (e.g., '2001-07-02' rather than '7/2/2001' or '07/02/2001'), the administrator should write appropriate code within _init() or in a separate module imported into the main package. Each element of each array used to store a data record must have a defined value. undef is not permitted; assign an empty string to the element instead. A reference to this hash of arrays is returned to the constructor, which blesses it into the object.

  • _extract_rows

    This method is called from within the Data::Presenter select_rows method. In much the same manner as _init(), it permits the administrator to ''munge'' operator-typed data to achieve a uniform format.

The packages Data::Presenter::Census and Data::Presenter::Medinsure found in the t/ directory in this distribution provide examples of _init() and _extract_rows. Search for the lines of code which read:

    # DATA MUNGING STARTS HERE
    # DATA MUNGING ENDS HERE

Here is a simple example of data munging. In the sample configuration file fields.census.data, all elements of @fields are entered entirely in lower-case. Hence, it would be advisable to transform the operator-specified content of $column to all lower-case so that the program does not fail simply because an operator types an upper-case letter. See _extract_rows() in the Data::Presenter::Census package included with this documentation for an example.

Sample file Data::Presenter::Medinsure contains an example of a subroutine written to clean up repetitive coding within the data munging section. Search for sub _prepare_record.

USAGE: Operator

Once the administrator has installed Data::Presenter and completed the preparation of configuration files and Data::Presenter subclass packages, the administrator may turn over to the operator the job of selecting particular source files, output formats and particular entries and fields from within the source files.

Construction of a Data::Presenter Object

Declarations

Using the hospital census example included with this documentation, the operator would construct a Data::Presenter::Census object with the following code:

    use Data::Presenter;
    use lib ("/usr/share/datapresenter");
    use Data::Presenter::Census;

    our @fields = ();
    our %parameters = ();
    our $index = q{};

    my $sourcefile = 'census.txt';
    my $configdir  = "/usr/share/datapresenter";
    my $configfile = "$configdir/fields.census.data";

    do $configfile;

new()

    my $dp1 = Data::Presenter::Census->new(
        $sourcefile, \@fields, \%parameters, $index);

Methods to Report on the Data::Presenter Object Itself

get_data_count()

Returns the current number of data entries in the specified Data::Presenter object. This number does not include those elements in the object whose keys are reserved words. This method takes no arguments and returns one numerical scalar.

    my $data_count = $dp1->get_data_count();
    print 'Data count is now:  ', $data_count, "\n";

Prints the current data count preceded by ''Current data count: ''. This number does not include those elements in the object whose keys are reserved words. This method takes no arguments and returns no values.

    $dp1->print_data_count();

get_keys()

Returns a reference to an array whose elements are an ASCII-betically sorted list of keys to the hash blessed into the Data::Presenter::[Package1] object. This list does not include those elements whose keys are reserved words. This method takes no arguments and returns only the array reference described.

    my $keysref = $dp1->get_keys();
    print "Current data points are:  @$keysref\n";

get_keys_seen()

Returns a reference to a hash whose elements are key-value pairs where the key is the key of an element blessed into the Data::Presenter::[Package1] object and the value is 1, indicating that the key has been seen (a 'seen-hash'). This list does not include those elements whose keys are reserved words. This method takes no arguments and returns only the hash reference described.

    my $seenref = $dp1->get_keys_seen();
    print "Current data points are:  ";
    print "$_ " foreach (sort keys %{$seenref});
    print "\n";

seen_one_column()

Takes as argument a single string which is the name of one of the fields listed in @fields in the configuration file. Returns a reference to a hash whose elements are keyed by the entries for that field in the data source and whose values are the number of times each entry was seen in the data.

For example, if the data consisted of this:

    HERNANDEZ     HECTOR     1963-08-01 456791
    VASQUEZ       ADALBERTO  1973-08-17 786792
    VASQUEZ       ALBERTO    1953-02-28 906786

where the left-most column was described in @fields as lastname, then:

    $seenref = $dp1->seen_one_column('lastname');

and $seenref would hold:

    {
        HERNANDEZ   => 1,
        VASQUEZ     => 2,
    }

Data::Presenter Selection, Sorting and Output Methods

select_rows()

select_rows() enables the operator to establish criteria by which specific entries from the data can be selected for output. It does so not by creating a new object but by striking out entries in the object's current data structure which do not meet the selection criteria.

If the operator were using Perl as an interface to a true database program, selection of entries would most likely be handled by a module such as DBI and an SQL-like query. In that case, it would be possible to write complex selection queries which operate on more than one field at a time such as:

    select rows where 'datebirth' is before 01/01/1960
    AND 'lastname' equals 'Vasquez'
    # (NOTE:  This is generic code,
    #  not true Perl or Perl DBI code.)

Complex selection queries are not yet possible in Data::Presenter. However, you could accomplish much the same objective with a series of simple selection queries that operate on only one field at a time,

    select rows where 'datebirth" is before 01/01/1960

then

    select rows where 'lastname' equals 'Vasquez'

each of which narrows the selection criteria.

How do we accomplish this within Data::Presenter? For each selection query, the operator must define 3 variables: $column, $relation and @choices. These variables are passed to select_rows(), which in turn passes them to certain internal subroutines where their values are manipulated as follows.

  • $column

    $column must be an element of @fields found in the configuration file.

  • $relation

    $relation expresses the verb part of the selection query, i.e., relations such as equals, is less than, E=>, after and so forth. In an attempt to add natural language flexibility to the selection query, Data::Presenter permits the operator to enter a wide variety of mathematical and English expressions here:

    • equality

          'eq', 'equals', 'is', 'is equal to', 'is a member of',
          'is part of', '=', '=='
    • non-equality

          'is', 'is not', 'is not equal to', 'is not a member of',
          'is not part of', 'is less than or greater than',
          'is less than or more than', 'is greater than or less than',
          'is more than or less than', 'does not equal', 'not',
          'not equal to ', 'not equals', '!=', '! =', '!==', '! =='
    • less than

          '<', 'lt', 'is less than', 'is fewer than', 'before'
    • greater than

          '>', 'gt', 'is more than', 'is greater than', 'after'
    • less than or equal to

          '<=', 'le', 'is less than or equal to',
          'is fewer than or equal to', 'on or before', 'before or on'
    • greater than or equal to

          '>=', 'ge', 'is more than or equal to', 'is greater than or equal to',
          'on or after', 'after or on'

    As long as the operator selects a string from the category desired, Data::Presenter will convert it internally in an appropriate manner.

  • @choices

    If the relationship being tested is one of equality or non-equality, then the operator may enter more than one value here, any one of which may satisfy the selection criterion.

        my ($column, $relation, @choices);
    
        $column = 'lastname';
        $relation = 'is';
        @choices = ('Smith', 'Jones');
        $dp1->select_rows($column, $relation, \@choices);

    If, however, the relationship being tested is one of 'less than', 'greater than', etc., then the operator should enter only one value, as the value is establishing a limit above or below which the selection criterion will not be met.

        $column = 'datebirth';
        $relation = 'before';
        @choices = ('01/01/1970');
        $dp1->select_rows($column, $relation, \@choices);

sort_by_column()

sort_by_column() takes only 1 argument: a reference to an array consisting of the fields the operator wishes to present in the final output, listed in the order in which those fields should be sorted. All elements of this array must be elements in @fields. The index field must always be included as one of the columns selected, though it may be placed last if it is not intrinsically important in the final output. sort_by_column() returns a reference to a hash of appropriately sorted data which will be used as input to Data::Presenter methods such as writeformat(), writeformat_plus_header() and writeHTML().

To illustrate:

    my @columns_selected = ('lastname', 'firstname', 'datebirth', 'cno');
    $sorted_data = $dp1->sort_by_column(\@columns_selected);

Suppose that the operator fails to include the index column in @columns_selected. This risks having two or more identical data entries, only the last of which would appear in the final output. As a safety precaution, sort_by_column() throws a warning and places duplicate entries in a text file called dupes.txt.

Note: If you want your output to report only selected entries from the source, and if you want to apply one of the complex Data::Presenter output methods which require application of sort_by_column(), call select_rows before calling sort_by_column(). Otherwise your report may contain blank lines.

print_to_screen() prints to standard output (generally, the computer monitor) a semicolon-delimited display of all entries in the object's current data structure. It takes no arguments and returns no values.

    $dp1->print_to_screen();

A typical line of output will look something like:

    VASQUEZ;JORGE;456787;LAVER;0105;1986-01-17;1956-01-13;

print_to_file() prints to an operator-specified file a semicolon-delimited display of all entries in the object's current data structure. It takes 1 argument -- the user-specified output file -- and returns no values.

    $outputfile = 'census01.txt';
    $dp1->print_to_file($outputfile);

A typical line of output will look exactly like that produced by print_to_screen.

print_with_delimiter(), like print_to_file(), prints to an operator-specified file. print_with_delimiter() allows the operator to specify the character pattern which will be used to delimit display of all entries in the object's current data structure. It does not print the delimiter after the final field in a particular data record. It takes 2 arguments -- the user-specified output file and the character pattern to be used as delimiter -- and returns no values.

    $outputfile = 'delimited01.txt';
    $delimiter = '|||';
    $dp1->print_with_delimiter($outputfile, $delimiter);

The file created print_with_delimiter() is designed to be used as an input to functions such as 'Convert text to tabs' or 'Convert text to table' found in commercial word processing programs. Such functions require delimiter characters in the input. A typical line of output will look something like:

    VASQUEZ|||JORGE|||456787|||LAVER|||0105|||1986-01-17|||1956-01-13

full_report()

full_report() prints to an operator-specified file each entry in the object's current data structure, sorted by the index and explicitly naming each field name/field value pair. It takes 1 argument -- the user-specified output file -- and returns no values.

    $outputfile = 'report01.txt';
    $dp1->full_report($outputfile);

The output for a given entry will look something like:

    456787
        lastname                VASQUEZ
        firstname               JORGE
        cno                     456787
        unit                    LAVER
        ward                    0105
        dateadmission           1986-01-17
        datebirth               1956-01-13

writeformat()

writeformat() writes data via Perl's formline function -- the function which internally powers Perl formats -- to an operator-specified file. writeformat() takes a list of 3 key-value pairs:

    $dp1->writeformat(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
    );
  • sorted

    The value of sorted is a hash reference which is the return value of sort_by_column(). Hence, writeformat() can only be called once sort_by_column() has been called.

  • columns

    The value of columns is a reference to the array of fields in the data source selected for presentation in the output file. It is the same variable which is used as the argument to sort_by_column().

  • file

    The value of file is the name of a file arbitrarily selected by the operator to hold the output of writeformat().

Using the ''census'' example from above, the overall sequence of code needed to use writeformat() would be:

    @columns_selected = ('lastname', 'firstname', 'datebirth', 'cno');
    $sorted_data = $dp1->sort_by_column(\@columns_selected);

    $dp1->writeformat(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
    );

The result of the above call would be a file named format01.txt containing:

    HERNANDEZ     HECTOR     1963-08-01 456791
    VASQUEZ       ADALBERTO  1973-08-17 786792
    VASQUEZ       ALBERTO    1953-02-28 906786

The columnar appearance of the data is governed by choices made by the administrator within the configuration file (here, within fields.census.data). The choice of columns themselves is controlled by the operator via \@columns_selected.

writeformat_plus_header()

writeformat_plus_header() writes data via Perl formats to an operator-specified file and writes a Perl format header to that file as well. writeformat_plus_header() takes a list of 4 key-value pairs. Three of these pairs are the same as in writeformat(); the fourth is:

  • title

            title       => $title,

    title holds text chosen by the operator.

The complete call to writeformat_plus_header looks like this:

    @columns_selected = ('lastname', 'firstname', 'datebirth', 'cno');
    $sorted_data = $dp1->sort_by_column(\@columns_selected);

    $dp1->writeformat_plus_header(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        title       => $title,
    );

and will produce a header and formatted data like this:

    Hospital Census Report

                                          Date       Date of
    Unit   Ward Last Name      First Name of Birth   Admission  C No.
    ------------------------------------------------------------------
    LAVER  0105 VASQUEZ        JORGE      1956-01-13 1986-01-17 456787
    LAVER  0107 VASQUEZ        LEONARDO   1970-15-23 1990-08-23 456788
    SAMSON 0209 VASQUEZ        JOAQUIN    1970-03-25 1990-11-14 456789

The wording of the column headers is governed by choices made by the administrator within the configuration file (here, within fields.census.data). If a particular word in a column header is too long to fit in the space allocated, it will be truncated.

writeformat_with_reprocessing()

writeformat_with_reprocessing() is an advanced application of Data::Presenter and the reader may wish to skip this section until other parts of the module have been mastered.

writeformat_with_reprocessing() permits a sophisticated administrator to activate ''last minute'' substitutions in the strings printed out from the format accumulator variable $^A. Suppose, for example, that a school administrator faced the problem of scheduling classes in different classrooms and in various time slots. Suppose further that, for ease of programming or data entry, the time slots were identified by chronologically sequential numbers and that instructors were identified by a unique ID built up from their first and last names. Applying an ordinary writeformat() to such data might show output like this

    11 Arithmetic                   Jones        4044 4044_11
    11 Language Studies             WilsonT      4054 4054_11
    12 Bible Study                  Eliade       4068 4068_12
    12 Introduction to Computers    Knuth        4086 4086_12
    13 Psychology                   Adler        4077 4077_13
    13 Social Science               JonesT       4044 4044_13
    51 World History                Wells        4052 4052_51
    51 Music Appreciation           WilsonW      4044 4044_51

where 11 mapped to 'Monday, 9:00 am', 12 to 'Monday, 10:00 am', 51 to 'Friday, 9:00 am' and so forth and where the fields underlying this output were 'timeslot', 'classname', 'instructor', 'room' and 'sessionID'. While this presentation is useful, a client might wish to have the time slots and instructor IDs decoded for more readable output:

    Monday, 9:00     Arithmetic                 E Jones        4044 4044_11
    Monday, 9:00     Language Studies           T Wilson       4054 4054_11
    Monday, 10:00    Bible Study                M Eliade       4068 4068_12
    Monday, 10:00    Introduction to Computers  D Knuth        4086 4086_12
    Monday, 11:00    Psychology                 A Adler        4077 4077_13
    Monday, 11:00    Social Science             T Jones        4044 4044_13
    Friday, 9:00     World History              H Wells        4052 4052_51
    Friday, 9:00     Music Appreciation         W Wilson       4044 4044_51

Time slots coded with chronologically sequential numbers can be ordered to sort numerically in the %parameters established in the fields.[package1].data file corresponding to a particular Data::Presenter::[package1]. Their human-language equivalents, however, will not sort properly, as, for example, 'Friday' comes before 'Monday' in an alphabetical or ASCII-betical sort. Clearly, it would be desirable to establish the sorting order by relying on the chronologically sequential time slots and yet have the printed output reflect more human-readable days of the week and times. Analogously, for the instructor we might wish to display the first initial and last name in our printed output rather than his/her ID code.

The order in which data records appear in output is determined by sort_by_column() before writeformat() is called. How can we preserve this order in the final output?

Answer: After we have stored a given formed line in $^A, we reprocess that line by calling an internal subroutine defined in the invoking class, Data::Presenter::[package1]::_reprocessor(), which tells Perl to splice out certain portions of the formed line and substitute more human-readable copy. The information needed to make _reprocessor() work comes from two places.

First, from a hash passed by reference as an argument to writeformat_with_reprocessing(). writeformat_with_reprocessing() takes a list of four key-value pairs, the first three of which are the same as those passed to writeformat(). The fourth key-value pair to writeformat_with_reprocessing() is a reference to a hash whose keys are the names of the fields in the data records where we wish to make substitutions and whose corresponding values are the number of characters the field will be allocated after substitution. The call to writeformat_with_reprocessing() would therefore look like this:

    %reprocessing_info = (
        timeslot    => 17,
        instructor  => 15,
    );

    $dp1->writeformat_with_reprocessing(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        reprocess   => \%reprocessing_info,
    );

Second, writeformat_with_reprocessing() takes advantage of the fact that Data::Presenter's package global hash %reserved contains four keys -- fields, parameters, index and options -- only the first three of which are used in Data::Presenter's constructor or sorting methods. Early in the development of Data::Presenter the keyword options was deliberately left unused so as to be available for future use.

The sophisticated administrator can make use of the options key to store metadata in a variety of ways. In writing Data::Presenter::[package1]::_init(), the administrator prepares the way for last-minute reprocessing by creating an options key in the hash to be blessed into the Data::Presenter::[package1]() object. The value corresponding to the key options is itself a hash with two elements keyed by subs and sources. If $dp1 is the object and %data is the hash blessed into the object, then we are looking at these two elements:

    $data{options}{subs}
    $data{options}{sources}

The values corresponding to these two keys are references to yet more hashes. The hash which is the value for $data{options}{subs} hash keys whose elements are the name of subroutines, each of which is built up from the string reprocess_ concatenated with the name of the field to be reprocessed, e.g.

    $data{options}{subs} = {
        reprocess_timeslot      => 1,
        reprocess_instructor    => 1,
    };

These field-specific internal reprocessing subroutines may be defined by the administrator in Data::Presenter::[package1]() or they may be imported from some other module. writeformat_with_reprocessing() verifies that these subroutines are actually present in Data::Presenter::[package1]() regardless of where they were originally found.

What about $data{options}{sources}? This location stores all the original data from which substitutions are made. Example:

    $data{options}{sources} = {
        timeslot   => {
            11 => ['Monday', '9:00 am'  ],
            12 => ['Monday', '10:00 am' ],
            13 => ['Monday', '11:00 am' ],
            51 => ['Friday', '9:00 am'  ],
        },
        instructor => {
            'Jones'     => ['Jones',  'E' ],
            'WilsonT'   => ['Wilson', 'T' ],
            'Eliade'    => ['Eliade', 'M' ],
            'Knuth'     => ['Knuth',  'D' ],
            'Adler'     => ['Adler',  'A' ],
            'JonesT'    => ['Jones',  'T' ],
            'Wells'     => ['Wells',  'H' ],
            'WilsonW'   => ['Wilson', 'W' ],
        }
    };

The point at which this data gets into the object is, of course, Data::Presenter::[package1]::_init(). What the administrator does at that point is limited only by his/her imagination. Data::Presenter seeks to bless a hash into its object. That hash must meet the following requirements:

  • With the exception of elements holding metadata, each element holds an array, each of whose elements must be a number or a string.

  • Three metadata elements keyed as follows must be present:

    • fields

    • parameters

    • index

    The fourth metadata element keyed by options is required only if some Data::Presenter method has been written which requires the information stored therein. writeformat_with_reprocessing() is the only such method currently present, but additional methods using the options key may be added in the future.

The author has used two different approaches to the problem of initializing Data::Presenter::[package1] objects.

  • In the first, more standard approach, the name of a source file can be passed to the constructor, which passes it on to the initializer, which then opens a filehandle to the file and processes with regular expressions, unpack, etc. to build an array for each data record. Keyed by a unique ID, a reference to this array then becomes the value of an element of the hash which, once metadata is added, is blessed into the Data::Presenter::[package1] object. The source for the metadata is the fields.[package1].data file and the @fields, %parameters and $index found therein.

  • A second approach asks: ''Instead of having _init() do data munging on a file, why not directly pass it a hash of arrays? Better still, why not pass it a hash of arrays which already has an 'options' key defined? And better still yet, why not pass it an object produced by some other Perl module and containing a blessed hash of arrays with an already defined options key?'' In this approach, Data::Presenter::[package1]::_init() does no data munging. It is mainly concerned with defining the three required metadata elements.

writeformat_deluxe()

writeformat_deluxe() is an advanced application of Data::Presenter and the reader may wish to skip this section until other parts of the module have been mastered.

writeformat_deluxe() enables the user to have both column headers (as in writeformat_plus_header()) and dynamic, 'just-in-time' reprocessing of data in selected fields (as in writeformat_with_reprocessing()). Call it just as you would writeformat_with_reprocessing(), but add a key-value pair keyed by title.

    %reprocessing_info = (
        timeslot    => 17,
        instructor  => 15,
    );

    $dp1->writeformat_deluxe(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        reprocess   => \%reprocessing_info,
        title       => $title,
    );

writedelimited()

The Data::Presenter::writeformat...() family of methods discussed above write data to plain-text files in columns aligned with whitespace via Perl's formline function -- the function which internally powers Perl formats. This is suitable if the ultimate consumer of the data is satisfied to read a plain-text file. However, in many business contexts data consumers are more accustomed to word processing files than to plain-text files. In particular, data consumers are accustomed to data presented in tables created by commercial word processing programs. Such programs generally have the capacity to take text in which individual lines consist of data separated by delimiter characters such as tabs or commas and transform that text into rows in a table where the delimiters signal the borders between table cells.

To that end, the author has created the Data::Presenter::writedelimited...() family of subroutines to print output to plain-text files intended for further processing within word processing programs. The simplest method in this family, writedelimited(), takes a list of three key-value pairs:

  • sorted

    The value keyed by sorted is a hash reference which is the return value of sort_by_column(). Hence, writedelimited() can only be called once sort_by_column() has been called.

  • file

    The value keyed by file is the name of a file arbitrarily selected by the operator to hold the output of writedelimited().

  • delimiter

    The value keyed by delimiter is the user-selected delimiter character or characters which will delineate fields within an individual record in the output file. Typically, this character will be a tab (\t), comma (,) or similar character that a word processing program's 'convert text to table' feature can use to establish columns.

Using the ''census'' example from above, the overall sequence of code needed to use writedelimited() would be:

    @columns_selected = ('lastname', 'firstname', 'datebirth', 'cno');
    $sorted_data = $dp1->sort_by_column(\@columns_selected);

    $dp1->writedelimited(
        sorted      => $sorted_data,
        file        => $outputfile,
        delimiter   => $delimiter,
    );

Note that, unlike writeformat(), writedelimited() does not require a reference to @columns_selected to be passed as an argument.

Depending on the number of characters in a text editor's tab-stop setting, the result of the above call might look like:

    HERNANDEZ    HECTOR    1963-08-01    456791
    VASQUEZ    ADALBERTO    1973-08-17    786792
    VASQUEZ    ALBERTO 1953-02-28    906786

This is obviously less readable than the output of writeformat() -- but since the output of writedelimited() is intended for further processing by a word processing program rather than for final use, this is not a major concern.

writedelimited_plus_header()

Just as writeformat_plus_header() extended writeformat() to include column headers, writedelimited_plus_header() extends writedelimited() to include column headers, separated by the same delimiter character as the data, in a plain-text file intended for further processing by a word processing program.

writedelimited_plus_header() takes a list of four key-value pairs: sorted, columns, file, and delimiter. The complete call to writedelimited_plus_header looks like this:

    @columns_selected = (
        'unit', 'ward', 'lastname', 'firstname',
        'datebirth', 'dateadmission', 'cno');
    $sorted_data = $dp1->sort_by_column(\@columns_selected);

    $dp1->writedelimited_plus_header(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        delimiter   => $delimiter,
    );

Note that, unlike writeformat_plus_header(), writedelimited_plus_header() does not take $title as an argument. It is felt that any title would be more likely to be supplied in the word-processing file which ultimately holds the data prepared by writedelimited_plus_header() and that its inclusion at this point might interfere with the workings of the word processing program's 'convert text to table' feature.

Depending on the number of characters in a text editor's tab-stop setting, the result of the above call might look like:

                Date    Date of
    Unit    Ward    Last Name    First Name    of Birth       Admission    C No.
    LAVER    0105    VASQUEZ JORGE    1956-01-13    1986-01-17     456787
    LAVER    0107    VASQUEZ LEONARDO    1970-15-23    1990-08-23   456788
    SAMSON    0209    VASQUEZ JOAQUIN 1970-03-25    1990-11-14     456789

Again, the readability of the delimited copy in the plain-text file here is not as important as how correctly the delimiter has been chosen in order to produce good results once the file is further processed by a word processing program.

Note that, unlike writeformat_plus_header(), writedelimited_plus_header() does not produce a hyphen line. The author feels that the separation of header and body within the table is here better handled within the word processing file which ultimately holds the data prepared by writedelimited_plus_header().

Note further that, unlike writeformat_plus_header(), writedelimited_plus_header() does not truncate the words in column headers. This is because the writedelimited...() family of methods does not impose a maximum width on output fields as does the writeformat...() family of methods. Hence, there is no need to truncate headers to fit within specified column widths. Column widths in the writedelimited...() family are ultimately determined by the word processing program which produces the final output.

writedelimited_with_reprocessing()

writedelimited_with_reprocessing() is an advanced application of Data::Presenter and the reader may wish to skip this section until other parts of the module have been mastered.

writedelimited_with_reprocessing(), like writeformat_with_reprocessing(), permits a sophisticated administrator to activate ''last minute'' substitutions in strings to be printed such that substitutions do not affect the pre-established sorting order. For a full discussion of the rationale for this feature, see the discussion of "writeformat_with_reprocessing()" above.

writedelimited_with_reprocessing() takes a list of five key-value pairs, four of which are the same arguments passed to writeformat_with_reprocessing(). The fifth key-value pair is a reference to an array holding a list of those columns selected for output upon which the user chooses to perform reprocessing.

    @reprocessing_info = qw( instructor timeslot room );

    $dp1->writedelimited_with_reprocessing(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        delimiter   => $delimiter,
        reprocess   => \@reprocessing_info,
    );

Taking the classroom scheduling problem presented above, writedelimited_with_reprocessing() would produce output looking something like this:

    Monday, 9:00    Arithmetic    E Jones 4044    4044_11
    Monday, 9:00    Language Studies    T Wilson    4054   4054_11
    Monday, 10:00    Bible Study    M Eliade    4068    4068_12
    Monday, 10:00    Introduction to Computers    D Knuth 4086   4086_12
    Monday, 11:00    Psychology    A Adler 4077    4077_13
    Monday, 11:00    Social Science    T Jones 4044    4044_13
    Friday, 9:00    World History    H Wells 4052    4052_51
    Friday, 9:00    Music Appreciation    W Wilson    4044   4044_51

Usage of writedelimited_with_reprocessing() requires that the administrator appropriately define Data::Presenter::[Package1]::_reprocess_delimit() and Data::Presenter::[Package1]::_init() subroutines in the invoking package, along with appropriate subroutines specific to each argument capable of being reprocessed. Again, see the discussion in "writeformat_with_reprocessing()".

writedelimited_deluxe()

writedelimited_deluxe() is an advanced application of Data::Presenter and the reader may wish to skip this section until other parts of the module have been mastered.

writedelimited_deluxe() completes the parallel structure between the writeformat...() and writedelimited...() families of Data::Presenter methods by enabling the user to have both column headers (as in writedelimited_plus_header()) and dynamic, 'just-in-time' reprocessing of data in selected fields (as in writedelimited_with_reprocessing()). Except for the name of the method called, the call to writedelimited_deluxe() is the same as for writedelimited_with_reprocessing():

    @reprocessing_info = qw( instructor timeslot );

    $dp1->writedelimited_deluxe(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $outputfile,
        delimiter   => $delimiter,
        reprocess   => \@reprocessing_info,
    );

Using the classroom scheduling example from above,the output from writedelimited_deluxe() might look like this:

    Timeslot    Group    Instructor    Room    GroupID
    Monday, 9:00    Arithmetic    E Jones 4044    4044_11
    Monday, 9:00    Language Studies    T Wilson    4054   4054_11
    Monday, 10:00    Bible Study    M Eliade    4068    4068_12
    Monday, 10:00    Introduction to Computers    D Knuth 4086   4086_12
    Monday, 11:00    Psychology    A Adler 4077    4077_13
    Monday, 11:00    Social Science    T Jones 4044    4044_13
    Friday, 9:00    World History    H Wells 4052    4052_51
    Friday, 9:00    Music Appreciation    W Wilson    4044   4044_51

As with writedelimited_with_reprocessing(), writedelimited_deluxe() requires careful preparation on the part of the administrator. See the discussion under "writeformat_with_reprocessing()" above.

writeHTML()

In its current formulation, writeHTML() works very much like writeformat_plus_header(). It writes data to an operator-specified HTML file and writes an appropriate header to that file as well. writeHTML() takes the same 4 arguments as writeformat_plus_header(): $sorted_data, \@columns_selected, $outputfile and $title. The body of the resulting HTML file is more similar to a Perl format than to an HTML table. (This may be upgraded to a true HTML table in a future release.)

    $dp1->writeHTML(
        sorted      => $sorted_data,
        columns     => \@columns_selected,
        file        => $HTMLoutputfile,  # must have .html extension
        title       => $title,
    );

Data::Presenter::Combo Objects

It is quite possible that we may have two or more different database reports which present data on the same underlying universe or population. If these reports share a common index field which can be used to uniquely identify each entry in the underlying population, then we would like to be able to combine these sources, manipulate the data and re-output them via the simple and complex Data::Presenter output methods described in the "Synopsis" above.

In other words, if we have already created

    my $dp1 = Data::Presenter::[Package1]->new(
        $sourcefile, \@fields,\%parameters, $index);
    my $dp2 = Data::Presenter::[Package2]->new(
        $sourcefile, \@fields,\%parameters, $index);
    ...
    my $dpx = Data::Presenter::[Package2]->new(
        $sourcefile, \@fields,\%parameters, $index);

we would like to be able to define an array of the objects we have created and construct a new object combining the first two in an orderly manner:

    my @objects = ($dp1, $dp2, ... $dpx);
    my $dpC = Data::Presenter::[some subclass]->new(\@objects);

We would then like to be able to call all the Data::Presenter sorting, selecting and output methods discussed above on $dpC without having to re-specify $sourcefile, \@fields, \%parameters or $index.

Can we do this? Yes, we can. More precisely, we can create two new types of objects: one in which the data entries comprise those entries found in each of the original sources, and one in which the data entries comprise those found in any of the sources. In mathematical terms, we can create either a new object which represents the intersection of the sources or one which represents the union of the sources. We call these as follows:

    my $dpI = Data::Presenter::Combo::Intersect->new(\@objects);

and

    my $dpU = Data::Presenter::Combo::Union->new(\@objects);

Note the following:

  • For Combo objects, unlike all other Data::Presenter::[Package1] objects, we pass only one variable -- a reference to an array of Data::Presenter objects -- to the constructor instead of three.

  • Combo objects are always called from a subclass of Data::Presenter::Combo such as Data::Presenter::Combo::Intersect or Data::Presenter::Combo::Union. They are not called from Data::Presenter::Combo itself.

  • The regular Data::Presenter objects which are selected to make up a Data::Presenter::Combo object must share a field which serves as the index field for each object. This field must carry the same name in @fields in the fields.XXX.data configuration files corresponding to each of the objects, though that field does not have to appear in the same element position in @fields in each such file. Similarly, the parameters on the value side of %parameters for the index field must be specified identically in each configuration file. If these conditions are not met, a Data::Presenter::Combo object cannot be constructed and the program will die with an error message.

    Let us illlustrate this point. Suppose that we have two configuration files, fields1.data and fields2.data, corresponding to two different Data::Presenter objects, $obj1 and $obj2. For fields1.data, we have:

        @fields = qw(lastname, firstname, cno);
    
        %parameters = (
            $fields[0]        => [14, 'U', 'a', 'Last Name'],
            $fields[1]        => [10, 'U', 'a', 'First Name'],
            $fields[2]        => [ 7, 'U', 'n', 'C No.'],
        );
    
        $index = 2;

    For fields2.data, we have:

        @fields = qw(cno, dateadmission, datebirth);
    
        %parameters = (
            $fields[0]        => [ 7, 'U', 'n', 'C No.'],
            $fields[1]        => [10, 'U', 'a', 'Date of Admission'],
            $fields[2]        => [10, 'U', 'a', 'Date of Birth'],
        );
    
        $index = 0;

    Can $obj1 and $obj2 be combined into a Data::Presenter::Combo object? Yes, they can. cno is named as the index field in each configuration file, and the values assigned to $fields[$index] in each are identical: [ 7, 'U', 'n', 'C No.'].

    Suppose, however, that we had a third configuration file, fields3.data, corresponding to yet another Data::Presenter object, $obj3. If the contents of fields3.data were:

        @fields = qw(cno, dateadmission, datebirth);
    
        %parameters = (
            $fields[0]        => [ 7, 'U', 'n', 'Serial No.'],
            $fields[1]        => [10, 'U', 'a', 'Date of Admission'],
            $fields[2]        => [10, 'U', 'a', 'Date of Birth'],
        );
    
        $index = 0;

    then $obj3 could not be combined with either $obj1 or $obj2 because the elements of $parameters{$fields[$index]} in $obj3 are not identical to those in the first two objects.

Here are some things to consider in using Data::Presenter::Combo objects:

  • Q: What happens if $dp1 has entries not found in $dp2 (or vice versa)?

    A: It depends on whether you are interested in only those entries found in each of the data sources (the mathematical intersection of the sources) or those found in any of the sources (the mathematical union). Only those entries found in both $dp1 and $dp2 are included in a Data::Presenter::Combo::Intersect object. But if you are constructing a Data::Presenter::Combo::Union object, any entry found in either source file will be represented in the Union object. These properties would hold no matter how many sources you used as arguments.

  • Q: What happens if both $dp1 and $dp2 have fields named, for instance, 'lastname'?

    A: Left-to-right precedence determines which object's 'lastname' field is entered into $dpC. Assuming that $dp1 is listed first in @objects, all the fields in $dp1 will appear in $dpC. Only those fields in $dp2 not found in $dp1 will be added to $dpC. If, however, @objects were defined as ($dp2, $dp1), then $dp2's fields would have precedence over those of $dp1. If a $dp3 object were constructed based on yet another data source, only those fields entries not found in $dp1 or $dp2 would be included in the Combo object -- and so forth. This left-to-right precedence rule governs both the data entries in $dpC as well as the selection, sorting and output characteristics.

BUGS

It was discovered that in versions 0.68 and earlier, sort_by_column() failed to sort data properly in descending order. This has been fixed. See Changes.

REFERENCES

The fundamental reference for this program is, of course, the Camel book: Larry Wall, Tom Christiansen, Jon Orwant. <Programming Perl, 3rd ed. O'Reilly & Associates, 2000, http://www.oreilly.com/catalog/pperl3/.

A careful reading of the code will tell any competent Perl hacker that many tricks were taken from the Ram book: Tom Christiansen & Nathan Torkington. Perl Cookbook. O'Reilly & Associates, 1998, http://www.oreilly.com/catalog/cookbook/.

The object-oriented programming skills needed to develop this program were learned via extensive re-reading of Chapters 3, 6 and 7 of Damian Conway's Object Oriented Perl. Manning Publications, 2000, http://www.manning.com/Conway/index.html.

This program goes to great length to follow the principle of 'Repeated Code is a Mistake' http://www.perl.com/pub/a/2000/11/repair3.html -- a specific application of the general Perl principle of Laziness. The author grasped this principle best following a 2001 talk by Mark-Jason Dominus http://perl.plover.com/ to the New York Perlmongers http://ny.pm.org/.

Most of the code in the _init() subroutines was written before the author read Data Munging with Perl http://www.manning.com/cross/index.html by Dave Cross. Nonetheless, that is an excellent discussion of the problems involved in understanding the structure of data sources.

The discussion of bugs in this program benefitted from discussions on the Perl Seminar New York mailing list http://groups.yahoo.com/group/perlsemny, particularly with Martin Heinsdorf.

Correcting the bug involving sorting in descending order entailed a complete rewrite of much code. This rewrite was greatly assisted by brian d foy and Tanktalus in the Perlmonks thread ''Building a sorting subroutine on the fly'' (http://perlmonks.org/?node_id=512460).

AUTHOR

James E. Keenan (jkeenan@cpan.org).

Creation date: October 25, 2001. Last modification date: February 10, 2008. Copyright (c) 2001-5 James E. Keenan. United States. All rights reserved.

All data presented in this documentation or in the sample files in the archive accompanying this documentation are dummy copy. The data was entirely fabricated by the author for heuristic purposes. Any resemblance to any person, living or dead, is coincidental.

This is free software which you may distribute under the same terms as Perl itself.