NAME

Data::Reconciliation::Rule - Perl extension data reconciliation

SYNOPSIS

   use Data::Reconciliation::Rule;

   my $r = new Data::Reconciliation::Rule(<Data::Table>, <Data::Table>);

   $r->identification(\@field_names_1,
                      \&canonical_1,
                      \@field_names_2,
                      \&canonical_2);

   $r->add_comparison(\@field_names_1,
                      \&canon_sub_1,
                      \@fields_names_2,
                      \&canon_sub_2,
                      \&compare_sub,
                      $compare_sub_name,
                      \@constants);

   my $sigur = $r->signature($src_nb,  # {0, 1}
                             \@record);

   my @msgs  = $r->compare(\@record_1, 
                           \@record2);

DESCRIPTION

This package implements the rule class used by the Data::Reconciliation algorithm.

A Data::Reconciliation::Rule is composed of two parts, the identification part and the comparison part.

CONSTRUCTOR

new

The constructor takes needs the two sources to be reconciliated as parameters. The sources must be of type Data::Table. (The sources are needed for the conversion of column names into column indices, and to check that the column names (resp. indices) passed to the methods actually exist).

METHODS

identification

The identification part provides a the mean for the Reconciliation algorithm to build a signature for the records in the two sources to be reconciliated. For each source, a list of column names must be provided and an optional function to build a canonical form of the signature (This function will typically change the value to uppercase, suppress non-alphanumeric characters, etc...). if not defined the function defaults to sub { join '|', @_ }

add_comparison

The comparison part provides the mean for the Reconciliation algorithms to compare records and report differences. For one rule, multiple comparisons can be specified (one per column for example).

for each data source, the list of columns names to be used in the comparison must be specified. An optional subroutine to rework the field values can be specified. An optional compare function can be specified. The default compare sub function is:

        sub (\@\@\@\@;\@$) {
            my $field_names_1  = shift;
            my $field_values_1 = shift;
            my $field_names_2  = shift;
            my $field_values_2 = shift;
            #my $constants     = shift;
            #my $func_name     = shift;
            
            my $value_1 = join '|', @$field_values_1;
            my $value_2 = join '|', @$field_values_2;

            if (isNumber($value_1) ?
                $value_1 <=> $value_2 :
                trim($value_1) cmp trim($value_2)) {
                return sprintf("SRC1.%s=[%s] <> SRC2.%s=[%s]",
                               join('.', @$field_names_1),
                               $value_1,
                               join('.', @$field_names_2),
                               $value_2);
            } else { 
                return undef ;
            }
        }
signature

The signature method is called by the Data::Reconciliation algorithm to compute values which are used to identify records to be compared in the two sources. It uses the values passed to the identification method.

compare

The compare method is called by the Data::Reconciliation algorithm to compare the records identified by using the signature method. It uses the values passed to the add_comparison method.

EXPORT

None.

AUTHORS

Martial.Chateauvieux@sfs.siemens.de, O.Capdevielle@cadextan.fr

SEE ALSO

Data::Reconciliation, Data::Table