NAME
Data::Reconciliation::Rule - Perl extension data reconciliation
SYNOPSIS
use Data::Reconciliation::Rule;
my $r = new Data::Reconciliation::Rule(<Data::Table>, <Data::Table>);
$r->identification(\@field_names_1,
\&canonical_1,
\@field_names_2,
\&canonical_2);
$r->add_comparison(\@field_names_1,
\&canon_sub_1,
\@fields_names_2,
\&canon_sub_2,
\&compare_sub,
$compare_sub_name,
\@constants);
my $sigur = $r->signature($src_nb, # {0, 1}
\@record);
my @msgs = $r->compare(\@record_1,
\@record2);
DESCRIPTION
This package implements the rule class used by the Data::Reconciliation
algorithm.
A Data::Reconciliation::Rule
is composed of two parts, the identification part and the comparison part.
CONSTRUCTOR
new
-
The constructor takes needs the two sources to be reconciliated as parameters. The sources must be of type
Data::Table
. (The sources are needed for the conversion of column names into column indices, and to check that the column names (resp. indices) passed to the methods actually exist).
METHODS
identification
-
The identification part provides a the mean for the Reconciliation algorithm to build a signature for the records in the two sources to be reconciliated. For each source, a list of column names must be provided and an optional function to build a canonical form of the signature (This function will typically change the value to uppercase, suppress non-alphanumeric characters, etc...). if not defined the function defaults to
sub { join '|', @_ }
add_comparison
-
The comparison part provides the mean for the Reconciliation algorithms to compare records and report differences. For one rule, multiple comparisons can be specified (one per column for example).
for each data source, the list of columns names to be used in the comparison must be specified. An optional subroutine to rework the field values can be specified. An optional compare function can be specified. The default compare sub function is:
sub (\@\@\@\@;\@$) { my $field_names_1 = shift; my $field_values_1 = shift; my $field_names_2 = shift; my $field_values_2 = shift; #my $constants = shift; #my $func_name = shift; my $value_1 = join '|', @$field_values_1; my $value_2 = join '|', @$field_values_2; if (isNumber($value_1) ? $value_1 <=> $value_2 : trim($value_1) cmp trim($value_2)) { return sprintf("SRC1.%s=[%s] <> SRC2.%s=[%s]", join('.', @$field_names_1), $value_1, join('.', @$field_names_2), $value_2); } else { return undef ; } }
signature
-
The signature method is called by the
Data::Reconciliation
algorithm to compute values which are used to identify records to be compared in the two sources. It uses the values passed to the identification method. compare
-
The compare method is called by the
Data::Reconciliation
algorithm to compare the records identified by using the signature method. It uses the values passed to the add_comparison method.
EXPORT
None.
AUTHORS
Martial.Chateauvieux@sfs.siemens.de, O.Capdevielle@cadextan.fr