Reconciliation - Perl extension for data reconciliation
use Data::Table; use Data::Reconciliation; use Data::Reconciliation::Rule; my $src1 = Data::Table::fromCSV('test1.dat'); my $src2 = Data::Table::fromCSV('test2.dat'); my $rule = new Data::Reconciliation::Rule($src1, $src2); $rule->identification([<col_names>], \&canon_sub_1, [<col_names>], \&canon_sub_2); $rule->add_comparison([<col_names>], \&canon_sub_3, [<col_names>], \&canon_sub_4, \&compare_sub, \@constants); my $r = new Data::Reconciliation($src1, $src2, -rules => [$rule]); $r->build_signatures(0); my($dup_signs_1, $dup_signs_2) = $r->duplicate_signatures; my($dup_signs_1, $dup_signs_2) = $r->delete_dup_signatures; my($widow_signs_1, $widow_signs_2) = $r->widow_signatures; my($widow_signs_1, $widow_signs_2) = $r->delete_wid_signatures; my @diffs = $r->reconciliate(0); package UserFunctions; sub fun_1 (\@\@\@\@;\@$) { my $field_names_1 = shift; my $field_values_1 = shift; my $field_names_2 = shift; my $field_values_2 = shift; my $constants = shift; my $func_name = shift; my $ok = (...); return undef if $ok; return "Not ok (comparing with $func_name)"; }
This creates a new Data::Reconciliation object. The first two parameters are the sources to be reconciliated. They must be Data::Table objects.
Data::Reconciliation
Data::Table
The other parameters are optional named parameters.
-rules => [ <rule list> ]
Provides the reconciliations rules. Each rule must be a Reconciliated::Data::Rule object (Reconciliated::Data::Rule.) The default rule uses the first column for the identification and compares one to one the other columns.
Reconciliated::Data::Rule
identification
build_signatures
This method is used to initialise a reconciliation process. It will setup the data needed to identify the records to be compared in the two sources. The rule number must be provided as parameter.
signatures
Returns two hash refs containing duplicate signatures as keys and array refs containing record indices as values. These signatures are the signatures actually built by the build_signatures method above.
duplicate_signatures
This method identifies in the two sources signatures which are not uniques. The rule nb must be provided as parameter. (The actual reconciliation algorithm only works on source with cardinality 1..1).
Returns two hash refs containing duplicate signatures as keys and array refs containing record indices as values.
delete_dup_signatures
Returns two hash refs containing the deleted signatures as keys and array refs containing record indices as values. The duplicates keys are calculated by calling the duplicate_signatures method.
widow_signatures
Returns two hash refs containing signatures from one data source missing in the other as keys and array refs containing record indices as values.
delete_wid_signatures
Returns two hash refs containing the deleted sigantures as values and record indices as values. The widow keys are calculated by calling the widow_keys method.
widow_keys
reconciliate
Returns a list of array refs. Each entry being an array containing respectively the signature, a reference on an arrayref containing the record indices in the sources, a reference on the applied rule, and a string describing the difference as returned by the (user defined ?) comparison function.
for reconciliate To work properly it is necessary to remove duplicate and widow signatures.
#!/usr/local/bin/perl -w use lib qw(../lib); use Data::Table; use Data::Reconciliation; use Data::Reconciliation::Rule; my $file1 = new Data::Table ([['1234', 0, '123,45', 'FRF'], ['1234', 1, '-123,45', 'FRF'], ['1235', 0, '122,45', 'FRF'], ['1236', 0, '121,50', 'FRF'], ['1237', 0, '121,50', 'FRF'], ['1237', 0, '50,121', 'CHF']], ['dealnb', 'leg', 'amt', 'ccy']); my $file2 = new Data::Table ([['1234-0', 123.45, 'FRF'], ['1234-1', -123.45, 'FRF'], ['1235-0', 122.47, 'FRF'], ['1236-0', 121.50, 'DEM'], ['1239-0', 50.121, 'CHF']], ['external-key', 'Amount', 'ccy']); my $rule = new Data::Reconciliation::Rule($file1, $file2); $rule->identification(['dealnb', 'leg'], sub{ join '-', @_ }, ['external-key'], undef); $rule->add_comparison(['amt'], sub {(my $v = shift) =~ tr/,/./; $v}, ['Amount'], undef, undef); $rule->add_comparison(['ccy'], undef, ['ccy'], undef, undef); my $r = new Data::Reconciliation($file1, $file2, -rules => [$rule]); $r->build_signatures(0); my($dup_signs_from_1, $dup_signs_from_2) = $r->delete_dup_signatures; my($widow_signs_1, $widow_signs_2) = $r->delete_wid_signatures; print "The following signatures in Table1 leads to multiple entries :\n\t[", join('][', sort keys %$dup_signs_from_1), "]\n" if keys %$dup_signs_from_1; print "The following signatures in Table2 leads to multiple entries :\n\t[", join('][', sort keys %$dup_keys_from_2), "]\n" if keys %$dup_keys_from_2; print "The following entries in Table1 have no correspondant in Table 2 :\n\t[", join('][', sort keys %$widow_signs_1), "]\n" if keys %$widow_signs_1; print "The following entries in Table2 have no correspondant in Table 1 :\n\t[", join('][', sort keys %$widow_signs_2), "]\n" if keys %$widow_signs_2; @diffs = $r->reconciliate(0); print "The following entries were found to be different :\n\t", join("\n\t", map {$_->[0] . ': ' . $_->[3]} @diffs), "\n" if @diffs;
The following signatures in Table1 leads to multiple entries : [1237-0] The following entries in Table2 have no correspondant in Table 1 : [1239-0] The following entries were found to be different : 1236-0: SRC1.ccy=[FRF] <> SRC2.ccy=[DEM] 1235-0: SRC1.amt=[122.45] <> SRC2.Amount=[122.47]
Martial.Chateauvieux@sfs.siemens.de, O.Capdevielle@cadextan.fr
Data::Reconciliation, Data::Table
To install Data::Reconciliation, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Data::Reconciliation
CPAN shell
perl -MCPAN -e shell install Data::Reconciliation
For more information on module installation, please visit the detailed CPAN module installation guide.