Charles Colbourn > Data-Sync-0.08 > Data::Sync



Annotate this POD

Related Modules



Open  0
View/Report Bugs
Module Version: 0.08   Source  


Data::Sync - A simple metadirectory/datapump module (advanced usage)


The basic functionality of Data::Sync is described in the main POD documentation. This documentation details more complex or rarely used functionality. You may have a requirement to customise Data::Sync by subclassing it, or you may wish to have more granular control over data flow for example. Or you may just want to use individual functions of the module as a sort of toolkit. If any of these are the case, you're reading the right document



 my $AoH = $sync->get();

This function triggers the read function defined in the Data::Sync constructor, and returns the resulting data as an array of hashes (see perlref if you are unfamilar with how to access this kind of data structure).


 my $result = $sync->put($AoH);

This function writes the content of $AoH (an array of hashes) to the target defined in the Data::Sync constructor, and returns the result (1 for success).


 my $hash = $sync->hashrecord($hashref,$arrayref);

This function is used internally by Data::Sync to detect changes in records (for minimising writes), but is also accessible for other uses. It takes a hashref of the db record, and an arrayref of attributes to include in the hash. It returns a hex representation of the MD5 hash of the concatenated record values. You can also call it with

 my $hash = Data::Sync->hashrecord($hashref,$arrayref);

(i.e. it does not depend on values set in the object constructor).


 my $remapped = $sync->remap($AoH)

Remaps the field names, as defined in $sync->mappings. Takes array of hashes, returns array of hashes with the hash keys renamed.


 my $transformed = $sync->runtransform($AoH);

Runs the transformations defined in $sync->transform. Takes array of hashes, returns array of hashes. You can use this function in conjunction with mappings as a utility function to perform recursive transformations of data:

 my $sync = Data::Sync->new();
 my $transformed = $sync->runtransform($AoH);

will recurse through a data structure (an array of arrays, array of hashes, array of hashes of arrays of hashes etc) of arbitrary depth, performing the transform on all hash values where the key matches (including every value of hash values containing anonymous arrays).


 my $transformed = $sync->makebuiltattributes($AoH);

Create attributes in the data set as defined in buildattributes().


 my $hashed = $self->scanhashtable($AoH);

Check the dataset against the stored hashtable, to filter out unchanged records. Returns a dataset with unchanged records removed. Requires that hashattributes has been set in the target definition.


 my $hash = $self->hashrecord(\%record,[ATTRIB,ATTRIB]);

returns a hexdigest MD5 hash of the attributes in the record.


 my @deletes = $self->getdeletes();

returns a list of TARGETINDEX=>value entries for all entries that have been deleted. This is a function of hashing, so will only detect deletes if the following steps are followed (the run method does this):

 1) read from source
 2) transform etc
 3) hash the entries

Note that you don't need to write for deletes to be detected - the hashing of entries is done by scanhashtable before the write to target function is called. This means that using get() and scanhashtable() you can set the deltas to the current state without performing a write.

batch update mode ^

 $sync->source($handle,{        batchsize=>x,
                                controls=>[$ldapcontrol]        } );

Batch update mode is only used in the standard (i.e. $sync->run based) usage of Data::Sync. Its primarily intended to handle asynchronous, persistent/paged LDAP searches, or SQL database queries where you want to see the updates as quickly as possible (without necessarily waiting for them all to complete). For details on how to construct persistent & asynchronous LDAP controls, see Net::LDAP::Control. This will read a batch from the handle, perform the operation, read the next batch from the handle, and so on. Note that with a DBI handle, this will still be working against an entire record set matching your criteria, so the memory advantages are limited.

subclassing Data::Sync ^

For whatever reason, you may want to subclass Data::Sync - perhaps to implement RDBMS specific read or write functions, or to add new functionality. Data::Sync is (or should be) subclass safe. The main functions you are likely to want to overload are:


each will be passed the following parameters in sequential order:

 db/ldap handle
 anonymous hash of parameters (see 'source' and 'target' methods.

and called by get/put or run().

You might also want to overload ::run with your own variant. ::run calls the read, remap, transform and writes methods sequentially (If you look at ::run, it also calls sourceToAoH - this is a vital call to convert a DBI or LDAP handle into an array of hashes record set, before remapping and transforming the records).

You may also wish to use a database other than SQLite to hash records, in which case you need to overload getdeletes() and scanhashtable(). See the code for more details.


 my $source1 = Data::Sync->new();
 $source1->source($dbhandle,"select * from sourcetable1");
 my $source2->new();
 $source2->source($dbhandle2,"select * from sourcetable2");
 my $target=Data::Sync->new();
 my $set1 = $source1->get();
 $set1 = $source1->remap($set1);
 $set1 = $source1->runtransform($set1);
 my $set2 = $source2->get();
 $set2 = $source2->remap($set2);
 $set2 = $source2->runtransform($set2);
 my @recordset = (@$set1,@$set2);

Would read the two defined sources, remap & transform the contents, join the two datasets together, and write them both to the target.

syntax highlighting: