
DBIx::Table::TestDataGenerator - Automatic test data creation, cross DBMS

Version 0.0.1

use DBIx::Table::TestDataGenerator;
my $generator = DBIx::Table::TestDataGenerator->new(
dbh => $dbi_database_handle,
schema => $schema_name,
table => $target_table_name,
);
#simple usage:
$generator->create_testdata(
target_size => $target_size,
num_random => $num_random,
seed => $seed,
);
#extended usage handling a self-reference of the target table:
$generator->create_testdata(
target_size => $target_size,
num_random => $num_random,
seed => $seed,
max_tree_depth => $max_tree_depth,
min_children => $min_children,
min_roots => $min_roots,
);
#instantiation using a custom DBMS handling class
my $generator = DBIx::Table::TestDataGenerator->new(
dbh => $dbi_database_handle,
schema => $schema_name,
table => $target_table_name,
custom_probe_class => $custom_probe_class_name,
);

There is often the need to create test data in database tables, e.g. to test database client performance. The existence of constraints on a table makes it non-trivial to come up with a way to add records to it.
The current module inspects the tables' constraints and adds a desired number of records. The values of the fields either come from the table itself (possibly incremented to satisfy uniqueness constraints) or from tables referenced by foreign key constraints. The choice of the copied values is random for a number of runs the user can choose, afterwards the values are chosen randomly from a cache, reducing database traffic for performance reasons. The user can define seeds for the randomization to be able to reproduce a test run. One nice thing about this way to construct new records is that at least at first sight, the added data looks like real data, at least as real as the data initially present in the table was.
A main goal of the module is to reduce configuration to the absolute minimum by automatically determining information about the target table, in particular its constraints. Another goal is to support as many DBMSs as possible. Currently Oracle, PostgreSQL and SQLite are supported, further DBMSs are in the work and one can add further databases or change the default behaviour by writing a class satisfying the role defined in DBIx::Table::TestDataGenerator::TableProbe.pm. NOTE: A major refactoring is on its way, see section FURTHER DEVELOPMENT.
In the synopsis, an extended usage has been mentioned. This refers to the common case of having a self-reference on a table, i.e. a one-column wide foreign key of a table to itself where the referenced column constitutes the primary key. Such a parent-child relationship defines a rootless tree and when generating test data it may be useful to have some control over the growth of this tree. One such case is when the parent-child relation represents a navigation tree and a client application processes this structure. In this case, one would like to have a meaningful, balanced tree structure since this corresponds to real-world examples. To control tree creation the parameters max_tree_depth, min_children and min_roots are provided. Note that the nodes are being added in a depth-first manner.

Arguments:
Return value:
a new TestDataGenerator object
Creates a new TestDataGenerator object. If the DBMS in question does not support the concept of a schema, the corresponding argument may be omitted. If a DBMS currently not supported by DBI::Table::TestDataGenerator is to be supported, or the behaviour of the current TableProbe class responsible for handling the DBMS must be changed, one may provide the optional custom_probe_class parameter. custom_probe_class being the name of a custom class impersonating the TableProbe role.
Accessor for the DBI database handle.
Accessor for the database schema name.
Accessor for the name of the target table.
Accessor for the name of a custom class impersonating the TableProbe role.
This is the main method, it creates and adds new records to the target table. In case one of the arguments max_tree_depth, min_children or min_roots has been provided, the other two must be provided as well.
Arguments:
The target number of rows to be reached.
The first $num_random number of records use fresh random choices for their values taken from tables referenced by foreign key relations or the target table itself. These values are stored in a cache and re-used for the remaining (target_size - $num_random) records. Note that even for the remaining records there is some randomness since the combination of cached values coming from columns involved in different constraints is random.
This value must be an integer. In case it has been provided, the random selections done by the Perl code as well as those done by the database (where supported, e.g. not for SQLite) are seeded by this value resp. a value based on this value, e.g. PostgreSQL accepting only floating numbers between 0 and 1. This allows for reproducible test runs.
In case of a self-reference, the maximum depth at which new records will be inserted. The minimum value for this parameter is 2.
In case of a self-reference, the minimum number of children each handled parent node will get. A possible exception is the last handled parent node if the execution stops before $min_children child nodes have been added to it.
In case of a self-reference, the minimum number of root elements existing after completion of the call to create_testdata. A record is considered to be a root element if the corresponding parent id is null or equal to the child id.
Returns:
Nothing, only called for the side-effect of adding new records to the target table. (This may change, see the section FURTHER DEVELOPMENT.)

To install this module, run the following commands:
perl Build.PL
./Build
./Build test
./Build install
When installing from CPAN, the install tests look for the environment variables TDG_DSN (connection string), TDG_USER (user), TDG_PWD (password) and TDG_SCHEMA (schema) which may be used to test the installation against an existing database. If TDG_DSN is found, the install will try to use this connection string and the tests will fail if no valid database connection can be established. If TDG_DSN is not found, the installation creates an in-memory SQLite database provided for free by the DBD::SQLite module and tests against this database.




A big thank you to all perl coders on the dbi-dev, DBIx-Class and perl-modules mailing lists and on PerlMonks who have patiently answered my questions and offered solutions, advice and encouragement, the Perl community is really outstanding.
Special thanks go to Tim Bunce (module name / advice on keeping the module extensible), Jonathan Leffler (module naming discussion / relation to existing modules / multiple suggestions for features), brian d foy (module naming discussion / mailing lists / encouragement) and the following Perl monks (see the threads for user jds17 for details): chromatic, erix, technojosh, kejohm, Khen1950fx, salva, tobyink (3 of 4 discussion threads!), Your Mother.
Martin J. Evans was the first developer giving me feedback and nice bug reports on Version 0.001, thanks a lot!

Jose Diaz Seng, <josediazseng at gmx.de>

Please report any bugs or feature requests to bug-dbix-table-testdatagenerator at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DBIx-Table-TestDataGenerator. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

You can find documentation for this module with the perldoc command.
perldoc DBIx::Table::TestDataGenerator
You can also look for information at:
http://rt.cpan.org/NoAuth/Bugs.html?Dist=DBIx-Table-TestDataGenerator

Copyright 2012 Jose Diaz Seng.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.