DBIx::Table::TestDataGenerator - Automatic test data creation, cross DBMS
use DBIx::Table::TestDataGenerator; my $generator = DBIx::Table::TestDataGenerator->new( dsn => 'dbi:Pg:dbname=testdb', user => 'jose', password => '1234', table => 'employees', ); #simple usage: $generator->create_testdata( target_size => 1000000, num_random => 1000, ); #extended usage handling a self-reference of the target table: $generator->create_testdata( target_size => 1000000, num_random => 1000, max_tree_depth => 8, min_children => 2, min_roots => 20, ); #dump test data to a csv file instead $generator->create_testdata( target_size => 1000000, num_random => 1000, csv_dir => '/usr/local/tdg_dump', );
The purpose of this module is fuss-free adding of (bulk) test data to database tables and having test data resembling the data already present in the table.
The current module analyzes the structure of a target table and adds a desired number of records. The column values come either from the table itself (incremented if necessary to satisfy uniqueness constraints) or from tables referenced by foreign key constraints. These values are chosen randomly for a number of runs chosen by the user, and afterwards the values are chosen randomly from a cache containing the already chosen values, thus reducing database traffic for performance reasons. (Even when using the cache, there is some randomness involved since the choices from the different caches are random, too.)
A main goal of the module is to reduce configuration to the absolute minimum by automatically determining information about the target table, in particular its constraints and if a primary key column is an auto-increment column or not. Another goal is to support as many DBMSs as possible, which has been achieved by basing it on DBIx::Class modules.
In the synopsis, an extended usage has been mentioned. This refers to the common case of having a self-reference on a table, i.e. a one-column wide foreign key of a table to itself where the referenced column constitutes the primary key. Such a parent-child relationship defines a forest, i.e. a collection of trees. When generating test data it may be useful to have some control over the growth of the trees. One such case is when the parent-child relation represents a navigation tree and a client application processes this structure. In this case, one would like to have a meaningful, balanced tree structure since this corresponds to real-world examples. To control tree creation the parameters max_tree_depth, min_children and min_roots are provided. The tree nodes are being added in a depth-first, right-to-left manner.
You can find an example in the "examples" folder. It contains a script example.sql defining and filling some tables in your target database as well as a small program example.pl using the current module.
A new TestDataGenerator object
Read-only accessor for the DBI data source name.
Read-only accessor for the database user.
Read-only accessor for the database user's password.
Read-only accessor for the name of the target table.
Read-only accessor for path to a file containing the definition of the install test database.
Externally read-only accessor for the DBIx::Class schema created from the target database.
This is the main method, it creates and adds new records to the target table resp. dumps the data to a csv file, see below. In case one of the arguments max_tree_depth, min_children or min_roots has been provided, the other two must be provided as well.
The number of rows the target table should have after create_testdata has completed.
The first $num_random number of records use fresh random choices for their values taken from tables referenced by foreign key relations or the target table itself. These values are stored in a cache and re-used for the remaining (target_size - $num_random) records. Note that even for the remaining records there is some randomness since the combination of cached values coming from columns involved in different constraints is random.
In case of a self-reference, the maximum depth at which new records will be inserted, old records could of course be on any level. The minimum value for this parameter is 1, the level of the root nodes.
In case of a self-reference, the minimum number of children each handled parent node will get. The last handled parent node may not get this number of children, of course, if the target number of records is reached before.
In case of a self-reference, the minimum number of root elements to create. A record is considered to be a root element if the corresponding parent id is null or equal to the child id.
If true, new root nodes will have NULL as parent id, otherwise the value of their primary key column. Defaults to false. The value should be set according to the convention used in the target table.
Optional path to a csv file which will contain the test data, no data will be written to the target database in this case. If not defined, changes are applied to the target database.
Optional parameter defining if the database handle dbh should still be connected after having created the test data, defaults to false. (For some install tests, we need to set it to true since we are using an in-memory database which would otherwise not be accessible for the tests, but this may also be of use in other scenarios.)
Optional parameter defining a transaction size. In case transactions are not supported, setting the parameter has no effect. If transactions are supported, omitting the parameter resp. setting its value to 0 will result in one big transaction being used.
Nothing, only called for the side-effect of adding new records to the target table. (This may change, see the section "FURTHER DEVELOPMENT".)
Allows to disconnect the connection to the target database in case keep_connection_alive was set to true before when calling create_testdata.
To install this module, run the following commands:
perl Build.PL ./Build ./Build test ./Build install
A big thank you to all perl coders on the dbi-dev, DBIx-Class and perl-modules mailing lists and on PerlMonks who have patiently answered my questions and offered solutions, advice and encouragement, the Perl community is really outstanding.
Special thanks go to Tim Bunce (module name / advice on keeping the module extensible), Jonathan Leffler (module naming discussion / relation to existing modules / multiple suggestions for features), brian d foy (module naming discussion / mailing lists / encouragement) and the following Perl monks (see the threads for user jds17 for details): chromatic, erix, technojosh, kejohm, Khen1950fx, salva, tobyink (3 of 4 discussion threads!), Your Mother.
Martin J. Evans was the first developer giving me feedback and nice bug reports on Version 0.001, thanks a lot!
Please report any bugs or feature requests to
bug-dbix-table-testdatagenerator at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DBIx-Table-TestDataGenerator. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
You can also look for information at:
Jose Diaz Seng,
<josediazseng at gmx.de>
Copyright (C) 2012-2013, Jose Diaz Seng.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.