The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

CDB_File::Generator - generate massive sorted CDB files simply.

SYNOPSIS

  use CDB_File::Generator;
  $gen = new CDB::Generator "my.cdb";
  $gen->("Fred", "Martha");
  $gen->("Fred", "Olivia");
  $gen->("Fred", "Jenny");
  $gen->("Roger", "Joe");
  $gen->("Roger", "Jenny");
  $gen = undef;
  use CDB_File;

DESCRIPTION

This is a class which makes generating sorted large (much bigger than memory, but the speed will depend on the efficiency of your sort command. If you haven't got one, for example, it won't work at all.) CDB files on the fly very easy

METHODS

Generator::new $cdbfile [$cdbmaketemp [{$tmpname [$sorttmpname] | $tmpdir}]]

The new function creates a generator for a given filename, optionally specifying where it sould put it's temporary files.

$gen->add($key, $value)

Adds a value to the CDB being created

$gen->DESTROY

This is not normally called by the user, but rather by the completion of the cdbfile being writen out and that block of the program being exited or by the program completing. When it us run, it calls the finish method which ends the CDB creation. See below.

finish

Finish ends of the cdb creation. First it closes the output temporary file, then it sorts it to another file and finally it calls cdbmake to complete the creation job.

In the current implementation this uses sort -u and deletes repeats of the same key with the same value.

In order to increase database portability, by default all sorting is done in the 'C' locale, even if the current program is working in another locale. This is "the right thing" in many cases. Where you are dealing with real word keys it won't be the right thing. In this case, use the locale function to set the locale.

$gen->abort

If you decide not to create the CDB file you were creating, you have to call this method. Otherwise, it will be created as your program exits (or possibly earlier)

gen_cdb_input($key,$value)

This is a little utility function which formats a cdbmake input line.

BUGS

We use the external programs sort and cdbmake. These almost certainly improve our performance on large databases (and those are all we care about), but they make portability difficult.. Possibly system independent alternatives should be written and used where needed.

We should write out to the sort file with some encoding that gets rid of new lines and then read back, de-coding that to feed it to cdbmake..