The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

FlatFile::DataStore - Perl module that implements a flatfile datastore.

SYNOPSYS

 use FlatFile::DataStore;

 # new datastore object

 my $dir  = "/my/datastore/directory";
 my $name = "dsname";
 my $ds   = FlatFile::DataStore->new( { dir => $dir, name => $name } );

 # create a record

 my $record_data = "This is a test record.";
 my $user_data   = "Test1";
 my $record = $ds->create( {
     data => \$record_data,
     user => $user_data,
     } );
 my $record_number = $record->keynum;

 # retrieve it

 $record = $ds->retrieve( $record_number );

 # update it

 $record->data( "Updating the test record." );
 $record = $ds->update( $record );

 # delete it

 $record = $ds->delete( $record );

 # get its history

 my @records = $ds->history( $record_number );

DESCRIPTION

FlatFile::DataStore implements a simple flatfile datastore. When you create (store) a new record, it is appended to the flatfile. When you update an existing record, the existing entry in the flatfile is flagged as updated, and the updated record is appended to the flatfile. When you delete a record, the existing entry is flagged as deleted, and a "delete record" is appended to the flatfile.

The result is that all versions of a record are retained in the datastore, and running a history will return all of them. Another result is that each record in the datastore represents a transaction: create, update, or delete.

Methods support the following actions:

 - create
 - retrieve
 - update
 - delete
 - history

Additionally, FlatFile::DataStore::Utils provides the methods

 - validate
 - migrate

and others.

See FlatFile::DataStore::Tiehash for a tied interface.

VERSION

FlatFile::DataStore version 1.03

CLASS METHODS

FlatFile::DataStore->new();

Constructs a new FlatFile::DataStore object.

Accepts hash ref giving values for dir and name.

 my $ds = FlatFile::DataStore->new(
     { dir  => $dir,
       name => $name,
     } );

To initialize a new datastore, edit the "$dir/$name.uri" file and enter a configuration URI (as the only line in the file), or pass the URI as the value of the uri parameter, e.g.,

 my $ds = FlatFile::DataStore->new(
     { dir  => $dir,
       name => $name,
       uri  => join( ";" =>
           "http://example.com?name=$name",
           "desc=My%20Data%20Store",
           "defaults=medium",
           "user=8-%20-%7E",
           "recsep=%0A",
           ),
     } );

(See URI Configuration below.)

Also accepts a userdata parameter, which sets the default user data for this instance, e.g.,

 my $ds = FlatFile::DataStore->new(
     { dir  => $dir,
       name => $name,
       userdata => ':',
     } );

Returns a reference to the FlatFile::DataStore object.

OBJECT METHODS, Record Processing (CRUD)

create( $record )

 or create( { data => \$record_data, user => $user_data } )
 or create( { record => $record[, data => \$record_data][, user => $user_data] } )

Creates a record. If the parameter is a record object, the record data and user data will be gotten from it. Otherwise, if the parameter is a hash reference, the expected keys are:

 - record => FlatFile::DataStore::Record object
 - data => string or scalar reference
 - user => string

If no record is passed, both 'data' and 'user' are required. Otherwise, if a record is passed, the record data and user data will be gotten from it unless one or both are explicitly provided.

Returns a Flatfile::DataStore::Record object.

Note: the record data (but not the user data) is stored in the FF::DS::Record object as a scalar reference. This is done for efficiency in the cases where the record data may be very large. Likewise, the data parm passed to create() may be a scalar reference.

retrieve( $num[, $pos] )

Retrieves a record. The parm $num may be one of

 - a key number, i.e., record sequence number
 - a file number

The parm $pos is required if $num is a file number.

Here's why: When $num is a record key sequence number (key number), a preamble is retrieved from the datastore key file. In that preamble is the file number and seek position where the record data may be gotten. Otherwise, when $num is a file number, the application (you) must supply the seek position into that file. Working from an array of record history is the most likely time you would do this.

Returns a Flatfile::DataStore::Record object.

retrieve_preamble( $keynum )

Retrieves a preamble. The parm $keynum is a key number, i.e., record sequence number

Returns a Flatfile::DataStore::Preamble object.

This method allows getting information about the record, e.g., if it's deleted, what's in the user data, etc., without the overhead of retrieving the full record data.

locate_record_data( $num[, $pos] )

Rather than retrieving a record, this subroutine positions you at the record data in the data file. This might be handy if, for example, the record data is text, and you just want part of it. You can scan the data and get what you want without having to read the entire record. Or the data might be XML and you could parse it using SAX without reading it all into memory.

The parm $num may be one of

 - a key number, i.e., record sequence number
 - a file number

The parm $pos is required if $num is a file number. See retrieve() above for why.

Returns a list containing the file handle (which is already locked for reading in binmode), the seek position, and the record length.

You will be positioned at the seek position, so you could begin reading data, e.g., via <$fh>:

    my( $fh, $pos, $len ) = $ds->locate_record_data( $keynum );
    my $got;
    while( <$fh> ) {
        last if ($got += length) > $len;  # in case we read the recsep
        # [do something with $_ ...]
        last if $got == $len;
    }
    close $fh;

The above loop assumes you know each line of the data ends in a newline. Also keep in mind that the file is opened in binmode, so you will be reading bytes (octets), not necessarily characters. Decoding these octets is up to you.

XXX ("opened in binmode"?) does that make the example wrong wrt non-unix OS's

update( $record )

 or update( { string => $preamble_string, data => \$record_data, user => $user_data } )
 or update( { preamble => $preamble_obj, data => \$record_data, user => $user_data } )
 or update( { record => $record_obj
    [, preamble => $preamble_obj]
    [, string   => $preamble_string]
    [, data     => \$record_data]
    [, user     => $user_data] } )

Updates a record. If the parameter is a record object, the preamble, record data, and user data will be gotten from it. Otherwise, if the parameter is a hash reference, the expected keys are:

 - record   => FlatFile::DataStore::Record object
 - preamble => FlatFile::DataStore::Preamble object
 - string   => a preamble string (the string attribute of a preamble object)
 - data     => string or scalar reference
 - user     => string

If no record is passed, 'preamble' (or 'string'), 'data', and 'user' are required. Otherwise, if a record is passed, the preamble, record data and user data will be gotten from it unless any of them are explicitly provided.

Returns a Flatfile::DataStore::Record object.

delete( $record )

 or delete( { string => $preamble_string, data => \$record_data, user => $user_data } )
 or delete( { preamble => $preamble_obj, data => \$record_data, user => $user_data } )
 or delete( { record => $record_obj
    [, preamble => $preamble_obj]
    [, string   => $preamble_string]
    [, data     => \$record_data]
    [, user     => $user_data] } )

Deletes a record. The parameters are the same as for update().

Returns a Flatfile::DataStore::Record object.

exists()

Tests if a datastore exists. Currently, a datastore "exists" if there is a .uri file -- whether the file is valid or not.

May be called on a datastore object, e.g.,

    $ds->exists()

Or may be called as a class method, e.g.,

    FlatFile::DataStore->exists({
        name => 'example',
        dir  => '/dbs/example',
        })

If called as a class method, you must pass a hashref that provides values for 'name' and 'dir'.

history( $keynum )

Retrieves a record's history. The parm $keynum is always a key number, i.e., a record sequence number.

Returns an array of FlatFile::DataStore::Record objects.

The first element of this array is the current record. The last element is the original record. That is, the array is in reverse chronological order.

OBJECT METHODS, Accessors

In the specifications below, square braces ([]) denote optional parameters, not anonymous arrays, e.g., [$omap] indicates that $omap is optional, instead of implying that you need to pass it inside an array.

$ds->specs( [$omap] )

Sets and returns the specs attribute value if $omap is given, otherwise just returns the value.

An 'omap' is an ordered hash as defined in

    http://yaml.org/type/omap.html

and implemented here using Data::Omap. That is, it's an array of single-key hashes. This ordered hash contains the specifications for constructing and parsing a record preamble as defined in the name.uri file.

In list context, the value returned is a list of hashrefs. In scalar context, the value returned is an arrayref containing the list of hashrefs.

$ds->dir( [$dir] )

Sets and returns the dir attribute value if $dir is given, otherwise just returns the value.

If $dir is given and is a null string, the dir object attribute is removed from the object. If $dir is not null, the directory must already exist. In other words, this module will not create the directory where the database is to be stored.

Preamble accessors (from the uri)

The following methods set and return their respective attribute values if $value is given. Otherwise, they just return the value.

 $ds->indicator( [$value] );  # length-characters
 $ds->transind(  [$value] );  # length-characters
 $ds->date(      [$value] );  # length-format
 $ds->transnum(  [$value] );  # length-base
 $ds->keynum(    [$value] );  # length-base
 $ds->reclen(    [$value] );  # length-base
 $ds->thisfnum(  [$value] );  # length-base
 $ds->thisseek(  [$value] );  # length-base
 $ds->prevfnum(  [$value] );  # length-base
 $ds->prevseek(  [$value] );  # length-base
 $ds->nextfnum(  [$value] );  # length-base
 $ds->nextseek(  [$value] );  # length-base
 $ds->user(      [$value] );  # length-characters

Other accessors

 $ds->name(        [$value] ); # from uri, name of datastore
 $ds->desc(        [$value] ); # from uri, description of datastore
 $ds->recsep(      [$value] ); # from uri, character(s)
 $ds->uri(         [$value] ); # full uri as is
 $ds->preamblelen( [$value] ); # length of preamble string
 $ds->toclen(      [$value] ); # length of toc entry
 $ds->keylen(      [$value] ); # length of stored keynum
 $ds->keybase(     [$value] ); # base   of stored keynum
 $ds->translen(    [$value] ); # length of stored transaction number
 $ds->transbase(   [$value] ); # base   of stored transaction number
 $ds->fnumlen(     [$value] ); # length of stored file number
 $ds->fnumbase(    [$value] ); # base   of stored file number
 $ds->userlen(     [$value] ); # format from uri
 $ds->dateformat(  [$value] ); # format from uri
 $ds->regx(        [$value] ); # capturing regx for preamble string
 $ds->datamax(     [$value] ); # maximum bytes in a data file
 $ds->crud(        [$value] ); # hash ref, e.g.,

     {
        create => '+',
        oldupd => '#',
        update => '=',
        olddel => '*',
        delete => '-',
        '+' => 'create',
        '#' => 'oldupd',
        '=' => 'update',
        '*' => 'olddel',
        '-' => 'delete',
     }

 (logical actions <=> symbolic indicators)

Accessors for optional attributes

 $ds->dirmax(   [$value] );  # maximum files in a directory
 $ds->dirlev(   [$value] );  # number of directory levels
 $ds->tocmax(   [$value] );  # maximum toc entries
 $ds->keymax(   [$value] );  # maximum key entries
 $ds->userdata( [$value] );  # default user data

If no dirmax, directories will keep being added to.

If no dirlev, toc, key, and data files will reside in top-level directory. If dirmax is given, dirlev defaults to 1.

If no tocmax, there will be only one toc file, which will grow indefinitely.

If no keymax, there will be only one key file, which will grow indefinitely.

If no userdata, will default to a null string (padded with spaces) unless supplied another way.

OBJECT METHODS, Other

howmany( [$regx] )

Returns count of records whose indicators match regx, e.g.,

    $self->howmany( qr/create|update/ );
    $self->howmany( qr/delete/ );
    $self->howmany( qr/oldupd|olddel/ );

If no regx, howmany() returns numrecs from the toc file, which should give the same number as qr/create|update/.

lastkeynum()

Returns the last key number used, i.e., the sequence number of the last record added to the datastore, as an integer.

nextkeynum()

Returns lastkeynum()+1 (a convenience method). This could be useful for adding a new record to a hash tied to a datastore, e.g.,

    $h{ $ds->nextkeynum } = "New record data.";

(but also note that there is a "null key" convention for this -- see FlatFile::DataStore::Tiehash)

URI Configuration

It may seem odd to use a URI as a configuration file. I needed some configuration approach and wanted to stay as lightweight as possible. The specs for a URI are fairly well-known, and it allows for everything we need, so I chose that approach.

The examples all show a URL, because I thought it would be a nice touch to be able to visit the URL and have the page tell you things about the datastore. This is what the utils/flatfile-datastore.cgi program is intended to do, but it is in a very young/rough state so far.

Following are the URI configuration parameters. The order of the preamble parameters does matter: that's the order those fields will appear in each record preamble. Otherwise the order of the URI parameters doesn't matter.

Parameter values should be percent-encoded (uri escaped). Use %20 for space (don't be tempted to use '+'). Use URI::Escape::uri_escape , if desired, e.g.,

    my $name = 'example';
    my $dir  = '/example/dir';

    use URI::Escape;
    my $datastore = FlatFile::DataStore::->new( {
        name => $name,
        dir  => $dir,
        uri  => join( ';' =>
            "http://example.com?name=$name",
            "desc=" . uri_escape( 'My DataStore' ),
            "defaults=medium",
            "user=" . uri_escape( '8- -~' ),
            "recsep=%0A",
        ) }
    );

Preamble parameters

All of the preamble parameters are required.

(In fact, four of them are optional, but leaving them out means that you're giving up keeping the linked list of record history, so don't do that unless you have a good reason.)

indicator

The indicator parameter specifies the single-character record indicators that appear in each record preamble. This parameter has the following form: indicator=length-5CharacterString, e.g.,

    indicator=1-+#=*-

The length is always 1. The five characters represent the five states of a record in the datastore (in this order):

    create(+): the record has not changed since being added
    oldupd(#): the record was updated, and this entry is an old version
    update(=): this entry is the updated version of a record
    olddel(*): the record was deleted, and this entry is the old version
    delete(-): the record is deleted, and this entry is the "delete record"

(The reason for a "delete record" is for storing information about the delete process, such has when it was deleted and by whom.)

The five characters shown in the example are the ones used by all examples in the documentation. You're free to use your own characters, but the length must always be 1.

transind

The transind parameter describes the single-character transaction indicators that appear in each record preamble. This parameter has the same format and meanings as the indicator parameter, e.g.,

    transind=1-+#=*-

(Note that only three of these are used, but all five must be given and must match the indicator parameter.)

The three characters that are used are create(+), update(=), and delete(-). While the record indicators will change, e.g., from create to oldupd, or from update to olddel, etc., the transaction indicators never change from their original values. So a transaction that created a record will always have the create value, and the same for update and delete.

date

The date parameter specifies how the transaction date is stored in the preamble. It has the form: date=length-format, e.g.,

    date=8-yyyymmdd
    date=14-yyyymmddtttttt
    date=4-yymd
    date=7-yymdttt

The examples show the four choices for length: 4, 7, 8, or 14. When the length is 8, the format must contain 'yyyy', 'mm', and 'dd' in some order. When the length is 14, add 'tttttt' (hhmmss) in there somewhere.

When the length is 4, the format must contain 'yy', 'm', and 'd' in some order. When the length is 7, add 'ttt' (hms) in there somewhere, e.g.

    date=8-mmddyyyy,        date=8-ddmmyyyy,        etc.
    date=14-mmddyyyytttttt, date=14-ttttttddmmyyyy, etc.
    date=4-mdyy,            date=4-dmyy,            etc.
    date=7-mdyyttt,         date=7-tttdmyy,         etc.

When the length is 8 (or 14), the year, month, and day (and hours, minutes, seconds) are stored as decimal numbers, e.g., '20100615' for June 15, 2010 (or '20101224114208' for Dec 24, 2010 11:42:08).

When the length is 4 (or 7), they are stored as base62 numbers, e.g. 'WQ6F' (yymd) for June 15, 2010, or 'WQCOBg8' (yymdttt) for Dec 24, 2010 11:42:08.

transnum

The transnum parameter specifies how the transaction number is stored in the preamble. It has the form: transnum=length-base, e.g.,

    transnum=4-62

The example says the number is stored as a four-digit base62 integer. The highest transaction number this allows is 'zzzz' base62 which is 14,776,335 decimal. Therefore, the datastore will accommodate up to that many transactions (creates, updates, deletes).

keynum

The keynum parameter specifies how the record sequence number is stored in the preamble. It has the form: keynum=length-base, e.g.,

    keynum=4-62

As with the transnum example above, the keynum would be stored as a four-digit base62 integer, and the highest record sequence number allowed would be 14,776,335 ('zzzz' base62). Therefore, the datastore could not store more than this many records.

reclen

The reclen parameter specifies how the record length is stored in the preamble. It has the form: reclen=length-base, e.g.,

    reclen=4-62

This example allows records to be up to 14,776,335 bytes long.

thisfnum

The thisfnum parameter specifies how the file numbers are stored in the preamble. There are three file number parameters, thisfnum, prevfnum, and nextfnum. They must match each other in length and base. The parameter has the form: thisfnum=length-base, e.g.,

    thisfnum=2-36

There is an extra constraint imposed on the file number parameters: they may not use a number base higher than 36. The reason is that the file number appears in file names, and base36 numbers match [0-9A-Z]. By limiting to base36, file names will therefore never differ only by case, e.g., there may be a file named example.Z.data, but never one named example.z.data.

The above example states that the file numbers will be stored as two-digit base36 integers. The highest file number is 'ZZ' base36, which is 1,295 decimal. Therefore, the datastore will allow up to that many data files before filling up. (If a datastore "fills up", it must be migrated to a newly configured datastore that has bigger numbers where needed.)

In a preamble, thisfnum is the number of the datafile where the record is stored. This number combined with the thisseek value and the reclen value gives the precise location of the record data.

thisseek

The thisseek parameter specifies how the seek positions are stored in the preamble. There are three seek parameters, thisseek, prevseek, and nextseek. They must match each other in length and base. The parameter has the form: thisseek=length-base, e.g.,

    thisseek=5-62

This example states that the seek positions will be stored as five-digit base62 integers. So the highest seek position is 'zzzzz' base62, which is 916,132,831 decimal. Therefore, each of the datastore's data files may contain up to that many bytes (record data plus preambles).

Incidentally, no record (plus its preamble) may be longer than this, because it just wouldn't fit in a data file.

Also, the size of each data file may be further limited using the datamax parameter (see below). For example, a seek value of 4-62 would allow datafiles up to 14,776,335 bytes long. If you want bigger files, but don't want them bigger than 500 Meg, you can give thisseek=5-62 and datamax=500M.

prevfnum (optional)

The prevfnum parameter specifies how the "previous" file numbers are stored in the preamble. The value of this parameter must exactly match thisfnum (see thisfnum above for more details). It has the form: prevfnum=length-base, e.g.,

    prevfnum=2-36

In a preamble, the prevfnum is the number of the datafile where the previous version of the record is stored. This number combined with the prevseek value gives the beginning location of the previous record's data.

This is the first of the four "optional" preamble parameters. If you don't provide this one, don't provide the other three either. If you leave these off, you will not be able to get a record's history of changes, and you will not be able to migrate any history to a new datastore.

So why would to not provide these? You might have a datastore that has very transient data, e.g., indexes, and you don't care about change history. By not including these four optional parameters, when the module updates a record, it will not perform the extra bit of IO to update a previous record's nextfnum and nextseek values. And the preambles will be a little bit shorter.

prevseek (optional)

The prevseek parameter specifies how the "previous" seek positions are stored in the preamble. The value of this parameter must exactly match thisseek (see thisseek above for more details). It has the form prevseek=length-base, e.g.,

    prevseek=5-62
nextfnum (optional)

The nextfnum parameter specifies how the "next" file numbers are stored in the preamble. The value of this parameter must exactly match thisfnum (see thisfnum above for more details). It has the form: nextfnum=length-base, e.g.,

    nextfnum=2-36

In a preamble, the nextfnum is the number of the datafile where the next version of the record is stored. This number combined with the nextseek value gives the beginning location of the next version of the record's data.

nextseek (optional)

The nextseek parameter specifies how the "next" seek positions are stored in the preamble. The value of this parameter must exactly match thisseek (see thisseek above for more details). It has the form nextseek=length-base, e.g.,

    nextseek=5-62

You would have a nextfnum and nextseek in a preamble when it's a previous version of a record whose current version appears later in the datastore. While thisfnum and thisseek are critical for all record retrievals, prevfnum, prevseek, nextfnum, and nextseek are only needed for getting a record's history. They are also used during a migration to help validate that all the data (and transactions) were migrated intact.

user

The user parameter specifies the length and character class for extra user data stored in the preamble. It has the form: user=length-CharacterClass, e.g.,

    user=8-%20-~  (must match /[ -~]+ */ and not be longer than 8)
    user=10-0-9   (must match /[0-9]+ */ and not be longer than 10)
    user=1-:      (must be literally ':')

When a record is created, the application supplies a value to store as "user" data. This might be a userid, an md5 digest, multiple fixed-length fields -- whatever is needed or wanted.

This field is required but may be preassigned using the userdata parameter (see below). If no user data is provided or preassigned, it will default to a null string (which will be padded with spaces).

When this data is stored in the preamble, it is padded on the right with spaces.

Preamble defaults

All of the preamble parameters -- except user -- may be set using one of the defaults provided, e.g.,

    http://example.com?name=example;defaults=medium;user=8-%20-~
    http://example.com?name=example;defaults=large;user=10-0-9

Note that these are in a default order also. And the user parameter is still part of the preamble, so you can make it appear first if you want, e.g.,

    http://example.com?name=example;user=8-%20-~;defaults=medium
    http://example.com?name=example;user=10-0-9;defaults=large

The _nohist versions leave out the optional preamble parameters -- the above caveat about record history still applies.

Finally, if none of these suits, they may still be good starting points for defining your own preambles.

xsmall, xsmall_nohist

When the URI contains defaults=xsmall, the following values are set:

    indicator=1-+#=*-
    transind=1-+#=*-
    date=7-yymdttt
    transnum=2-62   3,843 transactions
    keynum=2-62     3,843 records
    reclen=2-62     3,843 bytes/record
    thisfnum=1-36   35 data files
    thisseek=4-62   14,776,335 bytes/file
    prevfnum=1-36
    prevseek=4-62
    nextfnum=1-36
    nextseek=4-62

The last four are not set for defaults=xsmall_nohist.

Rough estimates: 3800 records (or transactions), no larger than 3800 bytes each; 517 Megs total (35 * 14.7M).

small, small_nohist

For defaults=small:

    indicator=1-+#=*-
    transind=1-+#=*-
    date=7-yymdttt
    transnum=3-62   238,327 transactions
    keynum=3-62     238,327 records
    reclen=3-62     238,327 bytes/record
    thisfnum=1-36   35 data files
    thisseek=5-62   916,132,831 bytes/file
    prevfnum=1-36
    prevseek=5-62
    nextfnum=1-36
    nextseek=5-62

The last four are not set for defaults=small_nohist.

Rough estimates: 238K records (or transactions), no larger than 238K bytes each; 32 Gigs total (35 * 916M).

medium, medium_nohist

For defaults=medium:

    indicator=1-+#=*-
    transind=1-+#=*-
    date=7-yymdttt
    transnum=4-62   14,776,335 transactions
    keynum=4-62     14,776,335 records
    reclen=4-62     14,776,335 bytes/record
    thisfnum=2-36   1,295 data files
    thisseek=5-62   916,132,831 bytes/file
    prevfnum=2-36
    prevseek=5-62
    nextfnum=2-36
    nextseek=5-62

The last four are not set for defaults=medium_nohist.

Rough estimates: 14.7M records (or transactions), no larger than 14.7M bytes each; 1 Terabyte total (1,295 * 916M).

large, large_nohist

For defaults=large:

    datamax=1.9G    1,900,000,000 bytes/file
    dirmax=300
    keymax=100_000
    indicator=1-+#=*-
    transind=1-+#=*-
    date=7-yymdttt
    transnum=5-62   916,132,831 transactions
    keynum=5-62     916,132,831 records
    reclen=5-62     916,132,831 bytes/record
    thisfnum=3-36   46,655 data files
    thisseek=6-62   56G per file (but see datamax)
    prevfnum=3-36
    prevseek=6-62
    nextfnum=3-36
    nextseek=6-62

The last four are not set for defaults=large_nohist.

Rough estimates: 916M records/transactions, no larger than 916M bytes each; 88 Terabytes total (46,655 * 1.9G).

xlarge, xlarge_nohist

For defaults=xlarge:

    datamax=1.9G    1,900,000,000 bytes/file
    dirmax=300
    dirlev=2
    keymax=100_000
    tocmax=100_000
    indicator=1-+#=*-
    transind=1-+#=*-
    date=7-yymdttt
    transnum=6-62   56B transactions
    keynum=6-62     56B records
    reclen=6-62     56G per record (limited to 1.9G by datamax)
    thisfnum=4-36   1,679,615 data files
    thisseek=6-62   56G per file (but see datamax)
    prevfnum=4-36
    prevseek=6-62
    nextfnum=4-36
    nextseek=6-62

The last four are not set for defaults=xlarge_nohist.

Rough estimates: 56B records/transactions, no larger than 1.9G bytes each; 3 Petabytes total (1,679,615 * 1.9G).

Other required parameters

name

The name parameter identifies the datastore by name. This name should be short and uncomplicated, because it is used as the root for the datastore's files.

recsep

The recsep parameter gives the ascii character(s) that will make up the record separator. The "flatfile" stategy suggests that these characters ought to match what your OS considers to be a "newline". But in fact, you could use any string of ascii characters.

    recsep=%0A       (LF)
    recsep=%0D%0A    (CR+LF)
    recsep=%0D       (CR)

    recsep=%0A---%0A (HR -- sort of)

(But keep in mind that the recsep is also used for the key files and toc files. So a simpler recsep is probably best.)

Also, if you develop your data on unix with recsep=%0A and then copy it to a windows machine, the module will continue to use the configured recsep, i.e., it is not tied the to OS.

Other optional parameters

desc

The desc parameter provides a means to give a short description (or perhaps a longer name) for the datastore.

datamax

The datamax parameter gives the maximum number of bytes a data file may contain. If you don't provide a datamax, it will be computed from the thisseek value (see thisseek above for more details).

The datamax value is simply a number, e.g.,

    datamax=1000000000   (1 Gig)

To make things easier to read, you can add underscores, e.g.,

    datamax=1_000_000_000   (1 Gig)

You can also shorten the number with an 'M' for megabytes (10**6) or a 'G' for gigabytes (10**9), e.g.,

    datamax=1000M  (1 Gig)
    datamax=1G     (1 Gig)

Finally, with 'M' or 'G', you can use fractions, e.g.

    datamax=.5M  (500_000)
    datamax=1.9G (1_900_000_000)
keymax

The keymax parameter gives the number of record keys that may be stored in a key file. This simply limits the size of the key files, e.g.,

    keymax=10_000

The maximum bytes would be:

    keymax * (preamble length + recsep length)

The numeric value may use underscores and 'M' or 'G' as described above for datamax.

tocmax

The tocmax parameter gives the number of data file entries that may be stored in a toc (table of contents) file. This simply limits the size of the toc files, e.g.,

    tocmax=10_000

Each (fairly short) line in a toc file describes a single data file, so you would need a tocmax only in the extreme case of a datastore with thousands or millions of data files.

The numeric value may use underscores and 'M' or 'G' as described above for datamax.

dirmax

The dirmax parameter gives the number of files (and directories) that may be stored in a datastore directory, e.g.,

    dirmax=300

This allows a large number of data files (and key/toc files) to be created without there being too many files in a single directory.

(The numeric value may use underscores and 'M' or 'G' as described above for datamax.)

If you specify dirmax without dirlev (see below), dirlev will default to 1.

Without dirmax and dirlev, a datastore's data files (and key/toc files) will reside in the same directory as the uri file, and the module will not limit how many you may create (though the size of your filesystem might).

With dirmax and dirlev, these files will reside in subdirectories.

Giving a value for dirmax will also limit the number of data files (and key/toc files) a datastore may have, by this formula:

 max files = dirmax ** (dirlev + 1)

So dirmax=300 and dirlev=1 would result in a limit of 90,000 data files. If you go to dirlev=2, the limit becomes 27,000,000, which is why you're unlikely to need a dirlev greater than 2.

dirlev

The dirlev parameter gives the number of levels of directories that a datastore may use, e.g.,

    dirlev=1

You can give a dirlev without a dirmax, which would store the data files (and key/toc files) in subdirectories, but wouldn't limit how many files may be in each directory.

userdata

The userdata parameter is similar to the userdata parameter in the call to new(). It specifies the default value to use if the application does not provide a value when creating, updating, or deleting a record.

Those provided values will override the value given in the call to new(), which will override the value given here in the uri.

If you don't specify a default value here or in the call to new(), the value defaults to a null string (which would be padded with spaces).

    userdata=:

The example is contrived for a hypothetical datastore that doesn't need this field. Since the field is required, the above setting will always store a colon (and the user parameter might be user=1-:).

CAVEATS

This module is still in an experimental state. The tests are sparse. When I start using it in production, I'll bump the version to 1.00.

Until then (afterwards, too) please use with care.

AUTHOR

Brad Baxter, <bbaxter@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2011 by Brad Baxter

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.