The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DBIx::FileStore - Module to store files in a DBI backend

VERSION

Version 0.33

SYNOPSIS

Ever wanted to store files in a database?

This code helps you do that.

All the fdb tools in script/ use this library to get at file names and contents in the database.

To get started, see the README file (which includes a QUICKSTART guide) from the DBIx-FileStore distribution.

This document details the DBIx::FileStore module implementation.

FILENAME NOTES

The name of the file in the filestore cannot contain spaces.

The maximum length of the name of a file in the filestore is 75 characters.

You can store files under any name you wish in the filestore. The name need not correspond to the original name on the filesystem.

All filenames in the filestore are in one flat address space. You can use / in filenames, but it does not represent an actual directory. (Although fdbls has some support for viewing files in the filestore as if they were in folders. See the docs on 'fdbls' for details.)

METHODS

new DBIx::FileStore()

my $filestore = new DBIx::FileStore();

returns a new DBIx::FileStore object

build_config_hash()

my $hash = $fs->build_config_hash()

reads $fs->config_file (or /etc/fdbrc.conf or ~/.fdbrc) as config file.

build_dbh()

returns a dbh for the $self->dbh

build_value_sub( 'member_name', 'default_value' )

uses $self->config_hash to return the config value for the member_name, or the default_value.

get_all_filenames()

my $fileinfo_ref = $filestore->get_all_filenames()

Returns a list of references to data about all the files in the filestore.

Each row consist of the following columns: name, c_len, c_md5, lasttime_as_int

get_filenames_matching_prefix( $prefix )

my $fileinfo_ref = get_filenames_matching_prefix( $prefix );

Returns a list of references to data about the files in the filestore whose name matches the prefix $prefix.

Returns a list of references in the same format as get_all_filenames().

read_from_db( $filesystem_name, $storage_name);

my $bytecount = $filestore->read_from_db( "filesystemname.txt", "filestorename.txt" );

Copies the file 'filestorename.txt' from the filestore to the file filesystemname.txt on the local filesystem.

rename_file( $from, $to );

my $ok = $self->rename_file( $from, $to );

Renames the file in the database from $from to $to. Returns 1;

delete_file( $fdbname );

my $ok = $self->delete_file( $fdbname );

Removes data named $filename from the filestore.

copy_blocks_from_db_to_filehandle()

my $bytecount = $filestore->copy_blocks_from_db_to_filehandle( $fdbname, $fh );

copies blocks from the filehandle $fh into the fdb at the name $fdbname

_read_blocks_from_db( $callback_function, $fdbname );

my $bytecount = $filestore->_read_blocks_from_db( $callback_function, $fdbname );

** Intended for internal use by this module. **

Fetches the blocks from the database for the file stored under $fdbname, and calls the $callback_function on the data from each one after it is read.

It also confirms that the base64 md5 checksum for each block and the file contents as a whole are correct. Die()'s with an error if a checksum doesn't match.

If uselocks is set, lock the relevant tables while data is extracted.

write_to_db( $localpathname, $filestorename );

my $bytecount = $self->write_to_db( $localpathname, $filestorename );

Copies the file $localpathname from the filesystem to the name $filestorename in the filestore.

Locks the relevant tables while data is extracted. Locking should probably be configurable by the caller.

Returns the number of bytes written. Dies with a message if the source file could not be read.

Note that it currently reads the file twice: once to compute the md5 checksum before insterting it, and a second time to insert the blocks.

write_from_filehandle_to_db ($fh, $fdbname)

Reads blocks of the appropriate block size from $fb and writes them into the fdb under the name $fdbname. Returns the number of bytes written into the filestore.

FUNCTIONS

name_ok( $fdbname )

my $filename_ok = DBIx::FileStore::name_ok( $fdbname )

Checks that the name $fdbname is acceptable for using as a name in the filestore. Must not contain spaces or be over 75 chars.

IMPLEMENTATION

The data is stored in the database using two tables: 'files' and 'fileblocks'. All meta-data is stored in the 'files' table, and the file contents are stored in the 'fileblocks' table.

fileblocks table

The fileblocks table has only three fields:

name

The name of the block, exactly as used in the fileblocks table. Always looks like "filename.txt <BLOCKNUMBER>", for example "filestorename.txt 00000".

block

The contents of the named block. Each block is currently set to be 512K. Care must be taken to use blocks that are not larger than mysql buffers can handle (in particular, max_allowed_packet).

lasttime

The timestamp of when this block was inserted into the DB or updated.

files table

The files table has several fields. There is one row in the files table for each row in the fileblocks table-- not one per file (see IMPLEMENTATION CAVEATS, below). The fields in the files table are:

c_len

Content length. The content length of the complete file (sum of length of all the file's blocks).

b_num

Block number. The number of the block this row represents. The b_num is repeated as a five (or more) digit number at the end of the name field (see above). We denormalize the data like this so we can quickly and easily find blocks by name or block number.

b_md5

Block md5. The md5 checksum for the block (b is for 'block') represented by this row. We use base64 encoding (which uses 0-9, a-z, A-Z, and a few other characters) to represent md5's because it's a little shorter than the hex representation. (22 vs. 32 characters)

c_md5

Content md5. The base64 md5 checksum for the whole file (c is for 'content') represented by this row.

lasttime

The timestamp of when this row was inserted into the DB or updated.

See the file 'table-definitions.sql' for more details about the db schema used.

IMPLEMENTATION CAVEATS

DBIx::FileStore is what I would consider production-grade code, but the overall wisdom of storing files in blobs in a mysql database may be questionable (for good reason).

That having been said, if you have a good reason to do so, as long as you understand the repercussions of storing files in your mysql database, then this toolkit offers a stable and flexible backend for binary data storage, and it works quite nicely.

If we were to redesign the system, in particular we might reconsider having one row in the 'files' table for each block stored in the 'fileblocks' table. Perhaps instead, we'd have one entry in the 'files' table per file.

In concrete terms, though, the storage overhead of doing it this way (which only affects files larger than the block size, which defaults to 512K) is about 100 bytes per block. Assuming files larger than 512K, and with a conservative average block size of 256K, the extra storage overhead of doing it this way is still only about 0.039%

AUTHOR

Josh Rabinowitz, <Josh Rabinowitz>

SUPPORT

You should probably read the documentation for the various filestore command-line tools:

fdbcat, fdbget, fdbls, fdbmv, fdbput, fdbrm, fdbslurp, fdbstat, and fdbtidy.

LICENSE AND COPYRIGHT

Copyright 2010-2017 Josh Rabinowitz.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.