DBIx::FileStore - Module to store files in a DBI backend
Ever wanted to store files in a database? Yeah, it's probably a bad idea, but maybe you want to do it anyway.
This code helps you do that.
All the fdb tools in script/ use this library to get at file names and contents in the database.
To get started, see the README file (which includes a QUICKSTART guide) from the DBIx-FileStore distribution.
This document details the DBIx::FileStore module implementation.
The name of the file in the filestore cannot contain spaces.
The maximum length of the name of a file in the filestore is 75 characters.
You can store files under any name you wish in the filestore. The name need not correspond to the original name on the filesystem.
All filenames in the filestore are in one flat address space. You can use / in filenames, but it does not represent an actual directory. (Although fdbls has some support for viewing files in the filestore as if they were in folders. See the docs on 'fdbls' for details.)
my $filestore = new DBIx::FileStore();
returns a new DBIx::FileStore object
my $fileinfo_ref = $filestore->get_all_filenames()
Returns a list of references to data about all the files in the filestore.
Each row consist of the following columns: name, c_len, c_md5, lasttime_as_int
my $fileinfo_ref = get_filenames_matching_prefix( $prefix );
Returns a list of references to data about the files in the filestore whose name matches the prefix $prefix.
Returns a list of references in the same format as get_all_filenames().
my $bytecount = $filestore->read_from_db( "filesystemname.txt", "filestorename.txt" );
Copies the file 'filestorename.txt' from the filestore to the file filesystemname.txt on the local filesystem.
my $bytecount = $filestore->print_blocks_from_db_to_filehandle( $fh, $fdbname );
Prints the file 'filestorename.txt' from the filestore to the the filehandle.
my $bytecount = $filestore->_read_blocks_from_db( $callback_function, $fdbname );
** Intended for internal use by this module. **
Fetches the blocks from the database for the file stored under $fdbname, and calls the $callback_function on the data from each one after it is read.
Locks the relevant tables while data is extracted. Locking should probably be configurable by the caller, or at least finer-grained.
It also confirms that the base64 md5 checksum for each block and the file contents as a whole are correct. Die()'s with an error if a checksum doesn't match.
my $bytecount = $self->write_to_db( $localpathname, $filestorename );
Copies the file $localpathname from the filesystem to the name $filestorename in the filestore.
Locks the relevant tables while data is extracted. Locking should probably be configurable by the caller.
Returns the number of bytes written. Dies with a message if the source file could not be read.
Note that it currently reads the file twice: once to compute the md5 checksum before insterting it, and a second time to insert the blocks.
my $ok = $self->rename_file( $from, $to );
Renames the file in the database from $from to $to. Returns 1;
my $ok = $self->delete_file( $filename );
Removes data named $filename from the filestore.
my $filename_ok = DBIx::FileStore::name_ok( $fdbname )
Checks that the name $fdbname is acceptable for using as a name in the filestore. Must not contain spaces or be over 75 chars.
The data is stored in the database using two tables: 'files' and 'fileblocks'. All meta-data is stored in the 'files' table, and the file contents are stored in the 'fileblocks' table.
The fileblocks table has only three fields:
The name of the block. Always looks like "filename.txt <BLOCKNUMBER>", for example "filestorename.txt 00000".
The contents of the named block. Each block is currently set to be 512K. Care must be taken to use blocks that are not larger than mysql buffers can handle (in particular, max_allowed_packet).
The timestamp of when this block was inserted into the DB or updated.
The files table has several fields. There is one row in the files table for each row in the fileblocks table-- not one per file (see IMPLEMENTATION CAVEATS, below). The fields in the files table are:
The name of the block, exactly as used in the fileblocks table. Always looks like "filename.txt <BLOCKNUMBER>", for example "filestorename.txt 00000".
Content length. The content length of the complete file (sum of length of all the file's blocks).
Block number. The number of the block this row represents. The b_num is repeated as a five (or more) digit number at the end of the name field (see above). We denormalize the data like this so we can quickly find blocks by name or block number.
Block md5. The md5 checksum for the block (b is for 'block') represented by this row. We use base64 encoding (which uses 0-9, a-z, A-Z, and a few other characters) to represent md5's because it's a little shorter than the hex representation. (22 vs. 32 characters)
Content md5. The base64 md5 checksum for the whole file (c is for 'content') represented by this row.
The timestamp of when this row was inserted into the DB or updated.
DBIx::FileStore is what I would consider production-grade code, but the overall wisdom of storing files in blobs in a mysql database may be questionable (for good reason).
That having been said, if you have a good reason to do so, as long as you understand the repercussions of storing files in your mysql database, then this toolkit offers a stable and flexible backend for binary data storage, and it works quite nicely.
If we were to redesign the system, in particular we might reconsider having one row in the 'files' table for each block stored in the 'fileblocks' table. Perhaps instead, we'd have one entry in the 'files' table per file.
In concrete terms, though, the storage overhead of doing it this way (which only affects files larger than the block size, which defaults to 512K) is about 100 bytes per block. Assuming files larger than 512K, and with a conservative average block size of 256K, the extra storage overhead of doing it this way is still only about 0.03%.
You should probably read the documentation for the various filestore command-line tools:
fdbcat, fdbget, fdbls, fdbmv, fdbput, fdbrm, fdbstat, and fdbtidy. fdbslurp (which is the opposite of fdbcat) was not completed.
You can also read the documentation at:
Copyright 2010 Josh Rabinowitz.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.