version 0.09 Directory::Transactional - ACID transactions on a set of files with journalling/recovery using flock or File::NFSLock
flock
use Directory::Transactional; my $d = Directory::Transactional->new( root => $path ); $d->txn_do(sub { my $fh = $d->openw("path/to/file"); $fh->print("I AR MODIFY"); close $fh; });
This module provides lock based transactions over a set of files with full supported for nested transactions.
There are a few limitations to what this module can do.
Following this guideline will prevent unpleasant encounters:
No attempt is made to sanify paths reaching outside of the root.
All paths are assumed to be relative and within the root.
Stick with plain files, with a link count of 1, or you will not get what you expect.
For instance a rename will first copy the source file to the txn work dir, and then when comitting rename that file to the target dir and unlink the original.
While seemingly more work, this is the only way to ensure that modifications to the file both before and after the rename are consistent.
Modifications to directories are likewise not supported, but support may be added in the future.
If you don't need transaction, use a global lock file and don't use this module.
If you do, then make sure even your read access goes through this object with an active transaction, or you may risk reading uncomitted data, or conflicting with the transaction commit code.
global_lock
If you stick to modifying the files through the API then you shouldn't have issues with locking, but try not to reuse paths and always reask for them to ensure that the right "real" path is returned even if the transaction stack has changed, or anything else.
If you fork in the middle of the transaction both the parent and the child have write locks, and both the parent and the child will try to commit or rollback when resources are being cleaned up.
Either create the Directory::Transactional instance within the child process, or use "_exit" in POSIX and do not open or close any transactions in the child.
nfs
nfs mode is not compatible with flock mode. If you enable nfs enable it in all processes working on the same directory.
Conversely, under flock mode global_lock is compatible with fine grained locking.
ACID stands for atomicity, consistency, isolation and durability.
Transactions are atomic (using locks), consistent (a recovery mode is able to restore the state of the directory if a process crashed while comitting a transaction), isolated (each transaction works in its own temporary directory), and durable (once txn_commit returns a software crash will not cause the transaction to rollback).
txn_commit
This section describes the way the ACID guarantees are met:
When the object is being constructed a nonblocking attempt to get an exclusive lock on the global shared lock file using File::NFSLock or flock is made.
If this lock is successful this means that this object is the only active instance, and no other instance can access the directory for now.
The work directory's state is inspected, any partially comitted transactions are rolled back, and all work files are cleaned up, producing a consistent state.
At this point the exclusive lock is dropped, and a shared lock on the same file is taken, which will be retained for the lifetime of the object.
Each transaction (root or nested) gets its own work directory, which is an overlay of its parent.
All write operations are performed in the work directory, while read operations walk up the tree.
Aborting a transaction consists of simply removing its work directory.
Comitting a nested transaction involves overwriting its parent's work directory with all the changes in the child transaction's work directory.
Comitting a root transaction to the root directory involves moving aside every file from the root to a backup directory, then applying the changes in the work directory to the root, renaming the backup directory to a work directory, and then cleaning up the work directory and the renamed backup directory.
If at any point in the root transaction commit work is interrupted, the backup directory acts like a journal entry. Recovery will rollback this transaction by restoring all the renamed backup files. Moving the backup directory into the work directory signifies that the transaction has comitted successfully, and recovery will clean these files up normally.
If crash_detection is enabled (the default) when reading any file from the root directory (shared global state) the system will first check for crashed commits.
crash_detection
Crashed commits are detected by means of lock files. If the backup directory is locked that means its comitting process is still alive, but if a directory exists without a lock then that process has crashed. A global dirty flag is maintained to avoid needing to check all the backup directories each time.
If the commit is still running then it can be assumed that the process comitting it still has all of its exclusive locks so reading from the root directory is safe.
This module does not implement deadlock detection. Unfortunately maintaing a lock table is a delicate and difficult task, so I doubt I will ever implement it.
The good news is that certain operating systems (like HPUX) may implement deadlock detection in the kernel, and return EDEADLK instead of just blocking forever.
EDEADLK
If you are not so lucky, specify a timeout or make sure you always take locks in the same order.
timeout
The global_lock flag can also be used to prevent deadlocks entirely, at the cost of concurrency. This provides fully serializable level transaction isolation with no possibility of serialization failures due to deadlocks.
There is no pessimistic locking mode (read-modify-write optimized) since all paths leading to a file are locked for reading. This mode, if implemented, would be semantically identical to global_lock but far less efficient.
In the future fcntl based locking may be implemented in addition to flock. EDEADLK seems to be more widely supported when using fcntl.
fcntl
If you perform any operation outside of a transaction and auto_commit is enabled a transaction will be created for you.
auto_commit
For operations like rename or readdir which do not return resource the transaction is comitted immediately.
rename
readdir
Operations like open or file_stream on the other create a transaction that will be alive as long as the return value is alive.
open
file_stream
This means that you should not leak filehandles when relying on autocommit.
Opening a new transaction when an automatic one is already opened is an error.
Note that this resource tracking comes with an overhead, especially on Perl 5.8, so even if you are only performing read operations it is reccomended that you operate within the scope of a real transaction.
One filehandle is required per every lock when using fine grained locking.
For large transactions it is reccomended you set global_lock, which is like taking an exclusive lock on the root directory.
global_lock also performs better, but causes long wait times if multiple processes are accessing the same database but not the same data. For web applications global_lock should probably be off for better concurrency.
This is the managed directory in which transactional semantics will be maintained.
This can be either a string path or a Path::Class::Dir.
This attribute is named with a leading underscore to prevent thoughtless modification (if you have two workers accessing the same directory simultaneously but the work dir is different they will conflict and not even know it).
The default work directory is placed under root, and is named .txn_work_dir.
.txn_work_dir
The work dir's parent must be writable, because a lock file needs to be created next to it (the workdir name with .lock appended).
.lock
If true (defaults to false), File::NFSLock will be used for all locks instead of flock.
Note that on my machine the stress test reliably FAILS with File::NFSLock, due to a race condition (exclusive write lock granted to two writers simultaneously), even on a local filesystem. If you specify the nfs flag make sure your link system call is truly atomic.
link
If true instead of using fine grained locking, a global write lock is obtained on the first call to txn_begin and will be kept for as long as there is a running transaction.
txn_begin
This is useful for avoiding deadlocks (there is no deadlock detection code in the fine grained locking).
This flag is automatically set if nfs is set.
If set will be used to specify a time limit for blocking calls to lock.
If you are experiencing deadlocks it is reccomended to set this or global_lock.
If true (the default) any operation not performed within a transaction will cause a transaction to be automatically created and comitted.
Transactions automatically created for operations which return things like filehandles will stay alive for as long as the returned resource does.
IF true (the default), all read operations accessing global state (the root directory) will first ensure that the global directory is not dirty.
If the perl process crashes while comitting the transaction but other concurrent processes are still alive, the directory is left in an inconsistent state, but all the locks are dropped. When crash_detection is enabled ACID semantics are still guaranteed, at the cost of locking and stating a file for each read operation on the global directory.
If you disable this then you are only protected from system crashes (recovery will be run on the next instantiation of Directory::Transactional) or soft crashes where the crashing process has a chance to run all its destructors properly.
Executes $code within a transaction in an eval block.
$code
eval
If any error is thrown the transaction will be rolled back. Otherwise the transaction is comitted.
%callbacks can contain entries for commit and rollback, which are called when the appropriate action is taken.
%callbacks
commit
rollback
Begin a new transaction. Can be called even if there is already a running transaction (nested transactions are supported).
Commit the current transaction. If it is a nested transaction, it will commit to the parent transaction's work directory.
Discard the current transaction, throwing away all changes since the last call to txn_begin.
Lock the resource at $path for writing or reading.
$path
By default the ancestors of $path will be locked for reading to (from outermost to innermost).
The only way to unlock a resource is by comitting the root transaction, or aborting the transaction in which the resource was locked.
$path does not have to be a real file in the root directory, it is possible to use symbolic names in order to avoid deadlocks.
root
Note that these methods are no-ops if global_lock is set.
Open a file for reading, writing (clobbers) or appending, or with a custom mode for three arg open.
Using openw or openr is reccomended if that's all you need, because it will not copy the file into the transaction work dir first.
openw
openr
Runs "stat" in File::stat on the physical path.
Runs CORE::stat on the physical path.
CORE::stat
Whether a file exists or has been deleted in the current transaction.
Runs the -f file test on the right physical path.
-f
Runs the -d file test on the right physical path.
-d
Deletes the file in the current transaction
Renames the file in the current transaction.
Note that while this is a real rename call in the txn work dir that is done on a copy, when comitting to the top level directory the original will be unlinked and the new file from the txn work dir will be renamed to the original.
Hard links will NOT be retained.
Merges the overlays of all the transactions and returns unsorted basenames.
A path of "" can be used to list the root directory.
""
A DWIM version of readdir that returns paths relative to root, filters out . and .. and sorts the output.
.
..
Creates a Directory::Transactional::Stream for a recursive file listing.
The dir option can be used to specify a directory, defaulting to root.
dir
These are documented so that they may provide insight into the inner workings of the module, but should not be considered part of the API.
Merges one directory over another.
Runs the directory state recovery code.
See "TRANSACTIONAL PROTOCOL"
Called to recover when the directory is already instantiated, by check_dirty if a dirty state was found.
check_dirty
Check for transactions that crashed in mid commit
Called just before starting a commit.
Copies $path as necessary from a parent transaction or the root directory in order to facilitate local work.
Does not support hard or symbolic links (yet).
To install Directory::Transactional, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Directory::Transactional
CPAN shell
perl -MCPAN -e shell install Directory::Transactional
For more information on module installation, please visit the detailed CPAN module installation guide.