The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

Name

File::Replace - Perl extension for replacing files by renaming a temp file over the original

Synopsis

This module provides three interfaces:

 use File::Replace 'replace2';
 
 my ($infh,$outfh) = replace2($filename);
 while (<$infh>) {
     # write whatever you like to $outfh here
     print $outfh "X: $_";
 }
 close $infh;   # closing both handles will
 close $outfh;  # trigger the replace

Or the more magical single filehandle, in which print, printf, and syswrite go to the output file; binmode to both; fileno only reports open/closed status; and the other I/O functions go to the input file:

 use File::Replace 'replace';
 
 my $fh = replace($filename);
 while (<$fh>) {
     # can read _and_ write from/to $fh
     print $fh "Y: $_";
 }
 close $fh;

Or the object oriented:

 use File::Replace;
 
 my $repl = File::Replace->new($filename);
 my $infh = $repl->in_fh;
 while (<$infh>) {
     print {$repl->out_fh} "Z: $_";
 }
 $repl->finish;

Description

This module implements and hides the following pattern for you:

  1. Open a temporary file for output

  2. While reading from the original file, write output to the temporary file

  3. rename the temporary file over the original file

In many cases, in particular on many UNIX filesystems, the rename operation is atomic*. This means that in such cases, the original filename will always exist, and will always point to either the new or the old version of the file, so a user attempting to open and read the file will always be able to do so, and never see an unfinished version of the file while it is being written.

* Warning: Unfortunately, whether or not a rename will actually be atomic in your specific circumstances is not always an easy question to answer, as it depends on exact details of the operating system and file system. Consult your system's documentation and search the Internet for "atomic rename" for more details. This module's job is to perform the rename, and it can make no guarantees as to whether it will be atomic or not.

Version

This documentation describes version 0.06 of this module.

Constructors and Overview

The functions File::Replace->new(), replace(), and replace2() take exactly the same arguments, and differ only in their return values - replace and replace2 wrap the functionality of File::Replace inside tied filehandles. Note that replace() and replace2() are normal functions and not methods, don't attempt to call them as such. If you don't want to import them you can always call them as, for example, File::Replace::replace().

 File::Replace->new( $filename );
 File::Replace->new( $filename, $layers );
 File::Replace->new( $filename, option => 'value', ... );
 File::Replace->new( $filename, $layers, option => 'value', ... );
 # replace(...) and replace2(...) take the same arguments

The constructors will open the input file and the temporary output file (the latter via File::Temp), and will die in case of errors. The options are described in "Options". It is strongly recommended that you use warnings;, as then this module will issue warnings which may be of interest to you.

File::Replace->new

 use File::Replace;
 my $replace_object = File::Replace->new($filename, ...);

Returns a new File::Replace object. The central methods provided are ->in_fh and ->out_fh, which return the input resp. output filehandle which you can read resp. write, and ->finish, which causes the files to be closed and the replace operation to be performed. There is also ->cancel, which just discards the temporary output file without touching the input file. Additional helper methods are mentioned below.

finish will die on errors, while cancel will only return a false value on errors. This module will try to clean up after itself (remove temporary files) as best it can, even when things go wrong.

Please don't re-open the in_fh and out_fh handles, as this may lead to confusion.

The method ->is_open will return a false value if the replace operation has been finished or canceled, or a true value if it is still active. The method ->filename returns the filename passed to the constructor. The method ->options in list context returns the options this object has set (including defaults) as a list of key/value pairs, in scalar context it returns a hashref of these options.

replace

 use File::Replace 'replace';
 my $magic_handle = replace($filename, ...);

Returns a single, "magical" tied filehandle. The operations print, printf, and syswrite are passed through to the output filehandle, binmode operates on both the input and output handle, and fileno only reports -1 if the File::Replace object is still active or undef if the replace operation has finished or been canceled. All other I/O functions, such as <$handle>, readline, sysread, seek, tell, eof, etc. are passed through to the input handle. You can still access these operations on the output handle via e.g. eof( tied(*$handle)->out_fh ) or tied(*$handle)->out_fh->tell(). The replace operation (finish) is performed when you close the handle, which means that close may die instead of just returning a false value.

Re-opening the handle causes a new underlying File::Replace object to be created. You should explicitly close the filehandle first so that the previous replace operation is performed (or cancel that operation). The "mode" argument (or filename in the case of a two-argument open) may not contain a read/write indicator (<, >, etc.), only PerlIO layers.

You can access the underlying File::Replace object via tied(*$handle)->replace. You can also access the original, untied filehandles via tied(*$handle)->in_fh and tied(*$handle)->out_fh, but please don't close or re-open these handles as this may lead to confusion.

replace2

 use File::Replace 'replace2';
 my ($input_handle, $output_handle) = replace2($filename, ...);
 my $output_handle = replace2($filename, ...);

In list context, returns a two-element list of two tied filehandles, the first being the input filehandle, and the second the output filehandle, and the replace operation (finish) is performed when both handles are closed. In scalar context, it returns only the output filehandle, and the replace operation is performed when this handle is closed. This means that close may die instead of just returning a false value.

You cannot re-open these tied filehandles.

You can access the underlying File::Replace object via tied(*$handle)->replace on both the input and output handle. You can also access the original, untied filehandles via tied(*$handle)->in_fh and tied(*$handle)->out_fh, but please don't close or re-open these handles as this may lead to confusion.

Options

Filename

A filename. The temporary output file will be created in the same directory as this file, its name will be based on the original filename, but prefixed with a dot (.) and suffixed with a random string and an extension of .tmp. If the input file does not exist (ENOENT), then the behavior will depend on the "create" option.

layers

This option can either be specified as the second argument to the constructors, or as the layers => '...' option in the options hash, but not both. It is a list of PerlIO layers such as ":utf8", ":raw:crlf", or ":encoding(UTF-16)". Note that the default layers differ based on operating system, see "open" in perlfunc.

create

This option configures the behavior of the module when the input file does not exist (ENOENT). There are three modes, which you specify as one of the following strings. If you need more precise control of the input file, see the "in_fh" option - note that create is ignored when you use that option.

"later" (default when create omitted)

Instead of the input file, /dev/null or its equivalent is opened. This means that while the output file is being written, the input file name will not exist, and only come into existence when the rename operation is performed.

"now"

If the input file does not exist, it is immediately created and opened. There is currently a potential race condition: if the file is created by another process before this module can create it, then the behavior is undefined - the file may be emptied of its contents, or you may be able to read its contents. This behavior may be fixed and specified in a future version. The race condition is discussed some more in "Concurrency and File Locking".

Currently, this option is implemented by opening the file with a mode of +>, meaning that it is created (clobbered) and opened in read-write mode. However, that should be considered an implementation detail that is subject to change. Do not attempt to take advantage of the read-write mode by writing to the input file - that contradicts the purpose of this module anyway. Instead, the input file will exist and remain empty until the replace operation.

"off" (or "no")

Attempting to open a nonexistent input file will cause the constructor to die.

The above values were introduced in version 0.06. Using any other than the above values will trigger a mandatory deprecation warning. For backwards compatibility, if you specify any other than the above values, then a true value will be the equivalent of now, and a false value the equivalent of later. The deprecation warning will become a fatal error in a future version, to allow new values to be added in the future.

The devnull option has been deprecated as of version 0.06. Its functionality has been merged into the create option. If you use it, then the module will operate in a compatibility mode, but also issue a mandatory deprecation warning, informing you what create setting to use instead. The devnull option will be entirely removed in a future version.

in_fh

This option allows you to pass an existing input filehandle to this module, instead of having the constructors open the input file for you. Use this option if you need more precise control over how the input file is opened, e.g. if you want to use sysopen to open it. The handle must be open, which will be checked by calling fileno on the handle. The module makes no attempt to check that the filename you pass to the module matches the filehandle. The module will attempt to stat the handle to get its permissions, except when you have specified the "perms" option or disabled the "chmod" option. The "create" option is ignored when you use this option.

perms

 perms => 0640       # ok
 perms => oct("640") # ok
 perms => "0640"     # WRONG!

Normally, just before the rename is performed, File::Replace will chmod the temporary file to those permissions that the original file had when it was opened, or, if the original file did not yet exist, default permissions based on the current umask. Setting this option to an octal value (a number, not a string!) will override those permissions. See also "chmod", which can be used to disable the chmod operation.

chmod

This option is enabled by default, unless you set $File::Replace::DISABLE_CHMOD to a true value. When you disable this option, the chmod operation that is normally performed just before the rename will not be attempted. This is mostly intended for systems where you know the chmod will fail. See also "perms", which allows you to define what permissions will be used.

Note that the temporary files created with File::Temp will have 0600 permissions if left unchanged (except of course on systems that don't support these kind of restrictive permissions).

autocancel

If the File::Replace object is destroyed (e.g. when it goes out of scope), and the replace operation has not been performed yet, normally it will cancel the replace operation and issue a warning. Enabling this option makes that implicit canceling explicit, silencing the warning.

This option cannot be used together with autofinish.

autofinish

When set, causes the finish operation to be attempted when the object is destroyed (e.g. when it goes out of scope).

However, using this option is actually not recommended unless you know what you are doing. This is because the replace operation will also be attempted when your script is dieing, in which case the output file may be incomplete, and you may not want the original file to be replaced. A second reason is that the replace operation may be attempted during global destruction, and it is not a good idea to rely on this always going well. In general it is better to finish the replace operation explicitly.

This option cannot be used together with autocancel.

debug

If set to a true value, this option enables some debug output for new, finish, and cancel. You may also set this to a filehandle, and debug output will be sent there.

Notes and Caveats

Concurrency and File Locking

This module is very well suited for situations where a file has one writer and one or more readers.

Among other things, this is reflected in the case of a nonexistent file, where the "create" settings now and later (the default) are currently implemented as a two-step process, meaning there is the potential of the input file being created in the short period of time between the first and second open attempts, which this module currently will not notice.

Having multiple writers is possible, but care must be taken to ensure proper coordination of the writers!

For example, a simple flock of the input file is not enough: if there are multiple processes, remember that each process will replace the original input file by a new and different file! One possible solution would be a separate lock file that does not change and is only used for flocking. There are other possible methods, but that is currently beyond the scope of this documentation.

(For the sake of completeness, note that you cannot flock the tied handles, only the underlying filehandles.)

Author, Copyright, and License

Copyright (c) 2017 Hauke Daempfling (haukex@zero-g.net) at the Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Berlin, Germany, http://www.igb-berlin.de/

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.