The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Cache::Repository - Generic repository of files

SYNOPSIS

  my $rep = Cache::Repository->new(
      style => 'Filesys',
      # options for the F::R driver
    );
  $rep->add_files(tag => 'groupname',
                  files => \@filenames,
                  basedir => '/tmp',
                  move => 1,
                 );
  $rep->add_filehandle(tag => 'anothergroup',
                       filename => 'blah',
                       filehandle => $fh,
                       mode => 0755);
  $rep->set_meta(tag => 'groupname',
                 meta => {
                     title => 'blah',
                     author => 'foo',
                 });

  $rep->retrieve(tag => 'groupname', dest => '/newdir');
  my $data = $rep->get_meta(tag => 'groupname');

DESCRIPTION

This module is intended to serve as a repository for files, whether those files are local or remote. Different drivers can work independantly to provide differing backing stores. For example, one driver can use a locally-mounted filesystem (even if that is a network filesystem), another could use FTP or HTTP, another could use gmail, and another could use a relational database such as MySQL or DB2.

Drivers may choose to compress the repository, unless explicitly told otherwise.

Keeping this in mind, the API presented here cannot expose things that are not generic to other possible implementations. That said, some possible implementations may not allow adding ("sending" to a web server) - it is expected that they will either throw an exception, or take extra params for FTP'ing to the server.

FUNCTIONS

new

Cache::Repository constructor. The constructor will load the driver and return an object of the driver package. All other parameters will be passed to the driver for initialisation.

    my $r = Cache::Repository->new(
                                  style => 'Filesys',
                                  # ...
                                 );

It is up to the underlying driver to determine if the repository created by this is persistant for other processes (e.g., meta-data or even data stored in RAM wouldn't be persistant), or to handle locking issues should multiple processes be accessing the repository simultaneously.

Parameters:

style

This is the name of the driver. The driver is expected to be Cache::Compress::style, e.g., Cache::Compress::Filesys

(others)

As required by the underlying driver.

Suggested parameters for drivers:

clear

If true, clear the repository (if it exists) to start anew. Existing files and meta information will all be removed.

compress

If true, the driver should compress the files and/or meta information if it is able to, and if it is capable of doing so (drivers do not need to implement this.) True values may include:

Z or compress

Compress with the standard compress format.

gz or gzip

Compress with gzip-compatable format.

zip

Compress with InfoZip-compatable format.

bz or bzip2

Compress with bzip2-compatable format.

any other truth value

Compress with any format the driver wishes.

If the chosen compression format cannot be acheived, the driver may choose another format, or choose to not compress.

If false, the driver should not compress the files, even if it can.

If unset, the driver may compress or not, as the driver desires. Usually this is the best option for the user since usually whether the repository is compressed or not should not be important. Also, the format of the compression is unimportant.

Returns: The Cache::Repository-derived object, or undef if the driver failed to initialise.

Alternately, you can instantiate the driver directly, e.g.,

    my $r = Cache::Repository::Filesys->new(%options);
clear_tag

Clears a tag completely from the repository. This includes files and meta information.

Parameters:

tag

The tag to be cleared.

Adds a symlink to the repository. Note that on systems that do not understand symlinks, this may not actually work. Even if the storage allows it, retrieving a symlink may not do what is expected.

Parameters:

tag

Mandatory identifier for the group of files. If the tag already exists, any files will be added to the tag by default.

filename

Filename that is the symlink

target

The target that the symlink points at. The target need not actually exist - dangling symlinks should work fine.

add_files

Adds files to the repository.

Parameters:

tag

Mandatory identifier for the group of files. If the tag already exists, any files will be added to the tag by default.

files

This can be either a single filename, or an array ref of filenames to add. The filenames may include paths, but may not include the equivalent of File::Spec->updir in any file. This is largely to keep files from going out of the "current" directory and into parent or sibling directories.

basedir

Where to look for files listed in the files parameter. Default is the current working directory.

filename_conversion

This is a multi-pronged tool which is intended to allow the user to rename files on the way in to the repository. The default is to leave the filenames unchanged.

This option may be:

a single CODE ref

In this case, the code ref should modify $_ to become the new file name. Usually this will be something like:

    filename_conversion => sub { s!(path)/([^/.]*)!$2/$1! }
a single or array ref of filenames

This works just like the files option. If the list of files passed in to this parameter is not of the same length as the list of files, then an exception is thrown. If a given filename is undef, the filename is left unchanged. For example:

    files => qw(blah foo bar/baz),
    filename_conversion => (undef, qw(floo/foo bar/blah))

This will read from a file named blah, and put it into the repository without modifying the name. It will read foo from the current directory (or the basedir if specified) and put it into the repository in the floo directory. And it will read bar/baz and put it in the repository as bar/blah.

move

If set to true, will remove the file after placing it in the repository. Can also be used for optimisation for a filesystem repository on the same partition.

Returns: true if all files were added succesfully, false otherwise.

add_filehandle

Adds a file to the repository.

Parameters:

tag

Mandatory identifier for the group of files. If the tag already exists, any files will be added to the tag by default.

filehandle

You can pipe your data directly into the repository. This filehandle can be any perl-ish filehandle object: a GLOB, an IO::Handle (including an IO::String), or anything else that works like a file handle to be read from. Note that perl can open from a string reference in v5.8, so that is viable as well.

The filehandle will be read from, and the data written directly to the repository, and should be done in a loop such that the entire file need not be brought into memory. For example, during an FTP transfer, the filehandle will be read so that it can be put directly to the server.

The filename that is used is the filename parameter.

Note that only one filehandle can be added at a time.

filename

The filename for the filehandle. Again, this filename may include subdirectories, but cannot be an absolute path nor include the updir string.

mode

Attributes for the file. Normally these would be read directly from the input file, but cannot be read from a filehandle, so this will need to be provided.

owner

The UID for the owner of the file. Note that without root authority, this may fail. Default is the file's owner, or the current user if the source is a filehandle.

group

The GID for the owner of the file. Note that without root authority, this may fail. Default is the file's owner, or the current group if the source is a filehandle.

Returns: true if the repository was successfully added.

retrieve

Retrieves all the files associated with the given tag to the location specified.

Parameters:

tag

Required. The tag to retrieve.

basedir

The location to place the file(s). Note that any files that were placed into the repository with subdirectories will be placed in a subdirectory relative to this basedir.

files

The list of files to be retrieved. Defaults to all files. This parameter may be a simple scalar, or an array ref, e.g.,

    files => 'foo.txt'

or

    files => [ 'foo.txt' ]

are both the same.

Returns undef if the tag doesn't exist, or 0 for any other retrieval error.

retrieve_as_hash

Retrieves all the files associated with the given tag into memory. The hash (or hash-ref in scalar context) is returned. To use a specific hash, pass in a ref to it.

Keys to the hash are the filenames. The values are hashes with keys of: content (the file contents), mode (file mode), owner (UID for the file), and group (GID for the file) if the filename is a real file, and a key of target if the file is a symlink.

Parameters:

tag

Required. The tag to retrieve.

hash

If this parameter is specified, this hash ref will be used instead of creating a new hash ref. For example:

    my %files;
    my $ref = $rep->retrieve(tag => 'groupname', hash => \%files);
    # \%files == $ref
files

The list of files to be retrieved. Defaults to all files. This parameter may be a simple scalar, or an array ref, e.g.,

    files => 'foo.txt'

or

    files => [ 'foo.txt' ]

are both the same.

Returns undef if the retrieval failed.

retrieve_with_callback

Retrieves each file associated with the given tag by calling back to the specified function.

Parameters:

tag

Required. The tag to retrieve.

callback

This parameter specifies a code ref which will be called for each file. The code ref will be given the following parameters on each call. The code may be called more than once per file if the file is being retrieved in chunks.

filename

The name of the current file.

data

The contents of the file, or the current chunk of contents of the file. May be empty if the previous call happened to contain the end of the file.

owner
group
mode

The owner, group, and mode of the file.

start

True if this is the first call for this file.

end

True if this is the last call for this file. Note that start and end may both be set to true if data contains the entire file. Also note that data may be empty if the previous chunk turned out to be the end of the file.

target

The symlink target if the current file is a symlink. Note that if the storage supports this, but the current filesystem does no, it is up to the callback routine to figure out what to do.

error

If an error happened during retrieval, this will be the driver-defined error (string or number).

If the callback returns true, processing will continue, false will abort the rest of the retrieval.

files

The list of files to be retrieved. Defaults to all files. This parameter may be a simple scalar, or an array ref, e.g.,

    files => 'foo.txt'

or

    files => [ 'foo.txt' ]

are both the same.

Returns undef if the tag doesn't exist.

set_meta

Sets some meta-information for the files. For example, storing sizes, MD5s, or other information that your application needs about this group, other than the files themselves.

Parameters:

tag

Mandatory identifier for the group of files.

meta

A hash ref of meta information. This information will be added to the existing meta information. Key collisions will replace. This can be thought of as:

  %meta = (%old_meta, %new_meta);
reset

If this flag is true, the given meta information will replace existing meta information as in:

  %meta = %new_meta;

completely discarding the old meta information.

Returns: true if successful.

Default implementation stores meta information in memory - this is fine for single-process repositories that don't need to be persistant across invocations, but a generic persistant implementation cannot be written outside of the driver.

get_meta

Retrieves the meta information for a tag.

Parameters:

tag

Mandatory identifier for the group of files.

Returns: the meta information in hash-ref form.

get_size

Returns the total space requirements of the tag. Each file is rounded up to the next K before adding together. If a file list is given, only those files are counted.

Symlinks are counted as 1K.

Parameters:

tag

Mandatory identifier for the group of files.

files

Either a single file, or an array ref of files, which is the file or files that should be considered part of the size total.

list_files

Returns a list of the files that are currently stored for the given tag. Note that if compression is on, the filenames must be the uncompressed names. If compression is using an archive format (such as zip), this may be a slow operation unless the driver stores the file list externally to the archive, or each file is archived/compressed to a separate compressed archive.

Parameters:

tag

Mandatory identifier for the group of files.

Returns: An array in array context, array ref in scalar context.

list_tags

Returns a list of all in-use tags. Note that if there is more than one process running from the same repository that by the time you get to use the list, it may have changed (tags being added or removed).

Parameters: None.

Returns: An array in array context, array ref in scalar context. The order of tags returned is indetertiminate (may be in insert order, may be alphabetical, may be pseudo-random). If a sort order is desired, it is up to the caller to use meta-information on each tag on which to base a sort, and to call sort itself.

_is_filename_ok

Checks a filename to see if it is "safe". We are primarily concerned with filenames that go up the tree to above the current directory, whether that is absolute path or using the File::Spec->updir.

Returns true if the filename is ok. Primarily used by the drivers.

AUTHOR

Darin McBride - dmcbride@cpan.org

COPYRIGHT

Copyright 2006 Darin McBride.

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.

BUGS

See TODO file.

SEE ALSO

http://www.perlmonks.org/index.pl?node_id=589209