The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::CleanupTask - Delete/Backup files on a task-based configuration

VERSION

Version 0.07

SYNOPSIS

    use File::CleanupTask;

    my $cleanup = File::Cleanup->new({
        conf => "/path/to/tasks_file.tasks",
        taskname => "TASK_LABEL_IN_TASKFILE",
    });

    $cleanup->run();

Once run() is called, the cleanup operation 'TASK_LABEL_IN_TASKFILE' specified in tasks_file.tasks is performed.

CONFIGURATION FORMAT

A .tasks file is a text file in which one or more cleanup tasks are specified. Each task has a label and a list of options specified as shown in the following example:

    [TASK_LABEL_IN_TASKFILE]
    path                = '/home/savio/results/'
    backup_path         = '/home/savio/old_results/'
    backup_gzip             = 1
    max_days                = 3
    recursive               = 1
    prune_empty_directories = 1
    keep_if_linked_in       = '/home/savio/results/'

    [ANOTHER_LABEL]
    path = 'C:\\this\\is\\a\\windows\\path'
        ...

In this case, [TASK_LABEL_IN_TASKFILE] is the name of the cleanup task to be executed.

The following options can be specified under a task label:

path

The path to the directory containing the files to be deleted or removed. Note that in MS Windows the backslashes of a path names should be escaped and apostrophese are strictly needed when specifying a path name (see example above).

backup_path

If specified, will cause files to be moved in the specified directory instead of being deleted. If backup_path doesn't exist, it will be created. Symlinks are not backed up. The files are backed up at the toplevel of backup_path in a .gz (or .tgz, depending on backup_gzip) archive, which preserves pathnames of the archived files.

backup_gzip

If set to "1", will gzip the files saved in backup_path. The resulting archive will preserve the pathname of the original file, and will be relative to 'path'.

For example, given the following configuration:

   [LABEL]
   path = /path/to/cleanup/
   backup_path = /path/to/backup/
   backup_gzip = 1

If /path/to/cleanup/my/target/file.txt is encountered, and it's old, it will be backed up in /path/to/backup/file.txt.gz. Uncompressing file.txt.gz using /path/to/backup as current working directory will result in:

   /path/to/backup/path/to/cleanup/my/target/file.txt

max_days

The number of maximum days within which the files in the cleanup directories are kept. If a file is older than the specified number of days, it is queued for deletion.

For example, max_days = 3 will delete files older than 3 days from the cleanup directory.

max_days defaults to 0 if it isn't specified, meaning that all the files are to be deleted.

recursive

If set to 0, only files within "path" can be deleted/backed up. If set to 1, files located at any level within "path" can be deleted.

prune_empty_directories

If set to 1, empty directories will be deleted regardless their age.

keep_if_linked_in

A pathname to a directory that may contain symlinks. If specified, it will prevent deletion of files and directories within path that are symlinked in this directory, regardless their age.

This option will be ignored in MS Windows or in other operating systems that don't support symlinks.

do_not_delete

A regular expression that defines a pattern to look for. Any pathname matching this pattern will not be erased, regardless their age. The regular expression applies to the full pathname of the file or directory.

delete_all_or_nothing_in

If set to 1, immediate subfolders in path will be deleted only if all the files in it are deleted.

pattern

If specified, will apply any potential delete or backup action to the files that match the pattern. Any other file will be left untouched.

If set to 1, the symlinks inside 'path' will be deleted only if their target will be deleted. This option is disabled by default, which means that the target of symlinks within the path will not be questioned during deletion/backup, they will be just treated as regular files.

This option will be ignored in MS Windows or in other operating systems that don't support symlinks.

METHODS

new

Create and configure a new File::CleanupTask object.

The object must be initialised as follows:

    my $cleanup = File::Cleanup->new({
        conf => "/path/to/tasks_file.tasks",
        taskname => 'TASK_LABEL_IN_TASKFILE',
    });

command_line_run

Given the arguments specified in the command line, processes them, creates a new File::CleanupTask object, an then calls run.

Options include dryrun, verbose, task and conf.

dryrun just build and show the plan, nothing will be executed or deleted.

verbose produce more verbose output.

task optional, will result in the execution of the specified task.

conf the path to the .tasks configuration file.

run

Perform the cleanup

run_one_task

Run a single cleanup task given its configuration and name. The name is used as a label for possible output and is an optional parameter of this method.

This will scan all files and directories in path in a depth first fashion. If a file is encountered a target action is performed based on the state of that file (file or directory, symlinked, old, empty directory...).

verbose, dryrun

Accessors that will tell you if running in dryrun or verbose mode.

_build_delete_once_empty

Builds a delete_once_empty of pathnames, each of which should be deleted only if all its files are also deleted.

_build_never_delete

Builds a never_delete list of pathnames that shouldn't be deleted at any condition.

_never_delete_add_path

Adds a path to the given never_delete list.

_delete_once_empty_contains

Checks if the given path is contained in the delete_once_empty

_delete_once_empty_add_path

Adds a path to the given delete_once_empty.

_never_delete_contains

Checks if the given path is contained in the never_delete.

_path_check

Checks up the given path, and returns its absolute representation.

_build_plan

Plans the actions to be executed on the files in the target path according to:

 - options in the configuration
 - the target files
 - the never_delete

All files in the never_delete list can't be deleted.

_plan_add_actions

Given a path to a file and the task configuration options, augment the plan with actions to take on that file.

Returns the array containing one or more actions performed.

These actions are meant to be performed in reverse sequence on the given file. An empty array_ref is returned if no action is to be performed on the given file.

A returned action can be one of: delete, backup.

Resulting actions are decided according to one or more of the followings:

 - options in the configuration
 - the target files
 - the never_delete

This method works under the assumption that the specified file or directory exists and the user has full permissions on it.

_plan_add_action

Adds the given action to the plan.

_is_folder_empty

Returns 1 if the given folder is empty.

_execute_plan

Execute a plan based on the given task options. Blacklist is passed to make sure once again that no unwanted files or directories are deleted.

_refine_plan

Takes into account symlinks in the current plan.

The refinement is done in the following way:

1) Go through the plan, and look for symlink targets.

2) Mark any symlink with as the action of it's target if it's in the cleanup directory: keep the symlink if its target is kept, delete otherwise (broken symlinks, or pointing outside the cleanup, target is being backupped...). While deciding this, build an hashref of { symlink_parent (canonical) => symlink_path (non_canonical) }.

3) Add the symlink to the plan in the correct position. To do this, build another 'refined' plan. - go hrough the pathnames (visits parents first) in the plan, pop each item. - if the parent of a marked symlink is found, do the following: * mark it as 'delete' if the symlink is going to be deleted. or mark it as 'nothing' if the symlink is not going to be deleted. * push the parent in the refined plan. * push the symlink in the refined plan.

4) Fix the plan to have consistent state (bubble up states between pairs of directories)

Return the refined plan.

Get the parent path of a given path. This method only accesses the disk if the f_path is found to have no parent directory (i.e., just the relative file name has been specified). In this case, we check that the current working directory contains the given file. If yes, we return the current working directory as the parent of the specified file. If not, we return undef.

Given a path to a symlink and a hash reference, keep the symlink target as a key of the hash reference (canonical path), and the path to the symlink (non canonical) as the corresponding value. Because multiple symlinks can point to the same target, the value of this hashref is an arrayref of symlinks paths.

Returns true on success, or false if a path to something else than a symlink is passed to this method.

_fix_pattern

Refine a pattern passed from the configuration.

Currently applyes the following transformation: - Remove any "/" in case the user has specified a pattern in the form of /pattern/.

AUTHOR

Savio Dimatteo, <savio at lokku.com>

BUGS

Please report any bugs or feature requests to bug-file-cleanuptask at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=File-CleanupTask. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc File::CleanupTask

You can also look for information at:

ACKNOWLEDGEMENTS

Thanks Alex for devising the original format of a .tasks file and offering me the opportunity to publish this work on CPAN.

Thanks Mike for your feedback about canonical paths detection.

Thanks David for reviewing the code.

Thanks #london.pm for helping me choosing the name of this module.

LICENSE AND COPYRIGHT

Copyright 2012 Savio Dimatteo.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.