The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

    HPCI::Stage;

SYNOPSIS

Role for building a cluster-specific driver for a stage. A stage is the object that describes a single job to be executed within the context of a group. It will be a member of a group object, which will be responsible for ensuring that the stage is executed after any prerequisite stages, and before any dependent stages. Most of the activity of a stage will be cluster-specific.

This role defines the generic, cluster-agnostic interface for a stage, hiding the details of the specific cluster implementation. A driver which consumes this role must provide methods for this interface, translating it into the specific implementation that best fits the specific type of cluster that it supports. The driver can also provide method for accessing facilities provided by its cluster type that don't fit into the generic interface (but, of course, any program that uses these methods will be less able to move to using a different type of cluster)

A (internally defined) cluster stage is defined with:

    package HPCD::$cluster::Stage;
    use Moose;

    # define the required attributes/methods here

    with 'HPCI::Stage';

    # define the additional attributes/methods here

DESCRIPTION

This role provides the generic interface for a stage object, which can be used to control a single stage. A stage is the unit of job execution that a grroup can schedule and assigne to a node within a cluster.

A cluster-specific stage object definition can consume this role to ensure that it provides the interface needed by a HPCI using that type of cluster to run a group of stages.

New objects of the cluster-specific type are not created by the user code - instead the user works through a HPCI group object to ensure that any stage objects that it includes within the group are of a cluster-specific compatible form to the cluster-specific group object, and to allow the group to register the stage so that it can be managed.

In fact, user code will generally not need to access the stage object itself at all. References to it will usually be made by passing its name to methods of the group object, rather than handled directly.

ATTRIBUTES

cluster (internally provided)

The type of cluster that will be used to execute the group of stages. This value is passed on by the $group->stage method when it creates a new stage. Since it also uses that value to select the type of stage object that is created, it is somewhat redundant.

name

The name of this stage within the group. The name must be unique within the group, most group interactions use the name rather than a reference to the stage object to refer to the stage.

command

The command to be executed on the cluster. This command will quite possibly be wrapped in a cluster-specific manner to pass the command and environment info to the target cluster node.

The command can either be provided as an explicit parameter when the stage is created, or by using one of the set_*_cmd methods described below after the stage object is created.

This is the only time that you need the stage object itself rather than just referring to it by its name. However, even here, unless you need to separate the creation of the stage object from the point where you specify the command you can do something like:

    my $group = HPCI->group( ...);

    ...

    $group->stage( name => 'thisstage', ... )
        ->set_perl_cmd(
            $program,
            f    => undef,       # a       flag argument:  -f
            flag => undef,       # another flag argument:  --flag
            v    => $value,      # a       value argument: -v $value
            val  => $value,      # another value argument: --val $value
            l    => [ qw(a b c}] # a       list argment:   -l a -l b -l c
            list => [ qw(a b c}] # another list argment:   --list a --list b --list c
            '--' => [ f1 f2 ]    # non-keyed argument(s):  f1 f2
        );
    # set the command to: perl $program -f --flag -v $value --val $value \
    #    -l a -l b -l c --list a --list b --list c f1 f2

As an alternative to the command attribute, you can specify the code attribute described below. It is required that one of these be provided before the group tries to execute this stage.-, and only one of them can be provided (no9t both)

code

The code attribute is an alternate to the command attribute described above. Its value is a code reference, which will be called directly to carry out the activity of the Stage. You can use this for a stage activity that is simple enough to code as a perl routine that it does not make sense to go through the overhead of creating a new job on a cluster node to do this activity. You may not specify both code and command for the same stage; but you must specify one of them before the stage is ready to execute.

The code in the reference is called with no arguments when the stage is ready to be executed. The return value is treated as a boolean to indicate whether the stage succeeded: a false value for success, or a true value for failure (with the value usually being a text message describing the cause of the failure). This mimics the exit_status that is returned from a separately executed program (except for being able to provide a more easily understood failure code than a simple integer) - however, it does mean that some common purposes for this attribute will need some wrapping code. For example, any code that is just providig a system call (e.g. unlink, link, rename) will need to invert the boolean result, and expand the failure result to provide the errno value, such as:

    stage(
        ...
        code => sub {
             unlink $tmpfile ? 0 : $!
        },
        ...
    )

resources_required

A hash that maps resource names to the amount of that resource that is needed.

TODO: Describe the generic resource names, the value types that can be used, and the non-default cluster-specific names that can be used instead.

retry_resources_required

A hash that maps resource names to a list of amounts of that resource that might be required.

If the cluster-specific interface detects that a runs fails because not enough of a particular resource was requested, then the next larger amount in the retry_resources_required list for that resource type is used and the stage is retried.

At present, this is only used for SGE clusters, and only for the mem (or SGE-specific h_vmem) resource. SGE provides a default value of:

    retry_resources_required => { h_vmem => [qw(2G 4G 8G 16G 32G)] }

When you are first running a job and don't know how much memory it will require, you can specify a low value. When it completes, you can check the log to see how much memory it ended up needing, and use that as the starting parameter for future runs. That initial run will have possible had to run part way through, fail, and then retry a number of times. The alternative of providing a large memory allocation the first time takes resources from your cluster (and especially if you never get around to changing the initial setting - people are more likely to remember to make such a change when it will save them time by avoiding retries than when it just happens to reserve more space than is actually used).

native_args_string

The native program mechanism used by the driver to submit a stage for execution can have many arguments. Some of those will be provided automatically from the generic HPCI attributes, but there may be extra capabilities that do not match the standard HPCI definition.

Using these extra capabilities can lead to non-portable programs, unless you are careful. HPCI provides two ways to be careful but also allows you to be careless.

If you (carelessly) provide this attribute directly, then it will be used for any type of cluster that your program is run on - using such cluster-specific parameters can have dangerous or obscure effects when applied to a different cluster type.

You can be careful, by providing this attribute only within an attribute set included in the cluster_specific attribute. The alternate way to be careful is to provide this attribute indirectly, using the cluster-specific name which the driver specifies as the native_args_string_name attribute.

The value provided to native_args_string will be passed to the native submissions mechanism when-ever that makes sense. A driver that uses a submission mechanism that does not provide for any non-standard HPCI capabilities may choose to ignore this attribute.

So, for example, if you wished to provide extra args for the qsub command when running on an SGE cluster you could code that as either:

    $group->stage(
        name => 'stage_name',
        ...
        cluster_specific => {
            SGE => { native_args_string => '-q myqueue -m beas' },
            ... # put args strings for other cluster types here
        },
        ...
    );

or as:

    $group->stage(
        name => 'stage_name',
        ...
        extra_sge_args_string => '-q myqueue -m beas',
        ... # put args strings for other cluster types here
            # using the appropriate cluster-specific attribute name
        ...
    );

native_args_string_name

This attribute is one way in which a driver can help you to use native arguments in a portable way. Drivers that actually use the native_args_string attribute will specify a value for this attribute internally - it is never provided directly as a Stage attribute. If this attribute is provided by the driver, then only the value of the attribute it names will be used for the source text of native_args_string.

_native_args_string_parsing_info

This internal attribute is provided by the driver. It specifies how the native_args_string is parsed, controlling how it is separated into parameters and values, which parameters are normally provided by altenate HPCI mechanisms, how the parameters and values are displayed in the log (if it is different from how they are inserted into the command line for execution), etc.

Only people writing drivers need to worry about this attribute.

native_args* support not implemented yet

The attributes related to native_args_string processing are not yet ready for use; they are being implemented by copying and generalizing the code from the SGE and Slurm drivers, that is still in progress.

group

The group that this stage belongs to is automatically provided to the stage creation. You don't need to initialize it, and since you usually will not be working with Stage objects directly, you won't have much need to use it either.

storage_class (internal)

The name of key to use to select a storage class from the storage_classes attribute for files that do no have an explicit class given.

Defaults to the value of the group's storage_class attribute, (which defaults to 'default') - this value is provided automatically by the group's stage method if no explicit storage_class value is provided for the stage.

files

A hash that can contain lists of files.

Throughout this hash, there are filenames contained within hash elements that describe the processing required for that file. Whenever a filename is needed, it can either be a string containing a pathname, or it can be an HPCI::File object (or subclass), or ot can be a HashRef. Often, it will be the string form, which will be converted to an object internally.

The top level of the hash has keys 'in' (for input), 'out' (for output), 'skipstage', 'rename', and 'delete'. (The same file might be listed under multiple keys.)

The values for these keys are:

'in'

a hashref with possible keys:

    'req' (for required input files)
    'opt' (for required output files)

The value for either of these can be either a filename or a list of filenames.

'out'

a hashref with possible keys:

    'req' (for required output files)
    'opt' (for required output files)

The value for either of these can be either a filename or a list of filenames.

'skipstage'

either:

  • an arrayref

  • a hashref with the keys 'pre' (for pre-requisites) and 'lists'

The arrayref (either the arrayref value of 'skipstage' or the arrayref value for the 'lists' hash element) can contain either a list of files, or a hashref with keys 'pre' and 'files'.

The 'pre' value (if present) at the top level is a list of files which are pre-requisites for all of the lists. If a list has its own 'pre' list, those files are only pre-requities for the files in that list.

'rename'
    a list of pairs of filenames

The file named as the first element in each pair (if it exists) is renamed to the second filename in the pair. It is not considered an error for the first file in a pair does not exist - if you want to ensure that a file exists, include it as an 'out'->'req' file as well.

'delete'

can be either:

    a scalar filename
    a list of filenames

These will be removed if the stage completes successfully. It is not considered an error if any of these files does not exist - include them in the 'out'->'req' files list if you wish to ensure that they do.

The contents are used at various times:

the stage is ready to be executed
  • if a 'skipstage' key is present then checking is done to decide whether the stage needs to be executed or can be skipped (treating it as a successful completion)

    the main content of this key is a list of lists of filenames (the target files) - if any of these lists has all of its files existing, then the stage can be skipped

    if there is a top level and/or a list level 'pre' list, then all of the files in the pre list(s) must also exist and be older than the target files (the files in the top level 'pre' list are checked against all of the target lists, the files in a target level 'pre' list are only checked against that target).

    skipstage checking is always done by the parent process, in hopes of avoiding the need to create the stage.

  • all 'in'->'req' files must exist, if any is missing, the stage is aborted. If the files exist, then the child stage will be set up (if needed) to download those files from the long-term storage.

  • all 'in'->'opt' are checked by the parent. If any exists, then the child stage will be set up (if needed) to download them from the long-term storage.

the stage has completed execution
  • all 'out'->'req' files must exists and they must have been updated during the execution of the stage (otherwise the stage is treated as failing)

  • any 'out'->'opt' files which exist must have been updated during the execution of the stage (otherwise the stage is treated as failing)

  • clusters that require special treatment of files can take copying actions to collect any 'out' files that have been updated

  • if the stage completed successfully, any files lists as 'delete' are removed

state, is_ready, is_blocked, is_pass, is_fail

These states are mostly for internal use during execution. (There are actually more states than are listed here but they are only meaningful during execution when the calling program doesn't have access to the state, and they are subject to change without notice.)

However, before execute is called, a stage can be either ready or blocked (depending upon whether any other stage has been noted as a pre-requisite).

After execution completes, a stage will be in either pass or fail state (and the is_complete attribute will always be true). This is probably the only significant setting to check from user code.

The state attribute is a string value, but it can be tested using the is_STATE methods, each of which returns true if the stage is in the specified state.

verify_completion_state

If the user wishes to verify whether a stage completed successfully, failed, or should be retried, this attribute can be given a coderef to code that checks the result.

The code will be called with the arguments:

  $coderef->( $stats, $stdout, $stderr, $state )
    $stats  is a hashref containing the accounting info and status of the job
    $stdout is the pathname of the stdout file
    $stderr is the pathname of the stderr file
    $state  is the state determined by HPCI
                   (only pass/fail - retry has not been decided yet)

The code can use these (and knowledge provided by the user about the stage operation) to test whether the stage succeeded.

To select a specific state, the return value from the code should be a string, one of:

    pass
    fail
    retry

If your function does not wish to select a state, it can return undef, and the same state that would be been chosen without the call to this function will be used unchanged.

If the value returned is 'fail', other standard HPCI mechanisms may still apply that might change the choice to 'retry'.

If the value returned is 'retry', it will either retry the stage or be changed to fail state, depending upon whether the limit for the choose_retries attribute has been reached.

This code attribute could be used for:

- running a program that always gives an error status, but you might wish to treat it as successful

- running a program that can give a zero status but not actually succeed (but perhaps a retry will succeed)

An example might look like:

    $group->stage(
        ....
        verify_completion_status => sub {
            my( $stats, $stdout, $stderr, $state ) = @_;
            # retry if output file is zero length
            return 'retry' if $state eq 'pass' && ! -s $my_output_file;
            # otherwise, normal processing
            return;
            },
        ...
        );

should_choose_retry, choose_retries

If a cluster has a tendency to fail randomly, it may be desired to retry a stage that fails rather than simply treating it as an unrecoverable failure.

This pair of attributes specifies two aspects of making such a choice.

First, the should_choose_retry attribute can be given a code reference to code that returns true if a retry should be made. This code is only called if the stage run has failed.

The sub is called with two arguments: the status hash containing the known status results from the most recent try to run the stage, and a string with the name of the stderr output file. This code returns a true value to indicate that a retry should be attempted. If this attribute is not explicitly provided, it defaults to code that always returns 0 to not cause a retry attempt.

Second, the choose_retries attribute specifies number of times that the stage should be retried at the request of the should_choose_retry code (or at the request of the verify_completion_status code). The default is 1.

force_retries

If a cluster has a tendency to fail randomly, it may be desired to retry a stage that fails rather than simply treating it as an unrecoverable failure.

This attribute specifies the number of times that the stage should be retried until it passes. The default is to not retry (unless the underlying driver retries for a specific reason).

There are three types of retry selections. First, the devide driver can provide automatic retries for considitions appropriate to that cluster type. Second, the retries selected by the should_choose_retry attribute can cause a retry if the user-provided code chooses. Finally, the force_retries attribute can cause a retry. Each of these three is independent - any of them can cause a retry regardless of what the others would select. A stage only is deemed to have failed completely if none of the methods selects a retry. The counters for each type of retry test is only adjusted if that test was the one that caused a retry so there can be as amny retries as the total of the number allowed by each type.

failure_action ('abort_group', 'abort_deps'*, or 'ignore')

Specifies the action to take if (the final retry of) this stage fails (terminates with non-zero status).

There are three string values that it can have:

- abort_deps (default)

If the stage fails, then any stages which depend upon it (recursively) are not run. The group continues executing stages which are not dependent upon this stage.

- abort_group

If the stage fails, then no other stages are started. The group simply waits until previously started stages complete and then returns.

- ignore

Execution continues unchanged, any dependent stages will be run when they are no longer blocked.

METHODS

command setting methods

The following command setting methods can be used as an alternative to providing the command as a string.

They each have the same parameter list:

    $stage->set_FOO_cmd( $command,
        $key1, $value1,
        $key2, $value2,
        ...
    );

There are three types of keys - keys with a single letter, longer keys, and '--'. Single letter keys will be prefixed with a single dash; longer keys will be prefixed with a double dash. The special key '--' (two dashes) contains the non-option command line arguments.

The values can either be a scalar or an arrayref. The way array values are expanded depends upon the choice of command setting method used.

The argparse method (used for python and R) sets lists by putting all the list variables after a single option word:

    --infiles file1 file2 file3

The gnuparse method (used for Perl) uses the standard Unix and gnu command way of setting lists by repeating the option word:

    --infiles file1 --infiles file2 --infiles file3

In the case of the interpreter methods (set_python_cmd, set_perl_cmd, set_R_cmd), they simply prepend the appropriate interpreter to the provided script name, and invoke the right parsing method.

set_argparse_cmd

Set the command attribute to execute a command taking Python's argparse-style arguments.

set_gnu_cmd

Set the command line arguments for a command taking GNU's getlongopt-style arguments.

set_perl_cmd

Wrapper around set_gnu_cmd that prefixes the Perl interpreter.

set_python_cmd

Wrapper around set_argparse_cmd that prefixes the Python interpreter.

set_r_cmd

Wrapper around set_argparse_cmd that prefixes the R interpreter.

AUTHOR

Christopher Lalansingh - Boutros Lab

Andre Masella - Boutros Lab

John Macdonald - Boutros Lab

ACKNOWLEDGEMENTS

Paul Boutros, Phd, PI - Boutros Lab

The Ontario Institute for Cancer Research