The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

HPC::Runner::Command::submit_jobs::Utils::Scheduler

Command Line Options

#TODO Move this over to docs

config

Config file to pass to command line as --config /path/to/file. It should be a yaml or other config supplied by Config::Any This is optional. Paramaters can be passed straight to the command line

example.yml

    ---
    infile: "/path/to/commands/testcommand.in"
    outdir: "path/to/testdir"
    module:
        - "R2"
        - "shared"

infile

infile of commands separated by newline. The usual bash convention of escaping a newline is also supported.

example.in

    cmd1
    #Multiline command
    cmd2 --input --input \
    --someotherinput
    wait
    #Wait tells slurm to make sure previous commands have exited with exit status 0.
    cmd3  ##very heavy job
    newnode
    #cmd3 is a very heavy job so lets start the next job on a new node

jobname

Specify a job name, and jobs will be 001_jobname, 002_jobname, 003_jobname

Separating this out from Base - submit_jobs and execute_job have different ways of dealing with this

max_array_size

use_batches

The default is to submit using job arrays.

If specified it will submit each job individually.

Example:

#HPC jobname=gzip #HPC commands_per_node=1 gzip 1 gzip 2 gzip 3

Batches: sbatch 001_gzip.sh sbatch 002_gzip.sh sbatch 003_gzip.sh

Arrays:

sbatch --array=1-3 gzip.sh

afterok

The afterok switch in slurm. --afterok 123 will tell slurm to start this job after job 123 has completed successfully.

no_submit_to_slurm

Bool value whether or not to submit to slurm. If you are looking to debug your files, or this script you will want to set this to zero. Don't submit to slurm with --no_submit_to_slurm from the command line or $self->no_submit_to_slurm(0); within your code

template_file

actual template file

One is generated here for you, but you can always supply your own with --template_file /path/to/template

serial

Option to run all jobs serially, one after the other, no parallelism The default is to use 4 procs

use_custom

Supply your own command instead of mcerunner/threadsrunner/etc

Internal Attributes

scheduler_ids

Our current scheduler job dependencies

job_stats

Object describing the number of jobs, number of batches per job, etc

deps

Call as

    #HPC deps=job01,job02

current_job

Keep track of our currently running job

current_batch

Keep track of our currently batch

template

template object for writing slurm batch submission script

cmd_counter

keep track of the number of commands - when we get to more than commands_per_node restart so we get submit to a new node. This is the number of commands within a batch. Each new batch resets it.

batch_counter

Keep track of how many batches we have submited to slurm

array_counter

Keep track of how many batches we have submited to slurm

job_counter

Keep track of how many jobes we have submited to slurm

batch

List of commands to submit to slurm

jobs

Contains all of our info for jobs

    {
        job03 => {
            deps => ['job01', 'job02'],
            schedulerIds => ['123.hpc.inst.edu'],
            submitted => 1/0,
            batch => 'String of whole commands',
            cmds => [
                'cmd1',
                'cmd2',
            ]
        },
        schedule => ['job01', 'job02', 'job03']
    }

graph_job_deps

Hashref of jobdeps to pass to Algorithm::Dependency

Job03 depends on job01 and job02

    { 'job03' => ['job01', 'job02'] }

Subroutines

Workflow

There are a lot of things happening here

parse_file_slurm #we also resolve the dependency tree and write out the batch files in here schedule_jobs iterate_schedule

    for $job (@scheduled_jobs)
        (set current_job)
        process_jobs
        if !use_batches
            submit_job #submit the whole job is using job arrays - which is the default
        pre_process_batch
            (current_job, current_batch)
            scheduler_ids_by_batch
            if use_batches
                submit_job
            else
                run scontrol to update our jobs by job array id

run

check_jobname

Check to see if we the user has chosen the default jobname, 'job'

check_add_to_jobs

Make sure each jobname has an entry. We set the defaults as the global configuration.

increase_jobname

Increase jobname. job_001, job_002. Used for graph_job_deps

check_files

Check to make sure the outdir exists. If it doesn't exist the entire path will be created

iterate_schedule

Iterate over the schedule generated by schedule_jobs

iterate_deps

Check to see if we are actually submitting

Make sure each dep has already been submitted

Return job schedulerIds

post_process_jobs

process_jobs

pre_process_batch

Go through the batch, add it, and see if we have any tags

scheduler_ids_by_batch

When defining job tags there is an extra level of dependency

scheduler_ids_by_array

index_in_batch

Using job arrays each job is divided into one or batches of size self->max_array_size

max_array_size = 10 001_job.sh --array=1-10 002_job.sh --array=10-11

    self->jobs->{a_job}->all_batch_indexes

    job001 => [
        {batch_index_start => 1, batch_index_end => 10 },
        {batch_index_start => 11, batch_index_end => 20}
    ]

The index argument is zero indexed, and our counters (job_counter, batch_counter) are 1 indexed

work

Process the batch Submit to the scheduler slurm/pbs/etc Take care of the counters

process_batch()

Create the slurm submission script from the slurm template Write out template, submission job, and infile for parallel runner

process_template

process_batch_command

splitting this off from the main command

create_version_str

If there is a version add it

submit_to_scheduler

Submit the job to the scheduler.

Inputs: self, submit_command (sbatch, qsub, etc)

Returns: exitcode, stdout, stderr

This subroutine was just about 100% from the following perlmonks discussions. All that I did was add in some logging.

http://www.perlmonks.org/?node_id=151886