HPC::Runner::Command::submit_jobs::Utils::Scheduler
Command Line Options
#TODO Move this over to docs
config
Config file to pass to command line as --config /path/to/file. It should be a yaml or other config supplied by Config::Any This is optional. Paramaters can be passed straight to the command line
example.yml
---
infile: "/path/to/commands/testcommand.in"
outdir: "path/to/testdir"
module:
- "R2"
- "shared"
infile
infile of commands separated by newline. The usual bash convention of escaping a newline is also supported.
example.in
cmd1
#Multiline command
cmd2 --input --input \
--someotherinput
wait
#Wait tells slurm to make sure previous commands have exited with exit status 0.
cmd3 ##very heavy job
newnode
#cmd3 is a very heavy job so lets start the next job on a new node
jobname
Specify a job name, and jobs will be 001_jobname, 002_jobname, 003_jobname
Separating this out from Base - submit_jobs and execute_job have different ways of dealing with this
max_array_size
use_batches
The default is to submit using job arrays.
If specified it will submit each job individually.
Example:
#HPC jobname=gzip #HPC commands_per_node=1 gzip 1 gzip 2 gzip 3
Batches: sbatch 001_gzip.sh sbatch 002_gzip.sh sbatch 003_gzip.sh
Arrays:
sbatch --array=1-3 gzip.sh
afterok
The afterok switch in slurm. --afterok 123 will tell slurm to start this job after job 123 has completed successfully.
no_submit_to_slurm
Bool value whether or not to submit to slurm. If you are looking to debug your files, or this script you will want to set this to zero. Don't submit to slurm with --no_submit_to_slurm from the command line or $self->no_submit_to_slurm(0); within your code
template_file
actual template file
One is generated here for you, but you can always supply your own with --template_file /path/to/template
serial
Option to run all jobs serially, one after the other, no parallelism The default is to use 4 procs
use_custom
Supply your own command instead of mcerunner/threadsrunner/etc
Internal Attributes
scheduler_ids
Our current scheduler job dependencies
job_stats
Object describing the number of jobs, number of batches per job, etc
deps
Call as
#HPC deps=job01,job02
current_job
Keep track of our currently running job
current_batch
Keep track of our currently batch
template
template object for writing slurm batch submission script
cmd_counter
keep track of the number of commands - when we get to more than commands_per_node restart so we get submit to a new node. This is the number of commands within a batch. Each new batch resets it.
batch_counter
Keep track of how many batches we have submited to slurm
array_counter
Keep track of how many batches we have submited to slurm
job_counter
Keep track of how many jobes we have submited to slurm
batch
List of commands to submit to slurm
jobs
Contains all of our info for jobs
{
job03 => {
deps => ['job01', 'job02'],
schedulerIds => ['123.hpc.inst.edu'],
submitted => 1/0,
batch => 'String of whole commands',
cmds => [
'cmd1',
'cmd2',
]
},
schedule => ['job01', 'job02', 'job03']
}
graph_job_deps
Hashref of jobdeps to pass to Algorithm::Dependency
Job03 depends on job01 and job02
{ 'job03' => ['job01', 'job02'] }
Subroutines
Workflow
There are a lot of things happening here
parse_file_slurm #we also resolve the dependency tree and write out the batch files in here schedule_jobs iterate_schedule
for $job (@scheduled_jobs)
(set current_job)
process_jobs
if !use_batches
submit_job #submit the whole job is using job arrays - which is the default
pre_process_batch
(current_job, current_batch)
scheduler_ids_by_batch
if use_batches
submit_job
else
run scontrol to update our jobs by job array id
run
check_jobname
Check to see if we the user has chosen the default jobname, 'job'
check_add_to_jobs
Make sure each jobname has an entry. We set the defaults as the global configuration.
increase_jobname
Increase jobname. job_001, job_002. Used for graph_job_deps
check_files
Check to make sure the outdir exists. If it doesn't exist the entire path will be created
iterate_schedule
Iterate over the schedule generated by schedule_jobs
iterate_deps
Check to see if we are actually submitting
Make sure each dep has already been submitted
Return job schedulerIds
post_process_jobs
process_jobs
pre_process_batch
Go through the batch, add it, and see if we have any tags
scheduler_ids_by_batch
When defining job tags there is an extra level of dependency
scheduler_ids_by_array
index_in_batch
Using job arrays each job is divided into one or batches of size self->max_array_size
max_array_size = 10 001_job.sh --array=1-10 002_job.sh --array=10-11
self->jobs->{a_job}->all_batch_indexes
job001 => [
{batch_index_start => 1, batch_index_end => 10 },
{batch_index_start => 11, batch_index_end => 20}
]
The index argument is zero indexed, and our counters (job_counter, batch_counter) are 1 indexed
work
Process the batch Submit to the scheduler slurm/pbs/etc Take care of the counters
process_batch()
Create the slurm submission script from the slurm template Write out template, submission job, and infile for parallel runner
process_template
process_batch_command
splitting this off from the main command
create_version_str
If there is a version add it
submit_to_scheduler
Submit the job to the scheduler.
Inputs: self, submit_command (sbatch, qsub, etc)
Returns: exitcode, stdout, stderr
This subroutine was just about 100% from the following perlmonks discussions. All that I did was add in some logging.
http://www.perlmonks.org/?node_id=151886