Array Jobs on Fox
To run many instances of the same job, use the --array
switch to
sbatch
. This is useful if you have a lot of data-sets which you
want to process in the same way:
sbatch --array=from-to [other sbatch switches] YourScript
You can also put the --array
switch in an #SBATCH
line inside the
script. from and to are the first and last task number. Each
instance of YourScript
can use the environment variable
$SLURM_ARRAY_TASK_ID
for selecting which data set to use, etc. (The
queue system calls the instances "array tasks".) For instance:
sbatch --array=1-100 MyScript
will run 100 instances of MyScript
, setting the environment variable
$SLURM_ARRAY_TASK_ID
to 1, 2, ..., 100 in turn.
It is possible to specify the task ids in other ways than from-to
:
it can be a single number, a range (from-to
), a range with a step
size (from-to:step
), or a comma separated list of these. Finally,
adding %max
at the end of the specification puts a limit on how many
tasks will be allowed to run at the same time. A couple of examples:
Specification Resulting SLURM_ARRAY_TASK_IDs
1,4,42 # 1, 4, 42
1-5 # 1, 2, 3, 4, 5
0-10:2 # 0, 2, 4, 6, 8, 10
32,56,100-200 # 32, 56, 100, 101, 102, ..., 200
1-200%10 # 1, 2, ..., 200, but maximum 10 running at the same time
Note: spaces, decimal numbers or negative numbers are not allowed.
The instances of an array job are independent, they have their own $SCRATCH and are treated like separate jobs.
To cancel all tasks of an array job, cancel the jobid that is returned
by sbatch
.
A small, but complete example (for a normal job):
#!/bin/bash
#SBATCH --account=YourProject
#SBATCH --time=1:0:0
#SBATCH --mem-per-cpu=4G
#SBATCH --array=1-200
set -o errexit # exit on errors
set -o nounset # treat unset variables as errors
module --quiet purge # clear any inherited modules
DATASET=dataset.$SLURM_ARRAY_TASK_ID
OUTFILE=result.$SLURM_ARRAY_TASK_ID
YourProgram $DATASET > $OUTFILE
Submit the script with sbatch <script_file_name>
. This job will
process the datasets dataset.1
, dataset.2
, ..., dataset.200
and
put the results in result.1
, result.2
, ..., result.200
.
CC Attribution: This page is maintained by the University of 探花精选 IT FFU-BT group. It has either been modified from, or is a derivative of, "" by NRIS under . Changes: Removed download link.