Cook Book: Job Arrays

Job arrays allow you to create many jobs from one qsub command. You generate them by calling qsub with the -t parameter to which you can pass comma delimited lists or ranges as explained here.

Simple Job Arrays

The following simple job script will create 10 identical jobs which print the id in the array. The ids are 1, 2, …, 10.
#!/bin/bash
#PBS -t 1-10
#PBS -N array_test
#PBS -d /home/<USER>/tmp
#PBS -o testjob.out
#PBS -e testjob.err
#PBS -M <EMAIL ADDRESS>
#PBS -l walltime=00:01:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=10mb

echo ${PBS_ARRAYID}

We can submit this with:
$ qsub job_script.sh

You can control which job ids to generate by specifying the -t parameter from the outside. For example, this script is identical to the one above but does not contain a #PBS -t line.
#!/bin/bash
#PBS -d /home/<USER>/tmp
#PBS -o testjob.out
#PBS -e testjob.err
#PBS -M <EMAIL ADDRESS>
#PBS -l walltime=00:01:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=10mb

echo ${PBS_ARRAYID}

We can now submit this with
$ qsub -t 10,33-44 job_script.sh

Job Arrays And Strongly Differing Jobs

While sometimes it might be easy to infer the part of work to do from a numeric id (i.e. the id of a matrix tile), it might get harder in some cases. One good way is to first create many job script files which contain the work to do for one number. Such scripts can be called job_${JOB_ID}.sh, for example.

In our example, we create a script gen.sh which contains the following:
#!/bin/bash

JOB=0

for i in `seq 10`; do
    for animal in dog cat cow; do
        let "JOB=${JOB}+1"
        echo "#!/bin/bash" > job_${JOB}.sh
        echo "echo 'The ${i}th ${animal} says Hello.'" >> job_${JOB}.sh
    done
done

Executing this script now gives us 10 files called job_1.sh through job_10.sh.

We then write a wrapper script to call these files:
#!/bin/bash
#PBS -d /home/<USER>/tmp
#PBS -o testjob.out
#PBS -e testjob.err
#PBS -M <EMAIL ADDRESS>
#PBS -l walltime=00:01:00
#PBS -l nodes=1:ppn=1
#PBS -l mem=10mb

date
hostname
echo "PBS_ARRAYID=${PBS_ARRAYID}"
bash job_${PBS_ARRAYID}.sh
date

Note that printing the hostname is a good idea since it gives you somewhat of an idea of which of your jobs are executed on which node. This allows you to infer whether too many jobs were execute concurrently if they run multithreaded. Printing the date is a good idea to get an idea of how long your job ran.

The script above can now be called with different settings for -t:
# Launch jobs 1-10.
$ qsub -t 1-10 job_script.sh
# Launch all jobs.
$ qsub -t 1-30 job_script.sh
# Launch selective jobs.
$ qsub -t 1,3,5,13,19 job_script.sh