Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 222222
Loughborough University

IT Services : High Performance Computing

Running OpenMP Jobs


Entire node dedicated to a single job

OpenMP is much more loosely coupled between SLURM and the parallel environment.

OpenMP allows parallelism only within a node and so the best option is to specify a single node in the script and the appropriate number of parallel threads via OMP_NUM_THREADS in the script and CPUs per task equal to the number of cores on the target node. Because the coupling is loose SLURM doesn't really enforce the CPU bindings, and so in node exclusive mode as operated on hydra you can simply specify the use of one node and then grab the number of cores on the node the job runs on from SLURM_JOB_CPUS_PER_NODE.

       
#!/bin/bash
#SBATCH --time=00:1:00
#SBATCH --job-name=openmpstuff
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=compute-20
#SBATCH --nodes=1
#SBATCH --account=ITTEST
#SBATCH --mail-type=ALL
#SBATCH --mail-user=a.turner@lboro.ac.uk

export OMP_NUM_THREADS=$SLURM_JOB_CPUS_PER_NODE

program.exe
      

The above example runs on the 20 core section of nodes but the script, apart from the partition definition is generic, and the partition could be specified instead on the sbatch command line.

A variation on the above is if you want to run some sort of ensemble run in which there is random variation involved. In this case you can specify more than one node and then use mpirun to start an identical copy of the program on each node. However you need to be careful that the programs only write out data to locations that are per-node else results from one node's program will be ovewritten by another. In this case setting cpus-per-task to a specific number and submitting to a specific partition is wise to avoid too many MPI jobs being spawned, as you want only one MPI job per node

#!/bin/bash
#SBATCH --time=00:1:00
#SBATCH --job-name=openmpstuff
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=compute-12
#SBATCH --nodes=2
#SBATCH --cpus-per-task=12
#SBATCH --account=ITTEST
#SBATCH --mail-type=ALL
#SBATCH --mail-user=a.turner@lboro.ac.uk

export OMP_NUM_THREADS=$SLURM_JOB_CPUS_PER_NODE

mpirun program.exe
      

The above example runs on the 12 core section of nodes. Note that here the number of cpus per task and the partition must match.

Running when the number of cores required is less than a node

Use an allocation (see Allocations ) and then submit a job to this with sbatch into this.

The following example assumes that four core jobs are to be run on any of the nodes, assuming you want the allocation to last for 10 hours: salloc --account=someaccountname --nodes=1 --cpus-per-task=4 --partition=compute --time=10:0:0

Use a job script like the following (modified from above):

    
#!/bin/bash
#SBATCH --time=10:0:0
#SBATCH --job-name=openmpstuff
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --nodes=1
#SBATCH --account=ITTEST
#SBATCH --mail-type=ALL
#SBATCH --mail-user=a.turner@lboro.ac.uk

export OMP_NUM_THREADS=`expr $SLURM_JOB_CPUS_PER_NODE / $SLURM_TASKS_PER_NODE`

program.exe $*
      

In the above the number of tasks per node will be calculated from SLURM by the number of cpus per task you specified for the allocation and the total number of cores available on the node reduced to get back to the number four and is assigned to OMP_NUM_THREADS to control the number of threads spawned.

Use sbatch --jobid=number script.sh args where number is the job id allocated to the allocation created with salloc, script.sh is the name given to the script above, and args are arguments to be passed to program.exe.

If you wished to use roughly 5 core jobs you could do so, but on a 12 core node SLURM_TASKS_PER_NODE would be set to 12/5, rounded down (i.e. 2). Then the calculation for the number of cores per job would be 12/2, or 6. This is more than the 5 asked for but ensures no cores are wasted and should normally allow the program to complete sooner. If absolutely 5 are required then OMP_NUM_THREADS can simply be set directly to 5 in the job script. However, check the maths to determine if you will have wasted cores and try to avoid this.

If you have a limited number of jobs to run you can simply use sbatch to submit a suitable number, but take care not to over or underallocate nodes. Doing this by hand generally requires picking an explicit compute partition to avoid firing in too many jobs to an allocation. Alternatively you can use array jobs. (See Array Jobs ).

You should ensure that an allocation is deleted after it is finished. See the information on Allocations