Batch Cluster Usage

When you log into the cluster, you are connecting to the cluster’s head node. For computational work users should subsequently access the compute nodes by using the batch scheduling system.

The batch scheduling system Slurm allows users to submit job requests using the sbatch command. The jobs will run when resources become available. All output is captured to file by the batch job system, and users can request e-mail notifications when jobs start or end.

To submit a batch job, a user must create a short text file called a batch job script. The batch job script contains instructions for running the batch job and the specific commands the batch job should execute. Batch jobs run as a new user session, so the batch job script should include any commands needed to set up the user session and navigate to the location of data files, etc.

Below is a sample batch job script for a job that will use a single CPU core.

#!/bin/bash
#Comment -  to be submitted by: sbatch slurm_job.txt
#SBATCH --time=1:00:00
#SBATCH --nodes=1 --ntasks-per-node=1
#SBATCH --partition=batch

#SBATCH --mail-type=BEGIN,END

#SBATCH --job-name=hello
#Comment - batch job setup complete

#Comment – load a program module, for example Python

module load anaconda-python3

#Comment – path to execution directory, for example

cd $HOME/Desktop

#Comment – program execution line

python helloWorld.py

To submit a batch job script, execute:

sbatch slurm_job.txt

The output of this will include a unique job identifier in the form of a job number.

The batch job script is a Linux shell script, so the first line specifies the shell interpreter to use when running the script. 

Lines starting with #SBATCH are instructions to the batch scheduling system.

MPI parallel execution 

Parallel program execution in Slurm is launched by the srun command which replaces mpiexec. Here is an example of an execution line that can be used in a job script:

srun --ntasks=28  ./a.out

GPU and Big Memory Node Access

GPU nodes and the big memory nodes are part of separate partitions (formerly queues).

A GPU node can be accessed by giving the --partition=gpu option:

#SBATCH --partition=gpu

A big memory node can be accessed by giving the --partition=bigmem option:

#SBATCH --partition=bigmem

Job Environment and Environment Variables

Environment variables will get passed to your job by default in Slurm. The command sbatch can be run with one of these options to override the default behavior:

sbatch --export=None 

sbatch --export MYVar=2 

sbatch --export=ALL, MYVAR=2

Job Monitoring and Status Check

Upon submission the scheduler reports back a job ID, for example: Submitted batch job 35.

The job’s progress in the queue can be monitored with the command squeue, see below. If cancellation is the job is required,

scancel <jobid>

will do the trick.

To check the status of a job the command squeue can be used:

squeue --job <jobid>

The displayed information includes the status of the job.

Codes provided by squeue to indicate job status
Code State Meaning
CA

Canceled

Job was canceled

CD

Completed

Job completed

CF

Configuring

Job resources being configured

CG

Completing

Job is completing

F

Failed

Job terminated with non-zero exit code

NF

Node Fail

Job terminated due to failure of node(s)

PD

Pending

Job is waiting for compute node(s)

R

Running

Job is running on compute node(s)

The squeue command also shows the length of time the job has been running.

Details on a specific job can be seen using the 

scontrol show job <jobid>

where <jobid> is the numeric portion of the name returned by the squeue command.

Common batch job options
#SBATCH OR sbatch Option Usage

--job-name=<name>

-or-

-J <name>

Name of the batch job:
#SBATCH   --job-name=<name>
The job name is used to name output files and is also displayed when using squeue to query the job status

--output

#SBATCH   --output=<combined out and err file path>

Slurm collects all job output in a single file rather than having separate files for standard output and error.

--mail-type= <events>

Mail options:
#SBATCH    --mail-type=<events>
Note: multiple mail-type requests may be specified in a comma separated list:
--mail-type=BEGIN,END,NONE,FAIL,REQUEUE
Send e-mail to the user when the job enters a specific state.

--mail-user=<email address>

Job status E-mail address specification.
#SBATCH    --mail-user=<email address>
List of additional e-mail addresses for messages. Note that e-mail is always sent to your uniqueID@miamioh.edu address, so it does not need to be specified.

--time=<hh:mm:ss>

--nodes=<count>

-or-

-N <count>

 

--ntasks-per-node=<count>

Resource specifications. There are three main types of resources, nodes, CPU cores on a node as well as time. Multiple #SBATCH  lines can be used to request these separately.
#SBATCH --time=100:00:00
This requests 100 hours of run time for the job.
#SBATCH  --nodes=2   --tasks-per-node=24
This requests 2 physical nodes and all 24 processors on each node.
Note that if you do not specify the number of processors, it will default to one processor core. The default time is 1 hour.

--partition=partition

-or-

-p

Destination - which batch partitions (formerly queue) to use.
#SBATCH -p batch
This sends the job to the default batch partition on the Redhawk cluster.

--export

Exports an environment variable to the job:

#SBATCH --export=VARIABLE