How to Use the Cluster (Qsub script)

To submit a job to the cluster, you should create a submission script. For the full documentation on using qsub, you can visit the torque website.

The script should follow the basic format of #PBS commands that tell the cluster software how to run your job, followed by a line of code to execute the program you wish to execute (a full example script can also be found on the cluster in /io1/home1/ghall/examples/job-script.qsub, which is readable to all cluster users). Everything preceded by # that is not followed by PBS, is a comment. To view the qsub options while logged into the cluster, type man qsub on the command line.

PBS lines you could add to your script:

NAME: You can name your job so that is easier to find later with #PBS -N hpl-576.

RESOURCES: You need to pick ONE of the following PBS -l lines (note this is a lowercase letter L, not a one). Either line can be modified to change the numer of nodes and processors needed for your job:

  • request resources of 24 nodes with 24 processors per node: #PBS -l nodes=24:ppn=24
  • request resources of 5 nodes with 20 processoers per node on gpu nodes: #PBS -l nodes=5:ppn=20:gpu

Please be sure to only ask for the resources you need (nodes and processors) for your jobs, as they will be reserved for you. If you ask for more than you need you may prevent other jobs from running, or you could prevent your own job from running if the resources are not available.

OUTPUT: You also have choices on how your job’s output is handled.
One options is to join stderr and stdout into one file and give it a name (if not specified it wil be jobname.o (i.e., hpl-576.o76)):
#PBS -j oe
#PBS -o logs/combined.log

Or, if you don’t combine stderr and stdout, you can specify the names of each of your output files:
#PBS -e logs/error.log
#PBS -o logs/output.log

Other options with output:

  • #PBS -j oe , which gives one output file per run, written to the dir where qsub was run.
  • #PBS -k eo, which specifies to keep the output/error from each host. It should be named job_name.sequencenumber. This ends up in root of home directory.

QUEUE: When submitting your job, it goes into a queue. The default is the batch queue, so if you are not using the GPU nodes, you do not need to specify the queue. If, however, you would like to use the GPU nodes, or specify the batch queue to be sure, you will need to add one of the following lines in your script to specify the appropriate queue:

  • Choose to use some of the 5 GPU nodes: #PBS -q gpu
  • Choose to use some of the 24 regular nodes: #PBS -q batch

DIRECTORY: The following line tells qsub to run your job in the directory in which you submit the job. This will save output files to this directory as well (or a relative path from this directory, if you gave a path when denoting output file name):
cd $PBS_O_WORKDIR

RUN: Next you tell qsub what you want to run. This line could be a call to mpiexec if you are running an MPI application (not relevant for most users), a call to a script, or the line of code to call your program from the command line. An example:
mpiexec ./xhpl

Example

A full example script follows to run a java program named “FileExample”. As I’m submitting to the batch queue, I don’t specify the queue:

#PBS -N hpl-example1
#PBS -l nodes=10:ppn=4
#PBS -e logs/error1.log
#PBS -o logs/output1.log
cd $PBS_O_WORKDIR
java FileExample

To run this script, type “qsub” followed by the name of the script file, i.e., qsub example.qsub if the above file is named “example.qsub”.