How To Use the Cluster (Basics)

Getting your code ready to run

  1. You will need to have a version of your code that
    1. Does not use the visualization to run
    2. Has a clear end-point (i.e. will not run forever)
    3. Outputs any information you want to know about it as it runs to a file
  2. You will need to copy your code to the cluster using either a command line argument such as scp, or a GUI such as WinSCP.  See the Linux SSH & SCP Page if you need help with this step.

Running Your Code

To use ocha you must run your code by submitting to the job management system using “qsub”. This system ensures that jobs on the cluster are evenly distributed, and also allows a quick view of analytics of currently running jobs. The only exception to this rule is MATLAB, as MATLAB jobs are submitted via the MATLAB interface on the user’s own machine.

You should ssh to using either the command line or an SSH client to connect to the server for running your jobs.

If you have used non-torque versions of qsub in the past, your qsub scripts may not work correctly on our cluster. See below for how to create a script to successfully submit your jobs.

Creating a qsub script:

For information on creating a qsub script, please see the How to Use the Cluster (Qsub script) page.

Checking Status of Jobs

How can you tell when the code has stopped running?  You use the command “qstat -u <username>” to show you all of your currently running or queued jobs.  If nothing is listed, everything is finished.  If the status next to a job is Eqw, it did not run due to an error — check the command you used to submit the job. Status of “r” means it’s running.

Deleting a Job

if you job is in an error state, or you otherwise need to stop it from running, you can delete it with “qdel <job-id>”. Use qstat to find the job id.

If you have a disaster and need to stop your code from running you will first need to know its job number (this is told to you when you submit the job, or you can find it by running qstat).  Then type “qdel <job number>” and that job will be killed.

Getting files back to your own computer

Once its done you will probably want a copy of your results files on your own computer for analysis.  You can either use a graphical scp program, or you can do it on the command line.