SunGridEngine - quick user guide

Introduction:

This page gives a quick overview of computational facilities available to users in BETA and LCI, and explains how to use them with the SunGridEngine scheduling software.
An extensive overview of all the features of SGE can be found at the Sun website.

Available clusters:

beta cluster: 5 machines, two 2GHz CPUs each, 4GB memory, running Linux. Available for members of the beta lab.
This should probably be used only via SGE (with a share-based scheduling system that will actually work, as opposed to the current first-come-first-serve scheme)
icics cluster: 13 machines, two 3GHz CPUs each, 2GB memory, running Linux. Available for all members of the department.
Some people run stuff locally on these machines. We could still use SGE on top of that (it dispatches jobs based on load), but there is no guarantee to get 100% CPU time on the nodes you're running on.
arrow cluster: 50 machines, two 3.2 GHz CPUs each, 2GB memory, running Linux.
This cluster is tied to Kevin Leyton-Brown's CFI grant for research on empirical hardness models, but is also available to other users in the department when it is idle.

Details about the machines, their configuration, and their names: Ganglia

The Arrow cluster:

Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose.

Priority classes are:

Urgent: intended for very occasional use, mostly around paper deadlines. The cluster is limited to 10 such jobs at any time.
eh (for empirical hardness): jobs which pertain to the particular project mentioned in the CFI grant under which the cluster was funded.
ea (for empirical algorithmics): studies on the empirical properties of algorithms (i.e., "the 'E' in BETA"), but not part of the project described above.
general: jobs which do not fall into one of the above categories. We ask that these jobs be relatively short in duration (although there can be arbitrarily many of them): since new jobs can only be scheduled when a processor becomes idle, excessively long low-priority jobs can lead to starvation of high-priority jobs.
low: jobs of particularly low priority. This priority class can be used to submit a huge amount of jobs that will give way to any other jobs in the queue if there are any.

In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Frank Hutter, Lin Xu or Kevin Leyton-Brown

How to submit jobs:

For the arrow cluster, add the line
```
   source /cs/beta/lib/pkg/sge-6.0u7_1/default/common/settings.csh
   
```
to your configuration file (e.g. ~/csh_init/.cshrc). For the beta cluster, the appropriate line to add is
```
   source /cs/beta/lib/pkg/sge/beta_grid/common/settings.csh
   
```
but we may completely get rid of this configuration and have it all in one. Currently, you work solely with the one cluster that is indicated by this line in the configuration file - there is no easy way to go back and forth (but this will hopefully change).
ssh onto a submit host. For beta, submit hosts are (we're not sure), for arrow it is samos (which is now another name for arrow)
A job is submitted to the cluster in the form of a shell (.sh) script.
You can either submit single or array jobs. An example for a single job would e.g. be the one-line script
```
 echo 'Hello world.'
 
```
If this is the content of the file helloworld.sh, you can submit a job by typing
```
 qsub -cwd -o <outfiledir> -e <errorfiledir> helloworld.sh
 
```
on the command line. This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster. When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (for array jobs you can easily end up with thousands) specify /dev/null as <>, and similarly for error files.

Tips on Submitting Jobs

Short Jobs Please!: To the extent possible, submit short jobs. We have not configured SGE as load balancing software, because allowing jobs to be paused or migrated can affect the accuracy of process timing. Once a job is running it runs with 100% of a CPU until it's done. Thus, if you submit many long jobs, you will block the cluster for other users. Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users.
On the arrow cluster, jobs that run longer than 25 hours are automatically killed, but rather try to keep jobs to minutes or hours. This makes it easiest for SGE to help share resources in a fair way.

Priority Class: If you use the qsub syntax above on arrow, your job will be assigned to the default priority class associated with your user account. This cannot be the priority class Urgent. To use another priority class than your default for a job, use the following syntax (note the capital P):
```
 qsub -cwd -o <outfiledir> -e <errorfiledir> -P <priorityclass> helloworld.sh
 
```
To submit in the urgent class, please use:
```
 qsub -cwd -o <outfiledir> -e <errorfiledir> -P Urgent -l urgent=1 helloworld.sh
 
```
As mentioned above, the total number of urgent jobs is limited to 10. This is even true if those are the only jobs on the cluster - in that case (which probably will never happen because many users will have stuff waiting) you can still fill the rest of the cluster with "normal" jobs.

CPLEX and MATLAB if your job uses CPLEX or Matlab, add -l cplex=1 or -l matlab=1 to your qsub command. This will ensure that we don't run more jobs than there are licenses.

Array jobs are useful for submitting many similar jobs. In general, you should try to submit a single array job for each big run you're going to do, rather than (e.g.,) invoking qsub in a loop. This makes it easier to pause or delete your jobs, and also imposes less overhead on the scheduler.
An example of an array job is the one-line script
```
 echo 'Hello world, number ' $SGE_TASK_ID
 
```
If this is the content of the file many-helloworld.sh, you can submit an array job by typing
```
 qsub -cwd -o <outfiledir> -e <errorfiledir> -t 1-100 many-helloworld.sh
 
```
on the command line, where the range 1-100 is chosen arbitrarily here. This will create a new array job with an automatically assigned job number <jobnumber> and 100 entries that is queued. Each entry of the array job will eventually run on a machine in the cluster - the th entry will be called <jobnumber>.. Sungrid Engine treats every entry of an array job as a single job, and when the th entry is called assigns to the variable $SGE_TASK_ID. You may use this variable to do arbitrarily complex things in your shell script - an easy option is to index a file and execute the th line with the th job.

How to monitor, control, and delete jobs:

The command qstat is used to check the status of the queue. It lists all running and pending jobs. There is an entry for each entry of a running array job, whereas pending parts of array jobs are listed in one line. qstat -f gives detailed information for each cluster node, qstat -ext more detailed information for each job. Try man qstat for more options.
The command qmon can be used to get a graphical interface to monitor and control jobs. It's not great, though.
The command qdel can be used to delete (your own) jobs. (syntax: qdel <jobnumber>). You can also delete single entries of array jobs (syntax qdel <jobnumber>.).

-- FrankHutter and Lin Xu - 02 May 2006

This topic: BETA > SunGridEngine
Topic revision: r5 - 2006-05-11 - FrankHutter