Introduction:
This page gives a quick overview of computational facilities available to users in BETA and LCI, and explains how to use them with the SunGridEngine scheduling software.
An extensive overview of all the features of SGE can be found at the
Sun website
.
Available clusters:
- beta
cluster: 5 machines, two 2GHz CPUs each, 4GB memory, running Linux. Available for members of the beta lab.
This should probably be used only via SGE (with a share-based scheduling system that will actually work, as opposed to the current first-come-first-serve scheme)
- icics
cluster: 13 machines, two 3GHz CPUs each, 2GB memory, running Linux. Available for all members of the department.
Some people run stuff locally on these machines. We could still use SGE on top of that (it dispatches jobs based on load), but there is no guarantee to get 100% CPU time on the nodes you're running on.
- arrow
cluster: 50 machines, two 3.2 GHz CPUs each, 2GB memory, running Linux.
This cluster is tied to Kevin Leyton-Brown's CFI grant for research on empirical hardness models, but is also available to other users in the department when it is idle.
Details about the machines, their configuration, and their names:
Ganglia
The Arrow cluster:
Jobs running on the
arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big
array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose.
Priority classes are:
- Urgent: intended for very occasional use, mostly around paper deadlines. The cluster is limited to 10 such jobs at any time.
- eh (for empirical hardness): jobs which pertain to the particular project mentioned in the CFI grant under which the cluster was funded.
- ea (for empirical algorithmics): studies on the empirical properties of algorithms (i.e., "the 'E' in BETA"), but not part of the project described above.
- general: jobs which do not fall into one of the above categories. We ask that these jobs be relatively short in duration (although there can be arbitrarily many of them): since new jobs can only be scheduled when a processor becomes idle, excessively long low-priority jobs can lead to starvation of high-priority jobs.
- low: jobs of particularly low priority. This priority class can be used to submit a huge amount of jobs that will give way to any other jobs in the queue if there are any.
In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact
Frank Hutter,
Lin Xu or
Kevin Leyton-Brown
.
How to submit jobs:
- For the arrow cluster, add the line
source /cs/beta/lib/pkg/sge-6.0u7_1/default/common/settings.csh
to your configuration file (e.g. ~/csh_init/.cshrc
). For the beta cluster, the appropriate line to add is
source /cs/beta/lib/pkg/sge/beta_grid/common/settings.csh
but we may completely get rid of this configuration and have it all in one. Currently, you work solely with the one cluster that is indicated by this line in the configuration file - there is no easy way to go back and forth (but this will hopefully change).
- ssh onto a submit host. For beta, submit hosts are (we're not sure), for arrow it is samos (which is now another name for arrow)
- A job is submitted to the cluster in the form of a shell (.sh) script.
- You can either submit single or array jobs. An example for a single job would e.g. be the one-line script
echo 'Hello world.'
If this is the content of the file helloworld.sh, you can submit a job by typing
qsub -cwd -o <outfiledir> -e <errorfiledir> helloworld.sh
on the command line. This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster. When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (for array jobs you can easily end up with thousands) specify /dev/null as <>, and similarly for error files.
Tips on Submitting Jobs
- Short Jobs Please!: To the extent possible, submit short jobs. We have not configured SGE as load balancing software, because allowing jobs to be paused or migrated can affect the accuracy of process timing. Once a job is running it runs with 100% of a CPU until it's done. Thus, if you submit many long jobs, you will block the cluster for other users. Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users.
On the arrow cluster, jobs that run longer than 25 hours are automatically killed, but rather try to keep jobs to minutes or hours. This makes it easiest for SGE to help share resources in a fair way.
- CPLEX and MATLAB if your job uses CPLEX or Matlab, add
-l cplex=1
or -l matlab=1
to your qsub command. This will ensure that we don't run more jobs than there are licenses.
- Array jobs are useful for submitting many similar jobs. In general, you should try to submit a single array job for each big run you're going to do, rather than (e.g.,) invoking qsub in a loop. This makes it easier to pause or delete your jobs, and also imposes less overhead on the scheduler.
An example of an array job is the one-line script
echo 'Hello world, number ' $SGE_TASK_ID
If this is the content of the file many-helloworld.sh, you can submit an array job by typing
qsub -cwd -o <outfiledir> -e <errorfiledir> -t 1-100 many-helloworld.sh
on the command line, where the range 1-100 is chosen arbitrarily here. This will create a new array job with an automatically assigned job number <jobnumber> and 100 entries that is queued. Each entry of the array job will eventually run on a machine in the cluster - the <i>th entry will be called <jobnumber>.<i>. Sungrid Engine treats every entry of an array job as a single job, and when the <i>th entry is called assigns <i> to the variable $SGE_TASK_ID. You may use this variable to do arbitrarily complex things in your shell script - an easy option is to index a file and execute the <i>th line with the <i>th job.
How to monitor, control, and delete jobs:
- The command qstat is used to check the status of the queue. It lists all running and pending jobs. There is an entry for each entry of a running array job, whereas pending parts of array jobs are listed in one line. qstat -f gives detailed information for each cluster node, qstat -ext more detailed information for each job. Try man qstat for more options.
- The command qmon can be used to get a graphical interface to monitor and control jobs. It's not great, though.
- The command qdel can be used to delete (your own) jobs. (syntax: qdel <jobnumber>). You can also delete single entries of array jobs (syntax qdel <jobnumber>.<i>).
--
FrankHutter and Lin Xu - 02 May 2006