| Sun Grid Engine - quick user guide
This page is part of the EmpiricalAlgorithmics web. |
|
Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose. |
|
> > | Note: much of the information on priority classes are out of date. The section will be updated soon. |
| Priority classes are:
- Urgent: intended for very occasional use, mostly around paper deadlines. The cluster is limited to 10 such jobs at any time. Please use the urgent consumable (see 'priority classes' below).
- eh (for empirical hardness): jobs which pertain to the particular project mentioned in the CFI grant under which the cluster was funded.
|
|
- kpm: jobs of Kevin Murphy's students.
- kpmUrgent: urgent jobs of Kevin Murphy's students. If you use this priority class, please use the kpmUrgent consumable (i.e. add -l kpmUrgent=1 to your SGE command; see 'priority classes' below); if you don't you will block the cluster for everyone else.
- klb: jobs of Kevin Leyton-Brown's students.
|
|
< < | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown . When he grants you access, he, Frank Hutter or Lin Xu can add you as a user. |
> > | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown . When he grants you access, he, or Steve Ramage can add you as a user. |
|
How do I submit jobs? |
| Thus, if you submit many long jobs, you will block the cluster for other users.
Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users.
- Not ridiculously short jobs though!: jobs that are extremely short (eg 0.1 second) might take longer to be dispatched than they take to run, again hurting performance if a gigantic number of such jobs are in the queue. An ideal job length is on the order of a minute (i.e., if your jobs are extremely short, it's best to group them to achieve this sort of length). This would mean that higher-priority jobs would be able to access the cluster within about a second on average, but that job dispatching would not overwhelm the submit host.
|
|
< < |
- Like it or not, you can't submit ridiculously long jobs: Please keep unrestricted jobs in the range of a few minutes or maximally around an hour. Once a job runs on the cluster it will run until completion (except jobs longer than 25 hours which are automatically killed); this means that if you submit unrestricted long jobs you will completely block the cluster for all other users. So please try to split jobs up into array jobs (see the instructions on array jobs) where each single task takes just a few minutes. If you have to submit longer jobs, please use a consumable, such as max20 (see below).
|
> > |
-
<--*Like it or not, you can't submit ridiculously long jobs*: Please keep unrestricted jobs in the range of a few minutes or maximally around an hour. Once a job runs on the cluster it will run until completion (except jobs longer than 25 hours which are automatically killed); this means that if you submit unrestricted long jobs you will completely block the cluster for all other users. So please try to split jobs up into array jobs (see the instructions on array jobs) where each single task takes just a few minutes. If you have to submit longer jobs, please use a consumable, such as max20 (see below).--> As of Feb 2014, the arrow cluster is not as heavily contended as it once was. The usage is normally coordinated by gentlemanly agreement based on group priorities. Arrow will not try to kill your long jobs as well.
|
|
Array jobs |