Difference: SunGridEngine (8 vs. 9)

Revision 92006-10-12 - KevinLeytonBrown

Line: 1 to 1
 

SunGridEngine - quick user guide

Changed:
<
<

Introduction:

>
>

Introduction

  This page gives a quick overview of computational facilities available to users in BETA and LCI, and explains how to use them with the SunGridEngine scheduling software.
An extensive overview of all the features of SGE can be found at the Sun website.
Changed:
<
<

Available clusters:

>
>

Available clusters

 
  • beta cluster: 5 machines, two 2GHz CPUs each, 4GB memory, running Linux. Available for members of the beta lab.
    This should probably be used only via SGE (with a share-based scheduling system that will actually work, as opposed to the current first-come-first-serve scheme)
Line: 17 to 19
  Details about the machines, their configuration, and their names: Ganglia
Changed:
<
<

The Arrow cluster:

>
>

The Arrow cluster

  Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose.
Line: 30 to 32
 In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Frank Hutter, Lin Xu or Kevin Leyton-Brown.
Changed:
<
<

How to submit jobs:

>
>

How to submit jobs

 
  • For the arrow cluster, add the line

Line: 87 to 89
  on the command line, where the range 1-100 is chosen arbitrarily here. This will create a new array job with an automatically assigned job number <jobnumber> and 100 entries that is queued. Each entry of the array job will eventually run on a machine in the cluster - the <i>th entry will be called <jobnumber>.<i>. Sungrid Engine treats every entry of an array job as a single job, and when the <i>th entry is called assigns <i> to the variable $SGE_TASK_ID. You may use this variable to do arbitrarily complex things in your shell script - an easy option is to index a file and execute the <i>th line with the <i>th job.
Changed:
<
<

How to monitor, control, and delete jobs:

>
>

How to monitor, control, and delete jobs

 
  • The command qstat is used to check the status of the queue. It lists all running and pending jobs. There is an entry for each entry of a running array job, whereas pending parts of array jobs are listed in one line. qstat -f gives detailed information for each cluster node, qstat -ext more detailed information for each job. Try man qstat for more options.
  • The command qmon can be used to get a graphical interface to monitor and control jobs. It's not great, though.
Line: 97 to 99
  Right here:
Changed:
<
<
>
>
 

Administration

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback