Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 33 to 33 | ||||||||
How do I submit jobs? | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
source /cs/sungridengine/arrow/common/settings.sh |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 12 to 12 | ||||||||
Available clusters | ||||||||
Added: | ||||||||
> > | Note: As of May-07-2014, no one seems to use the beta or iccs cluster, at least in the BETA lab. Please see previous revisions of this document for how to run on these clusters, but the information was removed. | |||||||
Note: As of Jan-05-2011, samos & ganglia no longer appear to be avaliable, and the submit host for arrow is arrowhead.cs.ubc.ca | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
This cluster is tied to Kevin Leyton-Brown's CFI grant for research on empirical hardness models, but is also available to other users in the department when it is idle.
Details about the machines, their configuration, and their names: Ganglia![]() | ||||||||
Line: 29 to 28 | ||||||||
Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose. | ||||||||
Changed: | ||||||||
< < | Note: much of the information on priority classes are out of date. The section will be updated soon.
Priority classes are:
![]() | |||||||
> > | To request access, please contact Kevin Leyton-Brown![]() | |||||||
How do I submit jobs?
| ||||||||
Changed: | ||||||||
< < | source /cs/sungridengine/arrow/common/settings.csh | |||||||
> > | source /cs/sungridengine/arrow/common/settings.sh | |||||||
Deleted: | ||||||||
< < | to your configuration file (e.g. ~/csh_init/cshrc ), or add
source /cs/beta/lib/pkg/sge-6.0u7_1/default/common/settings.cshto access the old arrow cluster. For the beta cluster, simply type the following command in a shell: source /cs/sungridengine/beta-icics/common/settings.sh | |||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Added: | ||||||||
> > | #!/bin/bash | |||||||
echo 'Hello world.' | ||||||||
Added: | ||||||||
> > | sleep 30
echo 'Good bye.'
If this is the content of the file helloworld.sh, you can submit a job by typing:
qsub -cwd -S /bin/bash -q all.q -P eh -o ./ ./helloworld.sh | |||||||
Changed: | ||||||||
< < | If this is the content of the file helloworld.sh, you can submit a job by typing | |||||||
> > | Your output should be similar to: | |||||||
Changed: | ||||||||
< < | qsub -cwd -o | |||||||
> > | Your job 359475 ("helloworld.sh") has been submitted | |||||||
Changed: | ||||||||
< < | on the command line. This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster. | |||||||
> > | This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster. | |||||||
When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (for array jobs you can easily end up with thousands) specify /dev/null as < | ||||||||
Added: | ||||||||
> > | You can verify that your job is running by typing
qstat -u "*"Note The quotes around the astericks are not an error, you should see: qstat -u "*" job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 359476 0.56000 helloworld seramage r 05/07/2014 17:59:30 all.q@arrow06.cs.ubc.ca 1If the state is r, then the job is running. If you see many jobs it may be held in state qr. | |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 39 to 39 | ||||||||
| ||||||||
Changed: | ||||||||
< < | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown![]() | |||||||
> > | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown![]() | |||||||
How do I submit jobs? |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 29 to 29 | ||||||||
Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose. | ||||||||
Added: | ||||||||
> > | Note: much of the information on priority classes are out of date. The section will be updated soon. | |||||||
Priority classes are:
| ||||||||
Line: 38 to 39 | ||||||||
| ||||||||
Changed: | ||||||||
< < | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown![]() | |||||||
> > | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown![]() | |||||||
How do I submit jobs? | ||||||||
Line: 114 to 115 | ||||||||
Thus, if you submit many long jobs, you will block the cluster for other users.
Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users.
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Array jobs |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 12 to 12 | ||||||||
Available clusters | ||||||||
Added: | ||||||||
> > | Note: As of Jan-05-2011, samos & ganglia no longer appear to be avaliable, and the submit host for arrow is arrowhead.cs.ubc.ca | |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 167 to 167 | ||||||||
If your job may require significant amounts of memory, please use the consumable memheavy . You use this just like the Matlab or CPLEX consumables, i.e. -l memheavy=1 .
Only one job using this consumable will be scheduled on each machine. | ||||||||
Added: | ||||||||
> > | Multi-core jobs (Arrow Cluster)If your job parallelizes such that it will use both CPUs on a single node and potentially impair the performance of other jobs assigned to the same host, consider using the parallel environmentfillup . This will assign 2 (i.e., both) slots on the same machine to your job. You use this by specifying -pe fillup 2 in your submit command. | |||||||
Manually limiting the number of your jobs (Arrow Cluster)By using one of the consumables max10, max20, max30, or max40, you can manually limit the number of jobs you run at once. This is useful if you have long jobs and are worried about blocking the whole cluster with your jobs. As usual for consumables, use e.g. -l max10=1. Note that you will compete with other users requesting the same consumable (only 10 max10 jobs can run, regardless of who submits them). |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 49 to 49 | ||||||||
to access the old arrow cluster. For the beta cluster, simply type the following command in a shell: | ||||||||
Changed: | ||||||||
< < | use sge | |||||||
> > | source /cs/sungridengine/beta-icics/common/settings.sh | |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 39 to 39 | ||||||||
How do I submit jobs? | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
source /cs/beta/lib/pkg/sge-6.0u7_1/default/common/settings.csh | ||||||||
Changed: | ||||||||
< < | to your configuration file (e.g. ~/csh_init/.cshrc ). For the beta cluster, simply type the following command in a shell: | |||||||
> > | to access the old arrow cluster. For the beta cluster, simply type the following command in a shell: | |||||||
use sge |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 33 to 33 | ||||||||
| ||||||||
Added: | ||||||||
> > |
| |||||||
In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 162 to 162 | ||||||||
If your job may require significant amounts of memory, please use the consumable memheavy . You use this just like the Matlab or CPLEX consumables, i.e. -l memheavy=1 .
Only one job using this consumable will be scheduled on each machine. | ||||||||
Changed: | ||||||||
< < | Manually limiting the number of your jobs | |||||||
> > | Manually limiting the number of your jobs (Arrow Cluster) | |||||||
By using one of the consumables max10, max20, max30, or max40, you can manually limit the number of jobs you run at once. This is useful if you have long jobs and are worried about blocking the whole cluster with your jobs. As usual for consumables, use e.g. -l max10=1. Note that you will compete with other users requesting the same consumable (only 10 max10 jobs can run, regardless of who submits them). | ||||||||
Changed: | ||||||||
< < | NEWHere is probably a better way to manually limit the number of jobs you are running. Use one of the following options when running qsub: | |||||||
> > | Manually limiting the number of your jobs (ICICS/BETA Clusters)Here is another way to manually limit the number of jobs you are running, since the maxK consumables are not currently implemented for the ICICS & BETA clusters. Use one of the following options when running qsub: | |||||||
| ||||||||
Changed: | ||||||||
< < | The [-h] option will hold your array of jobs in the queue until there are NO other jobs running on the SGE. The [-hold_jid job_identifier_list] will hold your array of jobs in the queue until a SPECIFIC job(s) is finished, defined by the job_identifier_list. So, to manually limit the number of jobs you are running (lets say 10 jobs) but NOT limit yourself to the max10 consumable, split your array up to blocks of 10 and submit them one at a time with the [-hold_jid jobID] option, where jobID is the ID of the previous block of 10. This will ensure that you never use more than 10 machines at a time. | |||||||
> > | The [-h] option will hold your array of jobs in the queue until there are NO other jobs running on the SGE. This isn't really recommended unless you want to be REALLY nice and make sure NO-ONE is using the cluster when you do. | |||||||
Changed: | ||||||||
< < | Note by Frank: not sure who recommended this, but it certainly doesn't work for everyone; the queue simply doesn't empty very often, so jobs with holds on them tend to never run. My advice is to stay with the maxK consumables above. | |||||||
> > | The best option is to use the [-hold_jid job_identifier_list], which will hold your array of jobs in the queue until a SPECIFIC job(s) is finished, defined by the job_identifier_list. So, to manually limit the number of jobs you are running (lets say 10 jobs), split your array up to blocks of 10 and submit them one at a time with the [-hold_jid jobID] option, where jobID is the ID of the previous block of 10. This will ensure that you never use more than 10 machines at a time, and it will also let you use any 10 machines. | |||||||
What is the configuration of the machines, and how busy are they?See http://samos.cs.ubc.ca/ganglia/![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 165 to 165 | ||||||||
Manually limiting the number of your jobsBy using one of the consumables max10, max20, max30, or max40, you can manually limit the number of jobs you run at once. This is useful if you have long jobs and are worried about blocking the whole cluster with your jobs. As usual for consumables, use e.g. -l max10=1. Note that you will compete with other users requesting the same consumable (only 10 max10 jobs can run, regardless of who submits them). | ||||||||
Changed: | ||||||||
< < | ||||||||
> > | NEW | |||||||
Here is probably a better way to manually limit the number of jobs you are running. Use one of the following options when running qsub:
| ||||||||
Added: | ||||||||
> > | Note by Frank: not sure who recommended this, but it certainly doesn't work for everyone; the queue simply doesn't empty very often, so jobs with holds on them tend to never run. My advice is to stay with the maxK consumables above. | |||||||
What is the configuration of the machines, and how busy are they?See http://samos.cs.ubc.ca/ganglia/![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 196 to 196 | ||||||||
AdministrationFor details on how to administer the cluster (requires admin access), look at SunGridEngineAdmin. | ||||||||
Added: | ||||||||
> > | For software installed on arrow (or software you need installed there), see ArrowSoftware. | |||||||
-- FrankHutter and Lin Xu - 02 May 2006 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 32 to 32 | ||||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 170 to 170 | ||||||||
| ||||||||
Changed: | ||||||||
< < | The [-h] option will hold your array of jobs in the queue until you have NO other jobs running on the SGE. The [-hold_jid job_identifier_list] will hold your array of jobs in the queue until a SPECIFIC job(s) is finished, defined by the job_identifier_list. So, to manually limit the number of jobs you are running (lets say 10 jobs) but NOT limit yourself to the max10 consumable, split your array up to blocks of 10 and submit them with the [-h] option. This will ensure that you never use more than 10 machines at a time. | |||||||
> > | The [-h] option will hold your array of jobs in the queue until there are NO other jobs running on the SGE. The [-hold_jid job_identifier_list] will hold your array of jobs in the queue until a SPECIFIC job(s) is finished, defined by the job_identifier_list. So, to manually limit the number of jobs you are running (lets say 10 jobs) but NOT limit yourself to the max10 consumable, split your array up to blocks of 10 and submit them one at a time with the [-hold_jid jobID] option, where jobID is the ID of the previous block of 10. This will ensure that you never use more than 10 machines at a time. | |||||||
What is the configuration of the machines, and how busy are they?See http://samos.cs.ubc.ca/ganglia/![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 165 to 165 | ||||||||
Manually limiting the number of your jobsBy using one of the consumables max10, max20, max30, or max40, you can manually limit the number of jobs you run at once. This is useful if you have long jobs and are worried about blocking the whole cluster with your jobs. As usual for consumables, use e.g. -l max10=1. Note that you will compete with other users requesting the same consumable (only 10 max10 jobs can run, regardless of who submits them). | ||||||||
Added: | ||||||||
> > | Here is probably a better way to manually limit the number of jobs you are running. Use one of the following options when running qsub:
| |||||||
What is the configuration of the machines, and how busy are they?See http://samos.cs.ubc.ca/ganglia/![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 46 to 46 | ||||||||
use sge | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 74 to 74 | ||||||||
qsub -cwd -o /dev/null -e /dev/null -q icics.q script.sh | ||||||||
Added: | ||||||||
> > |
I hate shell scripts! Can I use Perl instead? Or some other scripting language?
| |||||||
What is an appropriate job size?
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 81 to 81 | ||||||||
Thus, if you submit many long jobs, you will block the cluster for other users.
Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users.
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Array jobs |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 33 to 33 | ||||||||
| ||||||||
Changed: | ||||||||
< < | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Frank Hutter, Lin Xu or Kevin Leyton-Brown![]() | |||||||
> > | In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Kevin Leyton-Brown![]() | |||||||
How do I submit jobs? |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guideThis page is part of the EmpiricalAlgorithmics web. | ||||||||
Line: 145 to 145 | ||||||||
What is the configuration of the machines, and how busy are they?See http://samos.cs.ubc.ca/ganglia/![]() | ||||||||
Added: | ||||||||
> > | Common Problems
| |||||||
Where do Arrow jobs run?Right here: |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 57 to 57 | ||||||||
on the command line. This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster.
When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (for array jobs you can easily end up with thousands) specify /dev/null as < | ||||||||
Changed: | ||||||||
< < | If you just want to see if the cluster works for you, use the following command. When the job finishes, an output file will be written to the current directory. | |||||||
> > |
| |||||||
qsub -cwd -o . -e . ~kevinlb/World/helloWorld.sh | ||||||||
Added: | ||||||||
> > |
| |||||||
What is an appropriate job size?
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 40 to 40 | ||||||||
source /cs/beta/lib/pkg/sge-6.0u7_1/default/common/settings.csh | ||||||||
Changed: | ||||||||
< < | to your configuration file (e.g. ~/csh_init/.cshrc ). For the beta cluster, the appropriate line to add is | |||||||
> > | to your configuration file (e.g. ~/csh_init/.cshrc ). For the beta cluster, simply type the following command in a shell: | |||||||
Changed: | ||||||||
< < | source /cs/beta/lib/pkg/sge/beta_grid/common/settings.csh | |||||||
> > | use sge | |||||||
Deleted: | ||||||||
< < | but we may completely get rid of this configuration and have it all in one. Currently, you work solely with the one cluster that is indicated by this line in the configuration file - there is no easy way to go back and forth (but this will hopefully change). | |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 142 to 142 | ||||||||
Arrow users: CFI Blurbs | ||||||||
Changed: | ||||||||
< < | The Arrow cluster was funded under a CFI grant. This grant requires us to complete annual reports explaining how this infrastructure is being used. If you use the cluster for a project, large or small, please enter a bullet item here that gives a short description of your project and the role the cluster played.
| |||||||
> > | The Arrow cluster was funded under a CFI grant. This grant requires us to complete annual reports explaining how this infrastructure is being used. This information is summarized on the page ArrowClusterCFISummaries. | |||||||
Administration |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 110 to 110 | ||||||||
CPLEX and MATLAB license management | ||||||||
Changed: | ||||||||
< < | If your job uses CPLEX or Matlab, add -l cplex=1 or -l matlab=1 to your qsub command. This will ensure that we don't run more jobs than there are licenses. Instructions on how to use CPLEX are found here. We're not exactly sure of the best way to run MATLAB without invoking X-windows, but the best we've been able to do is | |||||||
> > | If your job uses CPLEX or Matlab, add -l cplex=1 or -l matlab=1 to your qsub command. This will ensure that we don't run more jobs than there are licenses. Instructions on how to use CPLEX are found here. We're not exactly sure of the best way to run MATLAB without invoking X-windows, but the best we've been able to do is | |||||||
matlab -nojvm -nodisplay -nosplash < inputfile.m | ||||||||
Added: | ||||||||
> > | Another good approach is to use the Matlab compiler. The advantage of this is that compiled Matlab code does not require licenses to run. | |||||||
Right now the cluster allows 22 simultaneous CPLEX processes and (separately) 20 simultaneous Matlab processes. Using the -l syntax above ensures that no more than this number of processes run. If you don't use the syntax, you run the risk of invoking more processes than there are licenses, causing some of your jobs to fail and thus requiring you to rerun jobs. The department as a whole has 100 Matlab licenses, of which many are in use at any time; thus, running a Matlab job without the -l syntax runs a high chance of failure. If there is a high level of Matlab usage in the department, it's also possible that you will run out of licenses even if you do use the -l flag, as this doesn't actually reserve the licenses for you. We have 22 CPLEX licenses, but they should only be used on this cluster; thus again you'll need to use -l but you shouldn't have to worry about users elsewhere in the dept. To check the number of Matlab licenses available you can type: | ||||||||
Line: 124 to 125 | ||||||||
If you want to manually limit the number of processes that can run simultaneously to a different number, you can use -l max10=1 , -l max20=1 , -l max30=1 or -l max40=1 . This will allow you to run 10, 20, 30 or 40 jobs at once. (Note that if multiple people running simultaneous jobs both use the same flag, the number of simultaneous jobs will be limited across all users.)
Memory intensive jobs | ||||||||
Changed: | ||||||||
< < | If your job may require significant amounts of memory, please use the consumable memheavy. You use this just like the Matlab or CPLEX consumables, i.e. -l memheavy=1. | |||||||
> > | If your job may require significant amounts of memory, please use the consumable memheavy . You use this just like the Matlab or CPLEX consumables, i.e. -l memheavy=1 . | |||||||
Only one job using this consumable will be scheduled on each machine.
Manually limiting the number of your jobs |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 123 to 123 | ||||||||
If you want to manually limit the number of processes that can run simultaneously to a different number, you can use -l max10=1 , -l max20=1 , -l max30=1 or -l max40=1 . This will allow you to run 10, 20, 30 or 40 jobs at once. (Note that if multiple people running simultaneous jobs both use the same flag, the number of simultaneous jobs will be limited across all users.) | ||||||||
Added: | ||||||||
> > | Memory intensive jobsIf your job may require significant amounts of memory, please use the consumable memheavy. You use this just like the Matlab or CPLEX consumables, i.e. -l memheavy=1. Only one job using this consumable will be scheduled on each machine.Manually limiting the number of your jobsBy using one of the consumables max10, max20, max30, or max40, you can manually limit the number of jobs you run at once. This is useful if you have long jobs and are worried about blocking the whole cluster with your jobs. As usual for consumables, use e.g. -l max10=1. Note that you will compete with other users requesting the same consumable (only 10 max10 jobs can run, regardless of who submits them). | |||||||
What is the configuration of the machines, and how busy are they?See http://samos.cs.ubc.ca/ganglia/![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 123 to 123 | ||||||||
If you want to manually limit the number of processes that can run simultaneously to a different number, you can use -l max10=1 , -l max20=1 , -l max30=1 or -l max40=1 . This will allow you to run 10, 20, 30 or 40 jobs at once. (Note that if multiple people running simultaneous jobs both use the same flag, the number of simultaneous jobs will be limited across all users.) | ||||||||
Added: | ||||||||
> > | What is the configuration of the machines, and how busy are they?See http://samos.cs.ubc.ca/ganglia/![]() | |||||||
Where do Arrow jobs run?Right here: |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 24 to 24 | ||||||||
Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose. Priority classes are: | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Added: | ||||||||
> > |
| |||||||
In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Frank Hutter, Lin Xu or Kevin Leyton-Brown![]() | ||||||||
Line: 92 to 94 | ||||||||
Priority Classes | ||||||||
Changed: | ||||||||
< < | If you use the qsub syntax above on arrow, your job will be assigned to the default priority class associated with your user account. This cannot be the priority class Urgent. To use another priority class than your default for a job, use the following syntax (note the capital P): | |||||||
> > | If you use the qsub syntax above on arrow, your job will be assigned to the default priority class associated with your user account. This cannot be the priority class Urgent or the class kpmUrgent. To use another priority class than your default for a job, use the following syntax (note the capital P): | |||||||
qsub -cwd -o <outfiledir> -e <errorfiledir> -P <priorityclass> helloworld.sh | ||||||||
Line: 100 to 102 | ||||||||
qsub -cwd -o <outfiledir> -e <errorfiledir> -P Urgent -l urgent=1 helloworld.sh | ||||||||
Added: | ||||||||
> > | To submit in the kpmUrgent class, please use:
qsub -cwd -o <outfiledir> -e <errorfiledir> -P kpmUrgent -l kpmUrgent=1 helloworld.sh | |||||||
As mentioned above, the total number of urgent jobs is limited to 10. This is even true if those are the only jobs on the cluster - in that case (which probably will never happen because many users will have stuff waiting) you can still fill the rest of the cluster with "normal" jobs.
CPLEX and MATLAB license management |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 56 to 56 | ||||||||
on the command line. This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster.
When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (for array jobs you can easily end up with thousands) specify /dev/null as < | ||||||||
Added: | ||||||||
> > | If you just want to see if the cluster works for you, use the following command. When the job finishes, an output file will be written to the current directory.
qsub -cwd -o . -e . ~kevinlb/World/helloWorld.sh | |||||||
What is an appropriate job size?
| ||||||||
Line: 104 to 109 | ||||||||
matlab -nojvm -nodisplay -nosplash < inputfile.m | ||||||||
Changed: | ||||||||
< < | Right now the cluster allows 22 simultaneous CPLEX processes and (separately) 10 simultaneous Matlab processes. Using the -l syntax above ensures that no more than this number of processes run. If you don't use the syntax, you run the risk of invoking more processes than there are licenses, causing some of your jobs to fail and thus requiring you to rerun jobs. The department as a whole has 100 Matlab licenses, of which many are in use at any time; thus, running a Matlab job without the -l syntax runs a high chance of failure. We have 22 CPLEX licenses, but they should only be used on this cluster; thus again you'll need to use -l but you shouldn't have to worry about users elsewhere in the dept. To check the number of Matlab licenses available you can type: | |||||||
> > | Right now the cluster allows 22 simultaneous CPLEX processes and (separately) 20 simultaneous Matlab processes. Using the -l syntax above ensures that no more than this number of processes run. If you don't use the syntax, you run the risk of invoking more processes than there are licenses, causing some of your jobs to fail and thus requiring you to rerun jobs. The department as a whole has 100 Matlab licenses, of which many are in use at any time; thus, running a Matlab job without the -l syntax runs a high chance of failure. If there is a high level of Matlab usage in the department, it's also possible that you will run out of licenses even if you do use the -l flag, as this doesn't actually reserve the licenses for you. We have 22 CPLEX licenses, but they should only be used on this cluster; thus again you'll need to use -l but you shouldn't have to worry about users elsewhere in the dept. To check the number of Matlab licenses available you can type: | |||||||
/cs/local/generic/lib/pkg/matlab-7.2/etc/lmstat -aTo change the number of available Matlab licenses when using -l matlab=1, an administrator has to change the cluster configuration; details on how to do this are on the administration page. | ||||||||
Added: | ||||||||
> > | If you want to manually limit the number of processes that can run simultaneously to a different number, you can use -l max10=1 , -l max20=1 , -l max30=1 or -l max40=1 . This will allow you to run 10, 20, 30 or 40 jobs at once. (Note that if multiple people running simultaneous jobs both use the same flag, the number of simultaneous jobs will be limited across all users.) | |||||||
Where do Arrow jobs run?Right here: |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 104 to 104 | ||||||||
matlab -nojvm -nodisplay -nosplash < inputfile.m | ||||||||
Changed: | ||||||||
< < | Right now the cluster allows 22 simultaneous CPLEX processes and (separately) 10 simultaneous Matlab processes. Using the -l syntax above ensures that no more than this number of processes run. If you don't use the syntax, you run the risk of invoking more processes than there are licenses, causing some of your jobs to fail and thus requiring you to rerun jobs. The department as a whole has 100 CPLEX licenses, of which many are in use at any time; thus, running a Matlab job without the -l syntax runs a high chance of failure. We have 22 CPLEX licenses, but they should only be used on this cluster; thus again you'll need to use -l but you shouldn't have to worry about users elsewhere in the dept. To check the number of Matlab licenses available you can type: | |||||||
> > | Right now the cluster allows 22 simultaneous CPLEX processes and (separately) 10 simultaneous Matlab processes. Using the -l syntax above ensures that no more than this number of processes run. If you don't use the syntax, you run the risk of invoking more processes than there are licenses, causing some of your jobs to fail and thus requiring you to rerun jobs. The department as a whole has 100 Matlab licenses, of which many are in use at any time; thus, running a Matlab job without the -l syntax runs a high chance of failure. We have 22 CPLEX licenses, but they should only be used on this cluster; thus again you'll need to use -l but you shouldn't have to worry about users elsewhere in the dept. To check the number of Matlab licenses available you can type: | |||||||
/cs/local/generic/lib/pkg/matlab-7.2/etc/lmstat -a |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 32 to 32 | ||||||||
In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Frank Hutter, Lin Xu or Kevin Leyton-Brown![]() | ||||||||
Changed: | ||||||||
< < | How to submit jobs | |||||||
> > | How do I submit jobs? | |||||||
| ||||||||
Line: 56 to 56 | ||||||||
on the command line. This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster.
When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (for array jobs you can easily end up with thousands) specify /dev/null as < | ||||||||
Changed: | ||||||||
< < |
Tips on Submitting Jobs | |||||||
> > | What is an appropriate job size? | |||||||
| ||||||||
Changed: | ||||||||
< < | Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users. On the arrow cluster, jobs that run longer than 25 hours are automatically killed, but rather try to keep jobs to minutes or hours. This makes it easiest for SGE to help share resources in a fair way. | |||||||
> > | Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users.
| |||||||
Changed: | ||||||||
< < |
| |||||||
> > | Array jobs | |||||||
Changed: | ||||||||
< < |
| |||||||
> > | Array jobs are useful for submitting many similar jobs. In general, you should try to submit a single array job for each big run you're going to do, rather than (e.g.,) invoking qsub in a loop. This makes it easier to pause or delete your jobs, and also imposes less overhead on the scheduler. (In particular, the scheduler creates a directory for every job, so if you submit 10,000 jobs there will be 10,000 directories created before your job can start running. An array job containing 10,000 elements creates only a single directory.) | |||||||
Deleted: | ||||||||
< < |
| |||||||
An example of an array job is the one-line script
echo 'Hello world, number ' $SGE_TASK_ID | ||||||||
Line: 91 to 79 | ||||||||
on the command line, where the range 1-100 is chosen arbitrarily here. This will create a new array job with an automatically assigned job number <jobnumber> and 100 entries that is queued. Each entry of the array job will eventually run on a machine in the cluster - the <i>th entry will be called <jobnumber>.<i>. Sungrid Engine treats every entry of an array job as a single job, and when the <i>th entry is called assigns <i> to the variable $SGE_TASK_ID. You may use this variable to do arbitrarily complex things in your shell script - an easy option is to index a file and execute the <i>th line with the <i>th job. | ||||||||
Deleted: | ||||||||
< < | ||||||||
How to monitor, control, and delete jobs
| ||||||||
Added: | ||||||||
> > | Priority ClassesIf you use the qsub syntax above on arrow, your job will be assigned to the default priority class associated with your user account. This cannot be the priority class Urgent. To use another priority class than your default for a job, use the following syntax (note the capital P):qsub -cwd -o <outfiledir> -e <errorfiledir> -P <priorityclass> helloworld.shTo submit in the urgent class, please use: qsub -cwd -o <outfiledir> -e <errorfiledir> -P Urgent -l urgent=1 helloworld.shAs mentioned above, the total number of urgent jobs is limited to 10. This is even true if those are the only jobs on the cluster - in that case (which probably will never happen because many users will have stuff waiting) you can still fill the rest of the cluster with "normal" jobs. CPLEX and MATLAB license managementIf your job uses CPLEX or Matlab, add-l cplex=1 or -l matlab=1 to your qsub command. This will ensure that we don't run more jobs than there are licenses. Instructions on how to use CPLEX are found here. We're not exactly sure of the best way to run MATLAB without invoking X-windows, but the best we've been able to do is
matlab -nojvm -nodisplay -nosplash < inputfile.mRight now the cluster allows 22 simultaneous CPLEX processes and (separately) 10 simultaneous Matlab processes. Using the -l syntax above ensures that no more than this number of processes run. If you don't use the syntax, you run the risk of invoking more processes than there are licenses, causing some of your jobs to fail and thus requiring you to rerun jobs. The department as a whole has 100 CPLEX licenses, of which many are in use at any time; thus, running a Matlab job without the -l syntax runs a high chance of failure. We have 22 CPLEX licenses, but they should only be used on this cluster; thus again you'll need to use -l but you shouldn't have to worry about users elsewhere in the dept. To check the number of Matlab licenses available you can type: /cs/local/generic/lib/pkg/matlab-7.2/etc/lmstat -aTo change the number of available Matlab licenses when using -l matlab=1, an administrator has to change the cluster configuration; details on how to do this are on the administration page. | |||||||
Where do Arrow jobs run?Right here: |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Sun Grid Engine - quick user guide
| ||||||||
Line: 75 to 75 | ||||||||
As mentioned above, the total number of urgent jobs is limited to 10. This is even true if those are the only jobs on the cluster - in that case (which probably will never happen because many users will have stuff waiting) you can still fill the rest of the cluster with "normal" jobs. | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guide | ||||||||
Changed: | ||||||||
< < | Introduction: | |||||||
> > |
Introduction | |||||||
This page gives a quick overview of computational facilities available to users in BETA and LCI, and explains how to use them with the SunGridEngine scheduling software. An extensive overview of all the features of SGE can be found at the Sun website ![]() | ||||||||
Changed: | ||||||||
< < | Available clusters: | |||||||
> > | Available clusters | |||||||
| ||||||||
Line: 17 to 19 | ||||||||
Details about the machines, their configuration, and their names: Ganglia![]() | ||||||||
Changed: | ||||||||
< < | The Arrow cluster: | |||||||
> > | The Arrow cluster | |||||||
Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose. | ||||||||
Line: 30 to 32 | ||||||||
In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Frank Hutter, Lin Xu or Kevin Leyton-Brown![]() | ||||||||
Changed: | ||||||||
< < | How to submit jobs: | |||||||
> > | How to submit jobs | |||||||
| ||||||||
Line: 87 to 89 | ||||||||
on the command line, where the range 1-100 is chosen arbitrarily here. This will create a new array job with an automatically assigned job number <jobnumber> and 100 entries that is queued. Each entry of the array job will eventually run on a machine in the cluster - the <i>th entry will be called <jobnumber>.<i>. Sungrid Engine treats every entry of an array job as a single job, and when the <i>th entry is called assigns <i> to the variable $SGE_TASK_ID. You may use this variable to do arbitrarily complex things in your shell script - an easy option is to index a file and execute the <i>th line with the <i>th job. | ||||||||
Changed: | ||||||||
< < | How to monitor, control, and delete jobs: | |||||||
> > | How to monitor, control, and delete jobs | |||||||
| ||||||||
Line: 97 to 99 | ||||||||
Right here: | ||||||||
Changed: | ||||||||
< < | ![]() | |||||||
> > | ![]() | |||||||
Administration |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guideIntroduction: | ||||||||
Line: 99 to 99 | ||||||||
![]() | ||||||||
Added: | ||||||||
> > | AdministrationFor details on how to administer the cluster (requires admin access), look at SunGridEngineAdmin. | |||||||
-- FrankHutter and Lin Xu - 02 May 2006 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guideIntroduction: | ||||||||
Line: 93 to 93 | ||||||||
| ||||||||
Added: | ||||||||
> > | Where do Arrow jobs run?Right here:![]() | |||||||
-- FrankHutter and Lin Xu - 02 May 2006 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guideIntroduction: | ||||||||
Line: 73 to 73 | ||||||||
As mentioned above, the total number of urgent jobs is limited to 10. This is even true if those are the only jobs on the cluster - in that case (which probably will never happen because many users will have stuff waiting) you can still fill the rest of the cluster with "normal" jobs. | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guideIntroduction: | ||||||||
Line: 26 to 26 | ||||||||
| ||||||||
Changed: | ||||||||
< < | ||||||||
> > |
| |||||||
In order to submit in any priority class (even 'general'), access for that class must be explicitly granted to your user account. To request access, please contact Frank Hutter, Lin Xu or Kevin Leyton-Brown![]() |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guideIntroduction: | ||||||||
Line: 19 to 19 | ||||||||
The Arrow cluster: | ||||||||
Changed: | ||||||||
< < | Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take hours or days before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like, as doing so will not interfere with the cluster's ability to serve its primary purpose. | |||||||
> > | Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take days or weeks before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like (but rather use few big array jobs than many single jobs), as doing so will not interfere with the cluster's ability to serve its primary purpose. | |||||||
Changed: | ||||||||
< < | The priority classes are: | |||||||
> > | Priority classes are: | |||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
![]() | ||||||||
Line: 52 to 52 | ||||||||
qsub -cwd -o | ||||||||
Changed: | ||||||||
< < | When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (you can easily end up with thousands) specify /dev/null as | |||||||
> > | When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (for array jobs you can easily end up with thousands) specify /dev/null as < | |||||||
Tips on Submitting Jobs | ||||||||
Line: 63 to 63 | ||||||||
Due to the share-based scheduling we've set up, your overall share of computational time will not be larger if you submit larger jobs (e.g., shell scripts that invoke more runs of your program). However, while longer jobs will not increase your throughput, they will increase latency for other users. On the arrow cluster, jobs that run longer than 25 hours are automatically killed, but rather try to keep jobs to minutes or hours. This makes it easiest for SGE to help share resources in a fair way. | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
qsub -cwd -o <outfiledir> -e <errorfiledir> -P <priorityclass> helloworld.sh |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guide | ||||||||
Changed: | ||||||||
< < | Introduction: | |||||||
> > | Introduction: | |||||||
Changed: | ||||||||
< < | This page gives a quick overview of the available computational facilities and how to use them with the SunGridEngine scheduling software. | |||||||
> > | This page gives a quick overview of computational facilities available to users in BETA and LCI, and explains how to use them with the SunGridEngine scheduling software. | |||||||
An extensive overview of all the features of SGE can be found at the Sun website![]() | ||||||||
Changed: | ||||||||
< < | General policies: | |||||||
> > | Available clusters: | |||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
This should probably be used only via SGE (with a share-based scheduling system that will actually work, as opposed to the current first-come-first-serve scheme) | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Some people run stuff locally on these machines. We could still use SGE on top of that (it dispatches jobs based on load), but there is no guarantee to get 100% CPU time on the nodes you're running on. | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Details about the machines, their configuration, and their names: Ganglia![]() | ||||||||
Changed: | ||||||||
< < | How to submit jobs: | |||||||
> > | The Arrow cluster:Jobs running on the arrow cluster belong to one of four priority classes. Jobs are scheduled (selected to be run) pre-emptively by priority class, and then evenly among users within a priority class. (Note that scheduling among users is done on the basis of CPU usage, not on the basis of the number of jobs submitted. Thus a user who submits many fast jobs will be scheduled more often than a user at the same priority class who submits many slow jobs.) Because of the preemptive scheduling, users submitting to a lower priority class may see high latency (it may take hours or days before a queued job is scheduled). On the other hand, these lower priority jobs will be allocated all 100 CPUs when no higher-priority jobs are waiting. All users should feel free to submit as many jobs as they like, as doing so will not interfere with the cluster's ability to serve its primary purpose. The priority classes are:
![]() How to submit jobs: | |||||||
| ||||||||
Line: 38 to 41 | ||||||||
source /cs/beta/lib/pkg/sge/beta_grid/common/settings.csh but we may completely get rid of this configuration and have it all in one. Currently, you work solely with the one cluster that is indicated by this line in the configuration file - there is no easy way to go back and forth (but this will hopefully change). | ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
| ||||||||
Line: 49 to 52 | ||||||||
qsub -cwd -o | ||||||||
Changed: | ||||||||
< < | When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty.
| |||||||
> > | When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty. If you don't want output files (you can easily end up with thousands) specify /dev/null as Tips on Submitting Jobs
| |||||||
An example of an array job is the one-line script
echo 'Hello world, number ' $SGE_TASK_ID | ||||||||
Line: 61 to 86 | ||||||||
on the command line, where the range 1-100 is chosen arbitrarily here. This will create a new array job with an automatically assigned job number <jobnumber> and 100 entries that is queued. Each entry of the array job will eventually run on a machine in the cluster - the <i>th entry will be called <jobnumber>.<i>. Sungrid Engine treats every entry of an array job as a single job, and when the <i>th entry is called assigns <i> to the variable $SGE_TASK_ID. You may use this variable to do arbitrarily complex things in your shell script - an easy option is to index a file and execute the <i>th line with the <i>th job. | ||||||||
Changed: | ||||||||
< < | How to monitor, control, and delete jobs: | |||||||
> > |
How to monitor, control, and delete jobs: | |||||||
| ||||||||
Deleted: | ||||||||
< < |
Urgent jobs on arrow
If you have an urgent deadline and would like to use some CPUs with comparably low latency, please contact Kevin Leyton-Brown, Frank Hutter, or Lin Xu to be temporarily added as an urgent user.
Once you are added as a temporary urgent user, you can submit jobs using
qsub -P Urgent -l urgent=1 instead of qsub . As said above, the total number of urgent jobs is limited to 10. This is even true if those are the only jobs on the cluster - in that case (which probably will never happen because many users will have stuff waiting) you can still fill the rest of the cluster with "normal" jobs. | |||||||
-- FrankHutter and Lin Xu - 02 May 2006 |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
SunGridEngine - quick user guideIntroduction: | ||||||||
Changed: | ||||||||
< < | This page gives a quick overview of the available computational facilities and how to use them with the SunGridEngine scheduling software. | |||||||
> > | This page gives a quick overview of the available computational facilities and how to use them with the SunGridEngine scheduling software. An extensive overview of all the features of SGE can be found at the Sun website ![]() | |||||||
General policies: | ||||||||
Line: 22 to 23 | ||||||||
| ||||||||
Changed: | ||||||||
< < | This cluster is not appropriate for jobs that require low latency. For special requests, such as very near paper deadlines, we set up a temporary urgent passing lane jobs in which will be run first. To prevent urgent jobs from blocking the whole cluster, their total number is limited to 10. | |||||||
> > | This cluster is not appropriate for jobs that require low latency. For special requests, such as very near paper deadlines, we set up a temporary urgent passing lane, jobs in which will be run first. To prevent urgent jobs from blocking the whole cluster, their total number is limited to 10. For details on this urgent lane, see UrgentJobsOnArrow. For comments on this usage policy, please contact Kevin Leyton-Brown. | |||||||
Details about the machines, their configuration, and their names: Ganglia![]() | ||||||||
Line: 38 to 39 | ||||||||
but we may completely get rid of this configuration and have it all in one. Currently, you work solely with the one cluster that is indicated by this line in the configuration file - there is no easy way to go back and forth (but this will hopefully change).
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
echo 'Hello world.' | ||||||||
Changed: | ||||||||
< < | If this is the content of the file helloworld.sh, you submit it by typing | |||||||
> > | If this is the content of the file helloworld.sh, you can submit a job by typing | |||||||
qsub -cwd -o <outfiledir> -e <errorfiledir> helloworld.sh | ||||||||
Changed: | ||||||||
< < | on the command line, where | |||||||
> > | on the command line. This will create a new job with an automatically assigned job number <jobnumber> that is queued and eventually run on a machine in the cluster.
When the job runs, it will write output (stdout) to the file <outfiledir>/helloworld.sh.o<jobnumber> It will also create a file <errorfiledir>/helloworld.sh.e<jobnumber> and write stderr to that file. In the above case, "Hello world." will be written to the outfile and the errorfile will be empty.
qsub -P Urgent -l urgent=1 instead of qsub . As said above, the total number of urgent jobs is limited to 10. This is even true if those are the only jobs on the cluster - in that case (which probably will never happen because many users will have stuff waiting) you can still fill the rest of the cluster with "normal" jobs.
-- FrankHutter and Lin Xu - 02 May 2006 | |||||||
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
Added: | ||||||||
> > | SunGridEngine - quick user guideIntroduction: This page gives a quick overview of the available computational facilities and how to use them with the SunGridEngine scheduling software. General policies:
![]()
|