> > | Go to User Configuration, click on Userset, select the appropriate Userset, click on modify and enter the username. Currently, the only user set maintained is eh; we might make others in the future.
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
Go to Project Configuration and click on Add. Enter the name of the project and choose user sets or users who are eligible to submit jobs to this project by clicking on the buttons below. E.g., say you want to add a user set: click on the left button, and in the new window that pops up, choose the applicable user sets. | |
> > | Parallel environments
A parellel environment defines a schema for how multiple-CPU jobs are to be run. Run a job in a parallel environment by adding "-pe " to the qsub command. For example, add "-pe fillup 2" to run a job which reserves 2 slots on the same host for job execution. The "fillup" environment was created so that multi-threaded, CPU-intensive jobs do not "clobber" other jobs placed on the same host. More complex parallel environments are likely required for MPI, etc., jobs, but no such environments have been configured yet.
Parallel environments can be created using "qconf -ap ", modified with "qconf -mp ", and listed with "qconf -spl". A new parallel environment must have name its added to the "pe_list" variable of some queue ("gconf -mq ") before being usable. More information is found at http://wikis.sun.com/display/gridengine62u2/Managing+Parallel+Environments | | Consumables
Changing existing consumables | |
Let's keep a log of administration changes made to the cluster, to help us undo bad changes if they occur.
- kevinlb, 4/11/07: added the flag "batch" to the arrowtest.q queue, in order to allow batch jobs to be submitted. Now as far as I can tell the test queue works.
| |
> > |
- cnell, 3/31/10: added "fillup" parallel environment to all.q in order to allow jobs which reserve a whole machine.
| |
-- FrankHutter - 11 Oct 2006 |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
> > | This page is part of the EmpiricalAlgorithmics web. | |
For general information about using Sun Grid Engine, see SunGridEngine. |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
For general information about using Sun Grid Engine, see SunGridEngine. | |
< < | Checking the queque | > > | Checking the queue | | qstat gives basic information about the jobs in the queque
qstat -ext gives a bit of extra information, such as a job's project
qstat -j <jobnumber> gives you very detailed information about that job |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
If you want to create a new consumable, you do want to go to "complex configuration". Give it a name and nickname, make it "int" and "<=", consumable, requestable and unforced, default=0, urgent=0. Then you'll need to set the number of available units through host configuration as above. However, the new consumable won't yet appear as a consumable in the "consumables/fixed attributes" pane to the right when you click on "global" in "execution host". How do you get it there? This is possibly the awesomest interface feat in SGE yet. Click on "modify" (with "global" selected). Click the "Consumable/Fixed Attribute" tab. There's the list of consumables--how do you get a new one to appear? Just click the "name" header (that's right!). You can figure it out from there. | |
> > | Creating for each machine
Frank created the memheavy consumable that limits memory-intensive jobs to one per machine. Consumables like this can be implemented by giving each single machine a single consumable. I.e., in "Host configuration, choose "Execution host", and then choose a single machine, such as arrow01.cs.ubc.ca, instead of global. As above, click on modify and put in your consumable and value 1. Unfortunately, this has to be done for each single machine in turn When new machines come in don't forget to give them such a consumable, too. | | Maximum array job instances and tasks
These variables (max_aj_instances and max_aj_tasks) are in cluster configuration. Max_aj_instances used to be 20,000; KLB changed it to 100,000 on 10/13/06 because we seemed to have hit the maximum. (This made the cluster essentially unresponsive for about an hour afterwards; I'm not sure it was a good idea...) Max_aj_tasks is 1,000,000. |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
These variables (max_aj_instances and max_aj_tasks) are in cluster configuration. Max_aj_instances used to be 20,000; KLB changed it to 100,000 on 10/13/06 because we seemed to have hit the maximum. (This made the cluster essentially unresponsive for about an hour afterwards; I'm not sure it was a good idea...) Max_aj_tasks is 1,000,000. | |
> > | Test queue
There is a queue for testing purposes set up to run only on arrow01, which is not part of the regular queue. To use this queue, add the syntax -q arrowtest.q and -P eh2 to the qsub command. Your jobs should dispatch immediately as the queue is usually empty. There may be other jobs running on arrow01, however it's OK to overload this (and only this) machine.
Change log
Let's keep a log of administration changes made to the cluster, to help us undo bad changes if they occur.
- kevinlb, 4/11/07: added the flag "batch" to the arrowtest.q queue, in order to allow batch jobs to be submitted. Now as far as I can tell the test queue works.
| | -- FrankHutter - 11 Oct 2006 |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
- EH-Models
- EmpiricalAlgorithmics
- KLB
| |
> > | | |
The other usersets are different: a user who belongs to one of the above sets can also belong to them. It is important to add every user to the userset called arrow--if you do not, the user will be able to submit jobs but they may not ever run. Users need to belong to urgentusers to submit urgent jobs. | |
> > | Adding a new user set
Go to User Configuration, click on Userset, make sure Department is chosen in the lower left, and click on Add. Then add people as described above (if you want to transfer people from other user sets you have to delete them from those first and then add them to the new one). You can then associate the new user set with projects users are eligible to submit to.
Adding a new project
Go to Project Configuration and click on Add. Enter the name of the project and choose user sets or users who are eligible to submit jobs to this project by clicking on the buttons below. E.g., say you want to add a user set: click on the left button, and in the new window that pops up, choose the applicable user sets. | |
Consumables |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
Consumables | |
> > | Changing existing consumables | | What if you want to change the number of available matlab licenses, urgent queues or CPLEX instances? You would be tempted to go into "complex configuration" and change the value "default" on the consumable's definition. However, this doesn't work. (I think all this does is determine how many units of the consumable get used by requests to use the consumable that don't specify a number of units.) Instead, go to "Host configuration", then choose the "execution host" tab and select the host "global". Then under "consumables/fixed attributes" you'll see the consumables: matlab, cplex, urgent. Change the totals here!
How to find out how many available matlab licenses there are? In UNIX, type: | |
There are 22 CPLEX licenses bought as part of the CFI grant that purchased the cluster. Unless the department buys more someday, that's it... | |
< < | If you want to create a new consumable, you do want to go to "complex configuration". Give it a name and nickname, make it "int" and "<=", consumable, requestable and unforced, default=0. Then set the number of available units through host configuration as above. | > > | Creating new consumables
If you want to create a new consumable, you do want to go to "complex configuration". Give it a name and nickname, make it "int" and "<=", consumable, requestable and unforced, default=0, urgent=0. Then you'll need to set the number of available units through host configuration as above. However, the new consumable won't yet appear as a consumable in the "consumables/fixed attributes" pane to the right when you click on "global" in "execution host". How do you get it there? This is possibly the awesomest interface feat in SGE yet. Click on "modify" (with "global" selected). Click the "Consumable/Fixed Attribute" tab. There's the list of consumables--how do you get a new one to appear? Just click the "name" header (that's right!). You can figure it out from there. | |
Maximum array job instances and tasks |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | |
Consumables | |
< < | What if you want to change the number of available matlab licenses, urgent queues or CPLEX instances? You would be tempted to go into "complex configuration" and change the value "default" on the consumable's definition. However, this would break everything, so don't do it! Instead, go to "Host configuration", then choose the "execution host" tab and select the host "global". Then under "consumables/fixed attributes" you'll see the consumables: matlab, cplex, urgent. Change the totals here! | > > | What if you want to change the number of available matlab licenses, urgent queues or CPLEX instances? You would be tempted to go into "complex configuration" and change the value "default" on the consumable's definition. However, this doesn't work. (I think all this does is determine how many units of the consumable get used by requests to use the consumable that don't specify a number of units.) Instead, go to "Host configuration", then choose the "execution host" tab and select the host "global". Then under "consumables/fixed attributes" you'll see the consumables: matlab, cplex, urgent. Change the totals here! | |
How to find out how many available matlab licenses there are? In UNIX, type:
| |
The 'matlab-7.2' part may change as new versions of matlab become available... | |
> > | There are 22 CPLEX licenses bought as part of the CFI grant that purchased the cluster. Unless the department buys more someday, that's it...
If you want to create a new consumable, you do want to go to "complex configuration". Give it a name and nickname, make it "int" and "<=", consumable, requestable and unforced, default=0. Then set the number of available units through host configuration as above. | | Maximum array job instances and tasks | |
< < | These variables (max_aj_instances and max_aj_tasks) are in cluster configuration. Max_aj_instances used to be 20,000; KLB changed it to 100,000 on 10/13/06 because we seemed to have hit the maximum. Max_aj_tasks is 1,000,000. | > > | These variables (max_aj_instances and max_aj_tasks) are in cluster configuration. Max_aj_instances used to be 20,000; KLB changed it to 100,000 on 10/13/06 because we seemed to have hit the maximum. (This made the cluster essentially unresponsive for about an hour afterwards; I'm not sure it was a good idea...) Max_aj_tasks is 1,000,000. | |
-- FrankHutter - 11 Oct 2006 |
|
META TOPICPARENT |
name="SunGridEngine" |
Sun Grid Engine Administration | | This is the policy that is used to determine how competing jobs are scheduled when they fall within the same project.
To change the share tree policy, go to Policy Configuration and click on Share Tree Policy. Right now, for each project (eh, ea, Urgent, etc), there is a node in the graph with a leaf called default. (If you don't see the leaf, double-click on the node to open it up.) Under this default leaf, SGE automatically adds all users in that project--they're listed inside. What this does is to ensure that all users in the project get the same priority, so that SGE will ensure that each user gets the same amount of CPU time (not the same number of jobs) within the same time window. Of course, the share tree policy doesn't have to share resources evenly. You can add another leaf named after a specific user to give them extra shares (they're proportional to the entry for Shares). | |
< < | Adding a new user: | > > | Adding a new user | |
Go to User Configuration, click on Userset, select the appropriate Userset, click on modify and enter the username. The following usersets are mutually exclusive (a user only needs to be added to one, and should be added to the one which is highest on the list to which he belongs): | | The other usersets are different: a user who belongs to one of the above sets can also belong to them. It is important to add every user to the userset called arrow--if you do not, the user will be able to submit jobs but they may not ever run. Users need to belong to urgentusers to submit urgent jobs. | |
> > | Consumables
What if you want to change the number of available matlab licenses, urgent queues or CPLEX instances? You would be tempted to go into "complex configuration" and change the value "default" on the consumable's definition. However, this would break everything, so don't do it! Instead, go to "Host configuration", then choose the "execution host" tab and select the host "global". Then under "consumables/fixed attributes" you'll see the consumables: matlab, cplex, urgent. Change the totals here!
How to find out how many available matlab licenses there are? In UNIX, type:
/cs/local/generic/lib/pkg/matlab-7.2/etc/lmstat -a
The 'matlab-7.2' part may change as new versions of matlab become available...
Maximum array job instances and tasks
These variables (max_aj_instances and max_aj_tasks) are in cluster configuration. Max_aj_instances used to be 20,000; KLB changed it to 100,000 on 10/13/06 because we seemed to have hit the maximum. Max_aj_tasks is 1,000,000. | | -- FrankHutter - 11 Oct 2006 |
|
META TOPICPARENT |
name="SunGridEngine" |
| |
< < | | > > | Sun Grid Engine Administration
For general information about using Sun Grid Engine, see SunGridEngine. | |
Checking the queque
qstat gives basic information about the jobs in the queque
qstat -ext gives a bit of extra information, such as a job's project
qstat -j <jobnumber> gives you very detailed information about that job | |
> > | Is there a way to change the project of a running job? That could be useful someday...
Override Policy
This is the primary mechanism that is used to determine which jobs are dispatched. Go to Policy Configuration and then Override Policy. Choose "project" from the dropdown, and you'll see all the SGE project names with the number of override tickets they get. They should always be multiples of 10,000: this ensures that override tickets trump share tree tickets (of which there are 9,000, as set on the main policy configuration page). Higher priority projects preempt lower priority ones--as long as their tickets are multiples of 10,000, no jobs will be run from a lower-priority project while pending jobs from a higher-priority project exist. You can modify the number of tickets a project is given here, but you can't create a new one. To do that, go to "Project Configuration" from the main qmon dialog. | | Share Tree Policy | |
< < | To change the shrae tree policy, go to Policy Configuration and click on Share Tree Policy.
Right now, for each project (eh, ea, Urgent, etc), there is a node in the graph with a leaf called default.
When you add this default leaf, SGE automatically adds all users in that project.
You can add another leaf with a user to give them extra shares (they're proportional to the entry for Shares) - this will only work within the same project. Higher priority projects always trump lower priority projects. | > > |
This is the policy that is used to determine how competing jobs are scheduled when they fall within the same project.
To change the share tree policy, go to Policy Configuration and click on Share Tree Policy. Right now, for each project (eh, ea, Urgent, etc), there is a node in the graph with a leaf called default. (If you don't see the leaf, double-click on the node to open it up.) Under this default leaf, SGE automatically adds all users in that project--they're listed inside. What this does is to ensure that all users in the project get the same priority, so that SGE will ensure that each user gets the same amount of CPU time (not the same number of jobs) within the same time window. Of course, the share tree policy doesn't have to share resources evenly. You can add another leaf named after a specific user to give them extra shares (they're proportional to the entry for Shares). | |
Adding a new user: | |
< < | Go to User Configuration, click on Userset, choose the appropriate "Department", click on modify and enter the username. Also add the user to the arrow "Access list". | > > |
Go to User Configuration, click on Userset, select the appropriate Userset, click on modify and enter the username. The following usersets are mutually exclusive (a user only needs to be added to one, and should be added to the one which is highest on the list to which he belongs):
- EH-Models
- EmpiricalAlgorithmics
- KLB
- General
The other usersets are different: a user who belongs to one of the above sets can also belong to them. It is important to add every user to the userset called arrow--if you do not, the user will be able to submit jobs but they may not ever run. Users need to belong to urgentusers to submit urgent jobs. | |
-- FrankHutter - 11 Oct 2006 |
|
> > |
META TOPICPARENT |
name="SunGridEngine" |
Checking the queque
qstat gives basic information about the jobs in the queque
qstat -ext gives a bit of extra information, such as a job's project
qstat -j <jobnumber> gives you very detailed information about that job
Share Tree Policy
To change the shrae tree policy, go to Policy Configuration and click on Share Tree Policy.
Right now, for each project (eh, ea, Urgent, etc), there is a node in the graph with a leaf called default.
When you add this default leaf, SGE automatically adds all users in that project.
You can add another leaf with a user to give them extra shares (they're proportional to the entry for Shares) - this will only work within the same project. Higher priority projects always trump lower priority projects.
Adding a new user:
Go to User Configuration, click on Userset, choose the appropriate "Department", click on modify and enter the username. Also add the user to the arrow "Access list".
-- FrankHutter - 11 Oct 2006 |
 Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Ideas, requests, problems regarding TWiki? Send feedback
|