This page is part of the EmpiricalAlgorithmics web.
For general information about using Sun Grid Engine, see SunGridEngine.
Is there a way to change the project of a running job? That could be useful someday...
This is the primary mechanism that is used to determine which jobs are dispatched. Go to Policy Configuration and then Override Policy. Choose "project" from the dropdown, and you'll see all the SGE project names with the number of override tickets they get. They should always be multiples of 10,000: this ensures that override tickets trump share tree tickets (of which there are 9,000, as set on the main policy configuration page). Higher priority projects preempt lower priority ones--as long as their tickets are multiples of 10,000, no jobs will be run from a lower-priority project while pending jobs from a higher-priority project exist. You can modify the number of tickets a project is given here, but you can't create a new one. To do that, go to "Project Configuration" from the main qmon dialog.
This is the policy that is used to determine how competing jobs are scheduled when they fall within the same project. To change the share tree policy, go to Policy Configuration and click on Share Tree Policy. Right now, for each project (eh, ea, Urgent, etc), there is a node in the graph with a leaf called default. (If you don't see the leaf, double-click on the node to open it up.) Under this default leaf, SGE automatically adds all users in that project--they're listed inside. What this does is to ensure that all users in the project get the same priority, so that SGE will ensure that each user gets the same amount of CPU time (not the same number of jobs) within the same time window. Of course, the share tree policy doesn't have to share resources evenly. You can add another leaf named after a specific user to give them extra shares (they're proportional to the entry for Shares).
Go to User Configuration, click on Userset, select the appropriate Userset, click on modify and enter the username. The following usersets are mutually exclusive (a user only needs to be added to one, and should be added to the one which is highest on the list to which he belongs):
Go to User Configuration, click on Userset, make sure Department is chosen in the lower left, and click on Add. Then add people as described above (if you want to transfer people from other user sets you have to delete them from those first and then add them to the new one). You can then associate the new user set with projects users are eligible to submit to.
Go to Project Configuration and click on Add. Enter the name of the project and choose user sets or users who are eligible to submit jobs to this project by clicking on the buttons below. E.g., say you want to add a user set: click on the left button, and in the new window that pops up, choose the applicable user sets.
A parellel environment defines a schema for how multiple-CPU jobs are to be run. Run a job in a parallel environment by adding "-pe
Parallel environments can be created using "qconf -ap
What if you want to change the number of available matlab licenses, urgent queues or CPLEX instances? You would be tempted to go into "complex configuration" and change the value "default" on the consumable's definition. However, this doesn't work. (I think all this does is determine how many units of the consumable get used by requests to use the consumable that don't specify a number of units.) Instead, go to "Host configuration", then choose the "execution host" tab and select the host "global". Then under "consumables/fixed attributes" you'll see the consumables: matlab, cplex, urgent. Change the totals here!
How to find out how many available matlab licenses there are? In UNIX, type:
/cs/local/generic/lib/pkg/matlab-7.2/etc/lmstat -aThe 'matlab-7.2' part may change as new versions of matlab become available...
There are 22 CPLEX licenses bought as part of the CFI grant that purchased the cluster. Unless the department buys more someday, that's it...
If you want to create a new consumable, you do want to go to "complex configuration". Give it a name and nickname, make it "int" and "<=", consumable, requestable and unforced, default=0, urgent=0. Then you'll need to set the number of available units through host configuration as above. However, the new consumable won't yet appear as a consumable in the "consumables/fixed attributes" pane to the right when you click on "global" in "execution host". How do you get it there? This is possibly the awesomest interface feat in SGE yet. Click on "modify" (with "global" selected). Click the "Consumable/Fixed Attribute" tab. There's the list of consumables--how do you get a new one to appear? Just click the "name" header (that's right!). You can figure it out from there.
Frank created the memheavy consumable that limits memory-intensive jobs to one per machine. Consumables like this can be implemented by giving each single machine a single consumable. I.e., in "Host configuration, choose "Execution host", and then choose a single machine, such as arrow01.cs.ubc.ca, instead of global. As above, click on modify and put in your consumable and value 1. Unfortunately, this has to be done for each single machine in turn When new machines come in don't forget to give them such a consumable, too.
These variables (max_aj_instances and max_aj_tasks) are in cluster configuration. Max_aj_instances used to be 20,000; KLB changed it to 100,000 on 10/13/06 because we seemed to have hit the maximum. (This made the cluster essentially unresponsive for about an hour afterwards; I'm not sure it was a good idea...) Max_aj_tasks is 1,000,000.
There is a queue for testing purposes set up to run only on arrow01, which is not part of the regular queue. To use this queue, add the syntax -q arrowtest.q and -P eh2 to the qsub command. Your jobs should dispatch immediately as the queue is usually empty. There may be other jobs running on arrow01, however it's OK to overload this (and only this) machine.
Let's keep a log of administration changes made to the cluster, to help us undo bad changes if they occur.
-- FrankHutter - 11 Oct 2006