Tags:
create new tag
view all tags

WestGrid - quick user guide

This page is part of the EmpiricalAlgorithmics web.

Introduction

WestGrid operates high performance computing (HPC), collaboration and visualization infrastructure across western Canada. It encompasses 14 partner institutions across four provinces.
An extensive overview of WestGrid can be found at the WestGrid website. You also can read the QuickStart Guide for New Users at http://www.westgrid.ca/support/quickstart/new_users

How to get a WestGrid account?

  1. Lead researcher: You will select this option if you are the leader of a project (research group). You will be asked to enter some information about the nature of your research project before you apply for your user account. You will be given a Project ID Number that other collaborators in your group may cite when applying for their user accounts. Note that only faculty members can be project leaders.
  2. Join an existing project (research group): This option is for researchers that are supervised by a leading researcher (for example, a student who is working for a professor.) To join a pre-existing group, you will need to obtain the Project ID Number from the project's leader and enter the number on the form. The project leader will be asked to verify the information you submit. You can look up project ID numbers using the web page https://rsg.nic.ualberta.ca/project_lookup.php.

To apply for an account, proceed to the Account Request page https://rsg.nic.ualberta.ca/.

What will you get in the next a few days?

After you submit your application, you will get a few e-mails from WestGrid.

  1. WestGrid Account Application Received: WestGrid Account Management received your application.
  2. Asking Permission from Project Leader: If you are willing to join an existing project (usually is the case for students), WestGrid will send an e-mail to the project leader asking for conformation.
  3. WestGrid Application Accepted: Your application for a WestGrid account has been approved.
  4. WestGrid account created: Westgrid has set up a Westgrid account for you on silo.westgrid.ca and hopper.westgrid.ca. Those are storage servers for medium and long-term data storage. For more information about using Silo and Hopper, please visit http://westgrid.ca/support/quickstart/silo. Note: The shell on Silo is restricted; it can only be used for managing and downloading files. You cannot run programs or scripts on Silo.
  5. Welcome to cluster name_: Your account on the _cluster name has been activated. In my case, the cluster name is glacier.westgrid.ca. Note: the file system for storage and cluster is different. You can not directly access files stored in storage server from cluster. You will need to use gcp to copy files between different WestGrid machines.

How to transfer my files to/between WestGrid?

Assume my host machine is okanagan.cs.ubc.ca, my WestGrid storage server is silo.westgrid.ca and my cluster in WestGrid is glacier.westgrid.ca. The file I want transfer is test.txt.

  • Transfer files between WestGrid machines (from glacier.westgrid.ca to silo.westgrid.ca)
        gcp test.txt username@silo.westgrid.ca:~/
       
  • Transfer files between your local machine to WestGrid (from okanagan.cs.ubc.ca to glacier.westgrid.ca)
        okanagan:> scp test.txt username@glacier.westgrid.ca:~/
        username@glacier.westgrid.ca's password: password
       
  • If you want write a script to transfer many files from your local machine to WestGrid, entering password will be a problem. Here is the solution: First log in on okanagan.cs.ubc.ca as user username and generate a pair of authentication keys. Do not enter a passphrase:
        okanagan:~> ssh-keygen -t rsa
        Generating public/private rsa key pair.
        Enter file in which to save the key (/ubc/cs/home/username/.ssh/id_rsa): 
        Enter passphrase (empty for no passphrase): 
        Enter same passphrase again:  
        Your identification has been saved in /ubc/cs/home/username/.ssh/id_rsa.
        Your public key has been saved in /ubc/cs/home/username/.ssh/id_rsa.pub.
        The key fingerprint is:
        0e:97:88:0f:86:70:39:8f:44:13:e3:f4:5f:79:32:cd username@okanagan
        
    Go to ~./ssh and transfer id_rsa.pub to glacier.westgrid.ca under .ssh in your home directory.
        scp id_rsa.pub xulin730@glacier.westgrid.ca:~/.ssh/
        
    Add id_rsa.pub to authorized_keys2
        cat id_rsa.pub >> authorized_keys2 
        
    Now, try to use scp to transfer files (no password required).

Running Jobs

A great majority of the computational work on WestGrid systems is carried out through non-interactive batch processing. Job scripts containing commands to be executed are submitted from a login server to a batch job handling system, which queues the requests, allocates processors and starts and manages the jobs. The system software that handles your batch jobs consists of two pieces: a resource manager (TORQUE) and a scheduler (Moab). This system is fairly similar to our SunGridEngine. For detailed information, please visit http://westgrid.ca/support/running_jobs.

A batch job script is a text file of commands for the UNIX shell to interpret, similar to what you could execute by typing directly at a keyboard. The job is submitted to an queue using the qsub command. A job will wait in the queue depending on factors such as system load and the priority assigned to the job. When appropriate resources become available to run a job, it started on one or more assigned processors. A job will be terminated if it exceeds its allotted time limit, or, on some systems, if it exceeds memory limits. By default, the standard output and error streams from the job are directed to files in the directory from which the job was submitted. For detailed information of how to write a job script, please visit http://westgrid.ca/support/running_jobs#directives

A few useful commands:

  • qstat: Check the status of the cluster
  • qsub: Submit jobs to the queue (You can also submit array job such as qsub -t 1-100)
  • qdel: Delete you own jobs in case of something wrong

A few notes if you are using glacier.westgrid.ca (http://guide.westgrid.ca/guide-pages/jobs.html):

Topic revision: r1 - 2009-09-29 - xulin730
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback