Big Cluster Project 11 April 2006

(Note: this page is under construction and hasn't been announced yet, so is in massive flux.)

Come help us make a big parallel cluster on 11 April 2006 from 3:15 in the 7th floor lounge!

  • If you are in CS 521, you'll get five points credit for participating (details below).
  • It's after instruction ends but before exams start, so you have no excuse.
  • It'll be right before Tuesday Tea.
  • Brett has a budget from Google for pizza that he's eager to spend on this event.
  • PICK It'll be fun! PICK

We want to get a lot of computers, where "a lot" is on the order of 40.

We want to run a cool parallel program on the cluster. If you have a suggestion, add it to the list below. The suggestions we have so far

  • Finding large prime numbers
  • LINPACK
Something graphical -- something interesting to watch -- would be more fun than something purely textual like prime numbers.

CS 521 credit

To get CS 521 credit, you need to do the following:
  • Burn a bootable BCCD CD (see below).
  • Boot your laptop from that CD at home to make sure that the burn was correct, that your system will boot, etc.
    • Note what kind of network card you have so that you can boot faster on the spot.
  • Bring your computer(s) and the CD to the 7th floor lounge by 3:15. Laptops are preferable, since we might be limited in the number of power outlets we will have.
    • If you don't have a laptop, you can bring in your desktop; we will have at least one monitor that we can swap around.
    • If you don't have a laptop, you can borrow one for four hours (not renewable) from Koerner Library or Woodward Library, Circulation Desk, entrance level.
  • Be fully prepared to boot your own computer, and to help out anyone who showed up who is not in 521 (i.e. who has even less experience than you have).

Making a bootable CD

There are a number of different distributions of MPI/LAM, but we will use BCCD. It is specifically designed for educational use, and has a PPC version so that Mac users can play too.

Windows/x86 users, use http://bccd.cs.uni.edu/BCCD-Images/bccd-2.2.1c.iso. PPC users, use http://bccd.cs.uni.edu/BCCD-Images/BCCD-2.2-BETA-IMAGES/bccd-ppc-2006-02-21T04-0500.iso.

It is important to burn the CD in "raw" format. (A raw image contains special error detection and correction codes that the CDR software will generate as it is writing the ISO image to the CD. You cannot simply add the image file to the CD layout as you would when recording normal files off your hard drive.) You can't just use the standard method of burning a file onto disk like you would an MP3.

Linux

Making a bootable CD on Linux is nowhere near as easy as it ought to be. It's probably not something that you do all the time, so you might have challenges with media, drivers, kernels, applications, or all of the above.

On Linux, there are some unspecified issues with cdrecord for kernels newer than 2.5; there is some documentation on the Web that says they have been fixed in 2.6.11, but DuckySherwood had problems even with 2.6.11.

If you have trouble burning from Linux, you can is burn a CD from Windows and then boot that from Linux. Note that if everything is working, you should not need to burn anything except an iso file (Windows) or an iso.img file (Linux) to your CD.

  • Note that if everything is working, you should be able to stick the CD in your drive and restart; you should not need to hold down magic key combinations.
  • If you do have trouble, you might need to change the boot device order in your BIOS.
  • If you have trouble, see also the note in the Mac section below about "raw" mode.
  • Ducky successfully burned BCCD from Windows and booted on Linux (Kannotix distro, 2.6.11 kernel) using just what Windows offered (::RecordNow, which appears to be distributed with IBM Thinkpads).
  • Ducky found that she had trouble booting from the CD when she had previously booted into Linux from her hard drive -- she had to boot into Windows, then reboot with the CD in the tray.

Windows

Windows should be easy. Just burn http://bccd.cs.uni.edu/BCCD-Images/bccd-2.2.1c.iso (in raw format) and reboot. You might need to change the boot device order in your BIOS.

Mac

Burn http://bccd.cs.uni.edu/BCCD-Images/BCCD-2.2-BETA-IMAGES/bccd-ppc-2006-02-21T04-0500.iso to disk in raw format.

Here are instructions from mkLinux on how to burn a raw image:

  1. Download a BCCD image
  2. Open Disk Copy, which can be found in /Applications/Utilities
  3. Choose "Burn Image..." from the "Image" menu.
  4. Locate the image you wish to burn (i.e. maindisk.img).
  5. Insert a blank CD-R or CD-RW.
  6. Click the "Burn" button!

This is slightly different from the UI Ducky found with version 10.2 (Panther) of OS X:

  1. Download a BCCD image
  2. Open Disk Utility from /Applications/Utilities
  3. Choose "Burn..." from the "Images" menu
  4. Select the .iso BCCD image from the file picker
  5. When it prompts you for a blank CD, insert one
  6. Click the "Burn" button!

Note that when you boot from the CD on your Mac, you must hold down the "c" button to make it boot from the CD.

Ducky had trouble rebooting into OS X immediately after booting into BCCD. She found that Control-Command-Power after an non-starting boot worked.



On the Day Of

On the day of the event, several things need to happen.

Equipment

We need some extra equipment:
  • Switches -- Alan Wagner will bring those
  • Spare CAT5 cables (from where? @@@)
  • A drive somewhere with the code to run (source, since we'll have Macs on the network as well)

Boot

DHCP Server: One machine needs to boot as a DHCP server. That must be an x86 machine, not a Mac.

DHCP server, follow this sequence:

  • Boot up to the BCCD splash screen
  • Hit F3, then type framebuffer_mode_number startdhcp (framebuffer_mode_number just refers to what screen resolution to use; 4 is 1024x768.)
  • Enter the password we decide on the day of the event.
  • Follow directions for trivial-net-setup. Hit Enter to select the highlighted answer and the arrow keys to change the selection.
    • Say NO when it asks if it should autoconfigure with DHCP and YES/OK for everything else.
    • When it asks for IP addresses, configure as in the examples. You can just type in the addresses they use in the dialogs, which are
      • IP address 192.168.1.1
      • netmastk 255.255.255.0
      • router address 192.168.1.254
      • DNS server 192.168.1.1

All the other machines need to be clients. On the Mac, it will boot you straight through the boot sequence. On x86 machines:

  • Boot up to the BCCD splash screen.
  • Hit F3
  • Type framebuffer_mode_number nodemode (framebuffer_mode_number just refers to what screen resolution to use; 4 is 1024x768.)
@@@ check nodemode@@@

Everybody then:

  • Enter the password we give you on the day of the event.
  • Follow directions for trivial-net-setup; say YES/OK to everything.
  • When you get to the option of logging in:
    • Login as root, using the password listed at the login prompt
    • Change the password. If you are helping the owner, let the owner set the password.
    • df to get a list of the mounted partitions
    • umount partition for all of your partitions, (e.g. umount /mnt/rw/discs/disc0/part3/home/fred)
    • exit
  • Sign in as bccd, with the password given earlier.
  • Answer yes when it asks if you want to run a heartbeat.

Switch to LAM

Everybody continue on to switching to LAM:
  • edit ~/.bashrc (vi, emacs, and pico should all be available -- check@@@)
    • edit the PATH line so that the line reads export PATH=/lam-mpi/bin:$PATH
    • write file and quit
  • source ~/.bashrc (or log out and log back in)
  • For each node, rebuild the library cache:
    • su - root (using the root password given)
    • ldconfig -v | less
    • exit (back to bccd)
  • bccd-allowall (Answer yes.)
  • bccd-snarfhosts
  • recon -v ~/machines
  • lamboot -v ~/machines @@@ not sure if we want people to do this here or make it part of the makefile

Windows users have the option of

  • startx
but Mac users, your trackpad might not work; you might get stuck and hosed. Don't startx.

Compile the target code

Everybody needs to compile the target code.
  • The DHCP server machine needs to run bccd-syncdir source_dir_name ~/machines, where source_dir_name is the directory with the code. Note that there might need to be two versions.
  • Everyone: the code will show up in a directory named something like /tmp/6g2w98s.
  • cp -r dirname/cs521.arch ~bccd/cs521 (where dirname is the tmp dir name and arch is either x86 or ppc, e.g. cp -r /tmp/6g2w98s/cs521.x86 ~bccd/cs521)
  • cd ~bccd/cs521
  • make
  • wait for instructions, eat pizza, etc.

As more machines come online, you might need to refresh your system state:

  • bccd-allowall
  • bccd-snarfhosts
  • recon -v ~/machines
There is some order dependency that Ducky hasn't quite figured out yet; keep doing those and eventually it will all get settled out.

Mac users, if you have trouble rebooting into OS X immediately after booting into BCCD, try holding Control-Command-Power after an non-starting boot.



Setup needs

There are a few one-of things to do ahead of time:
  • Instructions on what everyone needs to bring -- Ducky
  • Instructions for what to do on the day of -- Ducky
  • Locate switches -- Alan
  • Locate CAT5 cables -- Ducky
  • Reserve a room with an adequate amount of tables and space
    • Estimate head/node count, adjust room if needed
  • Post information to class -- Ducky w/input from Alan
  • Post information more widely -- Ducky w/input from Alan
  • Figure out what code to run -- class, Brett
  • Make sure code compiles on Macs and Linux both with near-forehead install (meaning two directories) -- Ducky

On the day of:

  • Switches -- Alan
  • Extra CAT5 cables (?)
  • Pizza -- Google via Brett
  • Printouts of instructions -- Ducky (maybe a projector with instructions as well?)
  • DHCP server boot
  • Put code in ~bccd/cs521
  • Tags for computers and cables to indicate ownership
  • Duct tape (?) to tape cables down onto the floor avoid if at all possible
  • Power strips


People who are known to be coming:
  • Ducky (1 x86, 1 PPC)
  • Alan Wagner (1 x86?)
  • Brett (PPC)
  • Andrew C (x86)
  • Nels (x86)
  • Allan R (way old x86) not in class
  • Karen Parker (PPC)
  • Ivan (x86)
  • Ying (x86)
  • Peng Li(x86 Laptop 2.0Hz Centrino, 1GB)
  • Tristram (PPC)
  • Andrei (x86)
  • LeCuong (x86)


This topic: Main > TWikiUsers > DuckySherwood > DuckyHomework > BigClusterProject20060411
Topic revision: r16 - 2006-04-05 - LeCuong
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback