Booting DHCP for the Big Cluster
To be the DHCP server, you must be on an x86 computer.
DHCP server, follow this sequence:
- Boot up to the BCCD splash screen
- Hit F3, then type framebuffer_mode_number startdhcp (framebuffer_mode_number just refers to what screen resolution to use; 4 is 1024x768.)
- Enter the password we decide on the day of the event.
- Follow directions for trivial-net-setup. Hit Enter to select the highlighted answer and the arrow keys to change the selection.
- Say NO when it asks if it should autoconfigure with DHCP and YES/OK for everything else.
- When it asks for IP addresses, configure as in the examples
. You can just type in the addresses they use in the dialogs, which are
- IP address 192.168.1.1
- netmastk 255.255.255.0
- router address 192.168.1.254
- DNS server 192.168.1.1
- When you get to the option of logging in:
- Login as root, using the password listed at the login prompt
- Change the password. If you are helping the owner, let the owner set the password.
- Copy the example code from wherever to ~bccd/src
- chmod -R bccd ~/src
- df to get a list of the mounted partitions
- umount partition for all of your local drive partitions, (e.g. umount /mnt/rw/discs/disc0/part3/home/fred) Macs don't seem to mount all your local drives.
- exit
- Sign in as bccd, with the password given earlier.
- Answer yes when it asks if you want to run a heartbeat.
Switch to LAM:
- edit ~/.bashrc
- edit the PATH line so that the line reads export PATH=/lam-mpi/bin:$PATH
- write file and quit
- source ~/.bashrc (or log out and log back in)
- For each node, rebuild the library cache:
- su - root (using the root password given)
- ldconfig -v | less
- exit (back to bccd)
- bccd-allowall (Answer yes.)
- bccd-snarfhosts
- recon -v ~/machines (It might take a few tries for this to work, not sure why.)
- lamboot -v ~/machines
You have the option of
Everybody needs to compile the target code.
- bccd-syncdir ~bccd/src ~/machines
- cp -r dirname/cs521.arch ~bccd/cs521 (where dirname is the tmp dir name and arch is either x86 or ppc, e.g. cp -r /tmp/6g2w98s/cs521.x86 ~bccd/cs521)
- cd ~bccd/cs521
- make
- run the program
As more machines come online, you might need to refresh your system state:
- bccd-allowall
- bccd-snarfhosts
- recon -v ~/machines
There is some order dependency that Ducky hasn't quite figured out yet; keep doing those and eventually it will all get settled out.