Reinstalling a Node on a Scyld Beowulf cluster

This writeup describes how to restore a node back to the cluster after a node hard disk has been wiped out due to hardware error.

I was prompted to write this instruction because one of the nodes in our cluster failed. After the hardware has been replaced, I tried to put it back to the cluster, however, I was not able to. I tried to follow the instructions to no avail. I also posted a message to the scyld beowulf mailing list but I did not get any response.

Anyway, I was trying to add the node back to the cluster. Using beosetup, the new MAC address was registered as node 0. I tried to partition the disk using the beofdisk tool, then I restarted the node. Here’s the output:

# beofdisk -w -n 0

Disk /dev/hda: 4865 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/hda1 * 0+ 0 1- 8001 89 Unknown
/dev/hda2 1 516 516 4144770 82 Linux swap
/dev/hda3 517 4864 4348 34925310 83 Linux
/dev/hda4 0 - 0 0 0 Empty
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/hda1 * 63 16064 16002 89 Unknown
/dev/hda2 16065 8305604 8289540 82 Linux swap
/dev/hda3 8305605 78156224 69850620 83 Linux
/dev/hda4 0 - 0 0 Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd (1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
The partition table on node 0 has been modified.
You must reboot each affected node for changes to take effect.

# beoboot-install 0 /dev/hda
Creating boot images...
Installing beoboot on partition 1 of /dev/hda.
mke2fs 1.32 (09-Nov-2002)
/dev/hda1: 11/2000 files (0.0% non-contiguous), 268/8001 blocks
Done

rcp: /boot/boot.b: No such file or directory
Failed to copy boot.b to node 0:/tmp/.beoboot-install.mnt

After rebooting, it came out with an ERROR state on the BeoSetup window. Here’s the log:

node_up: Initializing cluster node 0 at Wed Mar 9 15:44:55 EST 2005.
node_up: Setting system clock from the master.
node_up: Configuring loopback interface.
node_up: Loading device support modules for kernel version 2.4.27-294r0048.Scyldsmp.
setup_fs: Configuring node filesystems using /etc/beowulf/fstab...
setup_fs: Checking /dev/hda2 (type=swap)...
chkswap: /dev/hda2: Unable to find swap-space signature
setup_fs: FSCK failure. (OK for RAM disks)
setup_fs: Mounting /dev/hda2 on swap (type=swap; options=defaults)
swapon: /dev/hda2: Invalid argument
setup_fs: Failed to mount /dev/hda2 on swap (fatal).

So, to solve this problem, you have to do 2 extra steps before rebooting the node. After executing beoboot-install, you should execute bpsh mk2fs -j on the data partitions and bpsh mkswap on the swap partition, such as

# bpsh 0 mk2fs -j /dev/hda3
# bpsh 0 mkswap /dev/hda2

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.