Author Archives: admin

NetApp Storage Migration Lessons

One of my latest projects is to consolidate six old NetApp Filers and migrate a total of 30 TB of data to a new NetApp Filer cluster, FAS 3240C. The project started several months ago and it is almost complete. Only one out of six NetApp filers is left to migrate.

I have done several storage migrations in the past, and there are always new lessons to learn in terms of the technology, migration strategy and processes, and the people involved in the project. Here are some of the lessons that I learned:

  1. As expected, innovations in computer technology move too fast and storage technology is one of them. IT professionals need to keep pace or our skills become irrelevant. I learned storage virtualization, NetApp fast cache, and snapmirror using smtape, among many other new features.
  2. Migration strategy, planning, and preparation take more time than the actual migration itself. For instance, one filer only took an hour and a half to migrate. However, the preparations such as snapmirroring, re-creating NFS and CIFS shares, making changes in users login scripts, making changes in several applications, and many other pre-work were done several days before the actual migration. The actual migration is actually just to catch up with the latest changes in the files (ie snapmirror update), and flipping the switch.
  3. People, like many other big IT projects, are always the challenging part. The key is to engage the stakeholders (business users, application owners, technical folks) early on in the project. Communicate with them the changes that are happening and how their applications and accesses to their data will be affected. Give them time to understand and make changes to their applications. Tell them the benefits of the new technology and communicate often the status of the project.

Performing maintenance tasks on vmware hosts

There are times when you need to perform hardware maintenance (such as adding a new Network Interface Card [NIC]) on VMware hosts, or the host simply disconnects from vCenter.  The only way to perform maintenance is to shutdown or reboot the hosts.  To minimize damage, here’s the procedure I use:

  1. Run vSphere client on the workstation.  Do not use the vSphere client on the servers. The reason being – a server might be a virtual machine (VM) which will go down.
  2. Using vSphere client, connect to VMware host, *not* the vCenter server.
  3. Login as user root.
  4. Shutdown all the VM’s, by right clicking the VM, selecting Power, Shutdown Guest.  This is faster than logging in to each machine using RDP and shutting it down.  The vmtools though have to be up to date, or else the Shutdown Guest option will be grayed out. If Shutdown Guest is grayed out, you need to login to the VM to shut it down.  Performing “Power Off” on the VM should be the last resort.
  5. Once all the VM’s are powered down, right click on the VMware host and select Enter Maintenance Mode.
  6. Go to the console of the VMware host, and press Alt-F11 to get the login prompt.
  7. Login as root.
  8. Issue the command “shutdown -h now” to power down the host.  If you just want to reboot, issue the command “shutdown -r now”.
  9. Wait until the machine is powered off.
  10. Perform maintenance.
  11. Power on the VMware host.  Look for any problems on the screen.  The equivalent of blue screen in VMware is purple screen.  When there’s a purple screen, that means there is something very wrong.
  12. When the VMware host is all booted up, go back to your workstation, and connect using vSphere client to the VMware host.
  13. Right click on the Vmware host first, and select “Exit Maintenance Mode”
  14. Power On all the VM’s.

If there are multiple VMware hosts, and Vmotion is licensed and enabled (i.e. Enterprise License), you can vmotion VMs to the other hosts, and perform maintenance.  When the host gets back, you can vmotion back the VM’s to the host, and do the same maintenance on the other.

 

Reinstalling a Node on a Scyld Beowulf cluster

This writeup describes how to restore a node back to the cluster after a node hard disk has been wiped out due to hardware error.

I was prompted to write this instruction because one of the nodes in our cluster failed. After the hardware has been replaced, I tried to put it back to the cluster, however, I was not able to. I tried to follow the instructions to no avail. I also posted a message to the scyld beowulf mailing list but I did not get any response.

Anyway, I was trying to add the node back to the cluster. Using beosetup, the new MAC address was registered as node 0. I tried to partition the disk using the beofdisk tool, then I restarted the node. Here’s the output:

# beofdisk -w -n 0

Disk /dev/hda: 4865 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/hda1 * 0+ 0 1- 8001 89 Unknown
/dev/hda2 1 516 516 4144770 82 Linux swap
/dev/hda3 517 4864 4348 34925310 83 Linux
/dev/hda4 0 - 0 0 0 Empty
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/hda1 * 63 16064 16002 89 Unknown
/dev/hda2 16065 8305604 8289540 82 Linux swap
/dev/hda3 8305605 78156224 69850620 83 Linux
/dev/hda4 0 - 0 0 Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd (1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
The partition table on node 0 has been modified.
You must reboot each affected node for changes to take effect.

# beoboot-install 0 /dev/hda
Creating boot images...
Installing beoboot on partition 1 of /dev/hda.
mke2fs 1.32 (09-Nov-2002)
/dev/hda1: 11/2000 files (0.0% non-contiguous), 268/8001 blocks
Done

rcp: /boot/boot.b: No such file or directory
Failed to copy boot.b to node 0:/tmp/.beoboot-install.mnt

After rebooting, it came out with an ERROR state on the BeoSetup window. Here’s the log:

node_up: Initializing cluster node 0 at Wed Mar 9 15:44:55 EST 2005.
node_up: Setting system clock from the master.
node_up: Configuring loopback interface.
node_up: Loading device support modules for kernel version 2.4.27-294r0048.Scyldsmp.
setup_fs: Configuring node filesystems using /etc/beowulf/fstab...
setup_fs: Checking /dev/hda2 (type=swap)...
chkswap: /dev/hda2: Unable to find swap-space signature
setup_fs: FSCK failure. (OK for RAM disks)
setup_fs: Mounting /dev/hda2 on swap (type=swap; options=defaults)
swapon: /dev/hda2: Invalid argument
setup_fs: Failed to mount /dev/hda2 on swap (fatal).

So, to solve this problem, you have to do 2 extra steps before rebooting the node. After executing beoboot-install, you should execute bpsh mk2fs -j on the data partitions and bpsh mkswap on the swap partition, such as

# bpsh 0 mk2fs -j /dev/hda3
# bpsh 0 mkswap /dev/hda2