FC2 DRBD Root Failover HOWTO
From OpenSSI
This HOWTO will briefly describe the basic steps required to set up a 2-node cluster featuring DRBD root FS failover. I have run into quite a few caveats during the process that would definitely be of interest to anyone else trying to achieve this. I'm a newbie at both clustering and DRBD, and I have learned much in the past 4 sleepless nights. Perhaps, with this guide, you can avoid many of my pitfalls.
I. Hardware Requirements
Not that you have to use this setup, but here's what I had laying around:
- 2 computers, each with 2 NICs and a hard drive, with Post-It notes to label them "Node 1" and "Node 2"
II. Software Requirements
Please carefully note the version numbers for the following software packages. The latest DRBD-SSI package will NOT compile on Fedora Core 2.
- Fedora Core 2
- openssi-fc2-1.2.2.i686.tar (The OpenSSI 1.2.2 package for Fedora Core)
- drbd-ssi-1.2.2-20050712.tar.bz2 (This can be found in the contrib section)
- kernel-ssi-source-2.4.22-1.2199.nptl_ssi_9.i686.rpm (Kernel source for building DRBD module)
III. Install Fedora Core 2 on Node 1
You only have to install FC2 on one box. Node 2 will retrieve the image from the first node if we're lucky! Create separate partitions for boot and root, and an extra partition for the DRBD index. I used the following partition structure:
- /boot => /dev/hda1 (100MB)
- swap => /dev/hda2 (1GB)
- / => /dev/hda3 (20GB)
- [Extended] => /dev/hda4 (Automatically created by Disk Druid)
- /dev/drbd-index => /dev/hda5 (512MB, but only 128MB is required according to documentation)
Give your box a hostname, configure both NICs with static IP addresses, and disable the firewall for now. For example, I used:
- Hostname: node1
- eth0: 192.168.2.201 (external network)
- eth1: 10.10.10.1 (cluster interconnect)
When installation is complete, reboot your machine.
IV. Prepare Software and Install OpenSSI
Copy all the packages mentioned in II. Software Requirements to /usr/src, and:
su - cd /usr/src tar xvf openssi-fc2-1.2.2.i686.tar cd openssi-fc2-1.2.2.i686 ./install
Enable verbose prompts and carefully read each option. Two important notes:
- Make sure you choose the correct cluster interconnect interface. This is the NIC that will be connected to Node 2.
- Make sure to enable root failover.
Reboot when installation is complete to load your shiny new SSI-enhanced kernel.
V. Prepare Linux Kernel Source
The reason we need the Linux kernel is because we have to build the DRBD module against it. So, to install and prepare the kernel source that matches the kernel you are now running:
su - cd /usr/src rpm -Uvh kernel-ssi-source-2.4.22-1.2199.nptl_ssi_9.i686.rpm cd linux-2.4-ssi make mrproper cp /boot/config-2.4.22-1.2199.nptl_ssi_9smp .config make oldconfig vi Makefile (Change line 4 from "EXTRAVERSION = -1.2199.nptl_ssi_9custom" to "EXTRAVERSION = -1.2199.nptl_ssi_9smp") make dep
Now your kernel source is prepared, and we are ready to build the DRBD module.
VI. Build the DRBD Module
Moving right along from above:
cd /usr/src tar jxvf drbd-ssi-1.2.2-20050712.tar.bz2 cd drbd-ssi-1.2.2-20050712 cp mkinitrd /sbin (Confirm overwrite when prompted) cp SSIfailover /etc/init.d (Confirm overwrite when prompted) cp rc.sysrecover /etc/rc.d (Confirm overwrite when prompted) cd drbd make clean all (Should return "Build successful.") make install
VII. Edit /etc/drbd.conf
I made the following changes:
- Change the "resource r0 {" line to read "resource root {"
- Change "on amd {" to "on node1 {"
- Change "disk /dev/hde5" to "disk /dev/hda3"
- Change "address 192.168.22.11:7788;" to "address 10.10.10.1:7788;" (or your internal IP)
- Change "meta-disk internal" to "meta-disk /dev/hda5[0];"
- Change "on alf {" to "on node2 {"
- Repeat the identical changes shown for node1 in the node2 section
- Delete the remainder of the file, which is additional device definitions we won't be using
VIII. Prepare the DRBD Partition
When I installed FC2, I formatted an ext3 partition labeled "drbd-index." That won't work for DRBD. Now I will need to prepare the DRBD index partition so that it can store DRBD meta data:
vi /etc/fstab (Remove the line that reads "LABEL=/drbd-index /drbd-index ext3 defaults,node=1 1 2") umount /drbd-index rmdir /drbd-index dd if=/dev/zero of=/dev/hda5 (This may take a moment)
IX. Preparing DRBD Boot Parameters
As root, continuing from above:
modprobe drbd mkinitrd --drbd --cfs -f /boot/initrd-`uname -r`.img `uname -r` (Don't forget the .img!) vi /etc/grub.conf (Add "drbd" to the line that reads "kernel /vmlinuz-2.4.22-1.2199.nptl_ssi_9smp ro root=UUID=1328dbae-dff5-43e3-a2a7-7b417f7b2af9 panic=15" to make it "kernel /vmlinuz-2.4.22-1.2199.nptl_ssi_9smp ro drbd root=UUID=1328dbae-dff5-43e3-a2a7-7b417f7b2af9 panic=15") ssi-ksync vi /etc/fstab (Replace the "UUID=..." part to "/dev/drbd/0", i.e. from "UUID=1328dbae-dff5-43e3-a2a7-7b417f7b2af9 / ext3 chard,defaults,node=1 1 1" to "/dev/drbd/0 / ext3 chard,defaults,node=1 1 1")
X. Finalizing the DRBD Root Failover
Now reboot. You will see some messages about DRBD. Let it continue to boot until it begins counting on the screen. This counter is counting up the seconds it is going to wait for a peer node to appear. It will count to 120 seconds before continuing. Type "yes" to abort the counting this time. Your machine will attempt to mount the root filesystem but will fail because it is not a primary node. Then it will read "Press Enter within 10 seconds for shell." So hit Enter. Within 10 seconds. :) When the shell appears, type:
drbdadm -s /bin/drbdsetup -- --do-what-I-say primary root exit
Now the root partition should be picked up and booted. You are now running with the root as a DRBD primary. This procedure is only necessary the first time you boot the device under drbd.
When done booting up, log in and type:
cat /proc/drbd
This command will show you the status of your DRBD system, including whether the other nodes have synced up properly to the main image yet.
XI. Adding & Configuring Node 2
If Node 2 does not support PXE booting (booting from the network), then you will need to prepare an Etherboot CD for it. Go to http://rom-o-matic.net to produce an ISO which you can burn to CD. To determine which card you have, try lspci -v and then find your card's code in the list.
Anyhow, turn Node 2 on and boot from the Etherboot CD so that it is trying to boot from the network. During this time, go to Node 1 and:
- Execute "ssi-addnode" to add Node 2
- The node name will be "node2"
- Be sure to enable root failover
Once this is complete, wait for Node 2 to join the cluster. You can reboot Node 2 if you're extremely impatient. Once Node 2 joins, you will need to partition the hard drive of Node 2 to match that of Node 1. Use "fdisk /dev/hda" to edit and make the partition tables match. When you are done:
- Execute "ssi-chnode" to change Node 2
- Set a new boot device of "/dev/hda1"
The boot device of Node 2 will be updated. Once it is complete, do the following on Node 2:
onnode 2 mkswap /dev/hda2 grub --device-map=/boot/grub/device.map
You will now be in the grub editor. Type:
root (hd0,0) setup (hd0) quit
This will ensure that the boot loader is installed on hda of Node 2, (so that it is able to boot). The failover node will not boot from the network anymore. It will boot from its own hard drive, and depending on whether it finds a root node on the cluster, it will either join the root node or become the root node.
Eject the CD in Node 2 so that it will boot from the hard drive. Reboot and when the node boots and joins the cluster, it will immediately begin syncing to Node 1. You can "cat /proc/drbd" to determine the progress of the syncing. Drbd is ready for failover once all drbd devices show "ld:Consistent".
XII. Finishing Up
Once the disks are completely sync'd, failover can occur. After both disks are "consistent" and match, make sure that the SSIfailover service is being started at boot, and that it is currently running:
chkconfig SSIfailover on service SSIfailover start
Final Thoughts
I personally got a lot of lost packets and network flakiness when using DRBD. It may be normal and may require adjustments so that the network settings are more forgiving, or maybe my hardware just sucks. I don't have Gigabit ethernet. Regardless, I have chosen instead to use rsync to sync up the drive in Node 2 at scheduled nighttime intervals, which will hopefully keep performance high and the system as stable as possible, while still allowing failover.

