VMWareWalkthrough
From OpenSSI
OpenSSI in VMWare Walkthrough
This guide was created by Steven Van Acker (deepstar@singularity.be) and may be freely used. If you make corrections, please be kind enough to drop me a line.
Goal
The goal is to setup 2 machines in failover so that:
* One machine takes over when the other fails * Maintenance is as easy as possible * The setup is transparant: i.e. programs and users think they are working on a single machine
Links
http://openssi.org/cgi-bin/view?page=docs2/1.9/debian/INSTALL.html
First All - Be Careful!!!!
VMWARE uses random MAC Address for each virtual machine unless you modify the default configuration, OpenSSI nodes works based on known MAC Addresses. If you don't setup FIXED MAC Address you will lose all your effort ....
In order to setup a fixed MAC you should edit the file "name_of_your_virtual_machine.vmx" with a text editor, you will find something like :
ethernet0.present = "TRUE"
ethernet0.addressType = "generated"
ethernet0.generatedAddress = "00:0c:29:52:e2:54"
ethernet0.generatedAddressOffset = "10"
then, change addressType to "static", also generatedAddress to Address and erase the line where says generatedAddressOffset.
Note: the MAC address must follow this pattern 00:50:56:XX:XX:XX
A sample of my configuration :
ethernet0.present = "TRUE"
ethernet0.addressType = "static"
ethernet0.address = "00:50:56:00:00:01"
ethernet0.connectionType = "hostonly"
Procedure
The procedure followed is the same as the one described in the OpenSSI Debian install page linked above. A 2.6 kernel will be used. Installation takes place on a virtual machine inside VMWare.
Creating the virtual machine
We create a typical virtual machine in VMWare, and specify Other Linux 2.6.x kernel as the guest operating system.
The virtual machine name will be OpenSSI In Progress
We use bridged networking so we can install Debian over network. The virtual harddisk is 4GB in size. The profile is now created.
In VM settings, we disable:
* floppy drive * sound
Do NOT disable USB, as you will no longer have a keyboard when you reboot.
The CDROM drive is mapped to a CD image: /home/deepstar/debian-31r0a-i386-netinst.iso
We add another network card to the virtual machine, with network type Host-only
The machine is ready to be powered on, so we do just that.
SNAPSHOT: 0001 Machine hardware is configged
Installing Debian
We turn on the machine. It should boot from the CD. When the Debian logo is displayed, we press F3 (Don't forget to get focus on the VMWare window so that keypresses go to VMWare) and type linux26
Debian boots.
* In the Choose language screen, we choose English. * In the Choose Country we choose United States. * In the Select a keyboard layout screen, we choose American English. * The hardware is detected, drivers and components are loaded, network is setup. * In the Configure the network screen, we choose eth0 as primary network interface. * In the Configure the network screen, we choose the hostname host1 * In the Configure the network screen, we choose the domain name kulnet-l * Detecting disks and other hardware and starting partitioner * In the Partition disks screen, we choose Manually edit partition table * We make a 512MB swap partition, a 256MB /boot partition (ext3), and the rest is allocated to / (ext3) * The Debian base system is being installed * When asked Install the GRUB boot loader to the master boot record?, we answer Yes
The virtual machine is rebooted.
SNAPSHOT: 0002 Debian is installed
Debian Configuration
* In the Time zone configuration screen, we select No when asked if the hardware clock is set to GMT. * In the Time zone configuration screen, we select other and then Europe and Brussels as timezone. * A root password is entered and verified. * A regular user is created. * In the Apt configuration screen, we select No when asked to scan for another CD. * In the Apt configuration screen, we select Yes when asked to add another apt source. We select HTTP, then Belgium and then ftp.kulnet.kuleuven.ac.be. No HTTP proxy information is needed. * In the Debian software selection screen, we select nothing (not even manual package selection) * Some packages are installed... * In the Configuring Exim v4 (exim4-config) screen, we select local delivery only; not on a network. Root and postmaster mail recipient is set to our previously created user. * Configuration is done.
SNAPSHOT: 0003 Debian configuration is done
Some useful packages
To work better, we install some handy packages like:
* vim
OpenSSI installation
We now follow the OpenSSI installation instructions from http://openssi.org/cgi-bin/view?page=docs2/1.9/debian/INSTALL.html
2.1. Add the following entries to /etc/apt/sources.list in addition to entries used for Debian installation. deb http://deb.openssi.org/v2 ./ deb-src http://deb.openssi.org/v2 ./
In order to do this with copy-paste, we ssh into the virtual machine on IP 192.168.2.114 and edit the file that way.
2.2. Add following entries to /etc/apt/preferences Package: * Pin: origin deb.openssi.org Pin-Priority: 1001
2.3. Configure http proxy. In the bash shell , you can export environment variable ``http_proxy by setting its value to local proxy server. </pre>
Because we don't use a HTTP proxy, we skip step 3.
SNAPSHOT: 0004 Steps 1 2 3 of OpenSSI install
2.4. Execute: # apt-get update # apt-get dist-upgrade As a part of the dist-upgrade, some of the utilities will be downgraded since OpenSSI needs a modified version of those utilities.
This is the output:
host1:~# apt-get update <some output omitted for brevity> Fetched 85.8kB in 0s (162kB/s) Reading Package Lists... Done host1:~# apt-get dist-upgrade Reading Package Lists... Done Building Dependency Tree... Done Calculating Upgrade... Done The following NEW packages will be installed: libcluster sipcalc The following packages will be upgraded: nfs-common procps The following packages will be DOWNGRADED: bsdutils dpkg dpkg-dev dselect e2fslibs e2fsprogs initrd-tools initscripts libblkid1 libcomerr2 libss2 libuuid1 logrotate mount portmap strace sysv-rc sysvinit util-linux 2 upgraded, 2 newly installed, 19 downgraded, 0 to remove and 0 not upgraded. Need to get 3548kB of archives. After unpacking 1528kB disk space will be freed. Do you want to continue? [Y/n] Y <some output omitted for brevity> Setting up initrd-tools (0.1.74.ssi6) ... Installing new version of config file /etc/mkinitrd/mkinitrd.conf ... host1:~#
If you get a few connection reset errors while downloading packages, restarting the command to download the rest of the packages works, so no problem here.
SNAPSHOT: 0005 After step 4 of OpenSSI install
2.5. Add necessary drivers list to ``/etc/mkinitrd/modules''. Most important drivers are network drivers, depending upon the network cards used in the participating nodes in the cluster (Ex: e100, eepro100 etc.), which would be used while booting cluster nodes.
We need the following drivers in the initrd:
* pcnet32 * ext3 * mptscsih
The /etc/mkinitrd/modules file now looks like this:
host1:~# cat /etc/mkinitrd/modules # /etc/mkinitrd/modules: Kernel modules to load for initrd. # # This file should contain the names of kernel modules and their arguments # (if any) that are needed to mount the root file system, one per line. # Comments begin with a `#', and everything on the line after them are ignored. # # You must run mkinitrd(8) to effect this change. # # Examples: # # ext2 # wd io=0x300 pcnet32 ext3 mptscsih host1:~#
Before we can make the initrd image, we need to load the loop module, and give eth1 its IP-address (172.16.0.200).
host1:~# modprobe loop host1:~# cat /etc/network/interfaces # This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). # The loopback network interface auto lo iface lo inet loopback # The primary network interface auto eth0 iface eth0 inet dhcp auto eth1 iface eth1 inet static address 172.16.0.200 netmask 255.255.255.0 broadcast 172.16.0.255 host1:~# /etc/init.d/networking restart
More information: http://sourceforge.net/mailarchive/message.php?msg_id=11649465
SNAPSHOT: 0006 After step 5 of the OpenSSI install
2.6. Execute: # apt-get install openssi This would install an openssi and create a first node (init node) of the cluster. While creating first node as part of installation using ``ssi-create'', it would display few questions related to cluster setup and they are listed below and expects the installer to answer. Please see the 'known problems' at the end of this document how to consider few error messages that may appear and treat them.
* When asked <tt>On what network interfaces should the DHCP server listen?, we enter eth1 * A few notices are displayed, one of them telling us to config the dhcpd server * The configuration goes on asking: Do you want verbose prompts (y/n) [y]:, we enter y
2.6.1. Enter a node number between 1 and 125. Every node in the cluster must have a unique node number. The first node is usually 1, although you might want to choose another number for a reason such as where the machine is physically located.
We select 1
2.6.2. Select a Network Interface Card (``NIC'') for the cluster interconnect. It must already be configured with an IP address and netmask before it will appear in the list. If the desired card has not been configured, do so in another terminal then select (R)escan. The NIC should be connected to a private network for better security and performance. It should also be capable of network booting, in case anything ever happens to the boot partition on the local hard drive. To be network boot capable, the NIC must have a chipset supported by PXE or Etherboot.
We select eth1
2.6.3. Select (P)XE or (E)therboot as the network boot protocol for this node. PXE is an Intel standard for network booting, and many professional grade NICs have a PXE implementation pre-installed on them. You can probably enable PXE with your BIOS configuration tool. If you do not have a NIC with PXE, you can use the open-source project Etherboot, which lets you generate a floppy or ROM image for a variety of different NICs.
We select PXE
2.6.4. OpenSSI includes an integrated version of Linux Virtual Server (``LVS''), which lets you to configure a Cluster Virtual IP (``CVIP'') address that automatically load balances TCP connections across various nodes. This CVIP is highly available and can be configured to move to another node in the event of a failure. For more information, please see README.CVIP. 2.6.5. Enter a clustername. It should resolve to your CVIP address, either in DNS or the cluster's /etc/hosts file, if you choose to configure a CVIP. This is required if you want to run NFS server. For more information, please see README.nfs-server. The current hostname will automatically become the nodename for this node.
We enter testcluster
2.6.6. Select whether you want to enable root filesystem failover. The root must be installed on (or copied to) shared disk hardware, in order to answer yes to this question. If you do answer yes, then each time you add a new node, you will be asked if the node is physically attached to the root filesystem and if it should be configured as a root failover node (see openssi-config-node later). You can learn more about filesystem failover in README.hardmounts.
We select y
2.6.7. A simple mechanism for synchronizing time across the cluster will be installed. Any time a node boots, it will synchronize its system clock with the initnode (the node where init is running). You can also run the ssi-timesync command at any time to force all nodes to synchronize with the initnode. This timesync mechanism synchronizes nodes to within a second or two of each other. If you need a higher degree of synchronization, you can configure Network Time Protocol (``NTP'') across the cluster. Instructions for how to do this are available in README.ntp.
We use the default.
2.6.8. Automatic process load balancing will be installed as part of OpenSSI. To enable load-balancing for a program, mention it's name in the file ``/cluster/etc/loadlevellist''. The program name ``bash-ll'' has been listed in the file ``/cluster/etc/loadlevellist'' by default. ``bash-ll'' program has not been delivered. So to enable load balancing for every program that runs with bash shell, create a hard link as shown below and execute a program in the shell ``/bin/bash-ll''. # ln /bin/bash /bin/bash-ll
We don't need that yet.
2.6.9. If you want to run X Windows, please see README.X-Windows.
We don't want to run X Windows on this cluster.
The output of the above looks like this (not including the GUI questions):
host1:~# apt-get install openssi
<some output omitted for brevity>
Welcome to OpenSSI clustering!
Let's configure the first node in your cluster.
This configuration tool can either use verbose prompts that
give you extra information about what it is doing, or it can
just ask the necessary questions. Experienced OpenSSI users
might prefer the latter option.
Even if you choose not to use verbose prompts, you can
select '?' at any prompt to get extra information.
Do you want verbose prompts (y/n) [y]:
Every node in the cluster must have a unique node number.
The first node is usually 1, although you might want to
choose another number for a reason such as where the machine
is physically located.
Enter a node number (1-125) or (?) [1]:
Select a network interface for the cluster interconnect.
The network interface must already be configured with an
IP address and netmask before it will appear in the list
below. If the desired card has not been configured, do so
in another terminal then select (R)escan.
The interface should be connected to a private network for
better security and performance. It should also be capable
of network booting, in case anything ever happens to the
boot partition on the local hard drive. To be network boot
capable, the interface must have a chipset supported by PXE
or Etherboot.
Name IP address Netmask Hardware address
---- ---------- ------- ----------------
1) eth0 192.168.2.114 255.255.255.0 00:0C:29:8A:D9:25
2) eth1 172.16.0.200 255.255.255.0 00:0C:29:8A:D9:2F
Select (1-2), (R)escan or (?) [1]: 2
Select (P)XE or (E)therboot as the network boot protocol for
this node. PXE is an Intel standard for network booting, and
many professional grade NICs have a PXE implementation pre-
installed on them. You can probably enable PXE with your BIOS
configuration tool. If you do not have a NIC with PXE, you can
make use of open-source project Etherboot, which lets you generate
a floppy or ROM image for a variety of different NICs.
Select (P)XE, (E)therboot or (?) [E]: P
OpenSSI includes an integrated version of Linux Virtual
Server (LVS), which lets you to configure a Cluster Virtual
IP (CVIP) address that automatically load balances TCP
connections across various nodes. This CVIP is highly
available and can be configured to move to another node in
the event of a failure. For more information, please see
/usr/share/doc/openssi/README.ipvs.
Press Enter to acknowledge:
Enter a name for this Cluster.
Cluster name follows the naming convention derived from RFC 952.
It is a text string with at least 2 characters and a maximum
of 24 characters. It must begin with an alpha character, and
end with with an alpha-numeric character, with zero or more
intervening alpha-numeric and '-' (hyphen) characters.
The clustername should resolve to your CVIP address,
either in DNS or the cluster's /etc/hosts file, if you choose
to configure a CVIP. This is required if you want to run NFS
server. For more information, please see
/usr/share/doc/openssi/README.nfs-server.
The current hostname will automatically become the nodename
for this node.
Enter a clustername or (?): testcluster
Select whether you want to enable root filesystem failover.
The root must be installed on (or copied to) shared disk
hardware, in order to answer yes to this question. If you
do answer yes, then each time you add a new node, you will
be asked if the node is physically attached to the root
filesystem and if it should be configured as a root failover
node. You can learn more about filesystem failover in
/usr/share/doc/openssi/README.hardmounts.
Do you want to enable root failover (y/n/?) [n]: y
A simple mechanism for synchronizing time across the
cluster will be installed. Any time a node boots, it will
synchronize its system clock with the initnode (the node
where init is running). You can also run the ssi-timesync
command at any time to force all nodes to synchronize with
the initnode.
This timesync mechanism synchronizes nodes to within a
second or two of each other. If you need a higher degree of
synchronization, you can configure Network Time Protocol
(NTP) across the cluster. Instructions for how to do this
are available in
/usr/share/doc/openssi/README.ntp.
Press Enter to acknowledge:
Automatic process load-leveling will be installed as part
of OpenSSI. By default, only programs launched from the
bash-ll shell will be load-leveled. The bash-ll shell is
identical to bash, except for having load-leveled enabled.
To enable load-leveling for a program without launching it
from bash-ll, add its program name to
/etc/sysconfig/loadlevellist and run
'/etc/init.d/loadlevel restart'. For more information,
please see /usr/share/doc/openssi/README-mosixll.
Please keep in mind that a few programs do not like being
automatically load-leveled. In particular, it is not a good
idea to make bash-ll your default login shell. Before
reporting problems with running an application on OpenSSI,
first check to see how it runs without load-leveling.
Press Enter to acknowledge:
If you want to run X Windows, please see
/usr/share/doc/openssi/README.X-Windows.
Press Enter to acknowledge:
The following configuration has been entered:
Node number: 1
NIC for interconnect: eth1
IP address: 172.16.0.200
Network hardware addr: 00:0C:29:8A:D9:2F
Network boot protocol: PXE
Local boot device: /dev/sda2
Clustername: testcluster
Root failover: Yes
Potential initnode: Yes
NFS support: yes
(W)rite new configuration or (R)econfigure [W]: W
Making /cluster/nodetemplate for ssi-addnode
Converting /etc/network into a CDSL .....
Converting /etc/nodename into a CDSL .....
Converting /var/run into a CDSL .....
Making /var/run/utmp clusterwide
Converting /var/log into a CDSL .....
Making /var/log/wtmp clusterwide
Making /var/log/lastlog clusterwide
Converting /var/lock into a CDSL .....
Converting /var/lib/urandom into a CDSL .....
Creating /cluster/node1
Making symlink /cluster/node{nodenum} for compatibility
Saving /etc/securetty as /etc/securetty.ssisave
Generating SSI-enhanced /etc/inittab
Saving base /etc/inittab as /etc/inittab.ssisave
Fixing /etc/fstab file...
Initialization of node 1 completed.
For adding other nodes to your OpenSSI cluster, please use
the ssi-addnode command.
For more information about your new cluster, please read
/usr/share/doc/openssi/Introduction-to-SSI.
'''some output omitted for brevity'''
Setting up openssi (1.9.1-0) ...
update-rc.d: /etc/init.d/ipvsadm exists during rc.d purge (continuing)
Removing any system startup links for /etc/init.d/ipvsadm ...
/etc/rc0.d/K20ipvsadm
/etc/rc1.d/K20ipvsadm
/etc/rc2.d/S20ipvsadm
/etc/rc3.d/S20ipvsadm
/etc/rc4.d/S20ipvsadm
/etc/rc5.d/S20ipvsadm
/etc/rc6.d/K20ipvsadm
Removing any system startup links for /etc/init.d/drbd ...
host1:~#
SNAPSHOT: 0007 After step 6 of OpenSSI install
Note: The user should make sure that ipvsadm is not run during bootup. The easy way to do this is to use ``dpkg-reconfigure ipvsadm'' and select 'No' for 'Do you want to automatically load IPVS rules on boot?' and 'None' for ``Select a daemon method.''.
When trying this, the script complains that IPVS is not supported by the kernel.
Reboot the first cluster node.
We do just that.
Preparing for the second node
We continue with the OpenSSI installation guide.
3.1. If the selected NIC does not support PXE booting, download an appropriate Etherboot image from the following URL: http://rom-o-matic.net/5.2.4/ Choose the appropriate chipset. Under Configure it is recommended that ASK_BOOT be set to 0. Floppy Bootable ROM Image is the easiest format to use. Just follow the instructions for writing it to a floppy.
We will use PXE, so no Etherboot image is needed.
3.2. If the node requires a network driver not already mentioned in the file ``/etc/mkinitrd/modules''(in the init node), add the driver name in that file. Then rebuild the ramdisk to include the driver and update the network boot images. # mkinitrd -o <init RD image file> <kernel-version> # ssi-ksync NOTE: initrd and openssi kernel will be installed during openssi installation in the path "/boot" of 'init node'. For PXE boot do following steps manually. # apt-get install syslinux # cp /usr/lib/syslinux/pxelinux.0 /tftpboot It has been obeserved that tftpd-hpa or atftpd would work fine with etherboot or PXE. So it is recommended to install tftp-hpa or atftpd. The entry for ``tftp'' in /etc/inetd.conf should have a root directory as ``/tftpboot''. please check whether root directory for tftp is ``/tftpboot''. If it is not, please correct it. tftp-hpa does not have it by default, so modify manually editing /etc/inetd.conf. if you install atftpd, reconfigure using ``dpkg-reconfigure atftpd`` to refer /tftpboot'' on init node. This is a directory where kernel and initrd images are available for other nodes to boot using network booting method. You can refer an entry shown below. Ex: tftp dgram udp wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd -s /tftpboot If tftp-hpa is used, change 'nobody' to 'root' in the above example. With latest tftpd-hpa, enable tftpd-hpa in the configuration file /etc/default/tftpd-hpa and change the default path to /tftpboot.
All drivers should already be present in the initrd.
We install syslinux and copy the PXE image:
host1:~# apt-get install syslinux Reading Package Lists... Done Building Dependency Tree... Done The following NEW packages will be installed: syslinux 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 199kB of archives. After unpacking 537kB of additional disk space will be used. Get:1 http://ftp.kulnet.kuleuven.ac.be stable/main syslinux 2.11-0.1 [199kB] Fetched 199kB in 0s (899kB/s) Selecting previously deselected package syslinux. (Reading database ... 23138 files and directories currently installed.) Unpacking syslinux (from .../syslinux_2.11-0.1_i386.deb) ... Setting up syslinux (2.11-0.1) ... host1:~# cp /usr/lib/syslinux/pxelinux.0 /tftpboot/ host1:~#
We now install tftpd-hpa as suggested.
host1:~# apt-get install tftpd-hpa Reading Package Lists... Done Building Dependency Tree... Done The following packages will be REMOVED: tftpd The following NEW packages will be installed: tftpd-hpa 0 upgraded, 1 newly installed, 1 to remove and 0 not upgraded. Need to get 30.9kB of archives. After unpacking 102kB of additional disk space will be used. Do you want to continue? [Y/n] Get:1 http://ftp.kulnet.kuleuven.ac.be stable/main tftpd-hpa 0.40-4.1 [30.9kB] Fetched 30.9kB in 0s (619kB/s) Preconfiguring packages ... dpkg: tftpd: dependency problems, but removing anyway as you request: openssi depends on tftpd. (Reading database ... 23199 files and directories currently installed.) Removing tftpd ... Selecting previously deselected package tftpd-hpa. (Reading database ... 23191 files and directories currently installed.) Unpacking tftpd-hpa (from .../tftpd-hpa_0.40-4.1_i386.deb) ... Setting up tftpd-hpa (0.40-4.1) ... --------- IMPORTANT INFORMATION FOR XINETD USERS ---------- The following line will be added to your /etc/inetd.conf file: tftp dgram udp wait root /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot If you are indeed using xinetd, you will have to convert the above into /etc/xinetd.conf format, and add it manually. See /usr/share/doc/xinetd/README.Debian for more information. ----------------------------------------------------------- invoke-rc.d: WARNING: Service tftpd-hpa has no entry in rc.nodeinfo invoke-rc.d: Starting only on initnode tftpd-hpa disabled in /etc/default/tftpd-hpa host1:~#
In the Configuring tftpd-hpa screen, we select Yes when asked if the server should be started by inetd.
In inetd.conf, we change the tftp line to:
tftp dgram udp wait root /usr/sbin/tcpd /usr/sbin/in.tftpd -s /tftpboot
The /etc/default/tftpd-hpa contains values used by the /etc/init.d/tftpd-hpa script and should not be changed. This is a bit unclear in the installation instructions.
We restart inetd with the inetd.real init.d script. The standard /etc/init.d/inetd is empty!
host1:~# /etc/init.d/inetd.real restart Restarting internet superserver: inetd. host1:~#
SNAPSHOT: 0008 TFTP configured and ready to serve the second node
Installing the second node
Before continuing with the OpenSSI install on the second node, we first have to create the second virtual machine. We use the same options as the first node, but this time, we disconnect the CDROM drive aswell.
SNAPSHOT: 1000 Node 2 hardware is configured
Now the OpenSSI install guide continues:
3.3. Connect the selected NIC to the cluster interconnect, insert an Etherboot floppy (if needed), and boot the computer. It should display the hardware address of the NIC it is attempting to boot with, then hang while it waits for a DHCP server to answer its request.
When the second node is booted, it starts looking for a DHCP server on both attached networks (the bridged and the interconnect host-only network) This should not succeed. The virtual machine will report that No bootable CD, floppy or hard disk was detected... and Operating System not found
This is because the DHCP server on the first node is not yet configured. We continue the install:
3.4. On the first node (or any node already in the cluster), execute `` ssi-addnode''. It will ask you few questions about how you want to configure your new node and they are as follows.
When asked if we want verbose prompts, we answer y
3.4.1. Enter a unique node number between 1 and 125.
We enter 2
3.4.2. Enter MAC address of the new node to be added in the cluster.
The MAC-addresses that previously requested an IP-address through DHCP are listed here (probably fetched from the logs). We pick the one of node2.
3.4.3. Enter a static IP address for the NIC. It must be unique and it must be on the same subnet as the cluster interconnect NICs for the other nodes.
The script will ask if it can scan for available IP's. We don't want to configure some random IP, but one that we pick ourselves (to keep things organised). We pick 172.16.0.201
3.4.4. Select (P)XE or (E)therboot as the network boot protocol for this node. PXE is an Intel standard for network booting, and many professional grade NICs have a PXE implementation pre-installed on them. You can probably enable PXE with your BIOS configuration tool. If you do not have a NIC with PXE, you can use the open-source project Etherboot, which lets you generate a floppy or ROM image for a variety of different NICs.
We select PXE
3.4.5. Enter a nodename. It should be unique in the cluster and it should resolve to one of this node's IP addresses. The nodename can resolve to either the IP address you configured above for the interconnect, or to one of external IP addresses that you might configure below. The nodename can resolve to the IP address either in DNS or in the cluster's /etc/hosts file. The nodename is stored in /etc/nodename, which is a context-dependent symlink (``CDSL''). In this case, the context is node number, which means each node you add will have it's own view of /etc/nodename containing its own hostname. To learn more about CDSLs, please see the document entitled cdsl.
We enter host2
3.4.6. If you enabled root failover during the first node's installation, you will be asked if this node should be a root failover node. This node must have access to the root filesystem on a shared disk in order to answer yes. If you answer yes, then this node can boot first as a root node, so you should configure it with a local boot device. This is done after this node joins the cluster and is described in step .
We want root failover, so y
3.4.7. Save the configuration.
The script displays a summary of the configuration, and asks to save the configuration. We do that.
A full log can be found here:
host1:~# ssi-addnode
This configuration tool can either use verbose prompts that
give you extra information about what it is doing, or it can
just ask the necessary questions. Experienced OpenSSI users
might prefer the latter option.
Even if you choose not to use verbose prompts, you can
select '?' at any prompt to get extra information.
Do you want verbose prompts (y/n) [y]: y
Every node in the cluster must have a unique node number.
The first node is usually 1, although you might want to
choose another number for a reason such as where the machine
is physically located.
Enter a node number (2-125) or (?) [2]: 2
Select the hardware address of the new node's cluster
interconnect NIC.
The list shows all unknown hardware addresses that have
recently probed the cluster's DHCP server. Choose the one
that is displayed on the console of the new node when it
attempts to network boot. If the desired NIC is not listed,
make sure the node has attempted to network boot on the
cluster interconnect, then select (R)escan.
Hardware address Time last probed
---------------- ----------------
1) 00:0C:29:04:FA:D6 Sep 15 17:33:25
Select (1), (r)escan, (q)uit or (?) [1]: 1
Enter an IP address for the NIC.
The IP address must be unique and it must be on the same
subnet as the cluster interconnect NICs for the other
nodes. If you want, this program can scan the subnet for
IP addresses that seem to be available. It does this by
pinging every address in the subnet, which can take awhile
for larger subnets. You can safely skip this feature if you
know one or more available IP addresses.
Do you want to scan for available IP addresses (y/n/?) [n]:
Enter an IP address or (?): 172.16.0.201
Select (P)XE or (E)therboot as the network boot protocol for
this node. PXE is an Intel standard for network booting, and
many professional grade NICs have a PXE implementation pre-
installed on them. You can probably enable PXE with your BIOS
configuration tool. If you do not have a NIC with PXE, you can
make use of open-source project Etherboot, which lets you generate
a floppy or ROM image for a variety of different NICs.
Select (P)XE, (E)therboot or (?) [E]: P
The nodename should be unique in the cluster and it should
resolve to one of this node's IP addresses, so that client
NFS can work correctly. The nodename can resolve to either
the IP address you configured above for the interconnect,
or to one of external IP addresses that you might configure
below. The nodename can resolve to the IP address either in
DNS or in the cluster's /etc/hosts file.
The nodename is stored in /etc/nodename, which is a context-
dependent symlink (CDSL). In this case, the context is node
number, which means each node you add will have it's own view
of /etc/nodename containing its own hostname. To learn more
about CDSLs, please see /usr/share/doc/openssi/cdsl.
Enter a nodename or (?): host2
Do you want this node to be a root failover node? It _must_
have access to the root filesystem on a shared disk in order
to answer yes. If you answer yes, then this node can boot
first as a root node, so you should configure it with a local
boot device. This is done after this node joins the cluster
and is described in the installation instructions.
Enable root filesystem failover to this node (y/n/?) [n]: y
Please make sure that the interface with MAC address 00:0C:29:04:FA:D6
is not configured via /etc/network/interfaces
Press Enter to acknowledge:
The following configuration has been entered:
Node number: 2
IP address: 172.16.0.201
Network hardware addr: 00:0C:29:04:FA:D6
Network boot protocol: PXE
Local boot device: none
Nodename: host2
Potential initnode: Yes
(W)rite new configuration, (R)econfigure, or (Q)uit without writing [W]:
Creating /cluster/node2
Node 2 is a master for failover
Remember to configure a boot device
The configuration changes have been saved.
Rebuilding the boot materials
/tmp/initrd.VD4r6q: 70.8%
Synchronizing network boot images: succeeded
Stopping DHCP server: dhcpd3.
Starting DHCP server: dhcpd3.
Synchronizing local boot devices
syncing /dev/sda2 on node 1: succeeded
All new nodes are allowed to join the cluster. If you wish to setup a
local boot device for a node, wait until it's fully up, create a Linux
filesystem on one of its local disks using fdisk and mkfs, then run
ssi-chnode to configure the filesystem as a local boot device.
host1:~#
Rebooting the first node
If everything works right after boot, then we can reboot the first node in case of trouble, and be sure everything will work.
Before rebooting, make sure inetd.real starts at boot with:
host1:~# update-rc.d inetd.real defaults
Now reboot and the first node should come up nicely.
SNAPSHOT: 0009 Configuration for node 2 ready
Booting the second node
We are now ready to boot the second node into the cluster. Again, the network cards try to get a DHCP address to boot over network. The first network card shouldn't get an address (since it's the bridged network). The second card however (the interconnect network) should get an IP address and should boot over network with TFTP.
The second cluster boots up and automagically joins the cluster. Node 1 will report something like:
nm_add_node: Node 2 added INIT: +++ nodeup completed on node 2
3.5. The program will now do all the work to admit the node into the cluster. Wait for the new node to join. A ``nodeup'' message on the first node's console will indicate this. You can confirm its membership with the cluster command: # cluster -v If the new node is hung searching for the DHCP server, try manually restarting the ``dhcpd'' on the cluster's init node where DHCP server is running: # invoke-rc.d dhcp restart NOTE: It has been observed that some time tftp server will not respond to request once it already responded to client's request. So restart the inetd on the init node if client could get IP address , but could not continue booting. # invoke-rc.d inetd restart If New node is still hung, try rebooting the node . It would come up.
Let's check if the cluster works now:
host1:~# cluster -v 1: UP 2: UP host1:~#
Setting up the boot information on the second node
SNAPSHOT: 0009 Configuration for node 2 ready SNAPSHOT: 1000 Node 2 hardware is configured
3.6. The following steps tell you how to configure the new node's hardware, including its swap space, local boot device (optional unless configured for root failover), and external NICs. Sorry for the complexity of some of the steps. The Debian installer automates most of it for you during the installation of the first node, but it's not much help when adding new nodes.
3.6.1. Configure the new node with one or more swap devices using fdisk (or a similar tool) and mkswap: # onnode <node_number> fdisk /dev/hda (device name) partition disk
To be consistent with the first node, we partition the second node in the same way (width sfdisk)
/dev/sda1 512MB /boot /dev/sda2 256MB swap /dev/sda3 rest /
host1:~# sfdisk -d /dev/sda | onnode 2 sfdisk /dev/sda Checking that no-one is using this disk right now ... OK Disk /dev/sda: 522 cylinders, 255 heads, 63 sectors/track sfdisk: ERROR: sector 0 does not have an msdos signature /dev/sda: unrecognized partition Old situation: No partitions found New situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sda1 63 996029 995967 82 Linux swap /dev/sda2 996030 1494044 498015 83 Linux /dev/sda3 1494045 8385929 6891885 83 Linux /dev/sda4 0 - 0 0 Empty Warning: no primary partition is marked bootable (active) This does not matter for LILO, but the DOS MBR will not boot this disk. Successfully wrote the new partition table Re-reading the partition table ... If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).) host1:~# fdisk -l /dev/sda Disk /dev/sda: 4294 MB, 4294967296 bytes 255 heads, 63 sectors/track, 522 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 62 497983+ 82 Linux swap /dev/sda2 63 93 249007+ 83 Linux /dev/sda3 94 522 3445942+ 83 Linux host1:~# onnode 2 fdisk -l /dev/sda Disk /dev/sda: 4294 MB, 4294967296 bytes 255 heads, 63 sectors/track, 522 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 62 497983+ 82 Linux swap /dev/sda2 63 93 249007+ 83 Linux /dev/sda3 94 522 3445942+ 83 Linux host1:~#
We continue with the swapspace
# onnode <node_number> mkswap /dev/hda3 (device partition name)
host1:~# onnode 2 mkswap /dev/sda1 Setting up swapspace version 1, size = 509927 kB host1:~#
next:
Add the device name(s) to the file /etc/fstab, as documented in README.fstab.
We edit /etc/fstab and change
/dev/sda1 none swap sw,node=1 0 0
into
/dev/sda1 none swap sw,node=* 0 0
Either reboot the node or manually activate the swap device(s) with the swapon command: # onnode <node_number> swapon <swap_device>
We now reboot to see if everything still works. It doesn't. The virtual machine of node 2 hangs when trying to boot. I went in the virtual bios (grab input with ctrl+g and press F2) and changed the bootorder so the second networkcard (the interconnect one) is booted from first. Node 2 now boots, but the kernel panics... (What happened is that, after I made the partitions, made the swap and changed /etc/fstab on host1, I accidently reverted node2 to the previous state. This means it didn't know of the swap partition when it rebooted. Maybe that explains things?)
Reverting to the previous states (0009 and 1000) and doing the next steps again solved the problem. Both node1 and node2 boot up and the cluster works.
SNAPSHOT: 0010 partitions on node 2 made, swap, fstab edited SNAPSHOT: 1001 partitions created, swap made, boot from network in bios
3.6.2. If you have enabled root failover you MUST configure a local boot device on the new node. Otherwise, configuring a local boot device is optional. If you are going to configure a local boot device, it is highly recommended that the boot device have the same name as the first node's boot device. Remember that we assumed at the beginning of these instructions that the first node's boot device is located on the first partition of the first drive (e.g., /dev/hda1 or /dev/sda1). Assuming you have already created a suitable partition with fdisk, format your boot device with an ordinary Linux filesystem, such as ext3: # onnode <node_number> mkfs.ext3 /dev/hda1
For us, /boot/ is /dev/sda2
host1:~# onnode 2 mkfs.ext3 /dev/sda2 mke2fs 1.35 (28-Feb-2004) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) 62496 inodes, 249004 blocks 12450 blocks (5.00%) reserved for the super user First data block=1 31 block groups 8192 blocks per group, 8192 fragments per group 2016 inodes per group Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185 Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 27 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. host1:~#
Now run ssi-chnode anywhere in the cluster (no need to use onnode with this command). Select the new node, enter its local boot device name, and ssi-chnode will copy over the necessary files.
host1:~# ssi-chnode
Select a node number (1,2) [1]: 2
Select (P)XE, (E)therboot or (?) [P]:
Enter a new boot device: /dev/sda2
The following configuration has been entered:
Node number: 2
IP address: 172.16.0.201
Network hardware addr: 00:0C:29:04:FA:D6
Network boot protocol: PXE
Local boot device: /dev/sda2
Potential initnode: Yes
(W)rite new configuration, (R)econfigure, or (Q)uit without writing [W]:
The configuration changes have been saved.
Do you wish to configure another node (y/n) [n]:
Rebuilding the boot materials
/tmp/initrd.hn5kBG: 70.8%
Synchronizing network boot images: succeeded
Stopping DHCP server: dhcpd3.
Starting DHCP server: dhcpd3.
Synchronizing local boot devices
syncing /dev/sda2 on node 1: succeeded
syncing /dev/sda2 on node 2: succeeded
Node 2 has been updated.
host1:~#
Finally, you need to manually install a GRUB boot block on the new node: # onnode <node_number> grub --device-map=/boot/grub/device.map grub> root (hd0,0) grub> setup (hd0) grub> quit
In our case, we need to install the grubfiles on /dev/sda2 which is the second partition on the first harddisk. Grub starts counting from 0, so we need (hd0,1)
GNU GRUB version 0.95 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename. ]
grub> root (hd0,1)
Filesystem type is ext2fs, partition type 0x83
grub> set
Possible commands are: setkey setup
grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,1)/grub/stage2 /grub/menu.lst"... succeeded
Done.
grub>
Now we can reboot. Make sure to enter the BIOS on node 2 and change the boot-order so it boots from harddisk first!
3.7. Repeat above steps at any time to add other nodes to the cluster.
We have no more nodes at this time.
3.8. Enjoy your new OpenSSI cluster!!! To learn more about OpenSSI, please read Introduction-to-SSI. One of the first things you can try is running the demo Bruce, Scott and I have done at recent trade shows. It illustrates some of the features of OpenSSI clusters. You can find it here, along with older demos: http://OpenSSI.org/#demos Recently, a scalable LTSP server has been tested on an OpenSSI cluster. To learn more about how to set this up, please see README.ltsp. If you have questions or comments that are not addressed on the website, do not hesitate to send a message to the user's discussion forum: ssic-linux-users@lists.sf.net
That's it ! The cluster works now. The next step will be to introduce DRBD on it.
SNAPSHOT: 0011 Node 2 should boot of its own harddisk now SNAPSHOT: 1002 Grub installed and rebooted
Common problems and solutions
If it doesn't work...
Make sure you read the instructions correctly :) I copy pasted the output of the commands so the commands appear at least twice in this guide, to avoid mistakes.
Keyboard doesn't do anything after reboot
If you are in VMWare, did you disable USB support ? VMWare seems to use a USB keyboard.
After installation of the first node, when you reboot it, the second node can no longer find the TFTP server
This is because the first node didn't start inetd. Fix this with :
host1:~# update-rc.d inetd.real defaults
Correct & permanent fix is add tftpd to xinetd, or run 'apt-get remove xinetd'.
Kernel on node 2 panics with a message "Kernel panic - not syncing: Lost CLMS master while trying to join the cluster!"
When I first tried to boot node 2, its kernel paniced. I didn't know why. I reverted my virtual machines back to the previous stable state and tried: it worked. But! After rebooting the first node, it failed again... It appears to be a mixed problem. You need to start the inetd.real at boot like described in the previous solution and then reboot. If you start it manually without reboot, then the second node panics for some reason.
So the solution:
host1:~# update-rc.d inetd.real defaults
Then reboot the first node, and try to boot the second node again.
node 1 hangs on boot
If you see this:
... Configuring cluster Running pre-root cluster initialization Searching for an existing root node...
Then it's normal that node 1 is hanging. Just wait a bit longer :)

