Debian
From OpenSSI
Some hints and tricks about OpenSSI Debian.
hwclock hangs on recent Dell computers
Problem: installing OpenSSI 1.9 Debian on modern Dell machines, e.g. Optiplex GX280, Precision 3x0, and possibly PowerEdge x8xx, you might run into this bug.
Reason: hwclock hangs trying to read the rtc.
Workaround: this is fixed in Debian sarge (by a timeout on the read), but OpenSSI has an older version of util-linux that doesn't include the fix. As a quick and dirty workaround, you can do this after installing OpenSSI and before rebooting:
dpkg-divert --add --divert /sbin/hwclock.real --rename /sbin/hwclock cat <<EOF > /sbin/hwclock #! /bin/sh exec /sbin/hwclock.real --directisa "\$@" EOF chmod a=rx /sbin/hwclock
RPC-based network services won't start
Problem: all RPC-based network services (portmap, nis, nfs...) refuse to start or hang at startup.
Reason: the clustername can't probably be resolved.
Solution: be sure you set the clustername to a valid name. It should resolve to your CVIP address, either in DNS or in the cluster's /etc/hosts file, if you choose to configure a CVIP.
TFTP boot sometimes hang
Problem: TFTP booting is sometimes hanging after DHCP address attribution.
Reason: the OpenSSI provided tftpd daemon, which is launched by xinetd, does not seem to be very reliable.
Solution: replace tftpd by atftpd, and configure it to run stand-alone.
apt-get install atftpd cat <<EOF > /etc/default/atftpd USE_INETD=false OPTIONS="--daemon --port 69 --retry-timeout 2 --no-multicast --maxthread 100 --verbose=5 /tftpboot" EOF
File not found at PXE TFTP boot
Problem: PXE nodes display a 'File not found' error message at boot.
Reason: all required files for PXE boot are not installed by default.
Solution: install pxelinux.0 from syslinux package.
apt-get install syslinux cp /usr/lib/syslinux/pxelinux.0 /tftpboot
Apache2 run in a localview context
Problem: when launched either at boot, or the recommended Debian way (by 'invoke-rc.d apache2 start'), apache2 run in a localview context. It means that each thread can only see the processes running on the same node it has been launched from. It is troublesome for web apps like [openSSI webView|OpenSSI_webView], which aims to list all the processes accross the cluster. This does not affect apache1.3, which run in a 'defaultview' context.
Reason: While apache2 init.d startup script uses 'apache2ctl' for process controlling, apache1.3 one calls the start-stop-daemon wrapper. This wrapper has been modified to reset the procview to default, once the threads have been launched.
Workaround: to allow apache2 to run in a defaultview context, it should be restarted on selected nodes. According to rc.nodeinfo, use either 'onall' or 'onnode':
onall /etc/init.d/apache2 restart
or
onnode x /etc/init.d/apache2 restart
Node hang at boot
Problem: node hang at boot with 'ERROR: Could not find the NIC used to add this node to the cluster'
Reason: the most probable reason is that the module required for the network card is missing from the initrd image.
Solution: rebuild the initrd image as detailled on the installation guide: add the required network driver ``/etc/mkinitrd/modules, then rebuild the ramdisk to include the driver and update the network boot images.
# mkinitrd -o <initrd image file> <kernel-version> # ssi-ksync
Alternative Reason: this kicked me in the butt over and over again. On my cluster I accepted some default time such as 8pm. I installed a few nodes then I thought I should install ntpdate to get the right time. It reset my time to 5pm. Then I tried adding more nodes but got this error. I never actually made the connection to the time problem until I read the mkinitrd script and looked in initrd to see /etc/boottab was not updated which seemed to be a problem of mkinitrd (run by ssi-ksync) script not overwriting /tftpboot/initrd and /tftpboot/combined because they thought they were more up to date since they were created at 8pm as opposed to the current time 5.
Solution:
# rm -rf /tftpboot/initrd /tftpboot/combined # ssi-ksync
Node hang at boot, variant 2
Problem: node hang at boot with 'Boot:' prompt after downloading files from tftp
Reason: option 'prompt 1' in /tftpboot/pxelinux.cfg/default
Solution: change this option to 'prompt 0' and the node should boot immediately.
mkinitrd can't create initrd image
Problem: mkinitrd stops with an error message: 'All of your loopback devices are in use!'
Reason: mkinitrd needs to loopback-mount the initrd image, so it needs kernel loopback device support (BLK_DEV_LOOP).
Solution: if the OpenSSI debian provided kernel is used, just load the loop module:
# modprobe loop
If a custom kernel is used, rebuild it and be sure to include loopback support, either as module, or built-in.
mkinitrd creates an empty initrd image
Problem: mkinitrd exits with non-zero exit code at initrd creation, and the resulting initrd image is empty.
Reason: readlink fails in the getroot function (line 443 in /usr/sbin/mkinitrd), because it doesn't use the kernel symlink resolution mechanism, and thus can't resolve CDSLs in /dev tree.
Workaround: edit /usr/sbin/mkinitrd, and, line 443, replace the device line:
device=$(readlink -f "$1") eval "$(stat -c 'major=$((0x%t)); minor=$((0x%T))' "$device")"
with this:
device=${device:-$1}
eval "$(stat -c 'major=$((0x%t)); minor=$((0x%T))' "$device")"

