Linux and PHP web application support and development (Bromsgrove, UK)

High Availability cluster – DRBD + Apache

A short howto for configuring a two node highly available server cluster. This will be an active-standby configuration whereby a local filesystem is mirrored to the standby server in real time (by DRBD).

The article below shows how to configure the underlying components (DRBD, setup a filesystem to go on top, setup DRBDLinks) with a simple integration into the system init system combined with manual failover.

Failover to the other node can be automated (E.g. using Heartbeat / Pacemaker) or performed manually by the administrator (this is beyond this relatively short article).

Hardware Overview

Two Dell SC1435 servers. There’s nothing overly special about these – they have two network ports each, and each has dual mirrored SATA hard disks.

The first server has dual Western Digital 1Tb disks. The second server has dual 1Tb Seagate disks (this is to avoid a single point of failure based on the same hard disk having a flaw).

The second network card (eth2) in each server is connected to the other through a cross-over ethernet cable and has a 10.0.0.x network address; this is to reduce external dependencies (switches etc!).

192.168.0.6 is to be used as a ‘shared’ IP address which could be in service on either server node (but not both).

Linux Installation

Debian Wheezy (7).

Additional packages :

  • drbd8-utils
  • drbdlinks
  • apache2

Disk Partitioning / RAID configuration

We want :

  • /dev/md0 to be mounted as ‘/’ and independent on each server. Compromised of /dev/sda1 and /dev/sdb1. Size of approximately 200Gb.
  • /dev/md1 (/dev/sda2 and /dev/sdb2, approx 800Gb) will be replicated from node1 to node2 
  • /dev/drbd1 (see below) will be created from /dev/md1 on each server. This will be the mounted block device (i.e. it has the /srv filesystem on it).

Some of this partitioning / filesystem configuration can be done within the Debian Installer at installation time –

  1. Partition disks using fdisk or equivalent, specifically creating two partitions (the first, and smaller is intended to be the root filesystem, while the second will become md1 / drbd1.
  2. Configure software RAID on the server so both disks are mirrors of each other (i.e. /dev/md0 is made of /dev/sda1 and /dev/sdb1 in RAID1, and /dev/md1 is made of /dev/sda2 and /dev/sdb2 in RAID1).
  3. Leave /dev/md1 unmounted (“Not in use”) in the Debian installer.

After installation, check the contents of /proc/mdstat on both notes – note the number of blocks (well, volume size) for what will be the DRBD replicated volume needs to be IDENTICAL on each server (else DRBD will moan – “The peer’s disk size is too small!”).

Personalities : [raid1]
md1 : active raid1 sda2[0] sdb2[1]
878974784 blocks super 1.2 [2/2] [UU]
bitmap: 1/7 pages [4KB], 65536KB chunk

md0 : active raid1 sda1[0] sdb1[1]
97589248 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

If the device (i.e. md1) you wish to mirror over DRBD is not the same size on both servers, you’ll need to resize the larger to be the same as the other – using something like :

mdadm --grow /dev/md1 --bitmap none
mdadm --grow /dev/md1 --size=878974784

DRBD Configuration

On each node, create /etc/drbd.d/srv.res which should look something like :

resource srv {
  on node1 {
    device    /dev/drbd1;
    disk      /dev/md1;
    address   10.0.0.1:7789;
    meta-disk internal;
  }
  on node2 {
    device    /dev/drbd1;
    disk      /dev/md1;
    address   10.0.0.2:7789;
    meta-disk internal;
  }
}

It’s probably a good idea to increase the default synchronisation rate from 1M/s to something more realistic like 80M/s – so, edit /etc/drbd.d/global_common.conf and add ‘rate 80M’ into the syncer { … } section, so it looks a bit like :

 ..... other stuff ....
    syncer {
        rate 80M;
    }
    .... other stuff ...
    startup {
        wfc-timeout 10;
        degr-wfc-timeout 10;
        become-primary-on node01;
    }
    .... other stuff ....

(The ‘startup’ items above control DRBD’s behaviour on system bootup – to stop it hanging and waiting for input and also to make it favour/fall back to ‘node01’).

Next, initialise DRBD on the primary node (node01) using :

  • drbdadm up srv
  • drbdadm — –overwrite-data-of-peer primary srv 

And on the secondary node, just do :

  • drbdadm up srv

You should now see the underlying volume syncing –

root@node01:/# cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
srcversion: F937DCB2E5D83C6CCE4A6C9

1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:39998396 nr:0 dw:0 dr:40008188 al:0 bm:2440 lo:1 pe:182 ua:128 ap:0 ep:1 wo:f oos:838961108
[>....................] sync'ed:  4.6% (819296/858344)Mfinish: 2:29:09 speed: 93,728 (22,104) K/sec

So, in our case, it’s going to take about 2.5 hours to mirror an ~800Gb volume.

While DRBD is syncing, the device is usable on the primary node (albeit perhaps a little slower than normal).

So –

  1. Create the filesystem – mkfs.ext4 -L SRV-DRBD /dev/drbd1
  2. Mount filesystem – mount /dev/drbd1 /srv

DRBDLinks

A common problem with active-standby clusters is that software updates on the standby node become more difficult – due to :

  1. The service being unable to run, because data files are in use on the active node and aren’t accessible on the standby node, or
  2. The standby node needing to have access to the configuration options for the active node (e.g. using a shared IP address).
  3. Making sure configuration files do not become out of sync between nodes over time

Because of #2, it’s common to store the configuration files on the shared/replicated volume. However, this then breaks any system init-scripts on the standby node (as they won’t be able to access said configuration files).

One solution to this is the ‘drbdlinks‘ program.

Once installed, edit /etc/drbdlinks.conf – mine looks a bit like the following :

.... some stuff ...
mountpoint('/srv')
link('/etc/apache2', '/srv/etc-apache2')
link('/var/www', '/srv/var-www')

Now, we can use /usr/sbin/drbdlinks start to get :

root@node01:/srv# ls -ald /etc/apache2*
lrwxrwxrwx 1 root root   16 Dec 30 14:36 /etc/apache2 -> /srv/etc-apache2
drwxr-xr-x 7 root root 4096 Dec 30 12:48 /etc/apache2.drbdlinks

(In English, it’s detecting that the DRBD volume is mounted (on /srv), and if so, it’s “fixing up” the configuration files to use those stored on the shared volume through use of symlinks. Hence /etc/apache2 now pointing to /srv/etc-apache2).

If drbdlinks is stopped (/usr/sbin/drbdlinks stop), then the symlinks are removed, and the affected directories renamed back to their original names – i.e. /etc/apache2.drbdlinks becomes /etc/apache2. This now allows ‘apache2’ to be upgraded through apt-get/dpkg as normal on the standby node without any awkward dependencies coming into play.

Next Steps

  1. Use a combination of Heartbeat and Corosync to setup service monitoring with auto-failover, or
  2. Setup a manual failover routine and handle monitoring through e.g. monit. Modify the system startup scripts on both nodes (disabling auto-startup for relevant services) and configure a manual migration process (like the below example).

Manual control script

In my case, I’ve created this script (/srv/service.sh) to live on the shared/mirrored volume and

And modified /etc/rc.local so it tries to mount the DRBD filesystem and execute the service script on the primary (node01) server.

#!/bin/bash

# Start/stop necessary services. 
# Note, service startup order normally matters.
# This could be called from /etc/rc.local on the primary node.

if [ "$#" -eq 0 ]; then
    echo "Usage: $0 start|stop " >/dev/stderr
    exit 1
fi

set -e
set -x

if [ $1 = 'start' ]; then
    /usr/sbin/drbdlinks start
    # bring up our shared IP address (this will be on the active cluster node)
    ifconfig eth0:1 192.168.0.6 netmask 255.255.255.0 up
    for thing in postgresql apache2
    do
        [ -x /etc/init.d/$thing ] && /etc/init.d/$thing start
    done
fi

if [ $1 = 'stop' ]; then
    for thing in apache2 postgresql
    do
        [ -x /etc/init.d/$thing ] && /etc/init.d/$thing stop
    done

    ifconfig eth0:1 down || true
    /usr/sbin/drbdlinks stop
fi

Issues

The operating systems need to be kept in sync (packages, package versions etc) to ensure that if a failover does take place then the standby node is capable of running all services. Using something like Ansible may be a good idea at this point.

Having a manual failover process tends to simplify things (split brain is no longer an issue, hardware fencing/STONITH aren’t required etc) at the  cost of failover not being automatic/seamless.

Comments are currently closed.