
[Read also: HA Cluster with DRBD file sync which adds file sync configuration between cluster nodes]
[UPDATED on March 7, 2017: tested the configuration also with Ubuntu 16.04 LTS]
This post show how to configure a basic High Availability cluster in Ubuntu using Corosync (cluster manager) and Pacemaker (cluster resources manager) software available in Ubuntu repositories (tested on Ubuntu 14.04 and 16.04 LTS). More information regarding Linux HA can be found here.
The goal of this post is to setup a freeradius service in HA. To do this we use two Ubuntu 14.04 or 16.04 LTS Server nodes, announcing a single virtual IP from the active cluster node. Notice that in this scenario each freeradius cluster istance is a standalone istance; I don’t cover application replication/synchronization between the nodes (rsync or shared disk via DRBD). Maybe I can do a new post in the future 🙂 [I did the post]
Convention:
- PRIMARY – the name of the primary node
- PRIMARY_IP – the IP address of the primary node
- SECONDARY – the name of the secondary node
- SECODARY_IP – the IP address of the secondary node
- VIP – the IP announced from the master node of the cluster
First of all we install the needed packages
PRIMARY/SECONDARY# apt-get install pacemaker PRIMARY# apt-get install haveged
and then we can start configuring Corosync, building on the PRIMARY node the key to be shared between the cluster nodes (using havaged package).
PRIMARY# corosync-keygen Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy. [...] Press keys on your keyboard to generate entropy (bits = 1000). Writing corosync key to /etc/corosync/authkey.
Now we can remove the havaged package and copy the shared key from PRIMARY to SECONDARY node
PRIMARY# apt-get remove --purge haveged PRIMARY# apt-get autoremove PRIMARY# apt-get clean PRIMARY# scp /etc/corosync/authkey user@SECONDARY:/tmp SECONDARY# mv /tmp/authkey /etc/corosync SECONDARY# chown root:root /etc/corosync/authkey SECONDARY# chmod 400 /etc/corosync/authkey
We are ready now to configure both cluster nodes telling to corosync cluster members, binding IPs and other stuff. To do this edit /etc/corosync/corosync.conf and add a new section (nodelist) on PRIMARY and SECONDAY nodes at the end of the file, as follow.
[Ubuntu 16.04] don’t add the line “name: …” in nodelist section, the corosync version installed in 16.04 don’t support this directive, your cluster will not start. By default the node names are taken from “uname -a” command.
file: /etc/corosync/corosync.conf [...] totem { [...] interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: <PRIMARY_IP or SECONDARY_IP based on the node> mcastaddr: 226.94.1.1 mcastport: 5405 } } [... end of file ...] nodelist { node { ring0_addr: <PRIMARY_IP> name: primary --> DON'T ADD THIS LINE IN 16.04 --> node name (eg. primary) nodeid: 1 --> node numeric ID (eg. 1) } node { ring0_addr: <SECONDARY_IP> name: secondary --> DON'T ADD THIS LINE IN 16.04 --> node name (eg. secondary) nodeid: 2 --> node numeric ID (eg. 2) } }
Now we configure corosync to use Cluster Resource Manager Pacemaker. To do this create the new file /etc/corosync/service.d/pcmk with following content
[Ubuntu 16.04] First create the /etc/corosync/service.d/ directory with the command # mkdir /etc/corosync/service.d/
file: /etc/corosync/service.d/pcmk
service {
name: pacemaker
ver: 1
}
Then enable corosync setting to yes the START parameter
file: /etc/default/corosync START=yes
Corosync is ready to be started. Follow start and verify commands
PRIMARY/SECONDARY# service corosync start [...] PRIMARY/SECONDARY# service corosync status ● corosync.service - Corosync Cluster Engine Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled) Active: active (running) since [...] [...] PRIMARY/SECONDARY# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(<PRIMARY_IP>) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.740229595.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.740229595.ip (str) = r(0) ip(<SECONDARY_IP>) runtime.totem.pg.mrp.srp.members.740229595.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.740229595.status (str) = joined
Now it’s time to configure pacemaker, our Cluster Resource Manager.
We enable pacemaker at boot time, setting service priority to 20 (corosync has 19), then we start the service
PRIMARY/SECONDARY# update-rc.d pacemaker defaults 20 01 PRIMARY/SECONDARY# service pacemaker start [...] PRIMARY/SECONDARY# service pacemaker status ● pacemaker.service - Pacemaker High Availability Cluster Manager Loaded: loaded (/lib/systemd/system/pacemaker.service; enabled; vendor preset: enabled) Active: active (running) since [...] [...]
All the service are (hopefully) in the right state and we can check with crm utility.
[Ubuntu 14.04] the node names will be the one defined in file /etc/corosync/corosync.conf
[Ubuntu 16.04] the node names will be the taken from “uname -a” command (host names)
PRIMARY/SECONDARY# crm status Last updated: [...] Last change: [...] via crm_node on primary Stack: corosync Current DC: primary (1) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 0 Resources configured Online: [ primary secondary ]
We see both nodes (primary and secondary) online, with the numeric id of current node highlighted.
Now that the cluster infrastructure is ok we do some fine tuning:
- stonith disable: we avoid automatic cluster node deletion, in a 2 nodes cluster is useless;
- quorum policy disable: in a 2 nodes cluster we want the cluster up&running also with a single node.
PRIMARY# crm configure property stonith-enabled=false
PRIMARY# crm configure property no-quorum-policy=ignore
PRIMARY/SECONDARY# crm configure show
node $id="1" primary
node $id="2" secondary
property $id="cib-bootstrap-options" \
dc-version="1.1.10-42f2063" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
no-quorum-policy="ignore"
We are ready to add resources (Resource Agents) to pacemaker and, as we said before, we will add an IP address (VIP) and the freeradius system service (we need to install it before)
PRIMARY/SECONDARY# apt-get install freeradius
A Resource Agent is “a standardized interface for a cluster resource. In translates a standard set of operations into steps specific to the resource or application, and interprets their results as success or failure.” (have a look here for more information).
We can use two kinds of Resource Agents:
- LSB: those found on /etc/init.d/ dir and provided by the OS. freeradius will be one of these;
- OCF: specific resources than can also be downloaded and installed from the web; an extension to LSB resources. VIP will be one of these.
First we configure the VIP, that is an OCF resoure and is called IPaddr2 (binded to eth0 interface)
PRIMARY# crm configure primitive vip1 ocf:heartbeat:IPaddr2 params ip="<VIP>" nic="eth0" op monitor interval="10s" PRIMARY# crm configure show node $id="1" primary node $id="2" secondary primitive vip1 ocf:heartbeat:IPaddr2 \ params ip="<VIP>" nic="eth0" \ op monitor interval="10s" \ meta target-role="Started" [...] PRIMARY# crm status Last updated: [...] Last change: [...] via cibadmin on primary Stack: corosync Current DC: primary (1) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 1 Resources configured Online: [ primary secondary ] vip1 (ocf::heartbeat:IPaddr2): Started primary PRIMARY#
The VIP (resurce vip1) is started on primary node and we can check this directly from nodes
PRIMARY# ip addr show [...] 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 [...] inet <PRIMARY_IP> brd <PRIMARY_BROADCAST> scope global eth0 valid_lft forever preferred_lft forever inet <VIP>/32 brd <VIP_BROADCAST> scope global eth0 valid_lft forever preferred_lft forever [...] SECONDARY# ip addr show [...] 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 [...] inet <SECONDARY_IP> brd <SECONDARY_BROADCAST> scope global eth0 valid_lft forever preferred_lft forever [...]
On network side now we are ok, let proceed with freeradius clustering. We add the LSB resource on PRIMARY node
PRIMARY# crm configure primitive freeradius lsb:freeradius \
op monitor interval="5s" timeout="15s" \
op start interval="0" timeout="15s" \
op stop interval="0" timeout="15s" \
meta target-role="Started"
We have two resources configured (vip1 and freeradius) and the cluster can start each resource on a different node. So we clone the freeradius resource allowing the freeradius service to be active on both nodes at the same time (in this particular case is the right choise and is faster when the cluster switches)
PRIMARY# crm configure clone freeradius-clone freeradius PRIMARY# crm_mon Online: [ primary secondary ] vip1 (ocf::heartbeat:IPaddr2): Started primary Clone Set: freeradius-clone [freeradius] Started: [ primary secondary ]
Last tuning.
We define resources colocation, saying to the cluster that one resource depends from the location of another resource. This configuration ensure that all the resources involved run on the master cluster node at same time.
PRIMARY# crm configure colocation vip1-freeradius inf: vip1 freeradius-clone
Now we have the cluster up&running, enjoy!
Thank you this is a good read. I tested this with 2 ubuntu 16.04 servers. I was able to down the interface on the primary and it failed over to the other node no problem. How do I restore the VIP on the primary after recovery?
LikeLike
Hi Jon,
you can test resource failover settinng a node in standby.
When a node goes in standby it exit from the cluster (this is like a node fail) and all the resources go to the other node.
Then after resources migration is complete you can put back the node into the cluster (set online).
You can do this on both nodes to test the failover.
Into the active physical node
# crm node standby
# crm_mon
[ you will se the node in standby status with resources migrated to the other node]
After resource migration
# crm node online
# crm_mon
[ you will se the node in online status and resources active on the other node]
If you want to migrate resource from one node to the other check this
https://unix.stackexchange.com/questions/170986/pacemaker-migrate-resource-without-adding-a-prefer-line-in-config
This is to migrate resources, look at the last answer it’s important to use the unmigrate command to avoid side effects
LikeLike
thank you very much for the insight!
Is it possible to load balance traffic to a virtual IP?
For example: I want to set up a SIP load balancer, instead of failing over to the standby node I would like to distribute to several nodes. Any protips appreciated
LikeLike
> Is it possible to load balance traffic to a virtual IP?
of course yes.
You need to configure a specific load balancing service like HAProxy http://www.haproxy.org/
I found a tutorial that you can also use
Set up HAProxy with Pacemaker/Corosync on Ubuntu 16.04
This Document roughly describes a HAProxy Cluster Setup on Ubuntu 16.04 based on an example Configuration with 3 Nodes
This Document is still work in Progress the Following Stuff still needs to be done:
Example Installation
This example Installation consists of three Nodes with the following names and IP Addresses:
haproxy01-test
10.0.0.11
haproxy02-test
10.0.0.12
haproxy03-test
10.0.0.13
VIRTUAL IP
10.0.0.10
The Network they are on is:
10.0.0.0/24
If you would like to apply the Steps shown here to another environment, you need to replace all Network Addresses with the ones yoused in your Environment.
Prerequisites
The Following Prerequisites must be met for this to work:
Installation and Configuration of Pacemaker
This must be run on every Node
This must be run on the primary Node only (i.e haproxy01-test
10.0.0.11
):Now we need to Copy the generated Key from the primary node over to the secondary nodes:
This must be run on the two secondary Nodes (i.e. haproxy02-test
10.0.0.12
and haproxy03-test10.0.0.13
):After this you need to create the Following minimal Corosync Configuration File on every Node:
Inside the
interface
portion you can find thebindnetaddr
value which must be set to the corosponding Network AddressInside the
nodelist
every node is represented by its IP Addres, if you happen to have less or more then Thre nodes, you must add them here.This must also be run on every Node:
To make sure corosync is up and running, run the command
sudo crm status
the Output should tell you that the Stack in use iscorosync
and that there are thre Nodes configured, it should look like this:The following Steps can be run on any (one) Node, because right now corosync should keep the Cluster Configuration in Sync:
The last Thing you need to do, is to keep your haproxy Configuration sync on every node.
haproxy_corosync_pacemaker_ubuntu.md
hosted with ❤ by GitHub
LikeLike
oh wow that’s awesome! Thanks man!
LikeLike
Great post
i am facing few problem
here is my status
service corosync staatus
Usage: /etc/init.d/corosync {start|stop|restart|force-reload}
root@www:~# service corosync status
* corosync is running
just only show this
corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(x.x.x.x)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
( My SSH one port is 22 and another is 2222)
LikeLike