Configure basic Linux High Availability Cluster in Ubuntu with Corosync

merlos Linux, Ubuntu 30/10/201610/03/2017

Jellyfish Cluster - photo by robin on flickr — Jellyfish Cluster – photo by robin on flickr

[Read also: HA Cluster with DRBD file sync which adds file sync configuration between cluster nodes]

[UPDATED on March 7, 2017: tested the configuration also with Ubuntu 16.04 LTS]

This post show how to configure a basic High Availability cluster in Ubuntu using Corosync (cluster manager) and Pacemaker (cluster resources manager) software available in Ubuntu repositories (tested on Ubuntu 14.04 and 16.04 LTS). More information regarding Linux HA can be found here.

The goal of this post is to setup a freeradius service in HA. To do this we use two Ubuntu 14.04 or 16.04 LTS Server nodes, announcing a single virtual IP from the active cluster node. Notice that in this scenario each freeradius cluster istance is a standalone istance; I don’t cover application replication/synchronization between the nodes (rsync or shared disk via DRBD). Maybe I can do a new post in the future 🙂 [I did the post]

Convention:

PRIMARY – the name of the primary node
PRIMARY_IP – the IP address of the primary node
SECONDARY – the name of the secondary node
SECODARY_IP – the IP address of the secondary node
VIP – the IP announced from the master node of the cluster

First of all we install the needed packages

PRIMARY/SECONDARY# apt-get install pacemaker
PRIMARY# apt-get install haveged

and then we can start configuring Corosync, building on the PRIMARY node the key to be shared between the cluster nodes (using havaged package).

PRIMARY# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
[...]
Press keys on your keyboard to generate entropy (bits = 1000).
Writing corosync key to /etc/corosync/authkey.

Now we can remove the havaged package and copy the shared key from PRIMARY to SECONDARY node

PRIMARY# apt-get remove --purge haveged
PRIMARY# apt-get autoremove
PRIMARY# apt-get clean
PRIMARY# scp /etc/corosync/authkey user@SECONDARY:/tmp
SECONDARY# mv /tmp/authkey /etc/corosync
SECONDARY# chown root:root /etc/corosync/authkey
SECONDARY# chmod 400 /etc/corosync/authkey

We are ready now to configure both cluster nodes telling to corosync cluster members, binding IPs and other stuff. To do this edit /etc/corosync/corosync.conf and add a new section (nodelist) on PRIMARY and SECONDAY nodes at the end of the file, as follow.

[Ubuntu 16.04] don’t add the line “name: …” in nodelist section, the corosync version installed in 16.04 don’t support this directive, your cluster will not start. By default the node names are taken from “uname -a” command.

file: /etc/corosync/corosync.conf
[...]
totem {
[...]
interface {
 # The following values need to be set based on your environment 
 ringnumber: 0
 bindnetaddr: <PRIMARY_IP or SECONDARY_IP based on the node>
 mcastaddr: 226.94.1.1
 mcastport: 5405
 }
}
[... end of file ...]

nodelist {
 node {
 ring0_addr: <PRIMARY_IP>
 name: primary --> DON'T ADD THIS LINE IN 16.04 --> node name (eg. primary) 
 nodeid: 1 --> node numeric ID (eg. 1)
 }
 node {
 ring0_addr: <SECONDARY_IP>
 name: secondary --> DON'T ADD THIS LINE IN 16.04 --> node name (eg. secondary)
 nodeid: 2 --> node numeric ID (eg. 2)
 }
}

Now we configure corosync to use Cluster Resource Manager Pacemaker. To do this create the new file /etc/corosync/service.d/pcmk with following content

[Ubuntu 16.04] First create the /etc/corosync/service.d/ directory with the command # mkdir /etc/corosync/service.d/

file: /etc/corosync/service.d/pcmk
service {
 name: pacemaker
 ver: 1
}

Then enable corosync setting to yes the START parameter

file: /etc/default/corosync
START=yes

Corosync is ready to be started. Follow start and verify commands

PRIMARY/SECONDARY# service corosync start
[...]
PRIMARY/SECONDARY# service corosync status
● corosync.service - Corosync Cluster Engine
 Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
 Active: active (running) since [...]
[...]
PRIMARY/SECONDARY# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(<PRIMARY_IP>) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.740229595.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.740229595.ip (str) = r(0) ip(<SECONDARY_IP>)
runtime.totem.pg.mrp.srp.members.740229595.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.740229595.status (str) = joined

Now it’s time to configure pacemaker, our Cluster Resource Manager.
We enable pacemaker at boot time, setting service priority to 20 (corosync has 19), then we start the service

PRIMARY/SECONDARY# update-rc.d pacemaker defaults 20 01
PRIMARY/SECONDARY# service pacemaker start
[...]
PRIMARY/SECONDARY# service pacemaker status
● pacemaker.service - Pacemaker High Availability Cluster Manager
 Loaded: loaded (/lib/systemd/system/pacemaker.service; enabled; vendor preset: enabled)
 Active: active (running) since [...]
[...]

All the service are (hopefully) in the right state and we can check with crm utility.

[Ubuntu 14.04] the node names will be the one defined in file /etc/corosync/corosync.conf

[Ubuntu 16.04] the node names will be the taken from “uname -a” command (host names)

PRIMARY/SECONDARY# crm status
Last updated: [...]
Last change: [...] via crm_node on primary
Stack: corosync
Current DC: primary (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
0 Resources configured

Online: [ primary secondary ]

We see both nodes (primary and secondary) online, with the numeric id of current node highlighted.

Now that the cluster infrastructure is ok we do some fine tuning:

stonith disable: we avoid automatic cluster node deletion, in a 2 nodes cluster is useless;
quorum policy disable: in a 2 nodes cluster we want the cluster up&running also with a single node.

PRIMARY# crm configure property stonith-enabled=false
PRIMARY# crm configure property no-quorum-policy=ignore
PRIMARY/SECONDARY# crm configure show
node $id="1" primary
node $id="2" secondary
property $id="cib-bootstrap-options" \
 dc-version="1.1.10-42f2063" \
 cluster-infrastructure="corosync" \
 stonith-enabled="false" \
 no-quorum-policy="ignore"

We are ready to add resources (Resource Agents) to pacemaker and, as we said before, we will add an IP address (VIP) and the freeradius system service (we need to install it before)

PRIMARY/SECONDARY# apt-get install freeradius

A Resource Agent is “a standardized interface for a cluster resource. In translates a standard set of operations into steps specific to the resource or application, and interprets their results as success or failure.” (have a look here for more information).

We can use two kinds of Resource Agents:

LSB: those found on /etc/init.d/ dir and provided by the OS. freeradius will be one of these;
OCF: specific resources than can also be downloaded and installed from the web; an extension to LSB resources. VIP will be one of these.

First we configure the VIP, that is an OCF resoure and is called IPaddr2 (binded to eth0 interface)

PRIMARY# crm configure primitive vip1 ocf:heartbeat:IPaddr2 params ip="<VIP>" nic="eth0" op monitor interval="10s"
PRIMARY# crm configure show
node $id="1" primary
node $id="2" secondary
primitive vip1 ocf:heartbeat:IPaddr2 \
 params ip="<VIP>" nic="eth0" \
 op monitor interval="10s" \
 meta target-role="Started"
[...]
PRIMARY# crm status
Last updated: [...]
Last change: [...] via cibadmin on primary
Stack: corosync
Current DC: primary (1) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
1 Resources configured

Online: [ primary secondary ]

vip1 (ocf::heartbeat:IPaddr2): Started primary
PRIMARY#

The VIP (resurce vip1) is started on primary node and we can check this directly from nodes

PRIMARY# ip addr show 
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
 [...]
 inet <PRIMARY_IP> brd <PRIMARY_BROADCAST> scope global eth0
 valid_lft forever preferred_lft forever
 inet <VIP>/32 brd <VIP_BROADCAST> scope global eth0
 valid_lft forever preferred_lft forever
 [...]

SECONDARY# ip addr show
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
 [...]
 inet <SECONDARY_IP> brd <SECONDARY_BROADCAST> scope global eth0
 valid_lft forever preferred_lft forever
 [...]

On network side now we are ok, let proceed with freeradius clustering. We add the LSB resource on PRIMARY node

PRIMARY# crm configure primitive freeradius lsb:freeradius \
 op monitor interval="5s" timeout="15s" \
 op start interval="0" timeout="15s" \
 op stop interval="0" timeout="15s" \
 meta target-role="Started"

We have two resources configured (vip1 and freeradius) and the cluster can start each resource on a different node. So we clone the freeradius resource allowing the freeradius service to be active on both nodes at the same time (in this particular case is the right choise and is faster when the cluster switches)

PRIMARY# crm configure clone freeradius-clone freeradius
PRIMARY# crm_mon
Online: [ primary secondary ]

vip1 (ocf::heartbeat:IPaddr2): Started primary
 Clone Set: freeradius-clone [freeradius]
 Started: [ primary secondary ]

Last tuning.
We define resources colocation, saying to the cluster that one resource depends from the location of another resource. This configuration ensure that all the resources involved run on the master cluster node at same time.

PRIMARY# crm configure colocation vip1-freeradius inf: vip1 freeradius-clone

Now we have the cluster up&running, enjoy!

Published by merlos

Founder and president of Cyber Saiyan - www.cybersaiyan.it - a no profit organization founded to promote social initiatives to spread cyber security and ethical hacking culture; Cyber Saiyan organizes RomHack Conference and Camp romhack.camp View all posts by merlos

Published 30/10/201610/03/2017

7 thoughts on “Configure basic Linux High Availability Cluster in Ubuntu with Corosync”

Pingback: Configure Linux High Availability Cluster in Ubuntu with Corosync and DRBD file sync – Scubarda
Jon Pastore says:

10/07/2017 at 23:38

Thank you this is a good read. I tested this with 2 ubuntu 16.04 servers. I was able to down the interface on the primary and it failed over to the other node no problem. How do I restore the VIP on the primary after recovery?

LikeLike

Reply
1. merlos says:
  
  11/07/2017 at 06:23
  
  Hi Jon,
  you can test resource failover settinng a node in standby.
  When a node goes in standby it exit from the cluster (this is like a node fail) and all the resources go to the other node.
  Then after resources migration is complete you can put back the node into the cluster (set online).
  You can do this on both nodes to test the failover.
  
  Into the active physical node
  # crm node standby
  # crm_mon
  [ you will se the node in standby status with resources migrated to the other node]
  
  After resource migration
  # crm node online
  # crm_mon
  [ you will se the node in online status and resources active on the other node]
  
  If you want to migrate resource from one node to the other check this
  https://unix.stackexchange.com/questions/170986/pacemaker-migrate-resource-without-adding-a-prefer-line-in-config
  
  This is to migrate resources, look at the last answer it’s important to use the unmigrate command to avoid side effects
  
  LikeLike
  
  Reply
  1. Jon Pastore says:
    
    11/07/2017 at 20:00
    
    thank you very much for the insight!
    Is it possible to load balance traffic to a virtual IP?
    
    For example: I want to set up a SIP load balancer, instead of failing over to the standby node I would like to distribute to several nodes. Any protips appreciated
    
    LikeLike
  2. merlos says:
    
    12/07/2017 at 06:33
    > Is it possible to load balance traffic to a virtual IP?
    of course yes.
    You need to configure a specific load balancing service like HAProxy http://www.haproxy.org/
    
    I found a tutorial that you can also use
    
    Set up HAProxy with Pacemaker/Corosync on Ubuntu 16.04
    
    This Document roughly describes a HAProxy Cluster Setup on Ubuntu 16.04 based on an example Configuration with 3 Nodes
    
    This Document is still work in Progress the Following Stuff still needs to be done:
    
    Explain the crm configure steps
    
    explain Miscellaneous CRM Commands for Cluster Management
    
    Add all the external ressources used.
    
    Add a simple HAProxy Configuration for testing purpouse
    
    Example Installation
    
    This example Installation consists of three Nodes with the following names and IP Addresses:
    
    haproxy01-test 10.0.0.11
    
    haproxy02-test 10.0.0.12
    
    haproxy03-test 10.0.0.13
    
    VIRTUAL IP 10.0.0.10
    
    The Network they are on is: 10.0.0.0/24
    
    If you would like to apply the Steps shown here to another environment, you need to replace all Network Addresses with the ones yoused in your Environment.
    
    Prerequisites
    
    The Following Prerequisites must be met for this to work:
    
    All Nodes must have a valid Network Configuration and must be on the same Network.
    
    All Nodes must be able to download and install Standard Ubuntu Packages.
    
    Root Acces to every Node is needed.
    
    Installation and Configuration of Pacemaker
    
    This must be run on every Node
    
    # Updgrade Ubuntu Installation sudo apt update sudo apt upgrade -y # Install pacemaker and haveged Package sudo apt install pacemaker haproxy -y systemctl stop corosync systemctl stop haproxy systemctl disable haproxy
    
    This must be run on the primary Node only (i.e haproxy01-test 10.0.0.11):
    
    # Installation of haveged package to generate better random numbers for Key Generation sudo apt install haveged -y # Corosync Key generation: sudo corosync-keygen # Renmoval of no longer needed haveged package sudo apt remove haveged -y
    
    Now we need to Copy the generated Key from the primary node over to the secondary nodes:
    
    scp /etc/corosync/authkey USER@10.0.0.12:/tmp/corosync-authkey scp /etc/corosync/authkey USER@10.0.0.13:/tmp/corosync-authkey
    
    This must be run on the two secondary Nodes (i.e. haproxy02-test 10.0.0.12 and haproxy03-test 10.0.0.13):
    
    sudo mv /tmp/corosnyc-authkey /etc/corosync/authkey sudo chown root: /etc/corosync/authkey sudo chmod 400 /etc/corosync/authkey
    
    After this you need to create the Following minimal Corosync Configuration File on every Node:
    
    totem { version: 2 cluster_name: haproxy-prod transport: udpu interface { ringnumber: 0 bindnetaddr: 10.0.0.0 broadcast: yes mcastport: 5407 } } nodelist { node { ring0_addr: 10.0.0.11 } node { ring0_addr: 10.0.0.12 } node { ring0_addr: 10.0.0.13 } } quorum { provider: corosync_votequorum } logging { to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes timestamp: on } service { name: pacemaker ver: 1 }
    
    Inside the interface portion you can find the bindnetaddr value which must be set to the corosponding Network Address
    
    Inside the nodelist every node is represented by its IP Addres, if you happen to have less or more then Thre nodes, you must add them here.
    
    This must also be run on every Node:
    
    # Enable and restart Corosync Service sudo systemctl restart corosync.service sudo systemctl enable corosync.service # Enable and restart Pacemaker Service update-rc.d pacemaker defaults 20 01 sudo systemctl restart pacemaker.service sudo systemctl enable pacemaker.service
    
    To make sure corosync is up and running, run the command sudo crm status the Output should tell you that the Stack in use is corosync and that there are thre Nodes configured, it should look like this:
    
    crm status: Last updated: Fri Oct 16 14:38:36 2015 Last change: Fri Oct 16 14:36:01 2015 via crmd on primary Stack: corosync Current DC: primary (1) - partition with quorum Version: 1.1.10-42f2063 3 Nodes configured 0 Resources configured Online: [ primary secondary ]
    
    The following Steps can be run on any (one) Node, because right now corosync should keep the Cluster Configuration in Sync:
    
    sudo crm configure property stonith-enabled=false sudo crm configure property no-quorum-policy=ignore sudo crm configure primitive VIP ocf:heartbeat:IPaddr2 \ params ip="10.0.0.10" cidr_netmask="24" nic="ens160" \ op monitor interval="10s" \ meta migration-threshold="10" sudo crm configure primitive res_haproxy lsb:haproxy \ op start timeout="30s" interval="0" \ op stop timeout="30s" interval="0" \ op monitor interval="10s" timeout="60s" \ meta migration-threshold="10" sudo crm configure group grp_balancing VIP res_haproxy
    
    The last Thing you need to do, is to keep your haproxy Configuration sync on every node.
    
    view raw
    
    haproxy_corosync_pacemaker_ubuntu.md
    
    hosted with ❤ by GitHub
    
    LikeLike
Jon Pastore says:

12/07/2017 at 18:14

oh wow that’s awesome! Thanks man!

LikeLike

Reply
Tanvir Ahmed says:

25/10/2018 at 07:48

Great post
i am facing few problem
here is my status
service corosync staatus
Usage: /etc/init.d/corosync {start|stop|restart|force-reload}
root@www:~# service corosync status
* corosync is running

just only show this

corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(x.x.x.x)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

( My SSH one port is 22 and another is 2222)

LikeLike

Reply

Share this:

Related

Published by merlos

7 thoughts on “Configure basic Linux High Availability Cluster in Ubuntu with Corosync”

Set up HAProxy with Pacemaker/Corosync on Ubuntu 16.04

Example Installation

Prerequisites

Installation and Configuration of Pacemaker

Leave a comment Cancel reply