Ubuntu: How to setup HAProxy and Redis Sentinel for automatic failover between Redis Master and Slave servers

Environment

Product	Version
Redis Sentinel
HAProxy

Purpose

Users can easily setup and configure Redis Sentinel to allow for automatic failover between the master and slave Redis replica instances for high availability with the use of HAProxy server.

Redis Sentinel is designed to monitor and manage automatic failover of Redis instances by performing the following tasks according to Redis Sentinel Documentation: http://redis.io/topics/sentinel

Monitoring: Sentinel constantly checks if your master and slave instances are working as expected
Notification: Sentinel can notify the system administrator, or another computer program, via an API, that something is wrong with one of the monitored Redis instances
Automatic failover: If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master. The other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
Configuration provider: Sentinel acts as a source of authority for client's service discovery. Clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels would report the new address.

We can use Sentinel to monitor and handle automatic failover of Redis Master by promoting the slave to master when the master goes down.

HAProxy 1.5+ comes with a new built-in TCP health check feature for Redis to perform an automatic failover. To avoid having to change Redis IP/Port in the front-end client application after each failover, setup HAProxy with the TCP health check to test if a Redis instance is a master or slave. This setup is needed to make sure it only proxy connections to an active master instance.

For this simple demo, we are using Redis-3.0.1 and HAProxy-1.5.12 as the front-end load balancer for a tcServer/Tomcat7 server for caching and automatic failover of users HTTP sessions between backend Redis Servers all running on a Ubuntu 14.04 host.

Layout diagram for Redis Sentinel and HAProxy for TCP load balancing between Redis servers:

HAProxy - A front-end TCP load balancer for proxying incoming client application connections on port 6378 to backend Redis master server.
Redis_6379 - Master server on port 6379
Redis_6380 - Slave server on port 6380
Sentinel - Sentinel running on port 26379 to monitoring the Master and Slave servers and promote the Slave to Master role when the Master is down.

If you do not have Redis Master/Slave or HAProxy servers install yet, see:

Procedure

Setup and deploy HAProxy load balancer for Redis HA

1) Update /etc/haproxy/haproxy.conf file with the following parameter values. It is recommended to add comments for keeping instruction (starting with #).

# Specifies TCP timeout on connect for use by the frontend ft_redis
# Set the max time to wait for a connection attempt to a server to succeed 
# The server and client side expected to acknowledge or send data.
defaults REDIS
mode tcp
timeout connect 3s
timeout server 6s
timeout client 6s

# Specifies listening socket for accepting client connections using the default 
# REDIS TCP timeout and backend bk_redis TCP health check.
frontend ft_redis
bind *:6378 name redis
default_backend bk_redis

# Specifies the backend Redis proxy server TCP health settings 
# Ensure it only forward incoming connections to reach a master.
backend bk_redis
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis_6379 localhost:6379 check inter 1s
server redis_6380 localhost:6380 check inter 1s

2) Start up haproxy server

/etc/init.d/haproxy start

Setup and deploy Redis Sentinel for automatic failover should be between Redis Master and Slave servers

1) Create a /etc/redis/sentinel_26379.conf files with the following lines:

# Specifies the Sentinel port, log file, and a work directory and run it a daemon service.

port 26379

daemonize yes

pidfile "/var/run/sentinel_26379.pid"

logfile "/var/log/sentinel_26379.log"

dir /var/lib/redis/sentinel_26379

# Monitor redis_6379 master running on port 6379 and consider it in 0DOWN state with at
# least 1 quorum sentinels agree in order to start a failover.
# Sentinel will auto discover the slaves and rewrite the configuration files to add the slaves
# So you do not need to specify slaves. Also, note that the configuration file is rewritten.
# When a slave is promoted to master state.

sentinel monitor master01 localhost 6379 1

# Specifies the password to use to authenticate with the master or slaves if required.
# For Redis instances mixed with 'auth' and 'nonauth', you need to ensure to set the same
# Password is required in all the instances. The demo Redis servers require no password.

#sentinel auth-pass myredis mypass

# Specifies the number of milliseconds the master, slave or sentinel should be consider
# down and unreachable in SDOWN state.

sentinel down-after-milliseconds master01 3000

# Specifies the failover timeout in milliseconds that because use in the following ways:
# * The time needed to restart a failover after a previous failover attempted on the same master
# * The time needed for a slave replicating to a wrong master with current configuration, to
# Force to replicate with the right master.
# * The time needed to cancel a failover that is already in progress but did not produce any
# Configuration change (The slave has not yet acknowledged as SLAVEOF NO ONE)
# * The max time a failover in progress waits for all the slaves to be reconfigured as slaves of the new master.

sentinel failover-timeout master01 6000

# Specify how many slaves we can reconfigure to point to the new slave during the failover.
# Recommends a low number if the slaves are used to serve query in order to avoid the slaves to
# Becomes unreachable while performing the synchronization with the master.

sentinel parallel-syncs master01 6379 1

2) Create an executable Sentinel service script

/etc/init.d/sentinel_26379

#!/bin/bash
# Start/Stop/restart script for Redis Sentinel

NAME=`basename ${0}`
EXEC=/usr/local/bin/redis-server
PIDFILE="/var/run/${NAME}.pid"
CONF="/etc/redis/${NAME}.conf"

PID=`cat $PIDFILE 2> /dev/null`
case "$1" in
     start)
         echo "Starting $NAME ..."
         touch $PIDFILE
         exec $EXEC $CONF --sentinel --pidfile $PIDFILE
         ;;
     stop)
         echo "Stopping $NAME PID: $PID ..."
         kill $PID
         ;;
     restart)
         echo "Restarting $NAME ..."
         $0 stop
         sleep 2
         $0 start
         ;;
     *)
         echo "Usage $0 {start|stop|restart}"
         ;;
esac

3) Create a Sentinel work directory and Start up the Sentinel, Redis Master and Slave services

mkdir /var/lib/redis/sentinel_26379

/etc/init.d/sentinel_26379 start

/etc/init.d/redis_6379 start

/etc/init.d/redis_6380 start

Testings HAProxy and Redis Automatic failover to make they are work as expected.

tail -f /var/log/sentinel_26379.log on a separate console to monitor sentinel log messages. Check redis_6379 replication info:

> redis-cli -p 6379 info replication
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

The server is a master without a slave.

Sentinel_26379.log shows redis_6379 is the master and up and running in ODOWN state:

...
26956:X 11 May 23:05:23.632 # -sdown master master01 ::1 6379
26956:X 11 May 23:05:23.632 # -odown master master01 ::1 6379

Check the redis_6380 replication info: > redis-cli -p 6380 info replication

# Replication
role:slave
master_host:127.0.0.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:42577
slave_priority:100
slave_read_only:1
connected_slaves:0
...

The server is a read_only slave, and it master is on host ip 127.0.0.1 and port 6379.

Sentinel_26379.log shows it detected a slave redis_6380 and rewrote the sentinel configuration file to added it to the redis_6379 master:

26956:X 11 May 23:07:34.172 * +slave slave [::1]:6380 ::1 6380 @ master01 ::1 6379
26956:X 11 May 23:07:44.244 * +fix-slave-config slave [::1]:6380 ::1 6380 @ master01 ::1 6379

If it not startup as a Slave server, make sure the configuration file has this directive

     slaveof localhost 6379

and restart it. Check redis_6379 replication info:

> redis-cli -p 6379 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=60477,lag=1
master_repl_offset:60477
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:60476

Now, we have a master server with one running slave connected.

Let us test the automatic failover by simulating a crash on the master server and let the sentinel automatically promote the slave to a master role.

Crashed redis_6379 Master:

> redis-cli -p 6379 debug segfault 
Error: Server closed the connection

Check the redis_6380 slave replication info:

> redis-cli -p 6380 info server
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

The server is converted to a master role without no slave connected.

Sentinel_26379.log show it detected the master redis_6379 is down and elected (base on quorum of 1) to failover by promoting the slave redis_6380 server to master role by making the same a slave of no one. It also rewrote the sentinel configuration to make the failed master as a slave and the slave redis_6380 to be the new master.

26956:X 11 May 23:09:52.566 # +sdown master master01 ::1 6379
26956:X 11 May 23:09:52.566 # +odown master master01 ::1 6379 #quorum 1/1
26956:X 11 May 23:09:52.566 # +new-epoch 5
26956:X 11 May 23:09:52.566 # +try-failover master master01 ::1 6379
26956:X 11 May 23:09:52.571 # +vote-for-leader 49c1cc2d0492c9bf60ad0dfc8258ee01986fae4b 5
26956:X 11 May 23:09:52.571 # +elected-leader master master01 ::1 6379
26956:X 11 May 23:09:52.571 # +failover-state-select-slave master master01 ::1 6379
26956:X 11 May 23:09:52.633 # +selected-slave slave [::1]:6380 ::1 6380 @ master01 ::1 6379
26956:X 11 May 23:09:52.634 * +failover-state-send-slaveof-noone slave [::1]:6380 ::1 6380 @ master01 ::1 6379
26956:X 11 May 23:09:52.717 * +failover-state-wait-promotion slave [::1]:6380 ::1 6380 @ master01 ::1 6379
26956:X 11 May 23:09:53.621 # +promoted-slave slave [::1]:6380 ::1 6380 @ master01 ::1 6379
26956:X 11 May 23:09:53.622 # +failover-state-reconf-slaves master master01 ::1 6379
26956:X 11 May 23:09:53.692 # +failover-end master master01 ::1 6379
26956:X 11 May 23:09:53.692 # +switch-master master01 ::1 6379 ::1 6380
26956:X 11 May 23:09:53.693 * +slave slave [::1]:6379 ::1 6379 @ master01 ::1 6380
26956:X 11 May 23:09:56.726 # +sdown slave [::1]:6379 ::1 6379 @ master01 ::1 6380

When we fixed and start up the crashed redis_6379 server, it should become a slave and should automatically are added to redis_6380 master.

> /etc/init.d/redis/redis_6379 start

Wait a few seconds before checking the server replication info

> redis-cli -p 6379 info replication
# Replication
role:slave
master_host:::1
master_port:6380
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:4122
slave_priority:100
slave_read_only:1
connected_slaves:0
...

The sentinel log shows it detected and converted the server into a slave of the redis_6380 master server.

26956:X 11 May 23:23:59.963 # -sdown slave [::1]:6379 ::1 6379 @ master01 ::1 6380
26956:X 11 May 23:24:09.944 * +convert-to-slave slave [::1]:6379 ::1 6379 @ master01 ::1 6380

Check redis_6380 replication info:

> redis-cli -p 6380 info replication
# Replication
role:master
connected_slaves:1
slave0:ip=::1,port=6379,state=online,offset=61410,lag=1
master_repl_offset:61410
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:61409

The new failover master server now has one redis_6379 slave connected. We can also connect to the Sentinel on port 26379 to monitor and manage the Redis master/slave servers. For example, list all monitor masters and their state:

> redis-cli -p 26379 sentinel masters
1) 1) "name"
2) "master01"
3) "ip"
4) "::1"
5) "port"
6) "6380"
7) "runid"
8) "d43ff591a1fe556fcd345eb462588c6570c47df5"
9) "flags"
10) "master"
11) "pending-commands"
12) "0"
13) "last-ping-sent"
14) "0"
15) "last-ok-ping-reply"
16) "278"
17) "last-ping-reply"
18) "278"
19) "down-after-milliseconds"
20) "3000"
21) "info-refresh"
22) "3451"
23) "role-reported"
24) "master"
25) "role-reported-time"
26) "1159212"
27) "config-epoch"
28) "5"
29) "num-slaves"
30) "1"
31) "num-other-sentinels"
32) "0"360
33) "quorum"
34) "1"
35) "failover-timeout"
36) "9000"
37) "parallel-syncs"
38) "1"

For other sentinel commands, see http://redis.io/topics/sentinel-clients

Perform one more test to make sure our HAProxy, Sentinel and Redis HA with automatic failover are working correctly.

On a separate console, use a redis-cli client to connect to HAProxy server on port 6378 connect to master Redis and it replication information every 2 seconds:

> while true; do redis-cli -p 6378 info replication; redis-cli -p 6378 info server | grep tcp_port; sleep 2; clear; done

# Replication
role:master
connected_slaves:1
slave0:ip=::1,port=6380,state=online,offset=1741860,lag=0
master_repl_offset:1741860
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:693285
repl_backlog_histlen:1048576

tcp_port:6380

Notice we are getting the master redis_6380 replication info through HAProxy. Lets crash this master and see if auto-failover is working and we still can connect to the new master. On another console, crash the master:

> redis-cli -p 6380 debug segfault
Error: Server closed the connection

Back on the other console with redis-cli client connecting to HAProxy:

Error: Server closed the connection

It shows the master on port 6380 is down. After couple seconds, the automatic failover is completed and we should be able to connect to the newly promoted master server running on port 6379 and get it replication info.

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
tcp_port:6379

Now we restart up the crashed redis_6380:

/etc/init.d/redis_6380 start

It will automatically be added to the new master as a slave.

# Replication
role:master
connected_slaves:1
slave0:ip=::1,port=6380,state=online,offset=1282,lag=1
master_repl_offset:1405
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:1404
tcp_port:6379

We have successfully setup HAProxy with Redis Master/Slave Replicas for HA using Sentinel for automatic failover.

Ubuntu

Tuesday, 13 June 2017

How to setup HAProxy and Redis Sentinel for automatic failover between Redis Master and Slave servers

No comments:

Post a Comment