安装环境及Redis源码版本
OS: Ubuntu 22.04.3 LTS
Redis: 7.2.3
如下,准备3个VM
No. |
host name |
IP |
node roles |
#1 |
redis-server1 |
172.25.254.131 |
redis (master), sentinel |
#2 |
redis-server2 |
172.25.254.132 |
redis (slave), sentinel |
#3 |
redis-server3 |
172.25.254.133 |
redis (slave), sentinel |
编译器安装
1
2
3
4
5
6
| sudo apt update
sudo apt install build-essential
# 如需使用systemd管理服务,需要先安装libsystemd-dev(Debian/Ubuntu)包,或systemd-devel(CentOS)包
sudo apt install libsystemd-dev
# or
# sudo yum -y install systemd-devel
|
参考redis源码文件夹中的README:
To build with systemd support, you’ll need systemd development libraries (such
as libsystemd-dev on Debian/Ubuntu or systemd-devel on CentOS) and run:
Redis 源码下载及编译安装
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| # 源码下载
curl -O https://download.redis.io/redis-stable.tar.gz
# or
wget https://download.redis.io/redis-stable.tar.gz
# 解压
tar -xzvf redis-stable.tar.gz
cd redis-stable
# 源码编译及安装,systemd选项可根据自己的需要加减
make USE_SYSTEMD=yes
sudo make PREFIX=/usr/local/redis-server install
# 拷贝redis,sentinel配置文件模板到安装目录
sudo cp redis.conf sentinel.conf /usr/local/redis-server/
# 生成日志目录
sudo mkdir /usr/local/redis-server/logs
|
创建Redis 管理账户
创建系统账户及配置权限
1
2
| sudo adduser --system --group --no-create-home redis
sudo chown -R redis:redis /usr/local/redis-server
|
Replication配置 (Master / Slave)
各服务器上编辑redis.conf
文件,配置参考如下
master服务器 #1 (172.25.254.131)
1
| sudo vim /usr/local/redis-server/redis.conf
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| # 监听IP,如果是开发、测试环境可配置0.0.0.0或直接注释处理
# 如果服务器暴露在公网或不安全的网络环境下,可配置指定IP提供安全性
bind 172.25.254.131
# 工作目录(working directory),rdb, aof文件保存的位置,Redis需要拥有该文件目录的R/W权限。
dir /usr/local/redis-server/
# 为数据同步,slave连接master时使用的密码
# 考虑到发生故障转移,建议master,slave配置同样的密码
masterauth mypass
# 设置密码
requirepass mypass
# 日志文件位置
logfile "/usr/local/redis-server/logs/redis.log"
# redis会周期性的dump RDB文件,故障发生时可能会丢失一些尚未保存的数据
# 如果不能接受数据的丢失,可开启AOF功能
appendonly yes
# aof文件写入周期,appendonly为yes时,配置将生效
# 有3个可配置参数(no, always, everysec)
# no: 会由操作系统来决定持久化的频率,这种方式对其他另外两种而言性能最好,但可能每次持久化操作间的间隔有些长
# always: 每次发生Redis的写命令时都会触发持久化动作,非常影响性能
# everysec: 会以一秒的频率触发持久化动作,在这种方式下能很好地平衡持久化需求和性能间的关系,一般情况下取这个值。
# If unsure, use "everysec".
appendfsync everysec
|
slave 서버 #2 (172.25.254.132)
1
| sudo vim /usr/local/redis-server/redis.conf
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| # 监听IP,如果是开发、测试环境可配置0.0.0.0或直接注释处理
# 如果服务器暴露在公网或不安全的网络环境下,可配置指定IP提供安全性
bind 172.25.254.132
# 工作目录(working directory),rdb, aof文件保存的位置,Redis需要拥有该文件目录的R/W权限。
dir /usr/local/redis-server/
# 为数据同步,slave连接master时使用的密码
# 考虑到发生故障转移,建议master,slave配置同样的密码
masterauth mypass
# 设置密码
requirepass mypass
# master ip/port ( master节点需要注释掉该配置)
replicaof 172.25.254.131 6379
# 日志文件位置
logfile "/usr/local/redis-server/logs/redis.log"
# redis会周期性的dump RDB文件,故障发生时可能会丢失一些尚未保存的数据
# 如果不能接受数据的丢失,可开启AOF功能
appendonly yes
# aof文件写入周期,appendonly为yes时,配置将生效
# 有3个可配置参数(no, always, everysec)
# no: 会由操作系统来决定持久化的频率,这种方式对其他另外两种而言性能最好,但可能每次持久化操作间的间隔有些长
# always: 每次发生Redis的写命令时都会触发持久化动作,非常影响性能
# everysec: 会以一秒的频率触发持久化动作,在这种方式下能很好地平衡持久化需求和性能间的关系,一般情况下取这个值。
# If unsure, use "everysec".
appendfsync everysec
|
slave 서버 #3 (172.25.254.133)
1
| sudo vim /usr/local/redis-server/redis.conf
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| # 监听IP,如果是开发、测试环境可配置0.0.0.0或直接注释处理
# 如果服务器暴露在公网或不安全的网络环境下,可配置指定IP提供安全性
bind 172.25.254.133
# 工作目录(working directory),rdb, aof文件保存的位置,Redis需要拥有该文件目录的R/W权限。
dir /usr/local/redis-server/
# master ip/port ( master节点需要注释掉该配置)
replicaof 172.25.254.131 6379
# 为数据同步,slave连接master时使用的密码
# 考虑到发生故障转移,建议master,slave配置同样的密码
masterauth mypass
# 设置密码
requirepass mypass
# 日志文件位置
logfile "/usr/local/redis-server/logs/redis.log"
# redis会周期性的dump RDB文件,故障发生时可能会丢失一些尚未保存的数据
# 如果不能接受数据的丢失,可开启AOF功能
appendonly yes
# aof文件写入周期,appendonly为yes时,配置将生效
# 有3个可配置参数(no, always, everysec)
# no: 会由操作系统来决定持久化的频率,这种方式对其他另外两种而言性能最好,但可能每次持久化操作间的间隔有些长
# always: 每次发生Redis的写命令时都会触发持久化动作,非常影响性能
# everysec: 会以一秒的频率触发持久化动作,在这种方式下能很好地平衡持久化需求和性能间的关系,一般情况下取这个值。
# If unsure, use "everysec".
appendfsync everysec
|
创建systemd Unit文件
1
| sudo vim /etc/systemd/system/redis-server.service
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| [Unit]
Description=Redis data structure server
Documentation=https://redis.io/documentation
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/local/redis-server/bin/redis-server /usr/local/redis-server/redis.conf --supervised systemd --daemonize no
LimitNOFILE=10032
NoNewPrivileges=yes
Type=notify
TimeoutStartSec=infinity
TimeoutStopSec=infinity
UMask=0077
User=redis
Group=redis
[Install]
WantedBy=multi-user.target
|
各服务器上启动redis实例
1
| systemctl start redis-server
|
启动后确认启动状态:
1
| systemctl status redis-server
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| ● redis-server.service - Redis data structure server
Loaded: loaded (/etc/systemd/system/redis-server.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2023-12-22 05:19:08 UTC; 6s ago
Docs: https://redis.io/documentation
Main PID: 48121 (redis-server)
Status: "Ready to accept connections"
Tasks: 6 (limit: 2178)
Memory: 2.3M
CPU: 19ms
CGroup: /system.slice/redis-server.service
└─48121 "/usr/local/redis-server/bin/redis-server *:6379" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >
Dec 22 05:19:08 kafka-server1 systemd[1]: Starting Redis data structure server...
Dec 22 05:19:08 kafka-server1 systemd[1]: Started Redis data structure server.
|
确认Replication(主从同步状态)
先确认master的replication状态
1
2
| cd /usr/local/redis-server/bin
./redis-cli -h 172.25.254.131
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| 172.25.254.131:6379> auth mypass
OK
172.25.254.131:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.25.254.133,port=6379,state=online,offset=95561,lag=0
slave1:ip=172.25.254.132,port=6379,state=online,offset=95561,lag=0
master_failover_state:no-failover
master_replid:e396c5f17b331fc17d89f9c03e27e8a4548214f9
master_replid2:b950d0774916c8901184640d4652864788e2a51e
master_repl_offset:95561
second_repl_offset:94277
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:94277
repl_backlog_histlen:1285
|
再确认slave的replication状态
1
2
| # 两台slave节点都确认一下
./redis-cli -h 172.25.254.132
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| 172.25.254.132:6379> auth mypass
OK
172.25.254.132:6379> info replication
# Replication
role:slave
master_host:172.25.254.131
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_read_repl_offset:95603
slave_repl_offset:95603
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:e396c5f17b331fc17d89f9c03e27e8a4548214f9
master_replid2:b950d0774916c8901184640d4652864788e2a51e
master_repl_offset:95603
second_repl_offset:94277
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:93577
repl_backlog_histlen:2027
|
设置开机启动:
1
| systemctl enable redis-server
|
Sentinel 配置
各服务器上编辑sentinel.conf
文件
1
| sudo vim /usr/local/redis-server/sentinel.conf
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| # pid文件位置
pidfile "/usr/local/redis-server/logs/redis-sentinel.pid"
# log文件位置
logfile "/usr/local/redis-server/logs/sentinel.log"
# 需监控的master信息及判定S_DOWN(failover)所需的最少投票数
# sentinel monitor <master-name> <ip> <port> <quorum>
sentinel monitor mymaster 172.25.254.131 6379 2
# master密码
sentinel auth-pass mymaster mypass
# 若超过该时间无法连接master,将master判定为S_DOWN
sentinel down-after-milliseconds mymaster 6000
# failover超时时间
sentinel failover-timeout mymaster 180000
|
创建systemd Unit文件
1
| sudo vim /etc/systemd/system/redis-sentinel.service
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| [Unit]
Description=Redis sentinel
Documentation=https://redis.io/docs/management/sentinel/
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/local/redis-server/bin/redis-sentinel /usr/local/redis-server/sentinel.conf --supervised systemd --daemonize no
LimitNOFILE=10032
NoNewPrivileges=yes
Type=notify
TimeoutStartSec=infinity
TimeoutStopSec=infinity
UMask=0077
User=redis
Group=redis
[Install]
WantedBy=multi-user.target
|
各服务器上启动sentinel实例
1
| systemctl start redis-sentinel
|
启动后确认启动状态:
1
| systemctl status redis-sentinel
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| ● redis-sentinel.service - Redis sentinel
Loaded: loaded (/etc/systemd/system/redis-sentinel.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2023-12-22 07:47:47 UTC; 16min ago
Docs: https://redis.io/docs/management/sentinel/
Main PID: 85260 (redis-sentinel)
Status: "Ready to accept connections"
Tasks: 5 (limit: 2178)
Memory: 2.1M
CPU: 3.497s
CGroup: /system.slice/redis-sentinel.service
└─85260 "/usr/local/redis-server/bin/redis-sentinel *:26379 [sentinel]" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">
Dec 22 07:47:47 kafka-server3 systemd[1]: Starting Redis sentinel...
Dec 22 07:47:47 kafka-server3 systemd[1]: Started Redis sentinel.
|
查看Sentinel状态信息
1
2
| cd /usr/local/redis-server/bin
./redis-cli -p 26379 info sentinel
|
1
2
3
4
5
6
7
8
| # Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.25.254.131:6379,slaves=2,sentinels=3
|
设置开机启动:
1
| systemctl enable redis-sentinel
|
Failover测试
到了这个步骤可以测试failover。可以直接kill master进程,或通过sleep命令模拟故障。
1
2
3
4
5
6
7
| ./bin/redis-cli
127.0.0.1:6379> auth mypass
OK
# 使用debug命令,需先设置redis.conf的enable-debug-command为 "local"或"yes"
127.0.0.1:6379> debug sleep 10
OK
(10.01s)
|
确认日志:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| 122668:X 25 Dec 2023 01:30:57.033 # +sdown master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.088 # +odown master mymaster 172.25.254.131 6379 #quorum 2/2
122668:X 25 Dec 2023 01:30:57.088 # +new-epoch 11
122668:X 25 Dec 2023 01:30:57.088 # +try-failover master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.093 * Sentinel new configuration saved on disk
122668:X 25 Dec 2023 01:30:57.093 # +vote-for-leader 8a8d49f48649f665877ae3821c411d2511f1e084 11
122668:X 25 Dec 2023 01:30:57.100 * 79aafc906bc392865fbb1c6f1c9d4f38d8996332 voted for 8a8d49f48649f665877ae3821c411d2511f1e084 11
122668:X 25 Dec 2023 01:30:57.100 * d5e27cf5588cc89870ba1454872b6eedf8f4cae7 voted for 8a8d49f48649f665877ae3821c411d2511f1e084 11
122668:X 25 Dec 2023 01:30:57.155 # +elected-leader master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.155 # +failover-state-select-slave master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.215 # +selected-slave slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.215 * +failover-state-send-slaveof-noone slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.272 * +failover-state-wait-promotion slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:58.159 * Sentinel new configuration saved on disk
122668:X 25 Dec 2023 01:30:58.159 # +promoted-slave slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:58.159 # +failover-state-reconf-slaves master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:58.258 * +slave-reconf-sent slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.177 * +slave-reconf-inprog slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.177 * +slave-reconf-done slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.231 # -odown master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.231 # +failover-end master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.231 # +switch-master mymaster 172.25.254.131 6379 172.25.254.133 6379
122668:X 25 Dec 2023 01:30:59.231 * +slave slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.133 6379
122668:X 25 Dec 2023 01:30:59.231 * +slave slave 172.25.254.131:6379 172.25.254.131 6379 @ mymaster 172.25.254.133 6379
122668:X 25 Dec 2023 01:30:59.233 * Sentinel new configuration saved on disk
122668:X 25 Dec 2023 01:31:10.692 * +convert-to-slave slave 172.25.254.131:6379 172.25.254.131 6379 @ mymaster 172.25.254.133 6379
|
大体上会通过如下步骤执行故障转移
- 检测到master宕机后发生+sdown(主观下线)事件
- +sdown(主观下线)状态下,经过其他sentinel的同意将状态升级为+odown(客观下线)状态
- 选举sentinel leader
- 执行failover
查询新master信息
1
2
3
| ./redis-cli -p 26379 sentinel get-master-addr-by-name mymaster
1) "172.25.254.133"
2) "6379"
|
能看到master ip从172.25.254.131变成了172.25.254.133。