Redis源码方式安装及Sentinel配置

Posted by Geuni's Blog on December 25, 2023

安装环境及Redis源码版本

OS: Ubuntu 22.04.3 LTS

Redis: 7.2.3

如下,准备3个VM

No. host name IP node roles
#1 redis-server1 172.25.254.131 redis (master), sentinel
#2 redis-server2 172.25.254.132 redis (slave), sentinel
#3 redis-server3 172.25.254.133 redis (slave), sentinel

编译器安装

1
2
3
4
5
6
sudo apt update
sudo apt install build-essential
# 如需使用systemd管理服务,需要先安装libsystemd-dev(Debian/Ubuntu)包,或systemd-devel(CentOS)包
sudo apt install libsystemd-dev
# or
# sudo yum -y install systemd-devel

参考redis源码文件夹中的README:

To build with systemd support, you’ll need systemd development libraries (such as libsystemd-dev on Debian/Ubuntu or systemd-devel on CentOS) and run:

1
% make USE_SYSTEMD=yes

Redis 源码下载及编译安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 源码下载
curl -O https://download.redis.io/redis-stable.tar.gz
# or
wget https://download.redis.io/redis-stable.tar.gz

# 解压
tar -xzvf redis-stable.tar.gz

cd redis-stable
# 源码编译及安装,systemd选项可根据自己的需要加减
make USE_SYSTEMD=yes
sudo make PREFIX=/usr/local/redis-server install

# 拷贝redis,sentinel配置文件模板到安装目录
sudo cp redis.conf sentinel.conf /usr/local/redis-server/

# 生成日志目录
sudo mkdir /usr/local/redis-server/logs

创建Redis 管理账户

创建系统账户及配置权限

1
2
sudo adduser --system --group --no-create-home redis
sudo chown -R redis:redis /usr/local/redis-server

Replication配置 (Master / Slave)

各服务器上编辑redis.conf文件,配置参考如下

master服务器 #1 (172.25.254.131)

1
sudo vim /usr/local/redis-server/redis.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# 监听IP,如果是开发、测试环境可配置0.0.0.0或直接注释处理
# 如果服务器暴露在公网或不安全的网络环境下,可配置指定IP提供安全性
bind 172.25.254.131

# 工作目录(working directory),rdb, aof文件保存的位置,Redis需要拥有该文件目录的R/W权限。
dir /usr/local/redis-server/

# 为数据同步,slave连接master时使用的密码
# 考虑到发生故障转移,建议master,slave配置同样的密码
masterauth mypass

# 设置密码
requirepass mypass

# 日志文件位置
logfile "/usr/local/redis-server/logs/redis.log"

# redis会周期性的dump RDB文件,故障发生时可能会丢失一些尚未保存的数据
# 如果不能接受数据的丢失,可开启AOF功能
appendonly yes

# aof文件写入周期,appendonly为yes时,配置将生效
# 有3个可配置参数(no, always, everysec)
# no: 会由操作系统来决定持久化的频率,这种方式对其他另外两种而言性能最好,但可能每次持久化操作间的间隔有些长
# always: 每次发生Redis的写命令时都会触发持久化动作,非常影响性能
# everysec: 会以一秒的频率触发持久化动作,在这种方式下能很好地平衡持久化需求和性能间的关系,一般情况下取这个值。
# If unsure, use "everysec".
appendfsync everysec

slave 서버 #2 (172.25.254.132)

1
sudo vim /usr/local/redis-server/redis.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 监听IP,如果是开发、测试环境可配置0.0.0.0或直接注释处理
# 如果服务器暴露在公网或不安全的网络环境下,可配置指定IP提供安全性
bind 172.25.254.132

# 工作目录(working directory),rdb, aof文件保存的位置,Redis需要拥有该文件目录的R/W权限。
dir /usr/local/redis-server/

# 为数据同步,slave连接master时使用的密码
# 考虑到发生故障转移,建议master,slave配置同样的密码
masterauth mypass

# 设置密码
requirepass mypass

# master ip/port ( master节点需要注释掉该配置)
replicaof 172.25.254.131 6379

# 日志文件位置
logfile "/usr/local/redis-server/logs/redis.log"

# redis会周期性的dump RDB文件,故障发生时可能会丢失一些尚未保存的数据
# 如果不能接受数据的丢失,可开启AOF功能
appendonly yes

# aof文件写入周期,appendonly为yes时,配置将生效
# 有3个可配置参数(no, always, everysec)
# no: 会由操作系统来决定持久化的频率,这种方式对其他另外两种而言性能最好,但可能每次持久化操作间的间隔有些长
# always: 每次发生Redis的写命令时都会触发持久化动作,非常影响性能
# everysec: 会以一秒的频率触发持久化动作,在这种方式下能很好地平衡持久化需求和性能间的关系,一般情况下取这个值。
# If unsure, use "everysec".
appendfsync everysec

slave 서버 #3 (172.25.254.133)

1
sudo vim /usr/local/redis-server/redis.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 监听IP,如果是开发、测试环境可配置0.0.0.0或直接注释处理
# 如果服务器暴露在公网或不安全的网络环境下,可配置指定IP提供安全性
bind 172.25.254.133

# 工作目录(working directory),rdb, aof文件保存的位置,Redis需要拥有该文件目录的R/W权限。
dir /usr/local/redis-server/

# master ip/port ( master节点需要注释掉该配置)
replicaof 172.25.254.131 6379

# 为数据同步,slave连接master时使用的密码
# 考虑到发生故障转移,建议master,slave配置同样的密码
masterauth mypass

# 设置密码
requirepass mypass

# 日志文件位置
logfile "/usr/local/redis-server/logs/redis.log"

# redis会周期性的dump RDB文件,故障发生时可能会丢失一些尚未保存的数据
# 如果不能接受数据的丢失,可开启AOF功能
appendonly yes

# aof文件写入周期,appendonly为yes时,配置将生效
# 有3个可配置参数(no, always, everysec)
# no: 会由操作系统来决定持久化的频率,这种方式对其他另外两种而言性能最好,但可能每次持久化操作间的间隔有些长
# always: 每次发生Redis的写命令时都会触发持久化动作,非常影响性能
# everysec: 会以一秒的频率触发持久化动作,在这种方式下能很好地平衡持久化需求和性能间的关系,一般情况下取这个值。
# If unsure, use "everysec".
appendfsync everysec

创建systemd Unit文件

1
sudo vim /etc/systemd/system/redis-server.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[Unit]
Description=Redis data structure server
Documentation=https://redis.io/documentation
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/usr/local/redis-server/bin/redis-server /usr/local/redis-server/redis.conf --supervised systemd --daemonize no
LimitNOFILE=10032
NoNewPrivileges=yes
Type=notify
TimeoutStartSec=infinity
TimeoutStopSec=infinity
UMask=0077
User=redis
Group=redis

[Install]
WantedBy=multi-user.target

各服务器上启动redis实例

1
systemctl start redis-server

启动后确认启动状态:

1
systemctl status redis-server
1
2
3
4
5
6
7
8
9
10
11
12
13
14
● redis-server.service - Redis data structure server
     Loaded: loaded (/etc/systemd/system/redis-server.service; disabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-12-22 05:19:08 UTC; 6s ago
       Docs: https://redis.io/documentation
   Main PID: 48121 (redis-server)
     Status: "Ready to accept connections"
      Tasks: 6 (limit: 2178)
     Memory: 2.3M
        CPU: 19ms
     CGroup: /system.slice/redis-server.service
             └─48121 "/usr/local/redis-server/bin/redis-server *:6379" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >

Dec 22 05:19:08 kafka-server1 systemd[1]: Starting Redis data structure server...
Dec 22 05:19:08 kafka-server1 systemd[1]: Started Redis data structure server.

确认Replication(主从同步状态)

先确认master的replication状态

1
2
cd /usr/local/redis-server/bin
./redis-cli -h 172.25.254.131
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
172.25.254.131:6379> auth mypass
OK
172.25.254.131:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.25.254.133,port=6379,state=online,offset=95561,lag=0
slave1:ip=172.25.254.132,port=6379,state=online,offset=95561,lag=0
master_failover_state:no-failover
master_replid:e396c5f17b331fc17d89f9c03e27e8a4548214f9
master_replid2:b950d0774916c8901184640d4652864788e2a51e
master_repl_offset:95561
second_repl_offset:94277	
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:94277
repl_backlog_histlen:1285

再确认slave的replication状态

1
2
# 两台slave节点都确认一下
./redis-cli -h 172.25.254.132
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
172.25.254.132:6379> auth mypass
OK
172.25.254.132:6379> info replication
# Replication
role:slave
master_host:172.25.254.131
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_read_repl_offset:95603
slave_repl_offset:95603
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:e396c5f17b331fc17d89f9c03e27e8a4548214f9
master_replid2:b950d0774916c8901184640d4652864788e2a51e
master_repl_offset:95603
second_repl_offset:94277
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:93577
repl_backlog_histlen:2027

设置开机启动:

1
systemctl enable redis-server

Sentinel 配置

各服务器上编辑sentinel.conf文件

1
sudo vim /usr/local/redis-server/sentinel.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# pid文件位置
pidfile "/usr/local/redis-server/logs/redis-sentinel.pid"

# log文件位置
logfile "/usr/local/redis-server/logs/sentinel.log"

# 需监控的master信息及判定S_DOWN(failover)所需的最少投票数
# sentinel monitor <master-name> <ip> <port> <quorum>
sentinel monitor mymaster 172.25.254.131 6379 2

# master密码
sentinel auth-pass mymaster mypass

# 若超过该时间无法连接master,将master判定为S_DOWN
sentinel down-after-milliseconds mymaster 6000

# failover超时时间
sentinel failover-timeout mymaster 180000

创建systemd Unit文件

1
sudo vim /etc/systemd/system/redis-sentinel.service
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[Unit]
Description=Redis sentinel
Documentation=https://redis.io/docs/management/sentinel/
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/usr/local/redis-server/bin/redis-sentinel /usr/local/redis-server/sentinel.conf --supervised systemd --daemonize no
LimitNOFILE=10032
NoNewPrivileges=yes
Type=notify
TimeoutStartSec=infinity
TimeoutStopSec=infinity
UMask=0077
User=redis
Group=redis

[Install]
WantedBy=multi-user.target

各服务器上启动sentinel实例

1
systemctl start redis-sentinel

启动后确认启动状态:

1
systemctl status redis-sentinel
1
2
3
4
5
6
7
8
9
10
11
12
13
14
● redis-sentinel.service - Redis sentinel
     Loaded: loaded (/etc/systemd/system/redis-sentinel.service; disabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-12-22 07:47:47 UTC; 16min ago
       Docs: https://redis.io/docs/management/sentinel/
   Main PID: 85260 (redis-sentinel)
     Status: "Ready to accept connections"
      Tasks: 5 (limit: 2178)
     Memory: 2.1M
        CPU: 3.497s
     CGroup: /system.slice/redis-sentinel.service
             └─85260 "/usr/local/redis-server/bin/redis-sentinel *:26379 [sentinel]" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">

Dec 22 07:47:47 kafka-server3 systemd[1]: Starting Redis sentinel...
Dec 22 07:47:47 kafka-server3 systemd[1]: Started Redis sentinel.

查看Sentinel状态信息

1
2
cd /usr/local/redis-server/bin
./redis-cli -p 26379 info sentinel
1
2
3
4
5
6
7
8
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.25.254.131:6379,slaves=2,sentinels=3

设置开机启动:

1
systemctl enable redis-sentinel

Failover测试

到了这个步骤可以测试failover。可以直接kill master进程,或通过sleep命令模拟故障。

1
2
3
4
5
6
7
./bin/redis-cli
127.0.0.1:6379> auth mypass
OK
# 使用debug命令,需先设置redis.conf的enable-debug-command为 "local"或"yes"
127.0.0.1:6379> debug sleep 10
OK
(10.01s)

确认日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
122668:X 25 Dec 2023 01:30:57.033 # +sdown master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.088 # +odown master mymaster 172.25.254.131 6379 #quorum 2/2
122668:X 25 Dec 2023 01:30:57.088 # +new-epoch 11
122668:X 25 Dec 2023 01:30:57.088 # +try-failover master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.093 * Sentinel new configuration saved on disk
122668:X 25 Dec 2023 01:30:57.093 # +vote-for-leader 8a8d49f48649f665877ae3821c411d2511f1e084 11
122668:X 25 Dec 2023 01:30:57.100 * 79aafc906bc392865fbb1c6f1c9d4f38d8996332 voted for 8a8d49f48649f665877ae3821c411d2511f1e084 11
122668:X 25 Dec 2023 01:30:57.100 * d5e27cf5588cc89870ba1454872b6eedf8f4cae7 voted for 8a8d49f48649f665877ae3821c411d2511f1e084 11
122668:X 25 Dec 2023 01:30:57.155 # +elected-leader master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.155 # +failover-state-select-slave master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.215 # +selected-slave slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.215 * +failover-state-send-slaveof-noone slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:57.272 * +failover-state-wait-promotion slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:58.159 * Sentinel new configuration saved on disk
122668:X 25 Dec 2023 01:30:58.159 # +promoted-slave slave 172.25.254.133:6379 172.25.254.133 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:58.159 # +failover-state-reconf-slaves master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:58.258 * +slave-reconf-sent slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.177 * +slave-reconf-inprog slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.177 * +slave-reconf-done slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.231 # -odown master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.231 # +failover-end master mymaster 172.25.254.131 6379
122668:X 25 Dec 2023 01:30:59.231 # +switch-master mymaster 172.25.254.131 6379 172.25.254.133 6379
122668:X 25 Dec 2023 01:30:59.231 * +slave slave 172.25.254.132:6379 172.25.254.132 6379 @ mymaster 172.25.254.133 6379
122668:X 25 Dec 2023 01:30:59.231 * +slave slave 172.25.254.131:6379 172.25.254.131 6379 @ mymaster 172.25.254.133 6379
122668:X 25 Dec 2023 01:30:59.233 * Sentinel new configuration saved on disk
122668:X 25 Dec 2023 01:31:10.692 * +convert-to-slave slave 172.25.254.131:6379 172.25.254.131 6379 @ mymaster 172.25.254.133 6379

大体上会通过如下步骤执行故障转移

  1. 检测到master宕机后发生+sdown(主观下线)事件
  2. +sdown(主观下线)状态下,经过其他sentinel的同意将状态升级为+odown(客观下线)状态
  3. 选举sentinel leader
  4. 执行failover

查询新master信息

1
2
3
./redis-cli -p 26379 sentinel get-master-addr-by-name mymaster
1) "172.25.254.133"
2) "6379"

能看到master ip从172.25.254.131变成了172.25.254.133。