在复制环境下,借助MySQL 协调器工具,管理 MySQL 服务器集群非常高效。这确保了在发生任何临时故障转移或计划内/优雅切换时能够平稳过渡。
一些配置参数在控制和影响故障转移行为方面起着至关重要的作用。在本篇博文中,我们将探讨其中一些关键选项,以及它们如何影响整个故障转移过程。
让我们通过一些例子来逐一讨论这些设置。
默认情况下,此选项是禁用的。 但是,当该选项为“ true ”时,如果主服务器发生故障转移,而候选主服务器尚未消费所有中继日志事件,则故障转移或提升过程将终止。
如果此设置保持为“ false”,那么在所有副本都滞后且当前主节点宕机的情况下,其中一个成员将被选为新的主节点,这最终可能导致新主节点上的数据丢失。之后,当旧主节点再次被添加为副本时, 可能会导致重复条目问题。
考虑到在 Orchestrator 配置文件“orchestrator.conf.json”中启用了“ FailMasterPromotionIfSQLThreadNotUpToDate”:
"FailMasterPromotionIfSQLThreadNotUpToDate": true
这是由协调器管理的拓扑:
Anils-MacBook-Pro.local:22637 [0s,ok,8.0.36,rw,ROW,>>,GTID]
+ Anils-MacBook-Pro.local:22638 [0s,ok,8.0.36,ro,ROW,>>,GTID]
+ Anils-MacBook-Pro.local:22639 [0s,ok,8.0.36,ro,ROW,>>,GTID]
下面,我们通过 sysbench 运行一些工作负载,这将有助于增加复制滞后以满足我们的测试目的。
sysbench
--db-driver=mysql
--mysql-user=sbtest_user
--mysql-password=Sbtest@2022
--mysql-db=sbtest
--mysql-host=127.0.0.1
--mysql-port=22637
--tables=15
--table-size=3000000
--create_secondary=off
--threads=100
--time=0
--events=0
--report-interval=1 /opt/homebrew/Cellar/sysbench/1.0.20_7/share/sysbench/oltp_read_write.lua run
输出:
256s ] thds: 100 tps: 1902.77 qps: 38008.31 (r/w/o: 26611.72/7591.05/3805.54) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 257s ] thds: 100 tps: 1960.71 qps: 38960.02 (r/w/o: 27292.45/7746.15/3921.42) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 258s ] thds: 100 tps: 1803.48 qps: 35773.52 (r/w/o: 24991.65/7174.91/3606.96) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
经过一段时间后,副本上的复制滞后开始增加,与此同时,我们刚刚停止了主服务器 [ 127.0.0.1:22637]。
slave1 [localhost:22638] {msandbox} ((none)) > show slave statusG;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000012
Read_Master_Log_Pos: 941353804
Relay_Log_File: mysql-relay.000034
Relay_Log_Pos: 318047296
Relay_Master_Log_File: mysql-bin.000012
Slave_IO_Running: No
Slave_SQL_Running: Yes
…
Seconds_Behind_Master: 215
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: Error reconnecting to source 'rsandbox@127.0.0.1:22637'. This was attempt 1/86400, with a delay of 60 seconds between attempts. Message: Can't connect to MySQL server on '127.0.0.1:22637' (61)
slave2 [localhost:22639] {msandbox} ((none)) > show slave statusG;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000012
Read_Master_Log_Pos: 941353804
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 302890408
Relay_Master_Log_File: mysql-bin.000012
Slave_IO_Running: No
Slave_SQL_Running: Yes
…
Seconds_Behind_Master: 215
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: Error reconnecting to source 'rsandbox@127.0.0.1:22637'. This was attempt 1/86400, with a delay of 60 seconds between attempts. Message: Can't connect to MySQL server on '127.0.0.1:22637' (61)
结果是由于 SQL 线程不是最新的,导致主提升失败。
2025-06-04 23:42:21 ERROR RecoverDeadMaster: failed promotion. FailMasterPromotionIfSQLThreadNotUpToDate is set and promoted replica Anils-MacBook-Pro.local:22638 's sql thread is not up to date (relay logs still unapplied). Aborting promotion
现在,如果选项“ FailMasterPromotionIfSQLThreadNotUpToDate ”为 false 或默认值,即使副本遭受复制滞后,故障转移也将完美进行。
在上述相同场景中,在 “FailMasterPromotionIfSQLThreadNotUpToDate”:false 条件下,主升级成功完成。
猫/tmp/recovery.log:
20250604 23:51:19: Detected AllMasterReplicasNotReplicating on Anils-MacBook-Pro.local:22637. Affected replicas: 2
20250604 23:52:41: Detected DeadMaster on Anils-MacBook-Pro.local:22637. Affected replicas: 2
20250604 23:52:56: Will recover from DeadMaster on Anils-MacBook-Pro.local:22637
20250604 23:53:07: Recovered from DeadMaster on Anils-MacBook-Pro.local:22637. Failed: Anils-MacBook-Pro.local:22637; Promoted: Anils-MacBook-Pro.local:22638
20250604 23:53:07: (for all types) Recovered from DeadMaster on Anils-MacBook-Pro.local:22637. Failed: Anils-MacBook-Pro.local:22637; Successor: Anils-MacBook-Pro.local:22638
此参数与 我们上面讨论的相反。在这里,它不会中止主服务器故障转移,而是会延迟到候选主服务器使用完所有中继日志文件后再进行故障转移。当该参数为“ true ”时, orchestrator 进程将等待 SQL 线程赶上进度,然后再升级到新的主服务器。
考虑到在 Orchestrator 配置文件“orchestrator.conf.json”中启用了“ DelayMasterPromotionIfSQLThreadNotUpToDate” :
"DelayMasterPromotionIfSQLThreadNotUpToDate": true
主节点[127.0.0.1:22637 ]上有一些工作负载 ,几秒钟后,复制延迟开始出现。我们因此停止了主节点。
我们可以在日志文件 /tmp /tmp/recovery.log 中看到故障转移初始过程已启动。
20250605 19:10:04: Detected UnreachableMasterWithLaggingReplicas on Anils-MacBook-Pro.local:22637. Affected replicas: 2
20250605 19:10:06: Detected DeadMaster on Anils-MacBook-Pro.local:22637. Affected replicas: 2
20250605 19:10:06: Will recover from DeadMaster on Anils-MacBook-Pro.local:22637
20250605 19:10:16: Will recover from DeadMaster on Anils-MacBook-Pro.local:22637
然而,我们可以观察到,由于候选主服务器 [ Anils-MacBook-Pro.local:22638 ] 上的复制滞后,提升被暂停,以恢复故障转移之前的滞后。
2025-06-05 19:10:27 ERROR DelayMasterPromotionIfSQLThreadNotUpToDate error: 2025-06-05 19:10:27 ERROR WaitForSQLThreadUpToDate stale coordinates timeout on Anils-MacBook-Pro.local:22638 after duration 10s
...
2025-06-05 19:10:27 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 19:10:28 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 19:10:28 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 19:10:29 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 19:10:29 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 19:10:30 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
...
2025-06-05 19:10:37 INFO topology_recovery: DelayMasterPromotionIfSQLThreadNotUpToDate error: 2025-06-05 19:10:37 ERROR WaitForSQLThreadUpToDate stale coordinates timeout on Anils-MacBook-Pro.local:22638 after duration 10s
slave1 [localhost:22638] {msandbox} ((none)) > show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000017
Read_Master_Log_Pos: 981203536
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 179833778
Relay_Master_Log_File: mysql-bin.000017
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Seconds_Behind_Master: 227
slave1 [localhost:22638] {msandbox} ((none)) > show slave status \G;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000017
Read_Master_Log_Pos: 1017383839
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 237223256
Relay_Master_Log_File: mysql-bin.000017
Slave_IO_Running: No
Slave_SQL_Running: No
…
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: Error reconnecting to source 'rsandbox@127.0.0.1:22637'. This was attempt 1/86400, with a delay of 60 seconds between attempts. Message: Can't connect to MySQL server on '127.0.0.1:22637' (61)
slave2 [localhost:22639] {root} ((none)) > show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000017
Read_Master_Log_Pos: 907360640
Relay_Log_File: mysql-relay.000004
Relay_Log_Pos: 167191740
Relay_Master_Log_File: mysql-bin.000017
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
Seconds_Behind_Master: 210
slave2 [localhost:22639] {root} ((none)) > show slave status \G;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000017
Read_Master_Log_Pos: 1017383839
Relay_Log_File: mysql-relay.000004
Relay_Log_Pos: 237135927
Relay_Master_Log_File: mysql-bin.000017
Slave_IO_Running: No
Slave_SQL_Running: No
…
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 2003
Last_IO_Error: Error reconnecting to source 'rsandbox@127.0.0.1:22637'. This was attempt 1/86400, with a delay of 60 seconds between attempts. Message: Can't connect to MySQL server on '127.0.0.1:22637' (61)
这里还有一个观察结果。如果我们在所有副本上都存在复制滞后的情况下进行优雅切换,我们将收到以下消息。
shell> orchestrator-client -c graceful-master-takeover -alias testcluster -d Anils-MacBook-Pro.local:22638
输出:
Desginated instance Anils-MacBook-Pro.local:22638 seems to be lagging too much for this operation. Aborting.
发生这种情况是因为 以下情况,即复制滞后应等于或小于定义的“ReasonableMaintenanceReplicationLagSeconds:20。”
if !designatedInstance.HasReasonableMaintenanceReplicationLag() {
return nil, nil, fmt.Errorf("Desginated instance %+v seems to be lagging too much for this operation. Aborting.", designatedInstance.Key)
}
func (this *Instance) HasReasonableMaintenanceReplicationLag() bool {
// replicas with SQLDelay are a special case
if this.SQLDelay > 0 {
return math.AbsInt64(this.SecondsBehindMaster.Int64-int64(this.SQLDelay)) <= int64(config.Config.ReasonableMaintenanceReplicationLagSeconds)
}
return this.SecondsBehindMaster.Int64 <= int64(config.Config.ReasonableMaintenanceReplicationLagSeconds)
}
这里,编排器服务日志反映了故障转移过程现在正在等待所有中继日志完成。
2025-06-05 20:15:01 INFO CommandRun successful. exit status 0
2025-06-05 20:15:01 INFO topology_recovery: Completed PreGracefulTakeoverProcesses hook 1 of 1 in 5.463653s
2025-06-05 20:15:01 INFO topology_recovery: done running PreGracefulTakeoverProcesses hooks
2025-06-05 20:15:01 INFO GracefulMasterTakeover: Will set Anils-MacBook-Pro.local:22637 as read_only
2025-06-05 20:15:01 INFO instance Anils-MacBook-Pro.local:22637 read_only: true
2025-06-05 20:15:01 INFO auditType:read-only instance:Anils-MacBook-Pro.local:22637 cluster:Anils-MacBook-Pro.local:22637 message:set as true
2025-06-05 20:15:01 INFO GracefulMasterTakeover: Will wait for Anils-MacBook-Pro.local:22638 to reach master coordinates mysql-bin.000021:221642748
2025-06-05 20:19:53 INFO topology_recovery: DelayMasterPromotionIfSQLThreadNotUpToDate: waiting for SQL thread on Anils-MacBook-Pro.local:22638
2025-06-05 20:19:53 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 20:19:53 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 20:19:54 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 20:19:54 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 20:19:55 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pro.local:22638
2025-06-05 20:19:55 DEBUG WaitForSQLThreadUpToDate waiting on Anils-MacBook-Pr
..
2025-06-05 20:26:37 INFO topology_recovery: DelayMasterPromotionIfSQLThreadNotUpToDate: SQL thread caught up on Anils-MacBook-Pro.local:22638
2025-06-05 20:26:37 INFO topology_recovery: RecoverDeadMaster: found no reason to override promotion of Anils-MacBook-Pro.local:22638
2025-06-05 20:26:37 INFO topology_recovery: RecoverDeadMaster: successfully promoted Anils-MacBook-Pro.local:22638
2025-06-05 20:26:37 INFO topology_recovery: - RecoverDeadMaster: promoted server coordinates: mysql-bin.000017:221167478
2025-06-05 20:26:37 INFO topology_recovery: - RecoverDeadMaster: will apply MySQL changes to promoted master
2025-06-05 20:26:37 INFO Will reset replica on Anils-MacBook-Pro.local:22638
一旦延迟问题解决,接管过程就会成功运行。
20250605 20:19:53: Will recover from DeadMaster on Anils-MacBook-Pro.local:22637
20250605 20:26:39: Recovered from DeadMaster on Anils-MacBook-Pro.local:22637. Failed: Anils-MacBook-Pro.local:22637; Promoted: Anils-MacBook-Pro.local:22638
20250605 20:26:39: (for all types) Recovered from DeadMaster on Anils-MacBook-Pro.local:22637. Failed: Anils-MacBook-Pro.local:22637; Successor: Anils-MacBook-Pro.local:22638
20250605 20:26:39: Planned takeover complete
shell> orchestrator-client -c topology -a testcluster
输出:
Anils-MacBook-Pro.local:22638 [0s,ok,8.0.36,rw,ROW,>>,GTID]
- Anils-MacBook-Pro.local:22637 [null,nonreplicating,8.0.36,ro,ROW,>>,GTID]
+ Anils-MacBook-Pro.local:22639 [0s,ok,8.0.36,ro,ROW,>>,GTID]
此参数确保当副本滞后时间 >= 配置的分钟数时,主节点提升将被中止。为了使用此标志,我们还必须使用“ ReplicationLagQuery ”和心跳机制“ pt-hearbeat ”来评估正确的复制滞后时间。
让我们看看它是如何工作的。
我们在 Orchestrator 配置“ orchestrator.conf.json”中设置了 以下值,以确保如果滞后超过约 1 分钟,主升级过程将失败。
"FailMasterPromotionOnLagMinutes": 1,
正如我们上面所讨论的,启用此选项 取决于设置“ReplicationLagQuery”, 它从心跳机制获取复制滞后详细信息,而不是依赖于 seconds_behind_master状态。
2025-06-06 08:54:47 INFO starting orchestrator, version: 3.2.6, git commit: 89f3bdd33931d5e234890787a24cc035fa106b32
2025-06-06 08:54:47 INFO Read config: /Users/aniljoshi/orchestrator/conf/orchestrator.conf.json
2025-06-06 08:54:47 FATAL nonzero FailMasterPromotionOnLagMinutes requires ReplicationLagQuery to be set
默认情况下,Orchestrator 使用从服务器状态“ seconds_behind_master” 来监控复制延迟。然而,在复制已中断且主服务器也发生故障的情况下,“ seconds_behind_master” 的值将为 “null”,这最终将无法获取决策所需的准确详细信息。
因此,我们将使用pt-heartbeat作为复制延迟的来源。pt-heartbeat 是一个复制延迟监控系统,通过查看实际复制数据来测量延迟。它提供来自主服务器的“绝对”延迟以及亚秒级分辨率。
下面是“ ReplicationLagQuery”配置,我们将在orchestrator配置文件中定义它。
"ReplicationLagQuery": "SELECT CAST((UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(ts)) AS unsigned INTEGER) AS 'delay' FROM percona.heartbeat ORDER BY ts DESC LIMIT 1",
我们还需要一个单独的 pt-heartbeat 进程,它将在源/副本实例上运行。
shell> pt-heartbeat --check-read-only --read-only-interval=1 --fail-successive-errors 5 --interval=0.1 --create-table --create-table-engine=InnoDB --database=percona --table=heartbeat --host=127.0.0.1 --user=heartbeat --password=Heartbeat@1234 --port=22637 --update &
shell> pt-heartbeat --check-read-only --read-only-interval=1 --fail-successive-errors 5 --interval=0.1 --create-table --create-table-engine=InnoDB --database=percona --table=heartbeat --host=127.0.0.1 --user=heartbeat --password=Heartbeat@1234 --port=22638 --update &
shell> pt-heartbeat --check-read-only --read-only-interval=1 --fail-successive-errors 5 --interval=0.1 --create-table --create-table-engine=InnoDB --database=percona --table=heartbeat --host=127.0.0.1 --user=heartbeat --password=Heartbeat@1234 --port=22639 --update &
参考 – https://docs.percona.com/percona-toolkit/pt-heartbeat.html
延迟在副本节点上计算,即当前系统时间与心跳表中复制的时间戳值之间的差值。基本上,在主节点上,pt-heartbeat每秒使用服务器 ID 和当前时间戳更新心跳表。这些更新通过异步复制复制到副本节点。
例如,
slave1 [localhost:22638] {msandbox} (percona) > select * from percona.heartbeat;
+----------------------------+-----------+------------------+----------+-----------------------+---------------------+
| ts | server_id | file | position | relay_source_log_file | exec_source_log_pos |
+----------------------------+-----------+------------------+----------+-----------------------+---------------------+
| 2025-06-06T10:34:56.410320 | 100 | mysql-bin.000023 | 1654366 | NULL | NULL |
+----------------------------+-----------+------------------+----------+-----------------------+---------------------+
2 rows in set (0.00 sec)
slave2 [localhost:22639] {root} (percona) > SELECT CAST((UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(ts)) AS signed INTEGER) AS 'delay' FROM percona.heartbeat ORDER BY ts
DESC LIMIT 1;
+-------+
| delay |
+-------+
| 53 |
+-------+
1 row in set (0.01 sec)
让我们通过一个快速场景来看一下启用“ FailMasterPromotionOnLagMinutes”的行为。
我们在后台运行一些工作负载,导致复制延迟/滞后。
slave1 [localhost:22638] {root} ((none)) > show slave statusGl;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 355883980
Relay_Log_File: mysql-relay.000003
Relay_Log_Pos: 452576487
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 453709829
Relay_Log_Space: 2503386167
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 696
然后我们尝试进行主服务器平滑故障转移,但由于复制滞后而失败。
shell> orchestrator-client -c graceful-master-takeover -alias testcluster -d Anils-MacBook-Pro.local:22638
Desginated instance Anils-MacBook-Pro.local:22638 seems to be lagging too much for this operation. Aborting.
然而,一旦复制滞后 < 1 分钟(我们为 [FailMasterPromotionOnLagMinutes] 指定的条件),故障转移过程就会运行得非常顺利。
slave1 [localhost:22638] {root} ((none)) > show slave statusGl;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 22637
Connect_Retry: 60
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 605457002
Relay_Log_File: mysql-relay.000008
Relay_Log_Pos: 605455949
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 605455733
Relay_Log_Space: 605457511
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1
因此,主服务器故障转移到“ Anils-MacBook-Pro.local:22638” 。
shell> orchestrator-client -c graceful-master-takeover -alias testcluster -d Anils-MacBook-Pro.local:22638
Anils-MacBook-Pro.local:22638
故障转移日志“/tmp/recovery.log”。
20250607 22:19:53: Recovered from DeadMaster on Anils-MacBook-Pro.local:22637. Failed: Anils-MacBook-Pro.local:22637; Promoted: Anils-MacBook-Pro.local:22638
20250607 22:19:54: (for all types) Recovered from DeadMaster on Anils-MacBook-Pro.local:22637. Failed: Anils-MacBook-Pro.local:22637; Successor: Anils-MacBook-Pro.local:22638
20250607 22:19:54: Planned takeover complete
上述选项旨在控制 MySQL Orchestrator 故障转移过程的粒度,尤其是在副本出现复制滞后的情况下。本质上,我们可以选择等待滞后问题解决后再触发故障转移,或者即使出现滞后也立即进行故障转移。此外,[ FailMasterPromotionOnLagMinutes, FailMasterPromotionIfSQLThreadNotUpToDate ]等设置可确保在出现滞后问题时故障转移失败,从而提供最大程度的一致性。
本文系外文翻译,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文系外文翻译,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。