爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。
本文约 5000 字,预计阅读需要 15 分钟。
Percona XtraDB Cluster[1] (简称:PXC)在 8.0.41 中移除了 wsrep_group_commit_queue
组件相关的代码,因为有多个导致死锁的 bug 都是因为它引起。
我已经收了多个 Crash/Hang 的工单,经分析判断,均由这个组件引起。今天我会分享其中一个。
客户描述说他们有两套 PXC 集群分布在不同的机房。两个集群通过 MySQL 原生的复制同步数据,应用会就近访问本地机房的节点读写数据。架构图如下:
他们发现 PXC 中的多个节点经常在业务繁忙的时候 Hang 住,不再处理写入 SQL,并于 600 秒后宕机。
MySQL 的 error.log
如下:
2025-07-07T05:10:25.772284Z 0 [ERROR] [MY-012872] [InnoDB] [FATAL] Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2025-07-07T05:10:25.774138Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: srv0srv.cc:2107:ib::fatal triggered thread 140395501549312
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2025-07-07T05:10:25Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=3a37a99599e0a71e11a86a930d4974a17cf35a5b
Server Version: 8.0.33-25.1 Percona XtraDB Cluster binary (GPL) 8.0.33, Revision 0c56202, WSREP version 26.1.4.3, wsrep_26.1.4.3
这是一个典型的 Crash。
当 InnoDB 的 monitor thread 发现 Semaphore wait 已经超过了 600,它会自杀去重启数据库。
我们检查 Crash 之前的日志可以看到 InnoDB output 信息. 在 SEMAPHORES 部分,我们看到有两个线程在等待 RW-latch. 这个 latch 的持有者是 pthread ID 140395374548736。
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 121
--Thread 140395688941312 has waited at buf0flu.cc line 1376 for 1190 seconds the semaphore:
SX-lock on RW-latch at 0x7fb06fd8cd58 created in file buf0buf.cc line 815
a writer (thread id 140395374548736) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0undo.ic line 126
--Thread 140395451193088 has waited at trx0rseg.ic line 51 for 1152 seconds the semaphore:
X-lock on RW-latch at 0x7fb06fd34f78 created in file buf0buf.cc line 815
a writer (thread id 140395374548736) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0rseg.ic line 51
幸运的是,我们可以搜索到这个 thread,已经在 COMMIT 状态持续了 1197 秒。
---TRANSACTION 2998, ACTIVE (PREPARED) 1197 sec committing
4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 2
MySQL thread id 43, OS thread handle 140395374548736, query id 330 innobase_commit_low (191)
# at 40289
#250706 21:50:27 server id 30001 end_log_pos 40373 CRC32 0x9ddcce0c GTID last_committed=146 sequence_number=147 rbr_only=no original_committed_timestamp=1751863827165363 immediate_commit_timestamp=1751863827756834 transaction_length=221
# original_commit_timestamp=1751863827165363 (2025-07-06 21:50:27.165363 PDT)
# immediate_commit_timestamp=1751863827756834 (2025-07-06 21:50:27.756834 PDT)
/*!80001 SET @@session.original_commit_timestamp=1751863827165363*//*!*/;
/*!80014 SET @@session.original_server_version=80033*//*!*/;
/*!80014 SET @@session.immediate_server_version=80033*//*!*/;
SET @@SESSION.GTID_NEXT= '00030001-1111-1111-1111-111111111111:128'/*!*/;
# at 40373
#250706 21:50:27 server id 30001 end_log_pos 40441 CRC32 0x53b220b7 Query thread_id=42 exec_time=0 error_code=0
SETTIMESTAMP=1751863827/*!*/;
SET @@session.sql_mode=1168113696/*!*/;
BEGIN
/*!*/;
# at 40441
#250706 21:50:27 server id 30001 end_log_pos 40510 CRC32 0x32b78ce8 Query thread_id=42 exec_time=0 error_code=0
SETTIMESTAMP=1751863827/*!*/;
COMMIT
客户在后续的更新中指出,每次 Crash 前都能看到一个空的 binlog event(在 BEGIN 和 COMMIT 之间不包含任何数据)。
先阶段,我们还无法确定为什么 COMMIT 会卡住,想要进一步分析我们希望客户提供 corefile 文件。
在 GDB 中我们可以通过 thread find 找到对应的线程。
(gdb) p/x 140395374548736
$1 = 0x7fb0586d9700
(gdb) thread find 0x7fb0586d9700
Thread 49 has target id 'Thread 0x7fb0586d9700 (LWP 10982)'
(gdb) thread 49
[Switching to thread 49 (Thread 0x7fb0586d9700 (LWP 10982))]
#0 0x00007fb0b324c48c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
后文中以
thd1
代称。
(gdb) set filename-display basename
(gdb) bt
#0 0x00007fb0b324c48c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000012578f0 in native_cond_wait (mutex=<optimized out>, cond=<optimized out>) at thr_cond.h:161
#2 my_cond_wait (mp=<optimized out>, cond=<optimized out>) at thr_cond.h:161
#3 inline_mysql_cond_wait (that=<optimized out>, mutex=<optimized out>,
src_file=0x33f0598 "/mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/sql/wsrep_binlog.cc", src_line=468)
at mysql_cond.h:198
#4 wsrep_wait_for_turn_in_group_commit (thd=0x7fb08800d430) at wsrep_binlog.cc:468
#5 0x000000000242b5a8 in trx_sys_update_wsrep_checkpoint (xid=0x7fb078016680, sys_header=0x0, mtr=0x7fb0586d6c90, recovery=<optimized out>)
at current_thd.h:33
#6 0x0000000002438303 in trx_write_serialisation_history (trx=0x7fb0948d0558, mtr=0x7fb0586d6c90) at trx0trx.cc:1746
#7 0x000000000243b5aa in trx_commit_low (trx=trx@entry=0x7fb0948d0558, mtr=0x7fb0586d6c90) at trx0trx.cc:2287
#8 0x000000000243bc94 in trx_commit (trx=0x7fb0948d0558) at trx0trx.cc:2364
#9 0x000000000243bfcc in trx_commit_for_mysql (trx=trx@entry=0x7fb0948d0558) at trx0trx.cc:2616
#10 0x000000000224a672 in innobase_commit_low (trx=trx@entry=0x7fb0948d0558) at ha_innodb.cc:6266
#11 0x0000000002279eb9 in innobase_commit (hton=<optimized out>, thd=0x7fb08800d430, commit_trx=<optimized out>) at ha_innodb.cc:6543
#12 0x0000000000d9379e in ha_commit_low (thd=0x7fb08800d430, all=true, run_after_commit=<optimized out>) at handler.cc:2045
#13 0x00000000011f4b4d in trx_coordinator::commit_in_engines (run_after_commit=<optimized out>, all=<optimized out>, thd=<optimized out>) at tc_log.cc:148
#14 trx_coordinator::commit_in_engines (thd=<optimized out>, all=<optimized out>, run_after_commit=<optimized out>) at tc_log.cc:138
#15 0x0000000001e0e2e8 in (anonymous namespace)::finish_transaction_in_engines (thd=thd@entry=0x7fb08800d430, all=<optimized out>, run_after_commit=false)
at binlog.cc:12576
#16 0x0000000001e18276 in MYSQL_BIN_LOG::process_commit_stage_queue (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, first=0x7fb08003dcd0)
at binlog.cc:9096
#17 0x0000000001e2e80b in MYSQL_BIN_LOG::ordered_commit (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, all=<optimized out>,
skip_commit=<optimized out>) at binlog.cc:9650
#18 0x0000000001e30cf6 in MYSQL_BIN_LOG::commit (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, all=<optimized out>) at binlog.cc:8841
#19 0x0000000000d952e1 in ha_commit_trans (thd=thd@entry=0x7fb08003dcd0, all=all@entry=true, ignore_global_read_lock=ignore_global_read_lock@entry=false)
at handler.cc:1878
#20 0x00000000011f809b in trans_commit (thd=thd@entry=0x7fb08003dcd0, ignore_global_read_lock=ignore_global_read_lock@entry=false) at transaction.cc:271
#21 0x0000000001e39b8b in Xid_log_event::do_commit (this=0x7fb07b4786d0, thd_arg=0x7fb08003dcd0) at log_event.cc:6272
#22 0x0000000001e39cc7 in Xid_apply_log_event::do_apply_event_worker (this=0x7fb07b478740, w=0x7fb07aecfd00) at log_event.cc:6367
#23 0x0000000001eee06f in slave_worker_exec_job_group (worker=worker@entry=0x7fb07aecfd00, rli=rli@entry=0x7fb07c691ac0) at rpl_rli_pdb.cc:2517
#24 0x0000000001ef609b in handle_slave_worker (arg=arg@entry=0x7fb07aecfd00) at rpl_replica.cc:6289
#25 0x000000000264d5b5 in pfs_spawn_thread (arg=0x7fb07ae28e20) at pfs.cc:3043
#26 0x00007fb0b32461ca in start_thread () from /lib64/libpthread.so.0
#27 0x00007fb0b13bd8d3 in clone () from /lib64/libc.so.6
在 frame 4 wsrep_wait_for_turn_in_group_commit
中,我们看到在 468 行,它在等待 COND_wsrep_group_commit
这个条件变量。
File: sql/wsrep_binlog.cc
457: while (true) {
458: if (thd == wsrep_group_commit_queue.front()) {
459: WSREP_DEBUG("Thread with id (%d) granted turn to proceed",
460: thd->thread_id());
461: break;
462: } else {
463: WSREP_DEBUG(
464: "Thread with id (%d) waiting for its turns in wsrep group"
465: " commit queue - waiting for (%d)",
466: thd->thread_id(),
467: wsrep_group_commit_queue.front()->thread_id());
468: mysql_cond_wait(&COND_wsrep_group_commit, &LOCK_wsrep_group_commit);
469: }
470: }
因为 thd1
在等待 COND_wsrep_group_commit
这个条件变量, 那么 thd1
必定不是 wsrep_group_commit_queue.front()
。
我们看一下 wsrep_group_commit_queue
的定义。
static std::queue<THD *> wsrep_group_commit_queue;
wsrep_group_commit_queue
是一个 thd 组成的先入先出的队列。
std::queue[2]
The std::queue class template is a container adaptor that gives the functionality of a queue - specifically, a FIFO (first-in, first-out) data structure.
这段代码的逻辑是,在一个 while 循环中,如果当前 thd
是 wsrep_group_commit_queue
中的第一个元素那么就跳出循环;否则就要等待 COND_wsrep_group_commit
这个条件变量。
通过查找 COND_wsrep_group_commit
,我们可以找到函数 wsrep_unregister_from_group_commit
。
void wsrep_unregister_from_group_commit(THD *thd) {
DBUG_TRACE;
if (wsrep_emulate_bin_log || thd == NULL) {
/* Binlog is off, no need to maintain group commit queue */
/* thd can be NULL if the transaction is being committed
during recovery using XID. */
return;
}
MUTEX_LOCK(guard, &LOCK_wsrep_group_commit);
if (!thd->wsrep_enforce_group_commit) {
/* Said handler was not register for wsrep group commit */
return;
}
thd->wsrep_enforce_group_commit = false;
assert(wsrep_group_commit_queue.front() == thd);
wsrep_group_commit_queue.pop();
WSREP_DEBUG(
"Un-Registering thread with id (%d) from wsrep group commit"
" queue",
thd->thread_id());
mysql_cond_broadcast(&COND_wsrep_group_commit);
return;
}
这个函数会通过 pop()
移除将队列中第一个元素,然后调用 COND_wsrep_group_commit
唤醒所有等待的线程。
我们通常通过 GDB 添加断点来查看调用逻辑,但刚好我们分析的两个函数都调用 WSREP_DEBUG 来输出 debug 日志。今天我们就通过设置 wsrep_debug 来查看。
mysql> set global wsrep_debug = "server";
mysql > update t set d = 1 where id = 6;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1 Changed: 1 Warnings: 0
2025-07-14T04:07:04.388360Z 20 [Note] [MY-000000] [WSREP] wsrep_before_prepare: 1
2025-07-14T04:07:04.389203Z 20 [Note] [MY-000000] [WSREP] wsrep: replicating and certifying write set(-1)
2025-07-14T04:07:04.413971Z 20 [Note] [MY-000000] [WSREP] wsrep: write set replicated and certified (781)
2025-07-14T04:07:04.414795Z 20 [Note] [MY-000000] [WSREP] wsrep_after_prepare: 1
2025-07-14T04:07:04.415519Z 20 [Note] [MY-000000] [WSREP] wsrep_before_commit: 1, 781
2025-07-14T04:07:04.417004Z 20 [Note] [MY-000000] [WSREP] wsrep_ordered_commit: 1
2025-07-14T04:07:04.417208Z 20 [Note] [MY-000000] [WSREP] Registering thread with id (20) in wsrep group commit queue
2025-07-14T04:07:04.447810Z 20 [Note] [MY-000000] [WSREP] Updating WSREPXid: 8322eb1e-5acf-11f0-8691-e63683345e0b:781
2025-07-14T04:07:04.448029Z 20 [Note] [MY-000000] [WSREP] Thread with id (20) granted turn to proceed
2025-07-14T04:07:04.448153Z 20 [Note] [MY-000000] [WSREP] Un-Registering thread with id (20) from wsrep group commit queue
2025-07-14T04:07:04.449338Z 20 [Note] [MY-000000] [WSREP] wsrep_after_commit: 1, 1, 781, 1
wsrep group commit
队列中wsrep_group_commit_queue
中的第一个元素(gdb) p wsrep_group_commit_queue
$2 = std::queue wrapping: std::deque with 4 elements = {0x7fb08c0298c0, 0x7fb08800d430, 0x7fb08414faf0,
0x7fb09021c3f0}
队列中包含了四个 thd 分别为:
0x7fb08c0298c0
0x7fb08800d430
0x7fb08414faf0
0x7fb09021c3f0
队列中的第一个 thd
是:
(gdb) p ((THD*)0x7fb08c0298c0)->m_thread_id
$3 = 42
(gdb) p ((THD*)0x7fb08c0298c0)->real_id
$4 = 140395373491968
(gdb) p/x ((THD*)0x7fb08c0298c0)->real_id
$5 = 0x7fb0585d7700
(gdb) thread find 0x7fb0585d7700
Thread 50 has target id 'Thread 0x7fb0585d7700 (LWP 10983)'
后文中以
thd2
代称。
对于当前出现的两个 thd
做一个总结。
number | THD pointer | Pthread ID (hex) | MySQL connection id | GDB internal number |
---|---|---|---|---|
1 | 0x7fb08800d430 | 0x7fb0586d9700 | 43 | 49 |
2 | 0x7fb08c0298c0 | 0x7fb0585d7700 | 42 | 50 |
thd2
在做什么呢?根据 frame 4 Commit_stage_manager::enroll_for
,我们可以发现由于 thd2
不是 group commit 中的 leader
,它通过条件变量 m_stage_cond_binlog
等待 leader
去提交事务。
(gdb) set filename-display basename
(gdb) bt
#0 0x00007fb0b324c48c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000001e60040 in native_cond_wait (
mutex=0x43d6cb0 <Commit_stage_manager::get_instance()::shared_instance+336>,
cond=0x43d6c40 <Commit_stage_manager::get_instance()::shared_instance+224>) at thr_cond.h:161
#2 my_cond_wait (mp=0x43d6cb0 <Commit_stage_manager::get_instance()::shared_instance+336>,
cond=0x43d6c40 <Commit_stage_manager::get_instance()::shared_instance+224>) at thr_cond.h:161
#3 inline_mysql_cond_wait (
src_file=0x34dec40 "/mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/sql/rpl_commit_stage_manager.cc", src_line=417,
mutex=0x43d6cb0 <Commit_stage_manager::get_instance()::shared_instance+336>,
that=0x43d6c40 <Commit_stage_manager::get_instance()::shared_instance+224>) at mysql_cond.h:198
#4 Commit_stage_manager::enroll_for (
this=0x43d6b60 <Commit_stage_manager::get_instance()::shared_instance>,
stage=stage@entry=Commit_stage_manager::BINLOG_FLUSH_STAGE, thd=thd@entry=0x7fb08c0298c0,
stage_mutex=stage_mutex@entry=0x0, enter_mutex=enter_mutex@entry=0x43d5b08 <mysql_bin_log+8>)
at rpl_commit_stage_manager.cc:417
#5 0x0000000001e18660 in MYSQL_BIN_LOG::change_stage (this=this@entry=0x43d5b00 <mysql_bin_log>,
thd=thd@entry=0x7fb08c0298c0, stage=stage@entry=Commit_stage_manager::BINLOG_FLUSH_STAGE,
queue=queue@entry=0x7fb08c0298c0, leave_mutex=leave_mutex@entry=0x0,
enter_mutex=enter_mutex@entry=0x43d5b08 <mysql_bin_log+8>) at binlog.cc:9173
#6 0x0000000001e2e34f in MYSQL_BIN_LOG::ordered_commit (this=0x43d5b00 <mysql_bin_log>,
thd=0x7fb08c0298c0, all=<optimized out>, skip_commit=<optimized out>) at binlog.cc:9475
#7 0x0000000001e30cf6 in MYSQL_BIN_LOG::commit (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08c0298c0,
all=<optimized out>) at binlog.cc:8841
#8 0x0000000001e3168d in MYSQL_BIN_LOG::gtid_end_transaction (this=<optimized out>,
thd=thd@entry=0x7fb08c0298c0) at binlog.cc:2011
#9 0x0000000001e39ba9 in Xid_log_event::do_commit (this=0x7fb07ac916a0, thd_arg=0x7fb08c0298c0)
at log_event.cc:6278
#10 0x0000000001e39cc7 in Xid_apply_log_event::do_apply_event_worker (this=0x7fb07ac91710, w=0x7fb07ae33360) at log_event.cc:6367
#11 0x0000000001eee06f in slave_worker_exec_job_group (worker=worker@entry=0x7fb07ae33360, rli=rli@entry=0x7fb07c691ac0) at rpl_rli_pdb.cc:2517
#12 0x0000000001ef609b in handle_slave_worker (arg=arg@entry=0x7fb07ae33360) at rpl_replica.cc:6289
#13 0x000000000264d5b5 in pfs_spawn_thread (arg=0x7fb07af21190) at pfs.cc:3043
#14 0x00007fb0b32461ca in start_thread () from /lib64/libpthread.so.0
#15 0x00007fb0b13bd8d3 in clone () from /lib64/libc.so.6
File: sql/rpl_commit_stage_manager.cc
396: /*
397: If the queue was not empty, we're a follower and wait for the
398: leader to process the queue. If we were holding a mutex, we have
399: to release it before going to sleep.
400: */
401: if (!leader) {
402: CONDITIONAL_SYNC_POINT_FOR_TIMESTAMP("before_follower_wait");
403: mysql_mutex_lock(&m_lock_done);
404: #ifndef NDEBUG
405: /*
406: Leader can be awaiting all-clear to preempt follower's execution.
407: With setting the status the follower ensures it won't execute anything
408: including thread-specific code.
409: */
410: thd->get_transaction()->m_flags.ready_preempt = true;
411: if (leader_await_preempt_status) mysql_cond_signal(&m_cond_preempt);
412: #endif
413: while (thd->tx_commit_pending) {
414: if (stage == COMMIT_ORDER_FLUSH_STAGE) {
415: mysql_cond_wait(&m_stage_cond_commit_order, &m_lock_done);
416: } else {
417: mysql_cond_wait(&m_stage_cond_binlog, &m_lock_done);
418: }
419: }
leader
?我们需要寻找那些运行在 MYSQL_BIN_LOG::ordered_commit
中却不在 Commit_stage_manager::enroll_for
中的线程。
我们借助 pt-pmp
中的 aggregate_stacktrace
分组聚合 stracktrace
然后检索得到的结果为:
shell> gdb ~/opt/mysql/pxc8.0.33/bin/mysqld core.2035 --batch -ex "thread apply all bt" &> threads.log
shell> source /usr/bin/pt-pmp
shell> cat threads.log | aggregate_stacktrace > aggregate.log
shell> grep MYSQL_BIN_LOG::ordered_commit aggregate.log | grep -v enroll | tr ",""\n"
1 pthread_cond_wait
native_cond_wait(thr_cond.h:161)
my_cond_wait(thr_cond.h:161)
inline_mysql_cond_wait(mysql_cond.h:198)
wsrep_wait_for_turn_in_group_commit(wsrep_binlog.cc:468)
trx_sys_update_wsrep_checkpoint(current_thd.h:33)
trx_write_serialisation_history(trx0trx.cc:1746)
trx_commit_low(trx0trx.cc:2287)
trx_commit(trx0trx.cc:2364)
trx_commit_for_mysql(trx0trx.cc:2616)
innobase_commit_low(ha_innodb.cc:6266)
innobase_commit(ha_innodb.cc:6543)
ha_commit_low(handler.cc:2045)
trx_coordinator::commit_in_engines(tc_log.cc:148)
trx_coordinator::commit_in_engines(tc_log.cc:138)
(anonymous(binlog.cc:12576)
MYSQL_BIN_LOG::process_commit_stage_queue(binlog.cc:9096)
MYSQL_BIN_LOG::ordered_commit(binlog.cc:9650)
MYSQL_BIN_LOG::commit(binlog.cc:8841)
ha_commit_trans(handler.cc:1878)
trans_commit(transaction.cc:271)
Xid_log_event::do_commit(log_event.cc:6272)
Xid_apply_log_event::do_apply_event_worker(log_event.cc:6367)
slave_worker_exec_job_group(rpl_rli_pdb.cc:2517)
handle_slave_worker(rpl_replica.cc:6289)
pfs_spawn_thread(pfs.cc:3043)
start_thread(libpthread.so.0)
clone(libc.so.6)
通过检索 stracktrace
,我们可以确认 leader
线程竟然是 thd1
。
thd1
通过 COND_wsrep_group_commit
等待 thd2
。thd2
通过 m_stage_cond_binlog
等待 thd1
。形成了一个环形死锁.
thd1
当我们仔细检查 backtrace
时,发现一个奇怪的现象:
0x7fb08800d430
0x7fb08003dcd0
#15 0x0000000001e0e2e8 in (anonymous namespace)::finish_transaction_in_engines (thd=thd@entry=0x7fb08800d430, all=<optimized out>, run_after_commit=false)
at binlog.cc:12576
#16 0x0000000001e18276 in MYSQL_BIN_LOG::process_commit_stage_queue (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, first=0x7fb08003dcd0)
at binlog.cc:9096
简化后的 process_commit_stage_queue
代码为:
for (THD *head = first; head; head = head->next_to_commit) {
Thd_backup_and_restore switch_thd(thd, head);
...
::finish_transaction_in_engines(head, all, false);
...
}
我们通过 next_to_commit
这个指针可以遍历 group commit 中所有的 thd
。
(gdb) p first
$6 = (THD *) 0x7fb08003dcd0
后文中以
thd3
代称。
(gdb) p first->next_to_commit
$7 = (THD *) 0x7fb08c0298c0
(gdb) p first->next_to_commit->next_to_commit
$8 = (THD *) 0x7fb08800d430
(gdb) p first->next_to_commit->next_to_commit->next_to_commit
$9 = (THD *) 0x0
我们看到有三个 thd
并且 thd2
也在组中。
我们仔细观察 Thd_backup_and_restore
class,可以发现它对 thd
做了一些操作。
/**
Try to attach the POSIX thread to a session.
@param[in] backup_thd The thd to restore to when object is destructed.
@param[in] new_thd The thd to attach to.
*/
Thd_backup_and_restore(THD *backup_thd, THD *new_thd)
: m_backup_thd(backup_thd),
m_new_thd(new_thd),
m_new_thd_old_real_id(new_thd->real_id),
m_new_thd_old_thread_stack(new_thd->thread_stack) {
assert(m_backup_thd != nullptr && m_new_thd != nullptr);
// Reset the state of the current thd.
m_backup_thd->restore_globals();
m_new_thd->thread_stack = m_backup_thd->thread_stack;
m_new_thd->store_globals();
THD::store_globals
这里切换了 thd
对应的 Pthread ID。
real_id = my_thread_self();
我们对出现过的 thd
重新做一次梳理。
number | THD pointer | Pthread ID (hex) | MySQL connection id | GDB internal number |
---|---|---|---|---|
1 | 0x7fb08800d430 | 0x7fb0586d9700 | 43 | 49 |
2 | 0x7fb08c0298c0 | 0x7fb0585d7700 | 42 | 50 |
3 | 0x7fb08003dcd0 | 0x0 | 41 | 50 |
4 | 0x7fb08800d430 | 0x7fb0584d5700 | 43 | 51 |
我们真正的 leader
线程其实是 thd3
。
leader
线程通过 finish_transaction_in_engines
去处理每一个 thd
。
顺序为:
thd3 -> thd2 -> thd1
分析真个函数可能会太过于耗费时间,我们更关心的是它和 wsrep_group_commit_queue
之间的关系。
我们可以在 finish_transaction_in_engines
添加一个断点并且执行一个 DML query。
2025-07-14T10:12:45.798369Z 13 [Note] [MY-000000] [WSREP] Registering thread with id (13) in wsrep group commit queue
可以看到注册 thd
到 wsrep_group_commit_queue
。
Thread 45 "connection" hit Breakpoint 3, (anonymous namespace)::finish_transaction_in_engines (
thd=thd@entry=0x7ff743326ca0, all=false, run_after_commit=false)
at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/unique_ptr.h:421
421 /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/unique_ptr.h: No such file or directory.
(gdb) finish
Run till exit from #0 (anonymous namespace)::finish_transaction_in_engines (
thd=thd@entry=0x7ff743326ca0, all=false, run_after_commit=false)
at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/unique_ptr.h:421
MYSQL_BIN_LOG::process_commit_stage_queue (this=0x43d5b00 <mysql_bin_log>, thd=0x7ff743326ca0,
first=0x7ff743326ca0)
at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/sql/binlog.cc:9099
暂停到 finish_transaction_in_engines
执行结束时。
2025-07-14T10:12:51.423272Z 13 [Note] [MY-000000] [WSREP] Updating WSREPXid: 8322eb1e-5acf-11f0-8691-e63683345e0b:822
2025-07-14T10:12:51.423625Z 13 [Note] [MY-000000] [WSREP] Thread with id (13) granted turn to proceed
2025-07-14T10:12:51.423694Z 13 [Note] [MY-000000] [WSREP] Un-Registering thread with id (13) from wsrep group commit queue
可以看到 wsrep_group_commit_queue
移除了 thd
。
通过测试我们发现:
finish_transaction_in_engines
调用 wsrep_unregister_from_group_commit
去移除 thd
thd3
将 wsrep_group_commit_queue
中的 thd3
, thd2
移除thd1
通过 COND_wsrep_group_commit
等待 thd2
, 因为thd2
依然在 wsrep_group_commit_queue
中thd2
通过 m_stage_cond_binlog
等待 thd3
那么显然 thd3
并没有成功的将 thd2
移除。
thd2
依然在 wsrep_group_commit_queue
中?当我们将目光投降 thd2
时,我们发现它的 MySQL connection ID 刚好是客户指出的空 binlog event 对应的线程!
(gdb) p ((THD*) 0x7fb08c0298c0)->m_thread_id
$9 = 42
#250706 21:50:27 server id 30001 end_log_pos 40510 CRC32 0x32b78ce8 Query thread_id=42 exec_time=0 error_code=0
node1 [localhost:10001] {msandbox} (test) > update t set d = d where id = 6;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1 Changed: 0 Warnings: 0
想到 thd2
是 replication
的 SQL thread,我们做了以中尝试。
在 PXC1 中先修改一条数据,然后在 PXC1 的 Primary 中修改同样的结果并让 PXC1 应用。
node1 [localhost:10001] {msandbox} (test) > update t set d = 1 where id = 1;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
master [localhost:30001] {msandbox} (test) > update t set d = 1 where id = 1;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 0
我们得到了一个空的 binlog event。
# at 2011
#250714 5:06:54 server id 30001 end_log_pos 2095 CRC32 0xc2571d3f GTID last_committed=7 sequence_number=8 rbr_only=no original_committed_timestamp=1752494814375749 immediate_commit_timestamp=1752494814431609 transaction_length=221
# original_commit_timestamp=1752494814375749 (2025-07-14 05:06:54.375749 PDT)
# immediate_commit_timestamp=1752494814431609 (2025-07-14 05:06:54.431609 PDT)
/*!80001 SET @@session.original_commit_timestamp=1752494814375749*//*!*/;
/*!80014 SET @@session.original_server_version=80033*//*!*/;
/*!80014 SET @@session.immediate_server_version=80033*//*!*/;
SET @@SESSION.GTID_NEXT= '00030001-1111-1111-1111-111111111111:6'/*!*/;
# at 2095
#250714 5:06:54 server id 30001 end_log_pos 2163 CRC32 0x4a03e19e Query thread_id=11 exec_time=0 error_code=0
SET TIMESTAMP=1752494814/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
BEGIN
/*!*/;
# at 2163
#250714 5:06:54 server id 30001 end_log_pos 2232 CRC32 0x71e9ad2e Query thread_id=11 exec_time=0 error_code=0
SET TIMESTAMP=1752494814/*!*/;
COMMIT
/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
b wsrep_register_for_group_commit
commands
silent
printf"Id %d [ Xid %d ] register\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end
b wsrep_unregister_from_group_commit
commands
silent
printf"Id %d [ Xid %d ] leaves\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end
b wsrep_wait_for_turn_in_group_commit
commands
silent
printf"Id %d [ Xid %d ] wait\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end
b innobase_commit
commands
silent
printf"Id %d [ Xid %d ] innobase_commit\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end
b ha_commit_low
commands
silent
printf"Id %d [ Xid %d ] enters ha_commit_low\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end
b sql/handler.cc:2012
commands
silent
printf"Id %d [ Xid %d ] ha_list is not null \n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end
Id 21 [ Xid 4302 ] register
Id 15 [ Xid 4303 ] register
Id 16 [ Xid 4304 ] register
Id 17 [ Xid 4305 ] register
Id 21 [ Xid 4302 ] enters ha_commit_low
Id 18 [ Xid 4306 ] register
Id 21 [ Xid 4302 ] ha_list is not null
Id 21 [ Xid 4302 ] innobase_commit
Id 21 [ Xid 4302 ] wait
Id 21 [ Xid 4302 ] leaves
Id 15 [ Xid 4303 ] enters ha_commit_low
Id 16 [ Xid 4304 ] enters ha_commit_low
Id 16 [ Xid 4304 ] ha_list is not null
Id 16 [ Xid 4304 ] innobase_commit
Id 16 [ Xid 4304 ] wait
Id 22 [ Xid 4307 ] register
通过添加断点打印日志我们发现:
有问题的 thd (Id 15) 没有调用 ht->commit
(对于 InnoDB 表是 innobase_commit
) 导致 thd 还留在了 wsrep_group_commit_queue
中。
dbdeployer deploy --topology=pxc replication --base-port=10000 --base-server-id=10000 pxc8.0.33
dbdeployer deploy --gtid replication --base-port=30000 --base-server-id=30000 pxc8.0.33
node1_port=10001 # the port of the node 1 in the PXC
master_port=30001 # the port of the master node in the replication
alias node1_cmd="mysql --prompt='(node1)[db: \d] ' -A -umsandbox -pmsandbox -h127.0.0.1 -P${node1_port}"
alias master_cmd="mysql --prompt='(master)[db: \d] ' -A -umsandbox -pmsandbox -h127.0.0.1 -P${master_port}"
cat << EOF | node1_cmd
CHANGE MASTER TO master_host="127.0.0.1",
MASTER_PORT=${master_port},
MASTER_USER="rsandbox",
MASTER_PASSWORD="rsandbox",
MASTER_AUTO_POSITION = 1;
start slave;
EOF
master_cmd -BNe "create table test.t (id int auto_increment, d int, primary key (id));insert into test.t (id,d) values (1,0);"
for i in {1..1000};
do
node1_cmd -BNe "update test.t set d = d+1 where id = 1;" &> /dev/null
master_cmd -BNe "update test.t set d = d+1 where id = 1;" &> /dev/null
done
sysbench oltp_insert --threads=10 --time=600 --tables=1 --table-size=1000 \
--db-driver=mysql --mysql-db=test --mysql-storage-engine=innodb --report-interval=10 \
--mysql-user=msandbox --mysql-password=msandbox --mysql-host=127.0.0.1 --mysql-port=${node1_port} \
prepare
sysbench oltp_insert --threads=10 --time=600 --tables=1 --table-size=1000 \
--db-driver=mysql --mysql-db=test --mysql-storage-engine=innodb --report-interval=10 \
--mysql-user=msandbox --mysql-password=msandbox --mysql-host=127.0.0.1 --mysql-port=${node1_port} \
run
Initializing worker threads...
Threads started!
[ 10s ] thds: 10 tps: 42.77 qps: 42.77 (r/w/o: 0.00/42.77/0.00) lat (ms,95%): 623.33 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 10 tps: 37.00 qps: 37.00 (r/w/o: 0.00/37.00/0.00) lat (ms,95%): 549.52 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 10 tps: 23.40 qps: 23.40 (r/w/o: 0.00/23.40/0.00) lat (ms,95%): 634.66 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 10 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 10 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
SEMAPHORES
--Thread 140617271166720 has waited at trx0rseg.ic line 51 for 13 seconds the semaphore:
X-lock on RW-latch at 0x7fe40fd3cdf8 created in file buf0buf.cc line 815
a writer (thread id 140617197684480) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0rseg.ic line 51
--Thread 140617724176128 has waited at buf0flu.cc line 1376 for 10 seconds the semaphore:
SX-lock on RW-latch at 0x7fe40fd3d0d8 created in file buf0buf.cc line 815
a writer (thread id 140617197684480) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0undo.ic line 126
TRANSACTIONS
---TRANSACTION 4417, ACTIVE (PREPARED) 14 sec committing
mysql tables in use 1, locked 1
1 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 1
MySQL thread id 70, OS thread handle 140617197684480, query id 1411 localhost 127.0.0.1 msandbox innobase_commit_low (1218)
INSERT INTO sbtest1 (id, k, c, pad) VALUES (0, 504, '57904754910-88624042600-21941801151-17844023199-16047046883-99245408053-22028087049-26597707866-59381249459-19610307722', '33663561666-47879788257-03701217268-04391412711-14170077666')
--------
当我们复现了 bug 后,再回看客户给出的描述就会发现,客户说的情况完全匹配了我们的复现条件。
其中
最后想说的话:所以,在复现一些 bug 时,理解数据库的使用场景会很好的帮助我们复现问题。
参考资料
[1]
PXC: https://www.percona.com/resources/datasheets/percona-xtradb-cluster
[2]
std::queue: https://en.cppreference.com/w/cpp/container/queue.html
本文关键字:#MySQL #PXC