首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >PXC 现场频繁死锁,某组件惨遭官方删除

PXC 现场频繁死锁,某组件惨遭官方删除

作者头像
爱可生开源社区
发布2025-07-17 16:50:16
发布2025-07-17 16:50:16
19300
代码可运行
举报
运行总次数:0
代码可运行
作者:马金友, 一名给 MySQL 找 bug 的初级 DBA。

爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

本文约 5000 字,预计阅读需要 15 分钟。

Percona XtraDB Cluster[1] (简称:PXC)在 8.0.41 中移除了 wsrep_group_commit_queue 组件相关的代码,因为有多个导致死锁的 bug 都是因为它引起。

我已经收了多个 Crash/Hang 的工单,经分析判断,均由这个组件引起。今天我会分享其中一个。

现象描述

客户描述说他们有两套 PXC 集群分布在不同的机房。两个集群通过 MySQL 原生的复制同步数据,应用会就近访问本地机房的节点读写数据。架构图如下:

他们发现 PXC 中的多个节点经常在业务繁忙的时候 Hang 住,不再处理写入 SQL,并于 600 秒后宕机。

错误日志分析

MySQL 的 error.log 如下:

代码语言:javascript
代码运行次数:0
运行
复制
2025-07-07T05:10:25.772284Z 0 [ERROR] [MY-012872] [InnoDB] [FATAL] Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2025-07-07T05:10:25.774138Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: srv0srv.cc:2107:ib::fatal triggered thread 140395501549312
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2025-07-07T05:10:25Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=3a37a99599e0a71e11a86a930d4974a17cf35a5b
Server Version: 8.0.33-25.1 Percona XtraDB Cluster binary (GPL) 8.0.33, Revision 0c56202, WSREP version 26.1.4.3, wsrep_26.1.4.3

这是一个典型的 Crash。

当 InnoDB 的 monitor thread 发现 Semaphore wait 已经超过了 600,它会自杀去重启数据库。

我们检查 Crash 之前的日志可以看到 InnoDB output 信息. 在 SEMAPHORES 部分,我们看到有两个线程在等待 RW-latch. 这个 latch 的持有者是 pthread ID 140395374548736。

代码语言:javascript
代码运行次数:0
运行
复制
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 121
--Thread 140395688941312 has waited at buf0flu.cc line 1376 for 1190 seconds the semaphore:
SX-lock on RW-latch at 0x7fb06fd8cd58 created in file buf0buf.cc line 815
a writer (thread id 140395374548736) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0undo.ic line 126
--Thread 140395451193088 has waited at trx0rseg.ic line 51 for 1152 seconds the semaphore:
X-lock on RW-latch at 0x7fb06fd34f78 created in file buf0buf.cc line 815
a writer (thread id 140395374548736) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0rseg.ic line 51

幸运的是,我们可以搜索到这个 thread,已经在 COMMIT 状态持续了 1197 秒。

代码语言:javascript
代码运行次数:0
运行
复制
---TRANSACTION 2998, ACTIVE (PREPARED) 1197 sec committing
4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 2
MySQL thread id 43, OS thread handle 140395374548736, query id 330 innobase_commit_low (191)

binary log

代码语言:javascript
代码运行次数:0
运行
复制
# at 40289
#250706 21:50:27 server id 30001  end_log_pos 40373 CRC32 0x9ddcce0c    GTID    last_committed=146      sequence_number=147     rbr_only=no     original_committed_timestamp=1751863827165363   immediate_commit_timestamp=1751863827756834     transaction_length=221
# original_commit_timestamp=1751863827165363 (2025-07-06 21:50:27.165363 PDT)
# immediate_commit_timestamp=1751863827756834 (2025-07-06 21:50:27.756834 PDT)
/*!80001 SET @@session.original_commit_timestamp=1751863827165363*//*!*/;
/*!80014 SET @@session.original_server_version=80033*//*!*/;
/*!80014 SET @@session.immediate_server_version=80033*//*!*/;
SET @@SESSION.GTID_NEXT= '00030001-1111-1111-1111-111111111111:128'/*!*/;
# at 40373
#250706 21:50:27 server id 30001  end_log_pos 40441 CRC32 0x53b220b7    Query   thread_id=42    exec_time=0     error_code=0
SETTIMESTAMP=1751863827/*!*/;
SET @@session.sql_mode=1168113696/*!*/;
BEGIN
/*!*/;
# at 40441
#250706 21:50:27 server id 30001  end_log_pos 40510 CRC32 0x32b78ce8    Query   thread_id=42    exec_time=0     error_code=0
SETTIMESTAMP=1751863827/*!*/;
COMMIT

客户在后续的更新中指出,每次 Crash 前都能看到一个空的 binlog event(在 BEGIN 和 COMMIT 之间不包含任何数据)。

先阶段,我们还无法确定为什么 COMMIT 会卡住,想要进一步分析我们希望客户提供 corefile 文件。

分析 corefile

pthread ID 140395374548736

在 GDB 中我们可以通过 thread find 找到对应的线程。

代码语言:javascript
代码运行次数:0
运行
复制
(gdb) p/x 140395374548736
$1 = 0x7fb0586d9700
(gdb) thread find 0x7fb0586d9700
Thread 49 has target id 'Thread 0x7fb0586d9700 (LWP 10982)'
(gdb) thread 49
[Switching to thread 49 (Thread 0x7fb0586d9700 (LWP 10982))]
#0  0x00007fb0b324c48c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

后文中以 thd1 代称。

backtrace

代码语言:javascript
代码运行次数:0
运行
复制
(gdb) set filename-display basename
(gdb) bt
#0  0x00007fb0b324c48c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000012578f0 in native_cond_wait (mutex=<optimized out>, cond=<optimized out>) at thr_cond.h:161
#2  my_cond_wait (mp=<optimized out>, cond=<optimized out>) at thr_cond.h:161
#3  inline_mysql_cond_wait (that=<optimized out>, mutex=<optimized out>,
    src_file=0x33f0598 "/mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/sql/wsrep_binlog.cc", src_line=468)
    at mysql_cond.h:198
#4  wsrep_wait_for_turn_in_group_commit (thd=0x7fb08800d430) at wsrep_binlog.cc:468
#5  0x000000000242b5a8 in trx_sys_update_wsrep_checkpoint (xid=0x7fb078016680, sys_header=0x0, mtr=0x7fb0586d6c90, recovery=<optimized out>)
    at current_thd.h:33
#6  0x0000000002438303 in trx_write_serialisation_history (trx=0x7fb0948d0558, mtr=0x7fb0586d6c90) at trx0trx.cc:1746
#7  0x000000000243b5aa in trx_commit_low (trx=trx@entry=0x7fb0948d0558, mtr=0x7fb0586d6c90) at trx0trx.cc:2287
#8  0x000000000243bc94 in trx_commit (trx=0x7fb0948d0558) at trx0trx.cc:2364
#9  0x000000000243bfcc in trx_commit_for_mysql (trx=trx@entry=0x7fb0948d0558) at trx0trx.cc:2616
#10 0x000000000224a672 in innobase_commit_low (trx=trx@entry=0x7fb0948d0558) at ha_innodb.cc:6266
#11 0x0000000002279eb9 in innobase_commit (hton=<optimized out>, thd=0x7fb08800d430, commit_trx=<optimized out>) at ha_innodb.cc:6543
#12 0x0000000000d9379e in ha_commit_low (thd=0x7fb08800d430, all=true, run_after_commit=<optimized out>) at handler.cc:2045
#13 0x00000000011f4b4d in trx_coordinator::commit_in_engines (run_after_commit=<optimized out>, all=<optimized out>, thd=<optimized out>) at tc_log.cc:148
#14 trx_coordinator::commit_in_engines (thd=<optimized out>, all=<optimized out>, run_after_commit=<optimized out>) at tc_log.cc:138
#15 0x0000000001e0e2e8 in (anonymous namespace)::finish_transaction_in_engines (thd=thd@entry=0x7fb08800d430, all=<optimized out>, run_after_commit=false)
    at binlog.cc:12576
#16 0x0000000001e18276 in MYSQL_BIN_LOG::process_commit_stage_queue (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, first=0x7fb08003dcd0)
    at binlog.cc:9096
#17 0x0000000001e2e80b in MYSQL_BIN_LOG::ordered_commit (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, all=<optimized out>,
    skip_commit=<optimized out>) at binlog.cc:9650
#18 0x0000000001e30cf6 in MYSQL_BIN_LOG::commit (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, all=<optimized out>) at binlog.cc:8841
#19 0x0000000000d952e1 in ha_commit_trans (thd=thd@entry=0x7fb08003dcd0, all=all@entry=true, ignore_global_read_lock=ignore_global_read_lock@entry=false)
    at handler.cc:1878
#20 0x00000000011f809b in trans_commit (thd=thd@entry=0x7fb08003dcd0, ignore_global_read_lock=ignore_global_read_lock@entry=false) at transaction.cc:271
#21 0x0000000001e39b8b in Xid_log_event::do_commit (this=0x7fb07b4786d0, thd_arg=0x7fb08003dcd0) at log_event.cc:6272
#22 0x0000000001e39cc7 in Xid_apply_log_event::do_apply_event_worker (this=0x7fb07b478740, w=0x7fb07aecfd00) at log_event.cc:6367
#23 0x0000000001eee06f in slave_worker_exec_job_group (worker=worker@entry=0x7fb07aecfd00, rli=rli@entry=0x7fb07c691ac0) at rpl_rli_pdb.cc:2517
#24 0x0000000001ef609b in handle_slave_worker (arg=arg@entry=0x7fb07aecfd00) at rpl_replica.cc:6289
#25 0x000000000264d5b5 in pfs_spawn_thread (arg=0x7fb07ae28e20) at pfs.cc:3043
#26 0x00007fb0b32461ca in start_thread () from /lib64/libpthread.so.0
#27 0x00007fb0b13bd8d3 in clone () from /lib64/libc.so.6

条件变量 COND_wsrep_group_commit

等待

在 frame 4 wsrep_wait_for_turn_in_group_commit中,我们看到在 468 行,它在等待 COND_wsrep_group_commit 这个条件变量。

代码语言:javascript
代码运行次数:0
运行
复制
File: sql/wsrep_binlog.cc
457:   while (true) {
458:     if (thd == wsrep_group_commit_queue.front()) {
459:       WSREP_DEBUG("Thread with id (%d) granted turn to proceed",
460:                   thd->thread_id());
461:       break;
462:     } else {
463:       WSREP_DEBUG(
464:           "Thread with id (%d) waiting for its turns in wsrep group"
465:           " commit queue - waiting for (%d)",
466:           thd->thread_id(),
467:           wsrep_group_commit_queue.front()->thread_id());
468:       mysql_cond_wait(&COND_wsrep_group_commit, &LOCK_wsrep_group_commit);
469:     }
470:   }

因为 thd1 在等待 COND_wsrep_group_commit 这个条件变量, 那么 thd1 必定不是 wsrep_group_commit_queue.front()

我们看一下 wsrep_group_commit_queue 的定义。

代码语言:javascript
代码运行次数:0
运行
复制
static std::queue<THD *> wsrep_group_commit_queue;

wsrep_group_commit_queue 是一个 thd 组成的先入先出的队列。

std::queue[2]

The std::queue class template is a container adaptor that gives the functionality of a queue - specifically, a FIFO (first-in, first-out) data structure.

这段代码的逻辑是,在一个 while 循环中,如果当前 thdwsrep_group_commit_queue 中的第一个元素那么就跳出循环;否则就要等待 COND_wsrep_group_commit 这个条件变量。

唤醒

通过查找 COND_wsrep_group_commit,我们可以找到函数 wsrep_unregister_from_group_commit

代码语言:javascript
代码运行次数:0
运行
复制
void wsrep_unregister_from_group_commit(THD *thd) {
  DBUG_TRACE;
if (wsrep_emulate_bin_log || thd == NULL) {
    /* Binlog is off, no need to maintain group commit queue */
    /* thd can be NULL if the transaction is being committed
       during recovery using XID. */
    return;
  }

  MUTEX_LOCK(guard, &LOCK_wsrep_group_commit);

if (!thd->wsrep_enforce_group_commit) {
    /* Said handler was not register for wsrep group commit */
    return;
  }

  thd->wsrep_enforce_group_commit = false;
  assert(wsrep_group_commit_queue.front() == thd);
  wsrep_group_commit_queue.pop();
  WSREP_DEBUG(
      "Un-Registering thread with id (%d) from wsrep group commit"
      " queue",
      thd->thread_id());
  mysql_cond_broadcast(&COND_wsrep_group_commit);
return;
}

这个函数会通过 pop() 移除将队列中第一个元素,然后调用 COND_wsrep_group_commit 唤醒所有等待的线程。

wsrep_group_commit_queue

我们通常通过 GDB 添加断点来查看调用逻辑,但刚好我们分析的两个函数都调用 WSREP_DEBUG 来输出 debug 日志。今天我们就通过设置 wsrep_debug 来查看。

  • 打开 debug
代码语言:javascript
代码运行次数:0
运行
复制
mysql> set global wsrep_debug = "server";
  • 执行 DML
代码语言:javascript
代码运行次数:0
运行
复制
mysql > update t set d = 1 where id = 6;
Query OK, 1 row affected (0.09 sec)
Rows matched: 1  Changed: 1  Warnings: 0
  • 检查错误日志
代码语言:javascript
代码运行次数:0
运行
复制
2025-07-14T04:07:04.388360Z 20 [Note] [MY-000000] [WSREP] wsrep_before_prepare: 1
2025-07-14T04:07:04.389203Z 20 [Note] [MY-000000] [WSREP] wsrep: replicating and certifying write set(-1)
2025-07-14T04:07:04.413971Z 20 [Note] [MY-000000] [WSREP] wsrep: write set replicated and certified (781)
2025-07-14T04:07:04.414795Z 20 [Note] [MY-000000] [WSREP] wsrep_after_prepare: 1
2025-07-14T04:07:04.415519Z 20 [Note] [MY-000000] [WSREP] wsrep_before_commit: 1, 781
2025-07-14T04:07:04.417004Z 20 [Note] [MY-000000] [WSREP] wsrep_ordered_commit: 1
2025-07-14T04:07:04.417208Z 20 [Note] [MY-000000] [WSREP] Registering thread with id (20) in wsrep group commit queue
2025-07-14T04:07:04.447810Z 20 [Note] [MY-000000] [WSREP] Updating WSREPXid: 8322eb1e-5acf-11f0-8691-e63683345e0b:781
2025-07-14T04:07:04.448029Z 20 [Note] [MY-000000] [WSREP] Thread with id (20) granted turn to proceed
2025-07-14T04:07:04.448153Z 20 [Note] [MY-000000] [WSREP] Un-Registering thread with id (20) from wsrep group commit queue
2025-07-14T04:07:04.449338Z 20 [Note] [MY-000000] [WSREP] wsrep_after_commit: 1, 1, 781, 1
  1. 把 thd 加到 wsrep group commit 队列中
  2. 处理 thd
  3. 从 wsrep group commit 队列中移除 thd

问题 1: 谁是 wsrep_group_commit_queue 中的第一个元素

  • 查看队列中的元素
代码语言:javascript
代码运行次数:0
运行
复制
(gdb) p wsrep_group_commit_queue
$2 = std::queue wrapping: std::deque with 4 elements = {0x7fb08c0298c0, 0x7fb08800d430, 0x7fb08414faf0,
  0x7fb09021c3f0}

队列中包含了四个 thd 分别为:

  1. 0x7fb08c0298c0
  2. 0x7fb08800d430
  3. 0x7fb08414faf0
  4. 0x7fb09021c3f0

队列中的第一个 thd 是:

代码语言:javascript
代码运行次数:0
运行
复制
(gdb) p ((THD*)0x7fb08c0298c0)->m_thread_id
$3 = 42
(gdb) p ((THD*)0x7fb08c0298c0)->real_id
$4 = 140395373491968
(gdb) p/x  ((THD*)0x7fb08c0298c0)->real_id
$5 = 0x7fb0585d7700
(gdb) thread find 0x7fb0585d7700
Thread 50 has target id 'Thread 0x7fb0585d7700 (LWP 10983)'

后文中以 thd2 代称。

对于当前出现的两个 thd 做一个总结。

number

THD pointer

Pthread ID (hex)

MySQL connection id

GDB internal number

1

0x7fb08800d430

0x7fb0586d9700

43

49

2

0x7fb08c0298c0

0x7fb0585d7700

42

50

问题 2: thd2 在做什么呢?

根据 frame 4 Commit_stage_manager::enroll_for,我们可以发现由于 thd2 不是 group commit 中的 leader,它通过条件变量 m_stage_cond_binlog 等待 leader 去提交事务。

代码语言:javascript
代码运行次数:0
运行
复制
(gdb) set filename-display basename
(gdb) bt
#0  0x00007fb0b324c48c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000001e60040 in native_cond_wait (
    mutex=0x43d6cb0 <Commit_stage_manager::get_instance()::shared_instance+336>,
    cond=0x43d6c40 <Commit_stage_manager::get_instance()::shared_instance+224>) at thr_cond.h:161
#2  my_cond_wait (mp=0x43d6cb0 <Commit_stage_manager::get_instance()::shared_instance+336>,
    cond=0x43d6c40 <Commit_stage_manager::get_instance()::shared_instance+224>) at thr_cond.h:161
#3  inline_mysql_cond_wait (
    src_file=0x34dec40 "/mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/sql/rpl_commit_stage_manager.cc", src_line=417,
    mutex=0x43d6cb0 <Commit_stage_manager::get_instance()::shared_instance+336>,
    that=0x43d6c40 <Commit_stage_manager::get_instance()::shared_instance+224>) at mysql_cond.h:198
#4  Commit_stage_manager::enroll_for (
    this=0x43d6b60 <Commit_stage_manager::get_instance()::shared_instance>,
    stage=stage@entry=Commit_stage_manager::BINLOG_FLUSH_STAGE, thd=thd@entry=0x7fb08c0298c0,
    stage_mutex=stage_mutex@entry=0x0, enter_mutex=enter_mutex@entry=0x43d5b08 <mysql_bin_log+8>)
    at rpl_commit_stage_manager.cc:417
#5  0x0000000001e18660 in MYSQL_BIN_LOG::change_stage (this=this@entry=0x43d5b00 <mysql_bin_log>,
    thd=thd@entry=0x7fb08c0298c0, stage=stage@entry=Commit_stage_manager::BINLOG_FLUSH_STAGE,
    queue=queue@entry=0x7fb08c0298c0, leave_mutex=leave_mutex@entry=0x0,
    enter_mutex=enter_mutex@entry=0x43d5b08 <mysql_bin_log+8>) at binlog.cc:9173
#6  0x0000000001e2e34f in MYSQL_BIN_LOG::ordered_commit (this=0x43d5b00 <mysql_bin_log>,
    thd=0x7fb08c0298c0, all=<optimized out>, skip_commit=<optimized out>) at binlog.cc:9475
#7  0x0000000001e30cf6 in MYSQL_BIN_LOG::commit (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08c0298c0,
    all=<optimized out>) at binlog.cc:8841
#8  0x0000000001e3168d in MYSQL_BIN_LOG::gtid_end_transaction (this=<optimized out>,
    thd=thd@entry=0x7fb08c0298c0) at binlog.cc:2011
#9  0x0000000001e39ba9 in Xid_log_event::do_commit (this=0x7fb07ac916a0, thd_arg=0x7fb08c0298c0)
    at log_event.cc:6278
#10 0x0000000001e39cc7 in Xid_apply_log_event::do_apply_event_worker (this=0x7fb07ac91710, w=0x7fb07ae33360) at log_event.cc:6367
#11 0x0000000001eee06f in slave_worker_exec_job_group (worker=worker@entry=0x7fb07ae33360, rli=rli@entry=0x7fb07c691ac0) at rpl_rli_pdb.cc:2517
#12 0x0000000001ef609b in handle_slave_worker (arg=arg@entry=0x7fb07ae33360) at rpl_replica.cc:6289
#13 0x000000000264d5b5 in pfs_spawn_thread (arg=0x7fb07af21190) at pfs.cc:3043
#14 0x00007fb0b32461ca in start_thread () from /lib64/libpthread.so.0
#15 0x00007fb0b13bd8d3 in clone () from /lib64/libc.so.6
代码语言:javascript
代码运行次数:0
运行
复制
File: sql/rpl_commit_stage_manager.cc
396:   /*
397:     If the queue was not empty, we're a follower and wait for the
398:     leader to process the queue. If we were holding a mutex, we have
399:     to release it before going to sleep.
400:   */
401:   if (!leader) {
402:     CONDITIONAL_SYNC_POINT_FOR_TIMESTAMP("before_follower_wait");
403:     mysql_mutex_lock(&m_lock_done);
404: #ifndef NDEBUG
405:     /*
406:       Leader can be awaiting all-clear to preempt follower's execution.
407:       With setting the status the follower ensures it won't execute anything
408:       including thread-specific code.
409:     */
410:     thd->get_transaction()->m_flags.ready_preempt = true;
411:     if (leader_await_preempt_status) mysql_cond_signal(&m_cond_preempt);
412: #endif
413:     while (thd->tx_commit_pending) {
414:       if (stage == COMMIT_ORDER_FLUSH_STAGE) {
415:         mysql_cond_wait(&m_stage_cond_commit_order, &m_lock_done);
416:       } else {
417:         mysql_cond_wait(&m_stage_cond_binlog, &m_lock_done);
418:       }
419:     }

问题 3: 谁是 leader

我们需要寻找那些运行在 MYSQL_BIN_LOG::ordered_commit 中却不在 Commit_stage_manager::enroll_for 中的线程。

我们借助 pt-pmp 中的 aggregate_stacktrace 分组聚合 stracktrace 然后检索得到的结果为:

代码语言:javascript
代码运行次数:0
运行
复制
shell> gdb ~/opt/mysql/pxc8.0.33/bin/mysqld core.2035 --batch -ex "thread apply all bt" &> threads.log
shell> source /usr/bin/pt-pmp
shell> cat threads.log | aggregate_stacktrace  > aggregate.log
shell> grep MYSQL_BIN_LOG::ordered_commit aggregate.log | grep -v enroll | tr ",""\n"
            1 pthread_cond_wait
native_cond_wait(thr_cond.h:161)
my_cond_wait(thr_cond.h:161)
inline_mysql_cond_wait(mysql_cond.h:198)
wsrep_wait_for_turn_in_group_commit(wsrep_binlog.cc:468)
trx_sys_update_wsrep_checkpoint(current_thd.h:33)
trx_write_serialisation_history(trx0trx.cc:1746)
trx_commit_low(trx0trx.cc:2287)
trx_commit(trx0trx.cc:2364)
trx_commit_for_mysql(trx0trx.cc:2616)
innobase_commit_low(ha_innodb.cc:6266)
innobase_commit(ha_innodb.cc:6543)
ha_commit_low(handler.cc:2045)
trx_coordinator::commit_in_engines(tc_log.cc:148)
trx_coordinator::commit_in_engines(tc_log.cc:138)
(anonymous(binlog.cc:12576)
MYSQL_BIN_LOG::process_commit_stage_queue(binlog.cc:9096)
MYSQL_BIN_LOG::ordered_commit(binlog.cc:9650)
MYSQL_BIN_LOG::commit(binlog.cc:8841)
ha_commit_trans(handler.cc:1878)
trans_commit(transaction.cc:271)
Xid_log_event::do_commit(log_event.cc:6272)
Xid_apply_log_event::do_apply_event_worker(log_event.cc:6367)
slave_worker_exec_job_group(rpl_rli_pdb.cc:2517)
handle_slave_worker(rpl_replica.cc:6289)
pfs_spawn_thread(pfs.cc:3043)
start_thread(libpthread.so.0)
clone(libc.so.6)

通过检索 stracktrace,我们可以确认 leader 线程竟然是 thd1

对于当前问题的总结

  • thd1 通过 COND_wsrep_group_commit 等待 thd2
  • thd2 通过 m_stage_cond_binlog 等待 thd1

形成了一个环形死锁.

重回 thd1

当我们仔细检查 backtrace 时,发现一个奇怪的现象:

  • 在 frame15 中 thd 是 0x7fb08800d430
  • 在 frame16 中 thd 是 0x7fb08003dcd0
代码语言:javascript
代码运行次数:0
运行
复制
#15 0x0000000001e0e2e8 in (anonymous namespace)::finish_transaction_in_engines (thd=thd@entry=0x7fb08800d430, all=<optimized out>, run_after_commit=false)
    at binlog.cc:12576
#16 0x0000000001e18276 in MYSQL_BIN_LOG::process_commit_stage_queue (this=0x43d5b00 <mysql_bin_log>, thd=0x7fb08003dcd0, first=0x7fb08003dcd0)
    at binlog.cc:9096

简化后的 process_commit_stage_queue 代码为:

代码语言:javascript
代码运行次数:0
运行
复制
  for (THD *head = first; head; head = head->next_to_commit) {
    Thd_backup_and_restore switch_thd(thd, head);
...
    ::finish_transaction_in_engines(head, all, false);

...
}

我们通过 next_to_commit 这个指针可以遍历 group commit 中所有的 thd

代码语言:javascript
代码运行次数:0
运行
复制
(gdb) p first
$6 = (THD *) 0x7fb08003dcd0

后文中以 thd3 代称。

代码语言:javascript
代码运行次数:0
运行
复制
(gdb) p first->next_to_commit
$7 = (THD *) 0x7fb08c0298c0
(gdb) p first->next_to_commit->next_to_commit
$8 = (THD *) 0x7fb08800d430
(gdb) p first->next_to_commit->next_to_commit->next_to_commit
$9 = (THD *) 0x0

我们看到有三个 thd 并且 thd2 也在组中。

我们仔细观察 Thd_backup_and_restore class,可以发现它对 thd 做了一些操作。

代码语言:javascript
代码运行次数:0
运行
复制
  /**
    Try to attach the POSIX thread to a session.

    @param[in] backup_thd    The thd to restore to when object is destructed.
    @param[in] new_thd       The thd to attach to.
   */

  Thd_backup_and_restore(THD *backup_thd, THD *new_thd)
      : m_backup_thd(backup_thd),
        m_new_thd(new_thd),
        m_new_thd_old_real_id(new_thd->real_id),
        m_new_thd_old_thread_stack(new_thd->thread_stack) {
    assert(m_backup_thd != nullptr && m_new_thd != nullptr);
    // Reset the state of the current thd.
    m_backup_thd->restore_globals();

    m_new_thd->thread_stack = m_backup_thd->thread_stack;
    m_new_thd->store_globals();

THD::store_globals 这里切换了 thd 对应的 Pthread ID。

代码语言:javascript
代码运行次数:0
运行
复制
  real_id = my_thread_self();

我们对出现过的 thd 重新做一次梳理。

number

THD pointer

Pthread ID (hex)

MySQL connection id

GDB internal number

1

0x7fb08800d430

0x7fb0586d9700

43

49

2

0x7fb08c0298c0

0x7fb0585d7700

42

50

3

0x7fb08003dcd0

0x0

41

50

4

0x7fb08800d430

0x7fb0584d5700

43

51

我们真正的 leader 线程其实是 thd3

leader 线程通过 finish_transaction_in_engines 去处理每一个 thd

顺序为:

代码语言:javascript
代码运行次数:0
运行
复制
thd3 -> thd2 -> thd1

问题 4: finish_transaction_in_engines 做了些什么?

分析真个函数可能会太过于耗费时间,我们更关心的是它和 wsrep_group_commit_queue 之间的关系。

我们可以在 finish_transaction_in_engines 添加一个断点并且执行一个 DML query。

  1. error log
代码语言:javascript
代码运行次数:0
运行
复制
2025-07-14T10:12:45.798369Z 13 [Note] [MY-000000] [WSREP] Registering thread with id (13) in wsrep group commit queue

可以看到注册 thdwsrep_group_commit_queue

  1. breakpoint
代码语言:javascript
代码运行次数:0
运行
复制
Thread 45 "connection" hit Breakpoint 3, (anonymous namespace)::finish_transaction_in_engines (
    thd=thd@entry=0x7ff743326ca0, all=false, run_after_commit=false)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/unique_ptr.h:421
421     /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/unique_ptr.h: No such file or directory.
(gdb) finish
Run till exit from #0  (anonymous namespace)::finish_transaction_in_engines (
    thd=thd@entry=0x7ff743326ca0, all=false, run_after_commit=false)
    at /opt/rh/devtoolset-11/root/usr/include/c++/11/bits/unique_ptr.h:421
MYSQL_BIN_LOG::process_commit_stage_queue (this=0x43d5b00 <mysql_bin_log>, thd=0x7ff743326ca0,
    first=0x7ff743326ca0)
    at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/sql/binlog.cc:9099

暂停到 finish_transaction_in_engines 执行结束时。

  1. error log
代码语言:javascript
代码运行次数:0
运行
复制
2025-07-14T10:12:51.423272Z 13 [Note] [MY-000000] [WSREP] Updating WSREPXid: 8322eb1e-5acf-11f0-8691-e63683345e0b:822
2025-07-14T10:12:51.423625Z 13 [Note] [MY-000000] [WSREP] Thread with id (13) granted turn to proceed
2025-07-14T10:12:51.423694Z 13 [Note] [MY-000000] [WSREP] Un-Registering thread with id (13) from wsrep group commit queue

可以看到 wsrep_group_commit_queue 移除了 thd

通过测试我们发现:

  • finish_transaction_in_engines 调用 wsrep_unregister_from_group_commit 去移除 thd

对于当前问题的总结 2

  • thd3wsrep_group_commit_queue 中的 thd3, thd2 移除
  • thd1 通过 COND_wsrep_group_commit 等待 thd2, 因为thd2 依然在 wsrep_group_commit_queue
  • thd2 通过 m_stage_cond_binlog 等待 thd3

那么显然 thd3 并没有成功的将 thd2 移除。

问题 5: 为什么 thd2 依然在 wsrep_group_commit_queue 中?

当我们将目光投降 thd2 时,我们发现它的 MySQL connection ID 刚好是客户指出的空 binlog event 对应的线程!

空 binlog event

  • thd2
代码语言:javascript
代码运行次数:0
运行
复制
(gdb) p ((THD*) 0x7fb08c0298c0)->m_thread_id
$9 = 42
  • binlog
代码语言:javascript
代码运行次数:0
运行
复制
#250706 21:50:27 server id 30001  end_log_pos 40510 CRC32 0x32b78ce8    Query   thread_id=42    exec_time=0     error_code=0
  • 通常 UPDATE 更改了数据,但是存储引擎层已经是当前数据库,MySQL是不会真正修改数据的,例如:
代码语言:javascript
代码运行次数:0
运行
复制
node1 [localhost:10001] {msandbox} (test) > update t set d = d where id = 6;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1  Changed: 0  Warnings: 0

想到 thd2replication 的 SQL thread,我们做了以中尝试。

在 PXC1 中先修改一条数据,然后在 PXC1 的 Primary 中修改同样的结果并让 PXC1 应用。

  • PXC1
代码语言:javascript
代码运行次数:0
运行
复制
node1 [localhost:10001] {msandbox} (test) > update t set d = 1 where id = 1;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0
  • master
代码语言:javascript
代码运行次数:0
运行
复制
master [localhost:30001] {msandbox} (test) > update t set d = 1 where id = 1;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0

我们得到了一个空的 binlog event。

代码语言:javascript
代码运行次数:0
运行
复制
# at 2011
#250714  5:06:54 server id 30001  end_log_pos 2095 CRC32 0xc2571d3f     GTID    last_committed=7        sequence_number=8       rbr_only=no original_committed_timestamp=1752494814375749   immediate_commit_timestamp=1752494814431609     transaction_length=221
# original_commit_timestamp=1752494814375749 (2025-07-14 05:06:54.375749 PDT)
# immediate_commit_timestamp=1752494814431609 (2025-07-14 05:06:54.431609 PDT)
/*!80001 SET @@session.original_commit_timestamp=1752494814375749*//*!*/;
/*!80014 SET @@session.original_server_version=80033*//*!*/;
/*!80014 SET @@session.immediate_server_version=80033*//*!*/;
SET @@SESSION.GTID_NEXT= '00030001-1111-1111-1111-111111111111:6'/*!*/;
# at 2095
#250714  5:06:54 server id 30001  end_log_pos 2163 CRC32 0x4a03e19e     Query   thread_id=11    exec_time=0     error_code=0
SET TIMESTAMP=1752494814/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
BEGIN
/*!*/;
# at 2163
#250714  5:06:54 server id 30001  end_log_pos 2232 CRC32 0x71e9ad2e     Query   thread_id=11    exec_time=0     error_code=0
SET TIMESTAMP=1752494814/*!*/;
COMMIT
/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

通过 GDB 添加日志

代码语言:javascript
代码运行次数:0
运行
复制
b wsrep_register_for_group_commit
commands
silent
printf"Id %d [ Xid %d ] register\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end

b wsrep_unregister_from_group_commit
commands
silent
printf"Id %d [ Xid %d ] leaves\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end

b wsrep_wait_for_turn_in_group_commit
commands
silent
printf"Id %d [ Xid %d ] wait\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end


b innobase_commit
commands
silent
printf"Id %d [ Xid %d ] innobase_commit\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end


b ha_commit_low
commands
silent
printf"Id %d [ Xid %d ] enters ha_commit_low\n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end

b sql/handler.cc:2012
commands
silent
printf"Id %d [ Xid %d ] ha_list is not null \n", thd->m_thread_id, thd->m_wsrep_client_state.transaction_.ws_meta_.gtid_.seqno_.seqno_
cont
end
  • log
代码语言:javascript
代码运行次数:0
运行
复制
Id 21 [ Xid 4302 ] register
Id 15 [ Xid 4303 ] register
Id 16 [ Xid 4304 ] register
Id 17 [ Xid 4305 ] register
Id 21 [ Xid 4302 ] enters ha_commit_low
Id 18 [ Xid 4306 ] register
Id 21 [ Xid 4302 ] ha_list is not null
Id 21 [ Xid 4302 ] innobase_commit
Id 21 [ Xid 4302 ] wait
Id 21 [ Xid 4302 ] leaves
Id 15 [ Xid 4303 ] enters ha_commit_low
Id 16 [ Xid 4304 ] enters ha_commit_low
Id 16 [ Xid 4304 ] ha_list is not null
Id 16 [ Xid 4304 ] innobase_commit
Id 16 [ Xid 4304 ] wait
Id 22 [ Xid 4307 ] register

通过添加断点打印日志我们发现:

有问题的 thd (Id 15) 没有调用 ht->commit (对于 InnoDB 表是 innobase_commit) 导致 thd 还留在了 wsrep_group_commit_queue 中。

reproduce

  • 部署一个 PXC 一个 Replication 集群。
代码语言:javascript
代码运行次数:0
运行
复制
dbdeployer deploy --topology=pxc replication --base-port=10000 --base-server-id=10000 pxc8.0.33
dbdeployer deploy --gtid replication --base-port=30000 --base-server-id=30000 pxc8.0.33
  • 将 PXC 作为 Replication 集群的一个 Replica。
代码语言:javascript
代码运行次数:0
运行
复制
node1_port=10001   # the port of the node 1 in the PXC
master_port=30001  # the port of the master node in the replication

alias node1_cmd="mysql --prompt='(node1)[db: \d] ' -A -umsandbox -pmsandbox -h127.0.0.1 -P${node1_port}"
alias master_cmd="mysql --prompt='(master)[db: \d] ' -A -umsandbox -pmsandbox -h127.0.0.1 -P${master_port}"

cat << EOF | node1_cmd
CHANGE MASTER TO master_host="127.0.0.1",
MASTER_PORT=${master_port},
MASTER_USER="rsandbox",
MASTER_PASSWORD="rsandbox",
MASTER_AUTO_POSITION = 1;
start slave;
EOF
  • 创建数据
代码语言:javascript
代码运行次数:0
运行
复制
master_cmd -BNe "create table test.t (id int auto_increment, d int, primary key (id));insert into test.t (id,d) values (1,0);"
  • 创建空事务
代码语言:javascript
代码运行次数:0
运行
复制
for i in {1..1000};
do
    node1_cmd  -BNe "update test.t set d = d+1 where id = 1;" &> /dev/null
    master_cmd -BNe "update test.t set d = d+1 where id = 1;" &> /dev/null
done 
  • 在 PXC1 集群中运行 sysbench
代码语言:javascript
代码运行次数:0
运行
复制
sysbench oltp_insert --threads=10 --time=600 --tables=1 --table-size=1000 \
--db-driver=mysql --mysql-db=test --mysql-storage-engine=innodb --report-interval=10 \
--mysql-user=msandbox --mysql-password=msandbox --mysql-host=127.0.0.1 --mysql-port=${node1_port} \
prepare
sysbench oltp_insert --threads=10 --time=600 --tables=1 --table-size=1000 \
--db-driver=mysql --mysql-db=test --mysql-storage-engine=innodb --report-interval=10 \
--mysql-user=msandbox --mysql-password=msandbox --mysql-host=127.0.0.1 --mysql-port=${node1_port} \
run
  • sysbench log 显示 tps 跌倒了 0
代码语言:javascript
代码运行次数:0
运行
复制
Initializing worker threads...

Threads started!

[ 10s ] thds: 10 tps: 42.77 qps: 42.77 (r/w/o: 0.00/42.77/0.00) lat (ms,95%): 623.33 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 10 tps: 37.00 qps: 37.00 (r/w/o: 0.00/37.00/0.00) lat (ms,95%): 549.52 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 10 tps: 23.40 qps: 23.40 (r/w/o: 0.00/23.40/0.00) lat (ms,95%): 634.66 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 10 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 10 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
  • innodb status 显示

SEMAPHORES

代码语言:javascript
代码运行次数:0
运行
复制
--Thread 140617271166720 has waited at trx0rseg.ic line 51 for 13 seconds the semaphore:
X-lock on RW-latch at 0x7fe40fd3cdf8 created in file buf0buf.cc line 815
a writer (thread id 140617197684480) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0rseg.ic line 51
--Thread 140617724176128 has waited at buf0flu.cc line 1376 for 10 seconds the semaphore:
SX-lock on RW-latch at 0x7fe40fd3d0d8 created in file buf0buf.cc line 815
a writer (thread id 140617197684480) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/Percona-XtraDB-Cluster-8.0.33-25/storage/innobase/include/trx0undo.ic line 126

TRANSACTIONS

代码语言:javascript
代码运行次数:0
运行
复制
---TRANSACTION 4417, ACTIVE (PREPARED) 14 sec committing
mysql tables in use 1, locked 1
1 lock struct(s), heap size 1128, 0 row lock(s), undo log entries 1
MySQL thread id 70, OS thread handle 140617197684480, query id 1411 localhost 127.0.0.1 msandbox innobase_commit_low (1218)
INSERT INTO sbtest1 (id, k, c, pad) VALUES (0, 504, '57904754910-88624042600-21941801151-17844023199-16047046883-99245408053-22028087049-26597707866-59381249459-19610307722', '33663561666-47879788257-03701217268-04391412711-14170077666')
--------

复现条件

  1. SQL thread 更新一条数据,但更改后的值和当前表中值相同。
  2. 当前组提交中在 SQL thread 后面至少还有一个 thd。

回顾

  • 两个集群通过 MySQL 原生的 复制 同步数据
  • 应用会就近访问 本地机房的节点 读写数据
  • PXC 中多个节点经常在业务繁忙的时候 Hang 住

当我们复现了 bug 后,再回看客户给出的描述就会发现,客户说的情况完全匹配了我们的复现条件。

其中

  • 复制 关键词对应的是 复现条件中 SQL thread
  • 本地机房 关键词对应的是 更改了同一条数据 导致值可能相同。
  • 业务繁忙 关键词对应的是 组提交中后面还有 thd

最后想说的话:所以,在复现一些 bug 时,理解数据库的使用场景会很好的帮助我们复现问题。

相关记录

  • https://perconadev.atlassian.net/browse/PXC-4390
  • https://perconadev.atlassian.net/browse/PXC-4318

参考资料

[1]

PXC: https://www.percona.com/resources/datasheets/percona-xtradb-cluster

[2]

std::queue: https://en.cppreference.com/w/cpp/container/queue.html

本文关键字:#MySQL #PXC

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2025-07-16,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 爱可生开源社区 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 现象描述
  • 错误日志分析
    • binary log
  • 分析 corefile
    • pthread ID 140395374548736
    • backtrace
    • 条件变量 COND_wsrep_group_commit
      • 等待
    • 唤醒
    • wsrep_group_commit_queue
  • 问题 1: 谁是 wsrep_group_commit_queue 中的第一个元素
  • 问题 2: thd2 在做什么呢?
  • 问题 3: 谁是 leader?
    • 对于当前问题的总结
    • 重回 thd1
  • 问题 4: finish_transaction_in_engines 做了些什么?
    • 对于当前问题的总结 2
  • 问题 5: 为什么 thd2 依然在 wsrep_group_commit_queue 中?
    • 空 binlog event
    • 通过 GDB 添加日志
  • reproduce
    • 复现条件
  • 回顾
  • 相关记录
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档