我们经常会碰到这样的情况,某个事务执行完了未提交,后续再来一个DDL和DML操作,导致后面的session要么处于waiting for metadata lock,要么是锁等待超时。这时我们往往只能找到这个未提交的事务的事务id和session id,但是一般都处于sleep状态,不好分析事务内容到底是什么,所以通常都是粗鲁地kill这个session后解决问题,但是应用层的研发人员往往找不到到底是哪个事务引起的,后面再出现问题时还要重复kill。
对于一个执行完但未提交的事务,无法在show processlist的输出中找到该信息:
-- session 1
mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)
mysql> use test;
Database changed
mysql> create table t1(a int);
Query OK, 0 rows affected (0.45 sec)
mysql> insert into t1 values (1);
Query OK, 1 row affected (0.00 sec)
mysql> select connection_id() from dual;
+-----------------+
| connection_id() |
+-----------------+
| 6 |
+-----------------+
1 row in set (0.00 sec)
-- session 2
mysql> show processlist;
+----+------+-----------+------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+-------+------------------+
| 6 | root | localhost | test | Sleep | 44 | | NULL |
| 7 | root | localhost | NULL | Query | 0 | init | show processlist |
+----+------+-----------+------+---------+------+-------+------------------+
2 rows in set (0.00 sec)
可以看到,未提交事务的Command为Sleep,State为空,Info为NULL。
同样,information_schema.innodb_trx.trx_query也为NULL,无法提供未提交事务的SQL语句:
mysql> select * from information_schema.innodb_trx\G
*************************** 1. row ***************************
trx_id: 4632
trx_state: RUNNING
trx_started: 2020-03-28 07:18:32
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 2
trx_mysql_thread_id: 6
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 1
trx_lock_memory_bytes: 376
trx_rows_locked: 0
trx_rows_modified: 1
trx_concurrency_tickets: 0
trx_isolation_level: REPEATABLE READ
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 10000
trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.00 sec)
通过查看performance_schema.events_statements_current表可看到每一个session正在执行的sql,哪怕它依旧执行完成了,只是没有提交:
mysql> select * from performance_schema.events_statements_current where sql_text not like 'select * from performance_schema.events_statements_current%'\G
*************************** 1. row ***************************
THREAD_ID: 25
EVENT_ID: 9
END_EVENT_ID: 9
EVENT_NAME: statement/sql/select
SOURCE: mysqld.cc:937
TIMER_START: 1897350780117784000
TIMER_END: 1897350780310674000
TIMER_WAIT: 192890000
LOCK_TIME: 0
SQL_TEXT: select connection_id() from dual
DIGEST: e5d97521478adc05b74560b51be5d6f7
DIGEST_TEXT: SELECT `connection_id` ( ) FROM DUAL
CURRENT_SCHEMA: test
OBJECT_TYPE: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: NULL
MYSQL_ERRNO: 0
RETURNED_SQLSTATE: NULL
MESSAGE_TEXT: NULL
ERRORS: 0
WARNINGS: 0
ROWS_AFFECTED: 0
ROWS_SENT: 1
ROWS_EXAMINED: 0
CREATED_TMP_DISK_TABLES: 0
CREATED_TMP_TABLES: 0
SELECT_FULL_JOIN: 0
SELECT_FULL_RANGE_JOIN: 0
SELECT_RANGE: 0
SELECT_RANGE_CHECK: 0
SELECT_SCAN: 0
SORT_MERGE_PASSES: 0
SORT_RANGE: 0
SORT_ROWS: 0
SORT_SCAN: 0
NO_INDEX_USED: 0
NO_GOOD_INDEX_USED: 0
NESTING_EVENT_ID: NULL
NESTING_EVENT_TYPE: NULL
1 row in set (0.00 sec)
该方案有一个缺陷:一个事务可能有一组sql组成,这个方法只能看到事务最后执行的是什么SQL,无法看到全部。可以通过performance_schema.threads表来关联,将information_schema.processlist和performance_schema.events_statements_current一一对应起来:
mysql> select t1.id, t2.thread_id, t3.sql_text
-> from information_schema.processlist t1,
-> performance_schema.threads t2,
-> performance_schema.events_statements_current t3
-> where t1.id=6
-> and t1.id=t2.processlist_id
-> and t2.thread_id = t3.thread_id\G
*************************** 1. row ***************************
id: 6
thread_id: 25
sql_text: select connection_id() from dual
1 row in set (0.00 sec)