生产环境,假如不小心有个Waiting for global read lock 锁出现(例如flink cdc全量抽取数据),很容易造成生产事故。
对于这种情况,我们首先需要在流程上把控:
1、不要对外开授权过大的账号
2、没有写入需求的业务,原则上只允许去连接从库
有个流程后,还需要有个巡检和自动化处理机制,作为兜底。
Waiting for global read lock 的案例:
会话1:
[test]> flush table with read lock;
Query OK, 0 rows affected (0.20 sec)
[test]> select * from tb2;
+------+--------------+
| a | b |
+------+--------------+
| 11 | GⅩ10000249 |
| 22 | GX10000249 |
| 33 | X |
| 44 | x |
+------+--------------+
4 rows in set (0.00 sec)
会话2:
[test]> select * from tb2 limit 3; -- 查询可以正常输出结果
+------+--------------+
| a | b |
+------+--------------+
| 11 | GⅩ10000249 |
| 22 | GX10000249 |
| 33 | X |
+------+--------------+
3 rows in set (0.01 sec)
[test]> delete from tb2 ; -- 可以看到删除操作被阻塞
会话3:
[test]> show full processlist;
+-----+------+----------------------+------+---------+------+------------------------------+-----------------------+---------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined |
+-----+------+----------------------+------+---------+------+------------------------------+-----------------------+---------+-----------+---------------+
| 135 | dts | 192.168.31.181:52902 | test | Query | 517 | Waiting for global read lock | delete from tb2 | 517124 | 3 | 3 |
| 136 | root | localhost | test | Sleep | 27 | | NULL | 27186 | 4 | 4 |
| 139 | root | localhost | test | Query | 0 | executing | show full processlist | 0 | 1 | 1 |
+-----+------+----------------------+------+---------+------+------------------------------+-----------------------+---------+-----------+---------------+
3 rows in set (0.00 sec)
从上面的会话3上,很难判断出是哪个会话触发的。
这种情况下,我们可以去查询 performance_schema.events_statements_history 里面的数据。
select
a.PROCESSLIST_ID,a.PROCESSLIST_USER,a.PROCESSLIST_HOST,a.PROCESSLIST_DB,a.PROCESSLIST_COMMAND,a.PROCESSLIST_TIME,a.PROCESSLIST_STATE,a.PROCESSLIST_INFO,a.THREAD_OS_ID,
b.SQL_TEXT,b.DIGEST_TEXT
from performance_schema.threads a inner join performance_schema.events_statements_history b
on a.THREAD_ID = b.THREAD_ID
where b.SQL_TEXT like 'FLUSH TABLE WITH READ LOCK' \G
结果如下:
*************************** 1. row ***************************
PROCESSLIST_ID: 142
PROCESSLIST_USER: root
PROCESSLIST_HOST: localhost
PROCESSLIST_DB: test
PROCESSLIST_COMMAND: Sleep
PROCESSLIST_TIME: 168
PROCESSLIST_STATE: NULL
PROCESSLIST_INFO: NULL
THREAD_OS_ID: 98450
SQL_TEXT: flush table with read lock
DIGEST_TEXT: FLUSH TABLE WITH READ LOCK
1 row in set (0.00 sec)
使用kill命令,将上面的142会话杀掉即可(如果有多个,就执行多次的kill操作)
如下图我在多个会话窗口中都执行了ftwrl命令,都可以被识别到。
另外,如果是其他类型锁语句,可以改下条件:
例如:对于 lock table sbtest1 for read 这种锁表语句,
使用SQL:
select
a.PROCESSLIST_ID,a.PROCESSLIST_USER,a.PROCESSLIST_HOST,a.PROCESSLIST_DB,a.PROCESSLIST_COMMAND,a.PROCESSLIST_TIME,a.PROCESSLIST_STATE,a.PROCESSLIST_INFO,a.THREAD_OS_ID,
b.SQL_TEXT,b.DIGEST_TEXT
from performance_schema.threads a inner join performance_schema.events_statements_history b
on a.THREAD_ID = b.THREAD_ID
where b.SQL_TEXT like 'LOCK TABLE%' \G
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。