我有一个mysql集群,它有4个api节点、2个管理节点和4个数据节点。今天,我在试图连接数据库时遇到了问题,所有的查询都挂在“打开表”状态。在检查日志之后,我在日志上发现了这些错误:
Api节点错误:
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 5 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 6 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 7 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: Data node: 8 failed, subscriber bitmask 00
2015-08-20 19:44:14 15540 [Note] NDB Schema dist: cluster failure at epoch 3313124/17.
2015-08-20 19:44:14 15540 [Note] NDB Binlog: ndb tables initially read only on reconnect.
2015-08-20 19:44:14 15540 [ERROR] /opt/mysql/server-5.6/bin/mysqld: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
2015-08-20 19:44:14 15540 [ERROR] /opt/mysql/server-5.6/bin/mysqld: Sort aborted: Got temporary error 4028 'Node failure caused abort of transaction' from NDBCLUSTER
2015-08-20 19:44:14 15540 [ERROR] Got error 4010 when reading table './database_name/table'
2015-08-20 19:44:14 15540 [Note] NDB Binlog: cluster failure for ./database_name/table_name at epoch 3313124/17.
mysql> show processlists;
Id User Host db Command Time State Info
1 system user NULL Daemon 1497 Waiting for ndbcluster to start NULL
数据节点错误:
2015-08-20 19:44:14 [ndbd] ERROR -- c_gcp_list.seize() failed: gci: 14229759227592721 nodes: 0000000000000000000000000000040000000000000000000000000000001a00
2015-08-20 19:44:14 [ndbd] WARNING -- ACK wo/ gcp record (gci: 3313124/17) ref: 0fa2000b from: 0fa2000b
2015-08-20 19:44:14 [ndbd] WARNING -- ACK wo/ gcp record (gci: 3313124/17) ref: 0fa2000c from: 0fa2000c
2015-08-20 19:44:14 [ndbd] WARNING -- ACK wo/ gcp record (gci: 3313124/17) ref: 0fa2008a from: 0fa2008a
管理节点错误:
2015-08-20 19:44:14 [MgmtSrvr] INFO -- Node 5: Disconnecting lagging nodes '0000000000000000000000000000000000000000000000000000000000000200',
2015-08-20 19:44:14 [MgmtSrvr] WARNING -- Node 5: Disconnecting node 9 because it has exceeded MaxBufferedEpochs (100 > 100), epoch 3313119/4
数据节点配置:
https://gist.github.com/sdemircan/730fa49fcc14b4376c42
Api节点配置:
https://gist.github.com/sdemircan/f9d230d32700b86564fd
管理节点配置:
https://gist.github.com/sdemircan/d6fbd54799daaae01bf2
Api节点日志:
https://gist.github.com/sdemircan/2d62b1c92176de9de9d3
数据节点日志:
https://gist.github.com/sdemircan/d0c97b82457a9c33deaa
数据节点日志:
https://gist.github.com/sdemircan/3faa1e41367bc7655210
管理节点日志:
https://gist.github.com/sdemircan/a026ac57757fafdafaa9
什么会使MaxBufferedEpochs达到上限呢?
发布于 2015-10-12 05:20:12
您可能有一个大型事务,一个查询检索了许多行,提取了大量数据,并使节点9的网络连接饱和。
在重新连接时,API节点是只读的,因此它必须是在此之前执行的查询。
NDB Binlog: ndb tables initially read only on reconnect
https://dba.stackexchange.com/questions/112041
复制相似问题