我们知道对于Oracle的表连接,根据SQL连接条件主要支持如下三种连接方法(算法):
- 嵌套循环连接(Nested Loops Joins)
- 哈希连接(Hash Joins)
- 排序合并连接(Sort Merge Joins)
对于MySQL而言,支持的连接算法主要包括如下两种:
- 嵌套循环连接(Nested Loops Joins)
- 哈希连接(Hash Joins)
对于嵌套循环连接(Nested Loops Joins),又可以分为:
- 简单嵌套循环连接(Simple Nested-Loop Join Algorithm)
- 块嵌套循环连接(Block Nested-Loop Join Algorithm,BNL)
- 批量键值访问连接(Batched Key Access Joins,BKA)
对于进行嵌套循环连接的两个表,可以分别称为外部表(驱动表)和内部表。
进行简单嵌套循环连接(Simple Nested-Loop Join Algorithm)时候,会读取外部表(驱动表)中的一条记录,然后根据连接条件扫描内部表,反复循环,直到遍历完驱动表所有满足谓词条件的记录。
例:对于如下t1、t2、t3三个表的连接来说,
Table Join Type
t1 range
t2 ref
t3 ALL
简单嵌套循环连接算法的伪代码如下:
for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions, send to client
}
}
}
简单嵌套循环连接需要反复读取多次内部表(扫描内部表次数相当于驱动表中所有满足谓词条件的记录)。
块嵌套循环连接对这种连接算法进行了优化,在读取驱动表(外部表)时,一次性缓存多条驱动表的记录到 Join Buffer,然后拿Join Buffer中的记录批量与内层循环读取的记录进行匹配。BNL一般用于内连接。
通过块嵌套循环连接可以大大降低对内部表的扫描次数。
对于前面例中t1、t2、t3三个表的连接来说,
Table Join Type
t1 range
t2 ref
t3 ALL
块嵌套循环连接算法的伪代码如下:
for each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer
if buffer is full {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
empty join buffer
}
}
}
if buffer is not empty {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client
}
}
}
参考: Block Nested-Loop Join Algorithm https://dev.mysql.com/doc/refman/8.0/en/nested-loop-joins.html
每个进程的连接缓冲区(join buffer)的大小由系统环境变量join_buffer_size控制。
Command-Line Format | –join-buffer-size=# |
---|---|
System Variable | join_buffer_size |
Scope | Global, Session |
Dynamic | Yes |
SET_VAR Hint Applies | Yes |
Type | Integer |
Default Value | 262144 |
Minimum Value | 128 |
Maximum Value (Windows) | 4294967168 |
Maximum Value (Other, 64-bit platforms) | 18446744073709551488 |
Maximum Value (Other, 32-bit platforms) | 4294967168 |
Unit | bytes |
Block Size | 128 |
join_buffer_size是用于控制普通索引扫描、范围索引扫描和不使用索引的连接(全表扫描)的缓冲区的最小大小。 MySQL 8.0.18及更高版本中,join_buffer_size变量还用于控制哈希连接使用的内存量。
使用块嵌套循环(BNL)时,较大的连接缓冲区意味着可以将驱动表(外部表)的所有行都存储在连接缓冲区中; 使用块嵌套循环(BNL)时,较大的连接缓冲区意味着对连接操作的右侧表进行的顺序访问就越多。 因此,增加join_buffer_size的大小在某些情况下可以显着提高性能。
但是,增加join_buffer_siz意味着增大进程的内存缓冲区大小,如果全局设置的比较大,可能导致内存分配时间时间长,进而导致性能大幅下降。所以建议全局设置保持较小,仅在执行大型连接的会话中将会话级别的值设置为较大值(或者使用/*+ SET_VAR(join_buffer_size= XX) */提示针对个别SQL设置较大值)。
参考: join_buffer_size https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_join_buffer_size
随着MySQL数据库的演进,MySQL对块嵌套循环(BNL)连接算法进行了扩展,扩展后的块嵌套循环(BNL)连接算法,不仅可以用于内连接,还可以用于外连接、半连接和嵌套外连接。
批量键值访问连接(Batched Key Access Joins,BKA)和BNL类似,将驱动表(外部表)的行/结果集存入连接缓冲区(join buffer),然后根据buffer中的数据批量地与内表的数据进行匹配,进而减少内层循环的扫描次数。
批量键值访问连接(BKA)时,可以通过索引访问内部表(第二个表)。 BKA可以用于内连接(inner join)、外连接(outer join)、半连接(semijoin )以及嵌套外连接(nested outer joins)。
批量键值访问连接(Batched Key Access Joins,BKA)的流程可以简要地概括为以下几个步骤:
参考: Batched Key Access Joins https://dev.mysql.com/doc/refman/8.0/en/bnl-bka-optimization.html#bka-optimization
Block Nested-Loop Join Algorithm https://dev.mysql.com/doc/refman/8.0/en/nested-loop-joins.html
MySQl MRR(Multi-Range Read)优化特性基本过程如下:
- 进行范围扫描(Range Scans)的时候,MySQL首先只扫描索引获得索引元组(index tuples),并收集相关行的键值(主键Row Id)。
- 根据键值(Row Id) 对索引元组(index tuples)排序,将排序结果存储到每个会话的内存缓存中(read_rnd_buffer_size 定义大小,默认256K)。
- 根据键值(primary key)顺序从基表中返回数据(回表)
通过MRR可以减少随机磁盘读的次数,实现对基本表数据的更有序的扫描。
参考: 8.2.1.11 Multi-Range Read Optimization https://dev.mysql.com/doc/refman/8.0/en/mrr-optimization.html
运行SQL时,可以使用EXPLAIN来查看MySQL优化器执行查询的计划,当一个表在查询执行计划中出现 “Using join buffer (Batched Key Access)” 这个提示,且该表的 type 列的值为 ref 或 eq_ref 时,就意味着该表使用了 BKA 算法。 BKA 算法可以有效地优化大表关联查询的性能,减少磁盘 I/O 和内存占用,提高查询速度。
MySQL 8.0.18以后的版本中,MySQL可以用哈希连接算法(hash join algorithm)来进行表连接操作。 哈希连接通常要比嵌套循环连接更有效,特别是如果内存可以容纳其中一个表的情况下更加高效。
哈希连接算法(hash join algorithm)将连接操作分为两个阶段:构建哈希表和扫描哈希表。 在构建哈希表阶段,MySQL将连接操作的第一个表插入到哈希表中,其中哈希表的键是连接操作的连接列。 在扫描哈希表阶段,MySQL将连接操作的第二个表的每一行与哈希表中的相应行进行比较,如果它们的连接列匹配,则将它们作为连接操作的结果返回。
例如以下查询作为示例:
SELECT *
FROM t1
JOIN t2 ON t1.column1 = t2.column2;
在执行此查询时,MySQL将使用Hash Join算法来执行连接操作。
例:
--创建测试表
mysql> CREATE TABLE t1 ( id INT PRIMARY KEY, column1 INT);
Query OK, 0 rows affected (0.84 sec)
mysql> CREATE TABLE t2 ( id INT PRIMARY KEY, column2 INT);
Query OK, 0 rows affected (0.36 sec)
--插入示例数据
mysql> INSERT INTO t1 VALUES (1, 10), (2, 20), (3, 30), (4, 40), (5, 50);
Query OK, 5 rows affected (0.09 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> INSERT INTO t2 VALUES (1, 10), (2, 20), (3, 30), (4, 40), (5, 60);
Query OK, 5 rows affected (0.04 sec)
Records: 5 Duplicates: 0 Warnings: 0
--查看执行计划
mysql> explain format=tree
-> SELECT *
-> FROM t1
-> JOIN t2 ON t1.column1 = t2.column2;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Inner hash join (t2.column2 = t1.column1) (cost=3.50 rows=5)
-> Table scan on t2 (cost=0.07 rows=5)
-> Hash
-> Table scan on t1 (cost=0.75 rows=5)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> explain
-> SELECT *
-> FROM t1
-> JOIN t2 ON t1.column1 = t2.column2;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 20.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql>
具体来说,MySQL将按照以下步骤执行Hash Join:
通过使用Hash Join算法,MySQL可以在内存中快速查找匹配的行,从而提高连接操作的性能。但是,如果t1非常大,那么构建哈希表可能会消耗大量的内存,从而导致性能下降。因此,在使用Hash Join算法时,需要根据实际情况评估内存使用情况,并根据需要调整MySQL的配置参数。
参考: 8.2.1.4 Hash Join Optimization https://dev.mysql.com/doc/refman/8.0/en/hash-joins.html
SQL查询连接算法的使用和选择,根据MySQL的版本演进也不断发生改变。
参考: 【MySQL】控制MySQL优化器行为方法之optimizer_switch系统变量 Hash join in MySQL 8 https://dev.mysql.com/blog-archive/hash-join-in-mysql-8/
Choose the best answer. Which condition is true about the use of the hash join algorithm?
A) At least one of the tables in the join must have a hash index.
B) No index can be used for the join.
C) The query must access no more than two tables.
D) The smallest of the tables in the join must fit in memory as set by join_buffer_size.
参考答案:B
1.无法使用索引的等值连接(equi-joins )会使用散列连接(hash join algorithm)
参考: https://dev.mysql.com/doc/refman/8.0/en/hash-joins.html
Beginning with MySQL 8.0.18, MySQL employs a hash join for any query for which each join has an equi-join condition, and in which there are no indexes that can be applied to any join conditions
2.多表连接也可以使用哈希连接算法。
3.哈希连接算法使用join_buffer_size系统变量控制可以使用的内存量,但不要求连接中最小的表必须适合内存。
Choose two. Which two methods can be used to determine whether a query uses the hash join algorithm?
A) EXPLAIN FORMAT=JSON
B) EXPLAIN FORMAT=TRADITIONAL
C) EXPLAIN FORMAT=TREE
D) EXPLAIN without any formatting argument
E) EXPLAIN ANALYZE
参考答案:C E
但是根据如下的结果可以看到,EXPLAIN 的任何一个选项都可以看出执行计划是否使用了Hash Join。 如:
A) "using_join_buffer": "hash join"
B)Using where; Using join buffer (hash join)
C) Inner hash join
D) Using where; Using join buffer (hash join)
E) Inner hash join
Explain各选项的结果如下:
mysql> EXPLAIN FORMAT=JSON
-> SELECT *
-> FROM t1
-> JOIN t2 ON t1.column1 = t2.column2;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| EXPLAIN |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "3.50"
},
"nested_loop": [
{
"table": {
"table_name": "t1",
"access_type": "ALL",
"rows_examined_per_scan": 5,
"rows_produced_per_join": 5,
"filtered": "100.00",
"cost_info": {
"read_cost": "0.25",
"eval_cost": "0.50",
"prefix_cost": "0.75",
"data_read_per_join": "80"
},
"used_columns": [
"id",
"column1"
]
}
},
{
"table": {
"table_name": "t2",
"access_type": "ALL",
"rows_examined_per_scan": 5,
"rows_produced_per_join": 5,
"filtered": "20.00",
"using_join_buffer": "hash join",
"cost_info": {
"read_cost": "0.25",
"eval_cost": "0.50",
"prefix_cost": "3.50",
"data_read_per_join": "80"
},
"used_columns": [
"id",
"column2"
],
"attached_condition": "(`testdb`.`t2`.`column2` = `testdb`.`t1`.`column1`)"
}
}
]
}
} |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set, 1 warning (0.07 sec)
mysql>
mysql> EXPLAIN FORMAT=TRADITIONAL
-> SELECT *
-> FROM t1
-> JOIN t2 ON t1.column1 = t2.column2;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 20.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql>
mysql> EXPLAIN FORMAT=TREE
-> SELECT *
-> FROM t1
-> JOIN t2 ON t1.column1 = t2.column2;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Inner hash join (t2.column2 = t1.column1) (cost=3.50 rows=5)
-> Table scan on t2 (cost=0.07 rows=5)
-> Hash
-> Table scan on t1 (cost=0.75 rows=5)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
mysql> EXPLAIN
-> SELECT *
-> FROM t1
-> JOIN t2 ON t1.column1 = t2.column2;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 20.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql>
mysql> EXPLAIN ANALYZE
-> SELECT *
-> FROM t1
-> JOIN t2 ON t1.column1 = t2.column2;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Inner hash join (t2.column2 = t1.column1) (cost=3.50 rows=5) (actual time=0.103..0.108 rows=4 loops=1)
-> Table scan on t2 (cost=0.07 rows=5) (actual time=0.014..0.017 rows=5 loops=1)
-> Hash
-> Table scan on t1 (cost=0.75 rows=5) (actual time=0.047..0.053 rows=5 loops=1)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有