原文地址: http://mysql.taobao.org/monthly/2017/12/09/
MySQL · 最佳实践 · 如何索引JSON字段
MySQL从5.7.8起开始支持JSON字段,这极大的丰富了MySQL的数据类型。也方便了广大开发人员。但MySQL并没有提供对JSON对象中的字段进行索引的功能,至少没有直接对其字段进行索引的方法。本文将介绍利用MySQL 5.7中的虚拟字段的功能来对JSON对象中的字段进行索引。
我们将基于下面的JSON对象进行演示
{
"id": 1,
"name": "Sally",
"games_played":{
"Battlefield": {
"weapon": "sniper rifle",
"rank": "Sergeant V",
"level": 20
},
"Crazy Tennis": {
"won": 4,
"lost": 1
},
"Puzzler": {
"time": 7
} } }
表的基本结构
CREATE TABLE `players` (
`id` INT UNSIGNED NOT NULL, `player_and_games` JSON NOT NULL, PRIMARY KEY (`id`)
);
如果只是基于上面的表的结构我们是无法对JSON字段中的Key进行索引的。接下来我们演示如何借助虚拟字段对其进行索引
虚拟列语法如下
<type> [ GENERATED ALWAYS ] AS ( <expression> ) [ VIRTUAL|STORED ][ UNIQUE [KEY] ] [ [PRIMARY] KEY ] [ NOT NULL ] [ COMMENT <text> ]
在MySQL 5.7中,支持两种Generated Column,即Virtual Generated Column和Stored Generated Column,前者只将Generated Column保存在数据字典中(表的元数据),并不会将这一列数据持久化到磁盘上;后者会将Generated Column持久化到磁盘上,而不是每次读取的时候计算所得。很明显,后者存放了可以通过已有数据计算而得的数据,需要更多的磁盘空间,与Virtual Column相比并没有优势,因此,MySQL 5.7中,不指定Generated Column的类型,默认是Virtual Column。
如果需要Stored Generated Golumn的话,可能在Virtual Generated Column上建立索引更加合适,一般情况下,都使用Virtual Generated Column,这也是MySQL默认的方式
加完虚拟列的建表语句如下:
CREATE TABLE `players` (
`id` INT UNSIGNED NOT NULL, `player_and_games` JSON NOT NULL, `names_virtual` VARCHAR(20) GENERATED ALWAYS AS (`player_and_games` ->> '$.name') NOT NULL,
PRIMARY KEY (`id`)
);
Note: 利用操作符-» 来引用JSON字段中的KEY。在本例中字段names_virtual为虚拟字段,我把它定义成不可以为空。在实际的工作中,一定要集合具体的情况来定。因为JSON本身是一种弱结构的数据对象。也就是说的它的结构不是固定不变的。
我们插入数据
INSERT INTO `players` (`id`, `player_and_games`) VALUES (1, '{
"id": 1,
"name": "Sally",
"games_played":{
"Battlefield": {
"weapon": "sniper rifle",
"rank": "Sergeant V",
"level": 20
},
"Crazy Tennis": {
"won": 4,
"lost": 1
},
"Puzzler": {
"time": 7
}
}
}');...
查看表里的数据
SELECT * FROM `players`;
+----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| id | player_and_games | names_virtual |
+----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
| 1 | {"id": 1, "name": "Sally", "games_played": {"Puzzler": {"time": 7}, "Battlefield": {"rank": "Sergeant V", "level": 20, "weapon": "sniper rifle"}, "Crazy Tennis": {"won": 4, "lost": 1}}} | Sally |
| 2 | {"id": 2, "name": "Thom", "games_played": {"Puzzler": {"time": 25}, "Battlefield": {"rank": "Major General VIII", "level": 127, "weapon": "carbine"}, "Crazy Tennis": {"won": 10, "lost": 30}}} | Thom |
| 3 | {"id": 3, "name": "Ali", "games_played": {"Puzzler": {"time": 12}, "Battlefield": {"rank": "First Sergeant II", "level": 37, "weapon": "machine gun"}, "Crazy Tennis": {"won": 30, "lost": 21}}} | Ali |
| 4 | {"id": 4, "name": "Alfred", "games_played": {"Puzzler": {"time": 10}, "Battlefield": {"rank": "Chief Warrant Officer Five III", "level": 73, "weapon": "pistol"}, "Crazy Tennis": {"won": 47, "lost": 2}}} | Alfred |
| 5 | {"id": 5, "name": "Phil", "games_played": {"Puzzler": {"time": 7}, "Battlefield": {"rank": "Lt. Colonel III", "level": 98, "weapon": "assault rifle"}, "Crazy Tennis": {"won": 130, "lost": 75}}} | Phil |
| 6 | {"id": 6, "name": "Henry", "games_played": {"Puzzler": {"time": 17}, "Battlefield": {"rank": "Captain II", "level": 87, "weapon": "assault rifle"}, "Crazy Tennis": {"won": 68, "lost": 149}}} | Henry |
+----+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+
查看表Players的字段
SHOW COLUMNS FROM `players`;
+------------------+------------------+------+-----+---------+-------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------------+------+-----+---------+-------------------+
| id | int(10) unsigned | NO | PRI | NULL | |
| player_and_games | json | NO | | NULL | |
| names_virtual | varchar(20) | NO | | NULL | VIRTUAL GENERATED |
+------------------+------------------+------+-----+---------+-------------------+
我们看到虚拟字段names_virtual的类型是VIRTUAL GENERATED。MySQL只是在数据字典里保存该字段元数据,并没有真正的存储该字段的值。这样表的大小并没有增加。我们可以利用索引把这个字段上的值进行物理存储。
再添加索引之前,让我们先看下面查询的执行计划
EXPLAIN SELECT * FROM `players` WHERE `names_virtual` = "Sally"\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE table: players
partitions: NULL
type: ALLpossible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 6
filtered: 16.67
Extra: Using where
添加索引
CREATE INDEX `names_idx` ON `players`(`names_virtual`);
再执行上面的查询语句,我们将得到不一样的执行计划
EXPLAIN SELECT * FROM `players` WHERE `names_virtual` = "Sally"\G
*************************** 1. row *************************** id: 1
select_type: SIMPLE table: players
partitions: NULL type: refpossible_keys: names_idx
key: names_idx key_len: 22 ref: const rows: 1 filtered: 100.00 Extra: NULL
如我们所见,最新的执行计划走了新建的索引。
本文介绍了如何在MySQL 5.7中保存JSON文档。为了高效的检索JSON中内容,我们可以利用5.7的虚拟字段来对JSON的不同的KEY来建索引。极大的提高检索的速度。