我有一个包含五个相关字段的表- id
、source
、iid
、track_hash
、alias
。我想用一个公共的track_hash
将所有条目分组,然后对每一行使用最低的source
保存行的id
(为了支持最高的iid
而打破领带),从它的组到alias
字段。为此,我编写了以下查询:
with best as
(SELECT id as bid, track_hash FROM
(SELECT id, track_hash,
RANK () OVER (
PARTITION BY track_hash
ORDER BY source asc, iid DESC
) rank
from albums
)
where rank = 1
)
select bid, a.* from albums a inner join best
on a.track_hash = best.track_hash
这在24k行上需要完全合理的2秒。现在,我不想简单地看到这个id
,而是真正地保存它。为此,我使用了以下非常类似的查询:
with best as
(SELECT id as bid, track_hash FROM
(SELECT id, track_hash,
RANK () OVER (
PARTITION BY track_hash
ORDER BY source asc, iid DESC
) rank
from albums
)
where rank = 1
)
update albums
set alias = bid FROM albums a inner join best
on a.track_hash = best.track_hash
然而,这个花费了1到10分钟的时间,我真的不明白为什么。引擎不是必须将每一行与其best.id
/alias
匹配吗?这正是我对更新所做的事情吗?为什么会发生这种事?我做错了什么?
查询计划如下所示:
MATERIALIZE 1
CO-ROUTINE 4
SCAN TABLE albums USING INDEX track_hash_idx
USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
SCAN SUBQUERY 4
SCAN TABLE albums USING COVERING INDEX track_hash_idx
SEARCH SUBQUERY 1 USING AUTOMATIC PARTIAL COVERING INDEX (rank=?)
SEARCH TABLE albums AS a USING COVERING INDEX track_hash_idx (track_hash=?)
发布于 2021-08-05 11:41:04
您不需要连接到albums
(再说一次)。
UPDATE ... FROM
语法实际上提供了albums
到best
的隐式连接。
UPDATE albums AS a
SET alias = b.bid
FROM best AS b
WHERE a.track_hash = b.track_hash
https://stackoverflow.com/questions/68672551
复制相似问题