我想在我的应用程序中加快搜索结果,但是,无论我使用什么方法,我都会得到相同的结果。因为它是Django应用程序,所以我将提供ORM命令和生成的SQL代码(使用PostgreSQL)。
首先,我在数据库上启用了GIN索引和trigram操作:
其次,我创建了包含两个varchar列的表: first_name和last_name (加上一个id字段作为主键)。
from django.db import models
class Author(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
我还在数据库中填充了952条示例记录,这样就不会出现这样的情况: Postgres由于数据集太小而避免使用索引。
接下来,我对非索引数据运行以下查询。
简单的类似查询:
In [50]: print(Author.objects.filter(last_name__icontains='ari').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name" FROM "reviews_author" WHERE UPPER("reviews_author"."last_name"::text) LIKE UPPER(%ari%)
In [51]: print(Author.objects.filter(last_name__icontains='ari').explain(analyze=T
...: rue))
Seq Scan on reviews_author (cost=0.00..24.28 rows=38 width=16) (actual time=0.011..0.242 rows=56 loops=1)
Filter: (upper((last_name)::text) ~~ '%ARI%'::text)
Rows Removed by Filter: 896
Planning Time: 0.042 ms
Execution Time: 0.249 ms
Trigram类似:
In [55]: print(Author.objects.filter(last_name__trigram_similar='ari').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name" FROM "reviews_author" WHERE "reviews_author"."last_name" % ari
In [56]: print(Author.objects.filter(last_name__trigram_similar='ari').explain(ana
...: lyze=True))
Seq Scan on reviews_author (cost=0.00..21.90 rows=1 width=16) (actual time=0.582..0.582 rows=0 loops=1)
Filter: ((last_name)::text % 'ari'::text)
Rows Removed by Filter: 952
Planning Time: 0.033 ms
Execution Time: 0.591 ms
还有一个带有排序结果的更复杂的查询:
In [58]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'ari
...: ')).filter(similar__gt=0).order_by('-similar').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name", SIMILARITY("reviews_author"."last_name", ari) AS "similar" FROM "reviews_author" WHERE SIMILARITY("reviews_author"."last_name", ari) > 0.0 ORDER BY "similar" DESC
In [59]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'ari
...: ')).filter(similar__gt=0).order_by('-similar').explain(analyze=True))
Sort (cost=38.24..39.03 rows=317 width=20) (actual time=0.680..0.683 rows=84 loops=1)
Sort Key: (similarity((last_name)::text, 'ari'::text)) DESC
Sort Method: quicksort Memory: 31kB
-> Seq Scan on reviews_author (cost=0.00..25.07 rows=317 width=20) (actual time=0.021..0.657 rows=84 loops=1)
Filter: (similarity((last_name)::text, 'ari'::text) > '0'::double precision)
Rows Removed by Filter: 868
Planning Time: 0.062 ms
Execution Time: 0.693 ms
下一步是创建一个索引:
class Author(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
class Meta:
indexes = [GinIndex(fields=['last_name'])]
这导致了以下SQL迁移:
./manage.py sqlmigrate reviews 0004
BEGIN;
--
-- Alter field score on review
--
--
-- Create index reviews_aut_last_na_a89a84_gin on field(s) last_name of model author
--
CREATE INDEX "reviews_aut_last_na_a89a84_gin" ON "reviews_author" USING gin ("last_name");
COMMIT;
现在我运行同样的命令。
比如:
In [60]: print(Author.objects.filter(last_name__icontains='ari').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name" FROM "reviews_author" WHERE UPPER("reviews_author"."last_name"::text) LIKE UPPER(%ari%)
In [61]: print(Author.objects.filter(last_name__icontains='ari').explain(analyze=T
...: rue))
Seq Scan on reviews_author (cost=0.00..24.28 rows=38 width=16) (actual time=0.009..0.237 rows=56 loops=1)
Filter: (upper((last_name)::text) ~~ '%ARI%'::text)
Rows Removed by Filter: 896
Planning Time: 0.089 ms
Execution Time: 0.244 ms
Trigram类似:
In [62]: print(Author.objects.filter(last_name__trigram_similar='ari').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name" FROM "reviews_author" WHERE "reviews_author"."last_name" % ari
In [63]: print(Author.objects.filter(last_name__trigram_similar='ari').explain(ana
...: lyze=True))
Seq Scan on reviews_author (cost=0.00..21.90 rows=1 width=16) (actual time=0.740..0.740 rows=0 loops=1)
Filter: ((last_name)::text % 'ari'::text)
Rows Removed by Filter: 952
Planning Time: 0.056 ms
Execution Time: 0.750 ms
还有更复杂的查询:
In [64]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'ari
...: ')).filter(similar__gt=0).order_by('-similar').query)
SELECT "reviews_author"."id", "reviews_author"."first_name", "reviews_author"."last_name", SIMILARITY("reviews_author"."last_name", ari) AS "similar" FROM "reviews_author" WHERE SIMILARITY("reviews_author"."last_name", ari) > 0.0 ORDER BY "similar" DESC
In [65]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'ari
...: ')).filter(similar__gt=0).order_by('-similar').explain(analyze=True))
Sort (cost=38.24..39.03 rows=317 width=20) (actual time=0.659..0.662 rows=84 loops=1)
Sort Key: (similarity((last_name)::text, 'ari'::text)) DESC
Sort Method: quicksort Memory: 31kB
-> Seq Scan on reviews_author (cost=0.00..25.07 rows=317 width=20) (actual time=0.024..0.643 rows=84 loops=1)
Filter: (similarity((last_name)::text, 'ari'::text) > '0'::double precision)
Rows Removed by Filter: 868
Planning Time: 0.052 ms
Execution Time: 0.674 ms
执行时间的变化似乎微不足道。在最后一次查询的情况下,扫描需要0.643个单元,而在前一个情况下则是0.657个。时间也相差0.02毫秒(第二个查询运行速度甚至慢一点)。是否有一些选项,我错过了,应该启用,以帮助性能?数据集太简单了吗?
我用的医生:
编辑--我添加了一些houndred记录(现在有将近259,000条记录)--并再次运行测试。首先,没有索引:
In [59]: print(Author.objects.filter(last_name__icontains='bar').explain(analyze=True))
Seq Scan on reviews_author (cost=0.00..5433.28 rows=10358 width=16) (actual time=0.018..58.630 rows=846 loops=1)
Filter: (upper((last_name)::text) ~~ '%BAR%'::text)
Rows Removed by Filter: 258106
Planning Time: 0.046 ms
Execution Time: 58.662 ms
In [60]: print(Author.objects.filter(last_name__trigram_similar='bar').explain(analyze=True))
Gather (cost=1000.00..4478.96 rows=259 width=16) (actual time=0.555..80.710 rows=698 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Parallel Seq Scan on reviews_author (cost=0.00..3453.06 rows=152 width=16) (actual time=0.503..78.743 rows=349 loops=2)
Filter: ((last_name)::text % 'bar'::text)
Rows Removed by Filter: 129127
Planning Time: 0.039 ms
Execution Time: 80.740 ms
In [61]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'bar')).filter(similar__gt=0).order_by('-similar').explain(analyze=True))
Sort (cost=12725.93..12941.72 rows=86317 width=20) (actual time=168.214..168.876 rows=14235 loops=1)
Sort Key: (similarity((last_name)::text, 'bar'::text)) DESC
Sort Method: quicksort Memory: 1485kB
-> Seq Scan on reviews_author (cost=0.00..5649.07 rows=86317 width=20) (actual time=0.022..165.806 rows=14235 loops=1)
Filter: (similarity((last_name)::text, 'bar'::text) > '0'::double precision)
Rows Removed by Filter: 244717
Planning Time: 0.052 ms
Execution Time: 169.319 ms
有了它:
In [62]: print(Author.objects.filter(last_name__icontains='bar').explain(analyze=True))
Seq Scan on reviews_author (cost=0.00..5433.28 rows=10358 width=16) (actual time=0.015..59.366 rows=846 loops=1)
Filter: (upper((last_name)::text) ~~ '%BAR%'::text)
Rows Removed by Filter: 258106
Planning Time: 0.072 ms
Execution Time: 59.395 ms
In [63]: print(Author.objects.filter(last_name__trigram_similar='bar').explain(analyze=True))
Gather (cost=1000.00..4478.96 rows=259 width=16) (actual time=0.545..80.337 rows=698 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Parallel Seq Scan on reviews_author (cost=0.00..3453.06 rows=152 width=16) (actual time=0.292..78.502 rows=349 loops=2)
Filter: ((last_name)::text % 'bar'::text)
Rows Removed by Filter: 129127
Planning Time: 0.035 ms
Execution Time: 80.369 ms
In [64]: print(Author.objects.annotate(similar=TrigramSimilarity('last_name', 'bar')).filter(similar__gt=0).order_by('-similar').explain(analyze=True))
Sort (cost=12725.93..12941.72 rows=86317 width=20) (actual time=168.191..168.890 rows=14235 loops=1)
Sort Key: (similarity((last_name)::text, 'bar'::text)) DESC
Sort Method: quicksort Memory: 1485kB
-> Seq Scan on reviews_author (cost=0.00..5649.07 rows=86317 width=20) (actual time=0.029..165.743 rows=14235 loops=1)
Filter: (similarity((last_name)::text, 'bar'::text) > '0'::double precision)
Rows Removed by Filter: 244717
Planning Time: 0.054 ms
Execution Time: 169.340 ms
现在仍然是非常相似的时期,而且它似乎在避免使用gin索引。
发布于 2020-12-01 15:02:24
CREATE INDEX "reviews_aut_last_na_a89a84_gin" ON "reviews_author" USING gin ("last_name");
这并没有创建一个trigram索引。它在整个字符串上创建了一个GIN索引,使用了来自btree_gin的操作符(您似乎并不是为了任何好的目的使用这些操作符)。要建立一个trigram索引,需要如下所示:
CREATE INDEX "reviews_aut_last_na_a89a84_gin" ON "reviews_author" USING gin ("last_name" gin_trgm_ops);
但我不知道如何让django这么做,我不是Django用户。
https://stackoverflow.com/questions/65088405
复制相似问题