我有一个小型的PostgreSQL数据库(~3,000行)。
我正在尝试在其中一个文本字段('body')上设置一个全文搜索。
问题是任何查询都非常慢(35+秒!)。
我想问题是因为DB选择了顺序扫描模式.
这是我的疑问:
SELECT
ts_rank_cd(to_tsvector('italian', body), query),
ts_headline('italian', body, to_tsquery('torino')),
title,
location,
id_author
FROM
fulltextsearch.documents, to_tsquery('torino') as query
WHERE
(body_tsvector @@ query)
OFFSET
0以下是解释分析:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1129.81 rows=19 width=468) (actual time=74.059..13630.114 rows=863 loops=1)
-> Nested Loop (cost=0.00..1129.81 rows=19 width=468) (actual time=74.056..13629.342 rows=863 loops=1)
Join Filter: (documents.body_tsvector @@ query.query)
-> Function Scan on to_tsquery query (cost=0.00..0.01 rows=1 width=32) (actual time=4.606..4.608 rows=1 loops=1)
-> Seq Scan on documents (cost=0.00..1082.09 rows=3809 width=591) (actual time=0.045..48.072 rows=3809 loops=1)
Total runtime: 13630.720 ms这是我的桌子:
mydb=# \d+ fulltextsearch.documents;
Table "fulltextsearch.documents"
Column | Type | Modifiers | Storage | Description
---------------+-------------------+-----------------------------------------------------------------------+----------+-------------
id | integer | not null default nextval('fulltextsearch.documents_id_seq'::regclass) | plain |
id_author | integer | | plain |
body | character varying | | extended |
title | character varying | | extended |
location | character varying | | extended |
date_creation | date | | plain |
body_tsvector | tsvector | | extended |
Indexes:
"fulltextsearch_documents_tsvector_idx" gin (to_tsvector('italian'::regconfig, COALESCE(body, ''::character varying)::text))
"id_idx" btree (id)
Triggers:
body_tsvectorupdate BEFORE INSERT OR UPDATE ON fulltextsearch.documents FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('body_tsvector', 'pg_catalog.italian', 'body')
Has OIDs: no我肯定我错过了一些明显的东西..。
有什么线索吗?
。
。
。
======================================================================= ===更新
感谢您的建议,我提出了以下(更好的)查询:
SELECT
ts_rank(body_tsvector, query),
ts_headline('italian', body, query),
title,
location
FROM
fulltextsearch.documents, to_tsquery('italian', 'torino') as query
WHERE
to_tsvector('italian', coalesce(body,'')) @@ query这是相当好,但总是非常慢(13+秒.)。
我注意到注释掉"ts_headline()“行查询的速度非常快。
这是解释分析,它最终使用了索引,但对我没有多大帮助……
EXPLAIN ANALYZE SELECT
clock_timestamp() - statement_timestamp() as elapsed_time,
ts_rank(body_tsvector, query),
ts_headline('italian', body, query),
title,
location
FROM
fulltextsearch.documents, to_tsquery('italian', 'torino') as query
WHERE
to_tsvector('italian', coalesce(body,'')) @@ query
Nested Loop (cost=16.15..85.04 rows=19 width=605) (actual time=102.290..13392.161 rows=863 loops=1)
-> Function Scan on query (cost=0.00..0.01 rows=1 width=32) (actual time=0.008..0.009 rows=1 loops=1)
-> Bitmap Heap Scan on documents (cost=16.15..84.65 rows=19 width=573) (actual time=0.381..4.236 rows=863 loops=1)
Recheck Cond: (to_tsvector('italian'::regconfig, (COALESCE(body, ''::character varying))::text) @@ query.query)
-> Bitmap Index Scan on fulltextsearch_documents_tsvector_idx (cost=0.00..16.15 rows=19 width=0) (actual time=0.312..0.312 rows=863 loops=1)
Index Cond: (to_tsvector('italian'::regconfig, (COALESCE(body, ''::character varying))::text) @@ query.query)
Total runtime: 13392.717 ms发布于 2013-04-18 16:12:40
你错过了两件(相当明显的)事情:
1在'italian'中设置了to_tsvector(),但没有在to_tsquery()中指定
保持两者一致。
2你已经索引了COALESCE(body, ...),但这不是你要搜索的对象。
规划师不是魔术-你只能使用索引,如果这是你要搜索的对象。
发布于 2013-04-19 14:31:13
最后,在您的回答和评论的帮助下,通过一些googling,我在完整结果集(我感兴趣的结果页)的子集上运行了ts_headline() (我想这是一个非常重的函数)来解决问题:
SELECT
id,
ts_headline('italian', body, to_tsquery('italian', 'torino')) as headline,
rank,
title,
location
FROM (
SELECT
id,
body,
title,
location,
ts_rank(body_tsvector, query) as rank
FROM
fulltextsearch.documents, to_tsquery('italian', 'torino') as query
WHERE
to_tsvector('italian', coalesce(body,'')) @@ query
LIMIT 10
OFFSET 0
) as s发布于 2016-09-15 13:40:34
我通过预先计算ts_rank_cd并将其存储在一个表中来解决这个问题,以便在语料库中找到流行术语(高发生率)。搜索将查看此表,以获得查询术语的排序文档级别。如果不存在(对于不太流行的术语),它将默认动态创建ts_rank_cd。
请看一下这篇文章。
https://stackoverflow.com/questions/16086398
复制相似问题