PostgreSQL partial index use caveat in prepared statement <= ver 9.1

2 minute read

背景

最近开发的同事反映的一个问题.

环境是这样的,

主机A跨广域网抽取主机B上的PostgreSQL的数据.

一共抽取两个表, 假设为T1和 T2.

其中T1表 60GB左右, T2表 30GB左右.

抽取的规则是按照时间分段抽取, 例如 :

select * from t1 where create_time>=$1::timestamp with time zone and create_time <$2::timestamp with time zone order by create_time limit $3 offset $4;  

以及

select * from t2 where create_time>=$1::timestamp with time zone and create_time <$2::timestamp with time zone order by create_time limit $3 offset $4;  

在t1和t2的表中, create_time上都有索引.

但是t1 表抽取每次耗时约2秒, 而t2表每次耗时需要600秒. 这样分段抽取的话, 总的时间就非常的长.

从时间上来看, t1表应该是走了索引, 而t2表应该是全表扫描.

但是奇怪的是, 在psql或者pgadmin客户端中执行同样的SQL, t2 表是很快的.

为了更清晰的知晓执行计划, 最好是在数据库上启用auto_explain的插件, 但是需要重启数据库, 这个操作也只是为了印证. 对解决问题没有用, 所以这里就不重启数据库来加载auto_explain了.

那么到底有多少种原因会造成有索引确不走索引的情况呢?

1. 客户端设置了连接参数, 如enable_indexscan=off. 经过检查并没有这个参数.

2. t2 表的统计信息未更新. 但是如果是这个问题, 在psql或者pgadmin中都会很慢. 所以排除.

并且如果是这个问题执行analyze后就可以排除.

3. 禁用用户的全表扫描都无效, alter user digoal set enable_seqscan=off;

t2表的抽取依旧很慢, t1表依旧很快. (这个操作只对重新连接的会话生效, 长连接无效.)

一开始没有仔细看索引的信息, 只是关心有没有建索引. 原来T2表建的是partial index. 如下 :

"idx_t2_create_time" btree (create_time) WHERE create_time > '2011-04-15 00:00:00+08'::timestamp with time zone  

所以在执行prepared statement的时候, optimalize后才取传递的参数值, 所以不能使用这个索引.

而T1表是全量索引, 所以自然优先选择索引. 就出现了本文出现的问题.

例如 :

digoal=> prepare p_t2 (timestamp with time zone, timestamp with time zone, bigint, bigint) as SELECT * from digoal.t2 where create_time>=$1 and create_time<$2 order by create_time limit $3 offset $4;  
PREPARE  
digoal=> explain execute p_t2('2012-11-10', '2012-11-11', 10, 10);  
                                               QUERY PLAN                                                  
---------------------------------------------------------------------------------------------------------  
 Limit  (cost=10006153796.86..10006153975.12 rows=71304 width=161)  
   ->  Sort  (cost=10006153618.60..10006155401.21 rows=713042 width=161)  
         Sort Key: create_time  
         ->  Seq Scan on t2  (cost=10000000000.00..10005958957.99 rows=713042 width=161)  
               Filter: ((create_time >= $1) AND (create_time < $2))  
(5 rows)  

T1 的话执行prepared statement就没有这个问题 :

digoal=> prepare p_t1(timestamp with time zone, timestamp with time zone, bigint, bigint) as SELECT * from digoal.t1 where create_time>=$1 and create_time<$2 order by create_time limit $3 offset $4;  
PREPARE  
digoal=> explain execute p_t1('2012-11-10', '2012-11-11', 10, 10);  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=11730.96..23461.92 rows=158179 width=152)  
   ->  Index Scan using idx_t1 on t1  (cost=0.00..117309.44 rows=1581788 width=152)  
         Index Cond: ((create_time >= $1) AND (create_time < $2))  
(3 rows)  

注意PostgreSQL 9.2优化了这个问题, 有兴趣的朋友可参考 :

http://blog.163.com/digoal@126/blog/static/16387704020124219333824/

9.2 测试如下 :

digoal=> create table t1 (id int, create_time timestamp);  
CREATE TABLE  
digoal=> insert into t1 select generate_series(1,1000000),clock_timestamp();  
INSERT 0 1000000  
digoal=> create index idx_t1 on t1(create_time) where create_time>'2012-11-15 14:45:00';  
CREATE INDEX  
digoal=> explain select * from t1 where create_time>='2012-11-15 14:45:01' and create_time<'2012-11-15 14:45:02' order by create_time limit 10 offset 10;  
                                                                            QUERY PLAN                                                
                                
------------------------------------------------------------------------------------------------------------------------------------  
------------------------------  
 Limit  (cost=7.07..14.13 rows=10 width=12)  
   ->  Index Scan using idx_t1 on t1  (cost=0.00..3533.52 rows=5000 width=12)  
         Index Cond: ((create_time >= '2012-11-15 14:45:01'::timestamp without time zone) AND (create_time < '2012-11-15 14:45:02'::  
timestamp without time zone))  
(3 rows)  
digoal=> prepare p_t1(timestamp, timestamp, bigint, bigint) as select * from t1 where create_time>=$1 and create_time<$2 order by create_time limit $3 offset $4;   
PREPARE  
digoal=> explain execute  p_t1('2012-11-15 14:45:01', '2012-11-15 14:45:02', 10, 10);  
                                                                            QUERY PLAN                                                
                                
------------------------------------------------------------------------------------------------------------------------------------  
------------------------------  
 Limit  (cost=7.07..14.13 rows=10 width=12)  
   ->  Index Scan using idx_t1 on t1  (cost=0.00..3533.52 rows=5000 width=12)  
         Index Cond: ((create_time >= '2012-11-15 14:45:01'::timestamp without time zone) AND (create_time < '2012-11-15 14:45:02'::  
timestamp without time zone))  
(3 rows)  

PostgreSQL 9.1以及以下版本都有 这个问题, 测试如下 :

pg91@db-172-16-3-150-> psql  
psql (9.1.3)  
Type "help" for help.  
  
postgres=# create table t1 (id int, create_time timestamp);  
CREATE TABLE  
postgres=#  insert into t1 select generate_series(1,1000000),clock_timestamp();  
INSERT 0 1000000  
postgres=# create index idx_t1 on t1(create_time) where create_time>'2012-11-15 14:45:00';  
CREATE INDEX  
postgres=# explain select * from t1 where create_time>='2012-11-15 14:45:01' and create_time<'2012-11-15 14:45:02' order by create_time limit 10 offset 10;  
                                                                            QUERY PLAN                                                
                                
------------------------------------------------------------------------------------------------------------------------------------  
------------------------------  
 Limit  (cost=7.07..14.13 rows=10 width=12)  
   ->  Index Scan using idx_t1 on t1  (cost=0.00..3533.28 rows=5000 width=12)  
         Index Cond: ((create_time >= '2012-11-15 14:45:01'::timestamp without time zone) AND (create_time < '2012-11-15 14:45:02'::  
timestamp without time zone))  
(3 rows)  
  
postgres=#  prepare p_t1(timestamp, timestamp, bigint, bigint) as select * from t1 where create_time>=$1 and create_time<$2 order by create_time limit $3 offset $4;   
PREPARE  
postgres=# explain execute  p_t1('2012-11-15 14:45:01', '2012-11-15 14:45:02', 10, 10);  
                              QUERY PLAN                                
----------------------------------------------------------------------  
 Limit  (cost=20714.44..20715.69 rows=500 width=12)  
   ->  Sort  (cost=20713.19..20725.69 rows=5000 width=12)  
         Sort Key: create_time  
         ->  Seq Scan on t1  (cost=0.00..20406.00 rows=5000 width=12)  
               Filter: ((create_time >= $1) AND (create_time < $2))  
(5 rows)  

Flag Counter

digoal’s 大量PostgreSQL文章入口