Greenplum 类型一致性使用注意 - 索引条件、JOIN的类型一致性限制
背景
在查询时,有很多用户会犯浑,发现建立了索引,但是查询偏偏不走索引。
怎么不走索引啊?
这里做容易混淆的是类型的差异,例如字段类型为字符串,但是输入的是INT类型,这样就可能不走索引。(除非创建了自动的CAST,自动进行类型转换)
查询的输入类型与索引的类型一致是非常有必要的。
例子
1、建表(使用变长、定长字符串类型),写入测试数据
create table tbl3(id int, c1 text, c2 varchar(64), c3 char(64));
postgres=# insert into tbl3 select id, md5, md5, md5 from (select id, md5(random()::text) as md5 from generate_series(1,1000000) t(id)) t;
INSERT 0 1000000
2、创建索引
postgres=# create index idx_tbl3_1 on tbl3 (c1);
CREATE INDEX
postgres=# create index idx_tbl3_2 on tbl3 (c2);
CREATE INDEX
postgres=# create index idx_tbl3_3 on tbl3 (c3);
CREATE INDEX
3、结构如下
postgres=# \d+ tbl3
Table "public.tbl3"
Column | Type | Modifiers | Storage | Description
--------+-----------------------+-----------+----------+-------------
id | integer | | plain |
c1 | text | | extended |
c2 | character varying(64) | | extended |
c3 | character(64) | | extended |
Indexes:
"idx_tbl3_1" btree (c1)
"idx_tbl3_2" btree (c2)
"idx_tbl3_3" btree (c3)
Has OIDs: no
Distributed by: (id)
4、走索引,强制转换为bpchar。
postgres=# explain analyze select * from tbl3 where c3='abc';
QUERY PLAN
------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..201.23 rows=1 width=135)
Rows out: 0 rows at destination with 3.081 ms to end, start offset by 0.294 ms.
-> Index Scan using idx_tbl3_3 on tbl3 (cost=0.00..201.23 rows=1 width=135)
Index Cond: c3 = 'abc'::bpchar
Rows out: 0 rows (seg0) with 0.138 ms to end, start offset by 3.211 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 3.484 ms
(11 rows)
5、不走索引,因为输入的为text变长,与char定长不一致。
postgres=# explain analyze select * from tbl3 where c3='abc'::text;
QUERY PLAN
-------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..20032.60 rows=1001 width=135)
Rows out: 0 rows at destination with 305 ms to end, start offset by 0.335 ms.
-> Seq Scan on tbl3 (cost=0.00..20032.60 rows=334 width=135)
Filter: c3::text = 'abc'::text
Rows out: 0 rows (seg0) with 302 ms to end, start offset by 4.023 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 195K bytes avg x 3 workers, 195K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 305.985 ms
(11 rows)
6、走索引,输入的类型为变长类型,字段也是变长类型,与TEXT兼容。
postgres=# explain analyze select * from tbl3 where c2='abc'::text;
QUERY PLAN
------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..200.86 rows=1 width=135)
Rows out: 0 rows at destination with 2.744 ms to end, start offset by 0.275 ms.
-> Index Scan using idx_tbl3_2 on tbl3 (cost=0.00..200.86 rows=1 width=135)
Index Cond: c2::text = 'abc'::text
Rows out: 0 rows (seg0) with 0.031 ms to end, start offset by 2.868 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 145K bytes avg x 3 workers, 145K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 3.120 ms
(11 rows)
7、不走索引,输入的类型为定长类型,但是字段为变长类型。
postgres=# explain analyze select * from tbl3 where c2='abc'::bpchar;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..17531.66 rows=1 width=135)
Rows out: 0 rows at destination with 60 ms to end, start offset by 0.276 ms.
-> Seq Scan on tbl3 (cost=0.00..17531.66 rows=1 width=135)
Filter: c2::bpchar = 'abc'::bpchar
Rows out: 0 rows (seg0) with 57 ms to end, start offset by 2.864 ms.
Slice statistics:
(slice0) Executor memory: 147K bytes.
(slice1) Executor memory: 131K bytes avg x 3 workers, 131K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 60.320 ms
(11 rows)
8、内表不走索引,因为JOIN字段类型不匹配。
postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c3) and t1.c1='abc';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Aggregate (cost=25235.81..25235.82 rows=1 width=8)
-> Gather Motion 3:1 (slice2; segments: 3) (cost=25235.74..25235.79 rows=1 width=8)
-> Aggregate (cost=25235.74..25235.75 rows=1 width=8)
-> Nested Loop (cost=0.00..25235.66 rows=11 width=0)
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..200.88 rows=1 width=33)
-> Index Scan using idx_tbl3_1 on tbl3 t1 (cost=0.00..200.86 rows=1 width=33)
Index Cond: c1 = 'abc'::text AND 'abc'::text = c1
-> Seq Scan on tbl3 t2 (cost=0.00..25034.46 rows=11 width=65)
Filter: 'abc'::text = c3::text AND c3::text = 'abc'::text
(9 rows)
9、内表走索引,因为JOIN字段类型都是变长,匹配。
postgres=# explain select count(*) from tbl3 t1 join tbl3 t2 on (t1.c1=t2.c2) and t1.c1='abc';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=401.86..401.87 rows=1 width=8)
-> Gather Motion 3:1 (slice2; segments: 3) (cost=401.80..401.85 rows=1 width=8)
-> Aggregate (cost=401.80..401.81 rows=1 width=8)
-> Nested Loop (cost=200.88..401.79 rows=2 width=0)
-> Index Scan using idx_tbl3_1 on tbl3 t1 (cost=0.00..200.86 rows=1 width=33)
Index Cond: c1 = 'abc'::text AND 'abc'::text = c1
-> Materialize (cost=200.88..200.89 rows=1 width=33)
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..200.88 rows=1 width=33)
-> Index Scan using idx_tbl3_2 on tbl3 t2 (cost=0.00..200.86 rows=1 width=33)
Index Cond: 'abc'::text = c2::text AND c2::text = 'abc'::text
(10 rows)
分析
实际上就是要求索引的表达式与条件的表达式一致。
例子,使用表达式,强制转换为变长类型。
postgres=# create index idx_tbl3_4 on tbl3 ((c3::text));
CREATE INDEX
查询时,只要表达式的类型与条件类型匹配即可,走索引,同时会使用Implicit转换。
postgres=# explain analyze select * from tbl3 where (c3::text)='abc';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=108.60..52793.25 rows=1001 width=135)
Rows out: 0 rows at destination with 3.796 ms to end, start offset by 0.309 ms.
-> Bitmap Heap Scan on tbl3 (cost=108.60..52793.25 rows=334 width=135)
Recheck Cond: c3::text = 'abc'::text
Rows out: 0 rows (seg0) with 0.163 ms to end, start offset by 3.095 ms.
-> Bitmap Index Scan on idx_tbl3_4 (cost=0.00..108.35 rows=334 width=0)
Index Cond: c3::text = 'abc'::text
Bitmaps out: Avg 1.0 x 3 workers. Max 1 (seg0) with 0.135 ms to end, start offset by 3.119 ms.
Slice statistics:
(slice0) Executor memory: 155K bytes.
(slice1) Executor memory: 321K bytes avg x 3 workers, 321K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_seqscan=off
Total runtime: 4.219 ms
(15 rows)
postgres=# explain analyze select * from tbl3 where (c3::text) = '123'::int;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=108.60..52793.25 rows=1001 width=135)
Rows out: 0 rows at destination with 2.926 ms to end, start offset by 0.298 ms.
-> Bitmap Heap Scan on tbl3 (cost=108.60..52793.25 rows=334 width=135)
Recheck Cond: c3::text = '123'::text
Rows out: 0 rows (seg0) with 0.110 ms to end, start offset by 3.057 ms.
-> Bitmap Index Scan on idx_tbl3_4 (cost=0.00..108.35 rows=334 width=0)
Index Cond: c3::text = '123'::text
Bitmaps out: Avg 1.0 x 3 workers. Max 1 (seg0) with 0.064 ms to end, start offset by 3.095 ms.
Slice statistics:
(slice0) Executor memory: 155K bytes.
(slice1) Executor memory: 289K bytes avg x 3 workers, 289K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_seqscan=off
Total runtime: 3.327 ms
(15 rows)
postgres=# \dC
List of casts
Source type | Target type | Function | Implicit?
-----------------------------+-----------------------------+--------------------+---------------
abstime | date | date | in assignment
abstime | integer | (binary coercible) | no
abstime | timestamp without time zone | timestamp | yes
abstime | timestamp with time zone | timestamptz | yes
abstime | time without time zone | time | in assignment
bigint | bit | bit | no
bigint | character | text | in assignment
bigint | character varying | text | in assignment
bigint | double precision | float8 | yes
bigint | integer | int4 | in assignment
bigint | numeric | numeric | yes
bigint | oid | oid | yes
bigint | real | float4 | yes
...............
小结
要让索引起作用,最起码应该关注一下几点。
1、注意变长字符串和定长字符串,属于两种数据类型,不要混淆。
2、JOIN时操作符两边的字段类型一致。
3、查询时,WHERE条件与索引表达式一致。
4、查询时,WHERE条件与索引字段的类型一致。
5、查询时,WHERE条件的collate与索引的collate一致。(本地化相关)
6、操作符本身支持对应的索引访问方法。例如大多数的=操作符,支持b-tree的索引访问方法。
7、PG是CBO优化,因此索引访问的成本更低时,才会选择索引扫描。(设置了特定的优化器开关,或者使用了HINT除外)。