PostgreSQL 空间类型统计信息(Statistics, Cardinality, Selectivity, Estimate)不准确导致SQL执行计划不准(包含、相交查询)的优化实践

6 minute read

背景

PostgreSQL 通过几何类型以及PostGIS插件,支持丰富的空间类型,空间数据类型包括KNN检索,空间包含,空间相交,方位搜索,空间计算等操作。

为了得到正确的执行计划,在数据库优化器中,评估满足某个操作符条件的行数,是一个重要的CBO依据。

而评估选择性又依赖统计信息,同时依赖每种操作符的选择性算法。

一些优化器相关文档:

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 多值列的选择性 - Statistics, Cardinality, Selectivity, Estimate》

《PostgreSQL 11 preview - 表达式索引柱状图buckets\STATISTICS\default_statistics_target可设置》

《PostgreSQL 11 preview - 优化器 增强 汇总》

《PostgreSQL 11 preview - 强制auto prepared statment开关(自动化plan cache)(类似Oracle cursor_sharing force)》

《[未完待续] PostgreSQL PRO 特性 - AQO(机器学习执行计划优化器)》

《PostgreSQL pg_stat_reset清除track_counts的隐患》

《PostgreSQL 统计信息pg_statistic格式及导入导出dump_stat - 兼容Oracle》

《优化器里的概率学 - 性能抖动原理分析》

《懒人推动社会进步 - 多列聚合, gin与数据分布(选择性)》

《数据库优化器原理(含动态规划、机器学习建模优化器aqo) - 如何治疗选择综合症》

《PostgreSQL 10.0 preview 功能增强 - SQL执行剩余时间 - 垃圾回收过程可视pg_stat_progress_vacuum》

《PostgreSQL 10.0 preview 性能增强 - hash,nestloop join优化(聪明的优化器是这样的)》

《PostgreSQL 10.0 preview 功能增强 - 动态视图pg_stat_activity新增数据库管理进程信息》

《PostgreSQL 10.0 preview 功能增强 - 流复制统计视图pg_stat_replication增强, 新增时间维度延迟评估》

《PostgreSQL 10.0 preview 优化器改进 - 不完整索引支持复合排序》

《聊一下PostgreSQL优化器 - in里面有重复值时PostgreSQL如何处理?》

《官人要杯咖啡吗? - PostgreSQL实时监测PLAN tree的执行进度 - pg_query_state》

《PostgreSQL pg_stat_ pg_statio_ 统计信息(scan,read,fetch,hit)源码解读》

《PostgreSQL 9.6 并行计算 优化器算法浅析 - 以及如何强制并行度》

《PostgreSQL 9.6 并行计算 优化器算法浅析》

《关键时刻HINT出彩 - PG优化器的参数优化、执行计划固化CASE》

《Greenplum ORCA 优化器的编译安装与使用》

《PostgreSQL prepared statement和simple query的profile及性能差异》

《PostgreSQL 优化器逻辑推理能力 源码解析》

《PostgreSQL 操作符与优化器详解 - 包含(选择性、JOIN方法、等效)等内容》

《PostgreSQL pg_stat_replication sent_location, write_location, flush_location, replay_location的差别》

《为什么PostgreSQL启动后有个UDP监听localhost - pgstat统计信息进程间通信》

《pg_stat_statements fwrite not save》

《PostgreSQL 9.5 new feature - use FSM fast evaluate bloat and other table level statics(pgstattuple_approx)》

《PostgreSQL aggregate function 2 : Aggregate Functions for Statistics》

《优化器成本因子校对(disk,ssd,memory IO开销精算) - PostgreSQL real seq_page_cost & random_page_cost in disks,ssd,memory》

《优化器成本因子校对 - PostgreSQL explain cost constants alignment to timestamp》

《PostgreSQL pg_stats used to estimate top N freps values and explain rows》

[《执行计划选择算法 与 绑定变量 - PostgreSQL prepared statement: SPI_prepare, prepare execute COMMAND, PL/pgsql STYLE: custom & generic plan cache》](../201212/20121224_01.md)

《执行计划维度统计\判断执行计划翻转\统计每种执行计划的统计信息 use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2》

《PostgreSQL 9.2 improve prepared statements plan’s selectivity》

《PostgreSQL 9.2 add array elements statistics》

《WHY prepared Statement running slower in some situation CASE》

《PostgreSQL 优化器行评估算法》

那么,空间类型有合理的统计信息吗?

“目前空间数据类型,还没有柱状图,高频词等内容”,所以给操作符提供的选择性依据非常有限。

因此目前存在一个这样的问题,选择性函数内使用百分比硬编码,在代码中写死了选择性比例,那么空间包含查询(不管范围多大,评估出来的记录数是一样的),所以基本上都会选择空间索引扫描( 然而实际上当真实空间包含查询包含了大量记录时,全表扫描性能更好 )。

例子

1、建表(pos存储point数据),写入1000万记录数据,创建空间索引

create extension postgis;  
create table test (id int , info text, pos geometry);  
insert into test select generate_series(1,10000000), 'test', st_setsrid(st_makepoint(110+random()*20-10, 70+random()*10-5), 4326);  
create index idx_test_1 on test using gist(pos);  

2、收集统计信息

vacuum analyze test;  

你会发现空间类型的统计信息非常有限(没有高频词,柱状图,相关性等内容):

postgres=# select * from pg_stats where tablename='test' and attname='pos';  
 schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram   
------------+-----------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+----------------------  
 public     | test      | pos     | f         |         0 |        32 |         -1 |                  |                   |                  |             |                   |                        |   
(1 row)  

3、查看执行计划,查看评估出来的记录数。

空间类型选择性使用如下硬编码0.001。

src/backend/utils/adt/geo_selfuncs.c

/*
 *      contsel -- How likely is a box to contain (be contained by) a given box?
 *
 * This is a tighter constraint than "overlap", so produce a smaller
 * estimate than areasel does.
 */

Datum
contsel(PG_FUNCTION_ARGS)
{
        PG_RETURN_FLOAT8(0.001);
}

不管输入什么空间条件,评估得到的记录数都是千分之一:10000条

条件1:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                                                  QUERY PLAN                                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  
  
条件2:
postgres=# explain select * from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                                 QUERY PLAN                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=41)  
   Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
   Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
(5 rows)  

4、那么实际上的记录数是多少呢?

条件1:0条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
                               QUERY PLAN    
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=0.012..0.012 rows=0 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=0.010..0.010 rows=0 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.170 ms  
 Execution time: 0.061 ms  
(8 rows)  

条件2:5001107条。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                QUERY PLAN    
----------------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=12843.26..12843.27 rows=1 width=8) (actual time=5057.822..5057.822 rows=1 loops=1)  
   ->  Bitmap Heap Scan on test  (cost=149.95..12834.93 rows=3333 width=0) (actual time=507.744..4537.564 rows=5001107 loops=1)  
         Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Rows Removed by Index Recheck: 1981986  
         Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry,  
 pos)  
         Heap Blocks: exact=50167 lossy=33167  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=499.627..499.627 rows=5001107 loops=1)  
               Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~   
pos)  
 Planning time: 0.179 ms  
 Execution time: 5057.882 ms  
(10 rows)  

5、空间类型的选择性不准确,导致的问题,不管什么情况都走空间索引。

5.1 大范围搜索,使用索引反而更慢,如下。

postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                        QUERY PLAN  
-------------------------------------------------------------------------------------------------------  
 Finalize Aggregate  (cost=10539.13..10539.14 rows=1 width=8) (actual time=1079.698..1079.699 rows=1 loops=1)  
   ->  Gather  (cost=10539.10..10539.11 rows=8 width=8) (actual time=1079.655..1079.693 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=10539.10..10539.11 rows=1 width=8) (actual time=1049.984..1049.984 rows=1 loops=9)  
               ->  Parallel Bitmap Heap Scan on test  (cost=149.95..10538.06 rows=417 width=0) (actual time=483.733..991.774 rows=555679 loops=9)  
                     Recheck Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geo  metry ~ pos)  
                     Rows Removed by Index Recheck: 220221  
                     Filter: _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  
                     Heap Blocks: exact=5908 lossy=3936  
                     ->  Bitmap Index Scan on idx_test_1  (cost=0.00..149.12 rows=10000 width=0) (actual time=502.790..502.790 rows=5001107 loops=1)  
                           Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
 Planning time: 0.171 ms  
 Execution time: 1087.890 ms  
(14 rows)  
  
  
postgres=# explain analyze select count(*) from test where st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                                  QUERY PLAN           
-------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=411456.73..411456.74 rows=1 width=8) (actual time=674.853..674.853 rows=1 loops=1)  
   ->  Gather  (cost=411456.70..411456.71 rows=8 width=8) (actual time=674.793..674.848 rows=9 loops=1)  
         Workers Planned: 8  
         Workers Launched: 8  
         ->  Partial Aggregate  (cost=411456.70..411456.71 rows=1 width=8) (actual time=644.627..644.627 rows=1 loops=9)  
               ->  Parallel Seq Scan on test  (cost=0.00..411455.65 rows=417 width=0) (actual time=0.045..586.545 rows=555679 loops=9)  
                     Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry   ~ pos) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos)  )  
                     Rows Removed by Filter: 555433  
 Planning time: 0.130 ms  
 Execution time: 683.011 ms  
(10 rows)  

优化方法

首先我们知道空间包含,实际上是通过类似r-tree来从大到小进行收敛的,索引结构如下:

《PostgreSQL 黑科技 - 空间聚集存储, 内窥GIN, GiST, SP-GiST索引》

《通过空间思想理解GiST索引的构造》

因此,对于“平面或3D空间点”类型,我们可以利用x,y,z来收集BOUND BOX的边界统计信息,使用表达式索引即可得到。

下面的方法适合POINT geometry类型。

1、对point类型,创建x,y表达式索引

create index idx_test_2 on test (st_x(pos));  
create index idx_test_3 on test (st_y(pos));  

2、收集统计信息

vacuum analyze test;  

3、现在,表达式索引的统计信息有了(边界,柱状图,高频词等).

postgres=# select * from pg_stats where tablename='idx_test_2' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct |  most_common_vals  | most_common_freqs |                                                                                                                 
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                        histogram_bounds                                                                                                                                                                                                       
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                               | correlation | most_common_elems | most_common_elem_freqs | e  
lem_count_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+--------------------+-------------------+---------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+--  
--------------------  
 public     | idx_test_2 | st_x    | f         |         0 |         8 |  -0.978292 | {100.210125753656} | {6.66667e-05}     | {100.001683048904,100.198292508721,100.394482128322,100.614011939615,100.833150474355,101.039264379069,101.233  
114106581,101.457025809214,101.663421457633,101.865276163444,102.060301089659,102.248072810471,102.453578868881,102.656150059775,102.860928429291,103.057968365029,103.260449869558,103.457426037639,103.632223745808,103.85178135708,104.049  
411769956,104.26340050064,104.474113518372,104.658453594893,104.85337683931,105.054327072576,105.277718435973,105.484521845356,105.703673372045,105.907790735364,106.115896645933,106.314396839589,106.519666947424,106.724081514403,106.9380  
68913296,107.144253859296,107.326071513817,107.518919138238,107.721077715978,107.916374253109,108.123016441241,108.360300129279,108.551258808002,108.714984534308,108.903268156573,109.091228945181,109.277053307742,109.48098176159,109.6832  
09387586,109.853478074074,110.044501451775,110.241627292708,110.45505293645,110.652585113421,110.851763477549,111.036027139053,111.231312695891,111.423934968188,111.620921371505,111.847442938015,112.031054906547,112.234003236517,112.4332  
3944509,112.63793184422,112.834744984284,113.041293732822,113.23037719354,113.432592023164,113.619993180037,113.809020379558,114.014157289639,114.208164261654,114.409972019494,114.612494371831,114.827286116779,115.010247323662,115.208249  
371499,115.397510435432,115.607831152156,115.814645215869,116.007154844701,116.198594048619,116.39090036042,116.600214680657,116.819634130225,117.008039858192,117.199446037412,117.389995325357,117.60263632983,117.81475706026,118.02150100  
4696,118.219726895913,118.425922412425,118.623819025233,118.801201758906,118.997462140396,119.18357164599,119.394188923761,119.607428628951,119.789487998933,119.999485854059} |   0.0060246 |                   |                        |   
(1 row)  
  
postgres=# select * from pg_stats where tablename='idx_test_3' ;  
 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                                                                                                   
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                   histogram_bounds                                                                                                                                                                                                            
                                                                                                                                                                                                                                               
                                                                                                                                                                                                                                               
                                                                                                                                                                        | correlation | most_common_elems | most_common_elem_freqs | elem_cou  
nt_histogram   
------------+------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+-----------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+------------------------+---------  
-------------  
 public     | idx_test_3 | st_y    | f         |         0 |         8 |         -1 |                  |                   | {65.0000545242801,65.0990079157054,65.2072919439524,65.301208673045,65.3929488640279,65.4877291806042,65.5846159  
020439,65.6880339607596,65.7856895681471,65.8818093780428,65.9888388821855,66.0896951518953,66.1984446132556,66.304198782891,66.3969474285841,66.496685273014,66.6056828852743,66.7012783605605,66.7941406555474,66.8967378046364,66.98895718  
90235,67.0855156239122,67.191197341308,67.2994636883959,67.3969272850081,67.4947946099564,67.5993846775964,67.7099830284715,67.8120812214911,67.9088703868911,67.9981751134619,68.099527056329,68.1973828841001,68.2960116351023,68.398535256  
274,68.4888928988948,68.5882083000615,68.6907122470438,68.800939405337,68.9010692993179,69.0081828692928,69.11032628268,69.19522870332,69.2957087606192,69.3950105085969,69.5086302934214,69.6040614042431,69.7045981185511,69.8040427314118,  
69.906362099573,70.0155263161287,70.1146272942424,70.2022811910138,70.2994426619262,70.4040612280369,70.5112449126318,70.6194959674031,70.722281485796,70.822925157845,70.918908463791,71.0138623649254,71.1134587181732,71.2168908445165,71.  
3034919975325,71.4052765490487,71.5016098646447,71.6014923620969,71.6971560986713,71.8042277451605,71.8970008520409,71.9877171749249,72.0928315026686,72.1963384188712,72.2967635607347,72.3947894759476,72.4977402808145,72.5967703945935,72  
.6985206454992,72.7883456554264,72.885032473132,72.9875696543604,73.0857636081055,73.1845043040812,73.280454329215,73.3783990284428,73.4819683339447,73.5874625109136,73.6941803013906,73.7924668611959,73.8872256595641,73.9888873603195,74.  
0806347271428,74.1819418873638,74.2877806117758,74.3848745618016,74.4918680656701,74.5988712925464,74.7014277381822,74.8038458405063,74.9038722459227,74.9999207118526} | -0.00307583 |                   |                        |   
(1 row)  

现在我们改一下SQL,把boundbox放进SQL

select count(*) from test   
where   
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  

改成:

select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  

执行计划如下:

在大范围输入时,通过表达式索引条件评估得到的行数比空间编码大于空间评估(硬编码 千分之一)得到的记录数,所以依旧使用了空间索引。

postgres=# explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(120,100)), 4326) , pos);  
                       QUERY PLAN                                                                                                                                        
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12380.60..12380.61 rows=1 width=8)  
   ->  Index Scan using idx_test_1 on test  (cost=0.42..12376.42 rows=1674 width=0)  
         Index Cond: ('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos)  
         Filter: ((st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '120'::double precision) AND (st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '100'::double precision) AND _st_contains('0103000020E610000001000000050000000  000000000805B40000000000000F03F0000000000805B4000000000000059400000000000005E4000000000000059400000000000005E40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

在小范围输入时,评估得到的行数比空间编码更少(千分之一),所以使用了非空间索引。

explain select count(*) from test   
where   
st_x(pos)   
  between  
    st_xmin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_xmax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_y(pos)   
  between  
    st_ymin(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
  and  
    st_ymax(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326)))  
and  
st_contains(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) , pos);  
  
   QUERY PLAN                                
----------------------------------------------------------------------------------------------
 Aggregate  (cost=2.92..2.93 rows=1 width=8)  
   ->  Index Scan using idx_test_3 on test  (cost=0.43..2.92 rows=1 width=0)  
         Index Cond: ((st_y(pos) >= '1'::double precision) AND (st_y(pos) <= '2'::double precision))  
         Filter: (('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry ~ pos) AND   
(st_x(pos) >= '110'::double precision) AND (st_x(pos) <= '112'::double precision) AND _st_contains('0103000020E610000001000000050000000000000000805B40000000000000F03F0000000000805B4000000000000000400000000000005C4000000000000000400000000  
000005C40000000000000F03F0000000000805B40000000000000F03F'::geometry, pos))  
(4 rows)  

一些函数

查看geometry的bound box

postgres=# select st_astext(box2d(st_setsrid(st_makebox2d(st_makepoint(110,1), st_makepoint(112,2)), 4326) ));  
                st_astext                   
------------------------------------------  
 POLYGON((110 1,110 2,112 2,112 1,110 1))  
(1 row)  
postgres=# select st_astext(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
                                                                       st_astext                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------  
 POLYGON((220186.995121892 150406,220186.995121892 150506.126829327,220288.248780547 150506.126829327,220288.248780547 150406,220186.995121892 150406))  
(1 row)  

2、查看BOUND BOX的边界点x,y.

postgres=# select st_xmin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmin        
------------------  
 220186.995121892  
(1 row)  
  
postgres=# select st_xmax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_xmax        
------------------  
 220288.248780547  
(1 row)  
  
postgres=# select st_ymin(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
 st_ymin   
---------  
  150406  
(1 row)  
  
postgres=# select st_ymax(box2d(ST_GeomFromEWKT('CIRCULARSTRING(220268 150415 1,220227 150505 2,220227 150406 3)')));  
     st_ymax        
------------------  
 150506.126829327  
(1 row)  

小结

目前空间类型的bound box没有统计信息,所以空间查询时,选择性使用了硬编码写死,固定了选择性百分比,乘以pg_class.reltuples即可得到评估记录数,但是不准确。

使得PostgreSQL的空间搜索(包含搜索),不管范围多大,都使用索引,而实际上大范围可能使用全表扫描性能更好。

本文增加boundbox的条件,使用xyz边界条件,使得行数评估更加的准确,在适当的时候选择适合的索引。

当然,最好的方法,还是PG内核层面优化空间类型的统计信息,以及选择性代码的支持。

参考

建议,PostgreSQL内核层面改进空间类型的统计信息,点类型,分别统计X,Y轴的统计信息。

《PostgreSQL 自定义函数表达式选择性评估算法 - Statistics, Cardinality, Selectivity, Estimate》

Flag Counter

digoal’s 大量PostgreSQL文章入口