PostgreSQL 并行写入堆表，如何保证时序线性存储 - BRIN索引优化

5 minute read

背景

BRIN索引的原理很简单，每个数据块（或者每一段连续的数据块）记录了其中的元数据（最大值，最小值，平均值，COUNT，AVG，NULL值个数等），当每个数据块存储的数据范围错开（边界清晰）时，块级索引的效率就非常高。

例如测试表

create table test(id int, info text);  
  
-- 对ID创建BRIN索引  
create index idx_test_id on test using brin(id);  

ID边界清晰的例子：

数据块1：ID值范围 1-100  
  
数据块2：ID值范围 101-200  
  
数据块3：ID值范围 301-300  
  
......

ID边界模糊的例子：

数据块1：ID值范围 1-30 , 60-100  
  
数据块2：ID值范围 31-49, 101-130  
  
数据块3：ID值范围 49-59, 130-159, 160-200  
  
......

边界清晰，指不同的数据块没有交集，或者少量交集，而边界模糊，指数据块的交集很多，数据存储散乱。

为什么边界模糊性能不好呢？

例如

select * from test where id between 1 and 100;

这条SQL，在边界清晰时，搜索1号数据块即可。

而在边界模糊时，需要搜索1,2,3个数据块。

在实际业务中，我们可能会使用自增序列，作为时序字段，当串行（单进程）写入时，序列值和行号（堆表物理存储）线性相关，而当并行写入时，情况可能会变得糟糕（仅仅当使用了大步调时）。

怎么优化呢？

自增序列，并行写入的问题

1、串行写入

create table test(id serial8, info text);  
  
insert into test (info) select 'test' from generate_series(1,1000000);  

线性相关性 = 1

postgres=# analyze test;  
ANALYZE  
  
postgres=# select correlation from pg_stats where tablename='test' and attname='id';  
 correlation   
-------------  
           1  
(1 row)  

BRIN性能

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id between 1000 and 10000;  
                                                        QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=6.85..5761.69 rows=9264 width=13) (actual time=0.134..2.703 rows=9001 loops=1)  
   Output: id, info  
   Recheck Cond: ((test.id >= 1000) AND (test.id <= 10000))  
   Rows Removed by Index Recheck: 14679  
   Heap Blocks: lossy=128  
   Buffers: shared hit=130   
   ->  Bitmap Index Scan on idx_test_id  (cost=0.00..4.53 rows=23256 width=0) (actual time=0.026..0.026 rows=1280 loops=1)  
         Index Cond: ((test.id >= 1000) AND (test.id <= 10000))  
         Buffers: shared hit=2  
 Planning time: 0.085 ms  
 Execution time: 3.188 ms  
(11 rows)  

2、并行写入

postgres=# drop table test;  
DROP TABLE  
postgres=# create table test(id serial8, info text);  
CREATE TABLE  
  
vi test.sql  
insert into test (info) values ('test');  
  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 400 -j 400 -t 2500  

线性相关性，略差，但依旧很好

postgres=# create index idx_test_id on test using brin(id);  
CREATE INDEX  
postgres=# analyze test;  
ANALYZE  
postgres=# select correlation from pg_stats where tablename='test' and attname='id';  
 correlation   
-------------  
    0.998394  
(1 row)  
  
-- 偏差  
  
postgres=# select *,id-rn from (select ctid,id,row_number() over(order by ctid) rn from test) t where id<>rn;  
    ctid    |   id    |   rn    | ?column?   
------------+---------+---------+----------  
 (0,3)      |       4 |       3 |        1  
 (0,4)      |       3 |       4 |       -1  
 (0,28)     |      29 |      28 |        1  
 (0,29)     |      28 |      29 |       -1  
 (0,39)     |      40 |      39 |        1  
 (0,40)     |      39 |      40 |       -1  
 (0,66)     |      67 |      66 |        1  
 (0,67)     |      66 |      67 |       -1  
 (0,110)    |     111 |     110 |        1  
 (0,111)    |     110 |     111 |       -1  
 (1,1)      |     189 |     186 |        3  
 (1,2)      |     190 |     187 |        3  
 (1,3)      |     191 |     188 |        3  
 (1,4)      |     192 |     189 |        3  
 (1,5)      |     193 |     190 |        3  
 (1,6)      |     194 |     191 |        3  
 (1,7)      |     195 |     192 |        3  
 (1,8)      |     196 |     193 |        3  
 (1,9)      |     197 |     194 |        3  
 (1,10)     |     198 |     195 |        3  
 (1,11)     |     199 |     196 |        3  
 (1,12)     |     200 |     197 |        3  
 (1,13)     |     201 |     198 |        3  
 (1,14)     |     202 |     199 |        3  
 (1,15)     |     203 |     200 |        3  
 (1,16)     |     204 |     201 |        3  
 (1,17)     |     205 |     202 |        3  
 (1,18)     |     206 |     203 |        3  
 (1,19)     |     207 |     204 |        3  
 (1,20)     |     209 |     205 |        4  
 (1,21)     |     210 |     206 |        4  
 (1,22)     |     212 |     207 |        5  
 (1,23)     |     220 |     208 |       12  
 (1,24)     |     223 |     209 |       14  
 (1,25)     |     226 |     210 |       16  
 (1,26)     |     229 |     211 |       18  

BRIN性能

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id between 1000 and 10000;  
                                                        QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=6.83..5952.29 rows=9204 width=13) (actual time=0.073..2.687 rows=9001 loops=1)  
   Output: id, info  
   Recheck Cond: ((test.id >= 1000) AND (test.id <= 10000))  
   Rows Removed by Index Recheck: 14679  
   Heap Blocks: lossy=128  
   Buffers: shared hit=130  
   ->  Bitmap Index Scan on idx_test_id  (cost=0.00..4.53 rows=22764 width=0) (actual time=0.033..0.033 rows=1280 loops=1)  
         Index Cond: ((test.id >= 1000) AND (test.id <= 10000))  
         Buffers: shared hit=2  
 Planning time: 0.086 ms  
 Execution time: 3.185 ms  
(11 rows)  

3、并行写入，放大序列步调，散列问题出现

postgres=# drop table test;  
DROP TABLE  
postgres=# create table test(id serial8, info text);  
CREATE TABLE  
postgres=# alter sequence test_id_seq cache 10000;  
ALTER SEQUENCE  
  
vi test.sql  
insert into test (info) values ('test');  
  
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 400 -j 400 -t 2500  

线性相关性，直线下降

postgres=# create index idx_test_id on test using brin(id);  
CREATE INDEX  
postgres=# analyze test;  
ANALYZE  
postgres=# select correlation from pg_stats where tablename='test' and attname='id';  
 correlation   
-------------  
    0.571033  
(1 row)  
  
-- 偏差非常明显  
  
postgres=# select *,id-rn from (select ctid,id,row_number() over(order by ctid) rn from test) t where id<>rn;  
    ctid    |   id    |   rn    | ?column?   
------------+---------+---------+----------  
 (0,6)      |   10001 |       6 |     9995  
 (0,7)      |       6 |       7 |       -1  
 (0,8)      |       7 |       8 |       -1  
 (0,9)      |   10002 |       9 |     9993  
 (0,10)     |   20001 |      10 |    19991  
 (0,11)     |       8 |      11 |       -3  
 (0,12)     |   10003 |      12 |     9991  
 (0,13)     |       9 |      13 |       -4  
 (0,14)     |   20002 |      14 |    19988  
 (0,15)     |   10004 |      15 |     9989  
 (0,16)     |      10 |      16 |       -6  
 (0,17)     |   20003 |      17 |    19986  
 (0,18)     |   10005 |      18 |     9987  
 (0,19)     |      11 |      19 |       -8  
 (0,20)     |   20004 |      20 |    19984  
 (0,21)     |   10006 |      21 |     9985  
 (0,22)     |      12 |      22 |      -10  
 (0,23)     |   20005 |      23 |    19982  
 (0,24)     |   10007 |      24 |     9983  
 (0,25)     |      13 |      25 |      -12  
 (0,26)     |   20006 |      26 |    19980  
 (0,27)     |      14 |      27 |      -13  
 (0,28)     |   20007 |      28 |    19979  
 (0,29)     |   10008 |      29 |     9979  
 (0,30)     |      15 |      30 |      -15  
 (0,31)     |   20008 |      31 |    19977  
 (0,32)     |   10009 |      32 |     9977  
 (0,33)     |   20009 |      33 |    19976  
 (0,34)     |      16 |      34 |      -18  
 (0,35)     |   30001 |      35 |    29966  
 (0,36)     |   20010 |      36 |    19974  
 (0,37)     |   10010 |      37 |     9973  
 (0,38)     |      17 |      38 |      -21  
 (0,39)     |   20011 |      39 |    19972  
 (0,40)     |   20012 |      40 |    19972  
 (0,41)     |   10011 |      41 |     9970  
 (0,42)     |      18 |      42 |      -24  
 (0,43)     |   30002 |      43 |    29959  
 (0,44)     |   20013 |      44 |    19969  
 (0,45)     |      19 |      45 |      -26  
 (0,46)     |   10012 |      46 |     9966  
 (0,47)     |   30003 |      47 |    29956  
 (0,48)     |   20014 |      48 |    19966  
 (0,49)     |      20 |      49 |      -29  
 (0,50)     |   30004 |      50 |    29954  
 (0,51)     |   10013 |      51 |     9962  
 (0,52)     |   20015 |      52 |    19963  
 (0,53)     |      21 |      53 |      -32  
 (0,54)     |   30005 |      54 |    29951  
 (0,55)     |   10014 |      55 |     9959  
 (0,56)     |   20016 |      56 |    19960  
 (0,57)     |      22 |      57 |      -35  
 (0,58)     |   20017 |      58 |    19959  

BRIN性能

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id between 1000 and 10000;  
                                                         QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=5.11..6244.85 rows=2222 width=13) (actual time=7.104..43.755 rows=1501 loops=1)  
   Output: id, info  
   Recheck Cond: ((test.id >= 1000) AND (test.id <= 10000))  
   Rows Removed by Index Recheck: 448419  
   Heap Blocks: lossy=2432  
   Buffers: shared hit=2440  -- 散列，BRIN索引混淆了，导致扫描了大量的数据块。  
   ->  Bitmap Index Scan on idx_test_id  (cost=0.00..4.56 rows=38916 width=0) (actual time=0.071..0.071 rows=24320 loops=1)  
         Index Cond: ((test.id >= 1000) AND (test.id <= 10000))  
         Buffers: shared hit=8  
 Planning time: 0.128 ms  
 Execution time: 43.867 ms  
(11 rows)  

并发时序线性优化

1、使用单步序列。单步序列不会因为并发增加而导致离散度增加。

2、使用系统时间字段，系统时间与单步序列效果一样，即使并发写入，也不会导致离散度增加。

使用连续自增值的字段，创建BRIN索引就是很靠谱的。

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

PostgreSQL 并行写入堆表，如何保证时序线性存储 - BRIN索引优化

背景

自增序列，并行写入的问题

并发时序线性优化

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法