PostgreSQL 单表并行bulkload的extend file lock 冲突问题解决 - 数据块预分配

4 minute read

背景

PostgreSQL的数据文件是一个一个BLOCK extend的，当并行的会话插入同一个表，并且都遇到free space map反馈无BLOCK可插时，需要extend数据文件，有两种做法：

1、老的版本，只有一个会话可以EXTEND FILE，其他会话等它EXTEND结束后继续EXTEND。

《PostgreSQL bulk COPY load Bottleneck by extend lock waiting》

2、较新版本的PG，同样只有一个会话可以EXTEND FILE，但是如果其他会话也需要扩展同一个表的FILE时，会一次性扩展多个BLOCK，降低冲突。

[《parallel blocking

waiting by slow BLOCK extend relation , ExclusiveLock on extension of relation》](../201505/20150511_01.md)

由于EXTEND FILE改变了FILE的INODE信息，需要FSYNC操作。所以扩展数据块是比较耗时的操作。

前面两篇文章介绍了扩展数据块的冲突带来的性能问题。同时提到了内核层面的解决方法，在内核层面没有优化时，有什么方法呢？

预扩展数据块

当我们已知某个表将要导入大量数据时，可以先扩展好数据块，等高并发导入时，就不会有extend file的冲突了。

非预加载

1、创建预加载表

postgres=# create unlogged table tbl_pre(id int, info text, crt_time timestamp);  
CREATE TABLE  

2、创建数据源表

postgres=# create unlogged table tbl_content(id int, info text, crt_time timestamp);  
CREATE TABLE  

3、写入1000万记录

postgres=# insert into tbl_content select generate_series(1,10000000), repeat(md5(random()::text), 10), now();  
INSERT 0 10000000  

4、当不使用预加载时，写入1000万数据，耗时12秒。

postgres=# insert into tbl_pre select * from tbl_content ;  
INSERT 0 10000000  
Time: 11921.491 ms (00:11.921)  

预加载

1、往预加载表写入一条无效记录

postgres=# insert into tbl_pre values (-1);  
INSERT 0 1  
Time: 0.358 ms  

2、删除无效记录以外的其他记录。

postgres=# delete from tbl_pre where id<>-1;  
DELETE 10000000  
Time: 3417.401 ms (00:03.417)  

3、垃圾回收，由于无效记录在FILE的末尾BLOCK，所以这个表垃圾回收后，所有的BLOCK都会被保留，用于下次写入时，就不需要EXTEND BLOCK了。

postgres=# vacuum tbl_pre;  
VACUUM  
Time: 1221.166 ms (00:01.221)  

4、往预加载表写入数据，现在耗时5秒。降低了7秒。

postgres=# insert into tbl_pre select * from tbl_content ;  
INSERT 0 10000000  
Time: 4941.610 ms (00:04.942)  

如果是并行加载，效果会更加明显。

并行加载，非预分配模式

1、清除预分配表，清空到0个BLOCK，接下来的INSERT将需要EXTEND BLOCK。

postgres=# truncate tbl_pre ;  
TRUNCATE TABLE  

2、新建一个源数据表，写入100万记录

postgres=# create unlogged table tbl_content1(id int, info text, crt_time timestamp);  
CREATE TABLE  
postgres=# insert into tbl_content1 select generate_series(1,1000000), repeat(md5(random()::text), 10), now();  
INSERT 0 1000000  

3、新建一个脚本，往预分配表写入100万记录

vi test.sql  
  
insert into tbl_pre select * from tbl_content1;  

4、开启56个并发，每个并发往预分配表写入100万记录，一共写入5.6千万记录。耗时41.5秒。

pgbench -M prepared -n -r  -f ./test.sql -c 56 -j 56 -t 1  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
number of transactions per client: 1  
number of transactions actually processed: 56/56  
latency average = 42696.543 ms  
tps = 1.311582 (including connections establishing)  
tps = 1.311711 (excluding connections establishing)  
statement latencies in milliseconds:  
     41534.443  insert into tbl_pre select * from tbl_content1;  

等待严重

iotop  
  
Total DISK READ :      15.43 K/s | Total DISK WRITE :     402.79 M/s  
Actual DISK READ:      15.43 K/s | Actual DISK WRITE:     590.21 M/s  
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                                         
be/4 root        0.00 B/s    0.00 B/s  0.00 % 49.34 % [kworker/u112:1]  
be/4 digoal      3.86 K/s   10.88 M/s  0.00 %  8.22 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      3.86 K/s   15.46 M/s  0.00 %  4.72 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      3.86 K/s   15.46 M/s  0.00 %  0.67 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      3.86 K/s   12.69 M/s  0.00 %  0.57 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s   15.46 M/s  0.00 %  0.23 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.22 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.22 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    6.48 M/s  0.00 %  0.21 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.20 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.19 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s   10.28 M/s  0.00 %  0.19 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.19 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.18 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.18 % postgres: postgres postgres [local] INSERT waiting  
be/4 digoal      0.00 B/s    0.00 B/s  0.00 %  0.18 % postgres: postgres postgres [local] INSERT waiting  

5、导入时，查看等待事件，几乎全部都是extend file的等待

postgres=# select backend_type,wait_event_type,wait_event,count(*) from pg_stat_activity group by 1,2,3;  
         backend_type         | wait_event_type |     wait_event      | count   
------------------------------+-----------------+---------------------+-------  
 logical replication launcher | Activity        | LogicalLauncherMain |     1  
 walwriter                    | Activity        | WalWriterMain       |     1  
 autovacuum launcher          | Activity        | AutoVacuumMain      |     1  
 background writer            | Activity        | BgWriterMain        |     1  
 client backend               | IO              | DataFileExtend      |     1  
 checkpointer                 | Activity        | CheckpointerMain    |     1  
 client backend               | Lock            | extend              |    53  
 client backend               |                 |                     |     3  
(8 rows)  

并行加载，预分配模式

1、往预加载表，末尾追加一条无效记录。

postgres=# insert into tbl_pre values (-1);  
INSERT 0 1  

2、删除无效记录以外的所有记录。

postgres=# delete from tbl_pre where id<>-1;  
DELETE 56000000  

3、垃圾回收，由于无效记录在FILE的末尾BLOCK，所以这个表垃圾回收后，所有的BLOCK都会被保留，用于下次写入时，就不需要EXTEND BLOCK了。

postgres=# vacuum tbl_pre;

4、并行加载压测，现在耗时1.5秒。相比未预分配BLOCK时，节约了40秒时间。

pgbench -M prepared -n -r  -f ./test.sql -c 56 -j 56 -t 1  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
number of transactions per client: 1  
number of transactions actually processed: 56/56  
latency average = 1613.094 ms  
tps = 34.715898 (including connections establishing)  
tps = 34.835685 (excluding connections establishing)  
statement latencies in milliseconds:  
      1529.876  insert into tbl_pre select * from tbl_content1;  

postgres=# select count(*) from tbl_pre;  
  count     
----------  
 56000001  
(1 row)  

小结

当需要大批量，高并发导入数据到单个表里面时，extend file的冲突会成为写入的瓶颈。

建议可取的方法如下：

1、像本文一样，提前预分配好空间，避免并行加载时，extend file。特别适合每天都有定时的导入任务的情况，因为可以计划。(单个并发导入1000万数据从12秒降低到了5秒。56个并发导入5.6千万数据，从41.5秒降低到了1.5秒)

2、拆成分区表，并行的往分区表写入，这样可以降低EXTEND FILE的冲突，因为有多个分区的话，就对应有多个FILE。

方法1解决了EXTEND FILE的锁冲突问题，但是当表上面有索引时，索引的LOCK问题依旧存在。而方法2，解决了两个问题，但是引入了另一个问题，如果要查询全局数据，在走索引扫描时，性能可能不如单表（因为多个分区，导致索引也是分区索引）。

参考

《PostgreSQL bulk COPY load Bottleneck by extend lock waiting》

[《parallel blocking

waiting by slow BLOCK extend relation , ExclusiveLock on extension of relation》](../201505/20150511_01.md)

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

PostgreSQL 单表并行bulkload的extend file lock 冲突问题解决 - 数据块预分配

背景

预扩展数据块

非预加载

预加载

并行加载，非预分配模式

并行加载，预分配模式

小结

参考

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法