fast random data query & delete use ctid in postgresql
背景
朋友公司有一个需求, 需要生成一笔随机的唯一数据, 随机取出来使用, 用完下次不能重复使用.
使用PostgreSQL来实现的话, 有以下方法, 性能OK.
创建测试表, 存放一堆唯一值.
postgres=# create table tbl (id int);
CREATE TABLE
唯一值随机插入, 取数据时按照数据块倒序取出, 这么做的好处是vacuum时可以直接回收这部分空间.
postgres=# select generate_series(1,10) order by random();
generate_series
-----------------
1
9
4
7
3
6
8
2
10
5
(10 rows)
postgres=# \timing
Timing is on.
随机的插入数据
postgres=# insert into tbl select generate_series(1,10000000) order by random();
INSERT 0 10000000
Time: 42204.425 ms
从数据来看 , 已经随机插入了.
postgres=# select * from tbl limit 10;
id
---------
9318426
4366165
4661718
8491396
9413591
9845650
8830805
999712
7944907
2487468
(10 rows)
在ctid上创建索引, 取数据时使用这个索引, 倒序从最后的数据块开始取数据.
postgres=# create index idx_tbl_ctid on tbl(ctid);
CREATE INDEX
Time: 18824.496 ms
例如:
postgres=# select ctid,* from tbl order by ctid desc limit 5;
ctid | id
------------+---------
(11001,91) | 1121211
(11001,90) | 1157632
(11001,89) | 3024993
(11001,88) | 6147265
(11001,87) | 3057019
(5 rows)
Time: 0.675 ms
为了防止多个进程重复取数据, 使用这种方法.
postgres=# with t as(select ctid from tbl order by ctid desc limit 5) delete from tbl where ctid in (select ctid from t) returning *;
id
---------
3057019
6147265
3024993
1157632
1121211
(5 rows)
DELETE 5
Time: 1.136 ms
测试并行取数据.
测试方法, 将数据插入另一张表
postgres=# with t as(select ctid from tbl order by ctid desc limit 5),t1 as (delete from tbl where ctid in (select ctid from t) returning *) insert into test select * from t1 ;
INSERT 0 5
Time: 1.225 ms
postgres=# select * from test;
id
---------
9879376
8194987
4095242
4486826
5762760
(5 rows)
Time: 0.400 ms
使用pgbench 测试, 16个并行取数据进程, 每次取5条.
postgres@localhost-> vi test.sql
with t as(select ctid from tbl order by ctid desc limit 5),t1 as (delete from tbl where ctid in (select ctid from t) returning *) insert into test select * from t1;
测试完成后, 查询test表, 看看有没有重复数据就知道这种方法是否靠谱了.
性能见下 :
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 30
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 4
duration: 30 s
number of transactions actually processed: 225225
tps = 7505.450512 (including connections establishing)
tps = 7508.798021 (excluding connections establishing)
statement latencies in milliseconds:
2.128700 with t as(select ctid from tbl order by ctid desc limit 5),t1 as (delete from tbl where ctid in (select ctid from t) returning *) insert into test select * from t1;
经查没有重复数据, 方法靠谱
postgres=# select count(*),count(distinct id) from test;
count | count
--------+--------
127625 | 127625
(1 row)
使用这种方式, 垃圾回收将回收文件.
postgres=# select relpages from pg_class where relname='tbl';
relpages
----------
10442
(1 row)
postgres=# \q
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -T 30
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 4
duration: 30 s
number of transactions actually processed: 221771
tps = 7390.878883 (including connections establishing)
tps = 7393.894430 (excluding connections establishing)
statement latencies in milliseconds:
2.161958 with t as(select ctid from tbl order by ctid desc limit 5) delete from tbl where ctid in (select ctid from t) returning *;
postgres=# vacuum analyze tbl;
VACUUM
postgres=# select relpages from pg_class where relname='tbl';
relpages
----------
10301
(1 row)
使用多个表的话, 性能还有提升.
如下 :
postgres=# create table tbl0(id int);
CREATE TABLE
postgres=# create table tbl1(id int);
CREATE TABLE
postgres=# create table tbl2(id int);
CREATE TABLE
postgres=# create table tbl3(id int);
CREATE TABLE
postgres=# create table tbl4(id int);
CREATE TABLE
postgres=# create table tbl5(id int);
CREATE TABLE
postgres=# create table tbl6(id int);
CREATE TABLE
postgres=# create table tbl7(id int);
CREATE TABLE
postgres=# create table tbl8(id int);
CREATE TABLE
postgres=# create table tbl9(id int);
CREATE TABLE
postgres=# create table tbl10(id int);
CREATE TABLE
postgres=# create table tbl11(id int);
CREATE TABLE
postgres=# create table tbl12(id int);
CREATE TABLE
postgres=# create table tbl13(id int);
CREATE TABLE
postgres=# create table tbl14(id int);
CREATE TABLE
postgres=# create table tbl15(id int);
CREATE TABLE
insert into tbl0 select generate_series(1,1000000) order by random();
insert into tbl1 select generate_series(1000001,2000000) order by random();
insert into tbl2 select generate_series(2000001,3000000) order by random();
insert into tbl3 select generate_series(3000001,4000000) order by random();
insert into tbl4 select generate_series(4000001,5000000) order by random();
insert into tbl5 select generate_series(5000001,6000000) order by random();
insert into tbl6 select generate_series(6000001,7000000) order by random();
insert into tbl7 select generate_series(7000001,8000000) order by random();
insert into tbl8 select generate_series(8000001,9000000) order by random();
insert into tbl9 select generate_series(9000001,10000000) order by random();
insert into tbl10 select generate_series(10000001,11000000) order by random();
insert into tbl11 select generate_series(11000001,12000000) order by random();
insert into tbl12 select generate_series(12000001,13000000) order by random();
insert into tbl13 select generate_series(13000001,14000000) order by random();
insert into tbl14 select generate_series(14000001,15000000) order by random();
insert into tbl15 select generate_series(15000001,16000000) order by random();
create index idx_tbl0_ctid on tbl0(ctid desc);
create index idx_tbl1_ctid on tbl1(ctid desc);
create index idx_tbl2_ctid on tbl2(ctid desc);
create index idx_tbl3_ctid on tbl3(ctid desc);
create index idx_tbl4_ctid on tbl4(ctid desc);
create index idx_tbl5_ctid on tbl5(ctid desc);
create index idx_tbl6_ctid on tbl6(ctid desc);
create index idx_tbl7_ctid on tbl7(ctid desc);
create index idx_tbl8_ctid on tbl8(ctid desc);
create index idx_tbl9_ctid on tbl9(ctid desc);
create index idx_tbl10_ctid on tbl10(ctid desc);
create index idx_tbl11_ctid on tbl11(ctid desc);
create index idx_tbl12_ctid on tbl12(ctid desc);
create index idx_tbl13_ctid on tbl13(ctid desc);
create index idx_tbl14_ctid on tbl14(ctid desc);
create index idx_tbl15_ctid on tbl15(ctid desc);
postgres@localhost-> ll tbl*.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:28 tbl0.sql
-rw-rw-r-- 1 postgres postgres 127 Jan 30 03:32 tbl10.sql
-rw-rw-r-- 1 postgres postgres 127 Jan 30 03:32 tbl11.sql
-rw-rw-r-- 1 postgres postgres 127 Jan 30 03:32 tbl12.sql
-rw-rw-r-- 1 postgres postgres 127 Jan 30 03:32 tbl13.sql
-rw-rw-r-- 1 postgres postgres 127 Jan 30 03:33 tbl14.sql
-rw-rw-r-- 1 postgres postgres 127 Jan 30 03:33 tbl15.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:29 tbl1.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:29 tbl2.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:32 tbl3.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:32 tbl4.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:32 tbl5.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:32 tbl6.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:32 tbl7.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:32 tbl8.sql
-rw-rw-r-- 1 postgres postgres 125 Jan 30 03:32 tbl9.sql
postgres@localhost-> cat tbl*.sql
with t as(select ctid from tbl0 order by ctid desc limit 5) delete from tbl0 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl10 order by ctid desc limit 5) delete from tbl10 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl11 order by ctid desc limit 5) delete from tbl11 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl12 order by ctid desc limit 5) delete from tbl12 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl13 order by ctid desc limit 5) delete from tbl13 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl14 order by ctid desc limit 5) delete from tbl14 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl15 order by ctid desc limit 5) delete from tbl15 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl1 order by ctid desc limit 5) delete from tbl1 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl2 order by ctid desc limit 5) delete from tbl2 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl3 order by ctid desc limit 5) delete from tbl3 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl4 order by ctid desc limit 5) delete from tbl4 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl5 order by ctid desc limit 5) delete from tbl5 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl6 order by ctid desc limit 5) delete from tbl6 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl7 order by ctid desc limit 5) delete from tbl7 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl8 order by ctid desc limit 5) delete from tbl8 where ctid in (select ctid from t) returning *;
with t as(select ctid from tbl9 order by ctid desc limit 5) delete from tbl9 where ctid in (select ctid from t) returning *;
postgres@localhost-> cat bench.sh
#!/bin/bash
pgbench -M prepared -n -r -f ./tbl0.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl1.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl2.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl3.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl4.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl5.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl6.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl7.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl8.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl9.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl10.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl11.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl12.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl13.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl14.sql -c 1 -j 1 -T 30 &
pgbench -M prepared -n -r -f ./tbl15.sql -c 1 -j 1 -T 30 &
postgres@localhost-> . ./bench.sh
postgres@localhost-> jobs
[1] Running pgbench -M prepared -n -r -f ./tbl0.sql -c 1 -j 1 -T 30 &
[2] Running pgbench -M prepared -n -r -f ./tbl1.sql -c 1 -j 1 -T 30 &
[3] Running pgbench -M prepared -n -r -f ./tbl2.sql -c 1 -j 1 -T 30 &
[4] Running pgbench -M prepared -n -r -f ./tbl3.sql -c 1 -j 1 -T 30 &
[5] Running pgbench -M prepared -n -r -f ./tbl4.sql -c 1 -j 1 -T 30 &
[6] Running pgbench -M prepared -n -r -f ./tbl5.sql -c 1 -j 1 -T 30 &
[7] Running pgbench -M prepared -n -r -f ./tbl6.sql -c 1 -j 1 -T 30 &
[8] Running pgbench -M prepared -n -r -f ./tbl7.sql -c 1 -j 1 -T 30 &
[9] Running pgbench -M prepared -n -r -f ./tbl8.sql -c 1 -j 1 -T 30 &
[10] Running pgbench -M prepared -n -r -f ./tbl9.sql -c 1 -j 1 -T 30 &
[11] Running pgbench -M prepared -n -r -f ./tbl10.sql -c 1 -j 1 -T 30 &
[12] Running pgbench -M prepared -n -r -f ./tbl11.sql -c 1 -j 1 -T 30 &
[13] Running pgbench -M prepared -n -r -f ./tbl12.sql -c 1 -j 1 -T 30 &
[14] Running pgbench -M prepared -n -r -f ./tbl13.sql -c 1 -j 1 -T 30 &
[15]- Running pgbench -M prepared -n -r -f ./tbl14.sql -c 1 -j 1 -T 30 &
[16]+ Running pgbench -M prepared -n -r -f ./tbl15.sql -c 1 -j 1 -T 30 &
postgres@localhost-> transaction type: Custom query
transaction type: Custom query
scaling factor: 1
scaling factor: 1
query mode: prepared
query mode: prepared
number of clients: 1
number of clients: 1
number of threads: 1
number of threads: 1
duration: 30 s
duration: 30 s
number of transactions actually processed: 18323
number of transactions actually processed: 18744
tps = 610.744476 (including connections establishing)
tps = 624.776154 (including connections establishing)
tps = 610.882388 (excluding connections establishing)
tps = 624.974076 (excluding connections establishing)
statement latencies in milliseconds:
statement latencies in milliseconds:
1.635022 with t as(select ctid from tbl5 order by ctid desc limit 5) delete from tbl5 where ctid in (select ctid from t) returning *;
1.598122 with t as(select ctid from tbl0 order by ctid desc limit 5) delete from tbl0 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18660
tps = 621.969606 (including connections establishing)
tps = 622.153610 (excluding connections establishing)
statement latencies in milliseconds:
1.605455 with t as(select ctid from tbl7 order by ctid desc limit 5) delete from tbl7 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18625
tps = 620.807135 (including connections establishing)
tps = 621.057183 (excluding connections establishing)
statement latencies in milliseconds:
1.608170 with t as(select ctid from tbl3 order by ctid desc limit 5) delete from tbl3 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of transactions actually processed: 18679
tps = 622.613140 (including connections establishing)
tps = 622.763449 (excluding connections establishing)
statement latencies in milliseconds:
1.603809 with t as(select ctid from tbl4 order by ctid desc limit 5) delete from tbl4 where ctid in (select ctid from t) returning *;
duration: 30 s
number of transactions actually processed: 18480
tps = 615.974889 (including connections establishing)
tps = 616.107203 (excluding connections establishing)
statement latencies in milliseconds:
1.621196 with t as(select ctid from tbl11 order by ctid desc limit 5) delete from tbl11 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18521
tps = 617.330182 (including connections establishing)
tps = 617.443250 (excluding connections establishing)
statement latencies in milliseconds:
1.617626 with t as(select ctid from tbl10 order by ctid desc limit 5) delete from tbl10 where ctid in (select ctid from t) returning *;
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18366
tps = 612.146172 (including connections establishing)
tps = 612.431254 (excluding connections establishing)
statement latencies in milliseconds:
1.631034 with t as(select ctid from tbl9 order by ctid desc limit 5) delete from tbl9 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18738
tps = 624.574018 (including connections establishing)
tps = 624.715406 (excluding connections establishing)
statement latencies in milliseconds:
1.598772 with t as(select ctid from tbl6 order by ctid desc limit 5) delete from tbl6 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18448
tps = 614.888405 (including connections establishing)
tps = 615.075231 (excluding connections establishing)
statement latencies in milliseconds:
1.623808 with t as(select ctid from tbl12 order by ctid desc limit 5) delete from tbl12 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18875
tps = 629.134497 (including connections establishing)
tps = 629.315227 (excluding connections establishing)
statement latencies in milliseconds:
1.587065 with t as(select ctid from tbl2 order by ctid desc limit 5) delete from tbl2 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18449
tps = 614.934670 (including connections establishing)
tps = 615.066184 (excluding connections establishing)
statement latencies in milliseconds:
1.623882 with t as(select ctid from tbl8 order by ctid desc limit 5) delete from tbl8 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18633
tps = 621.077579 (including connections establishing)
tps = 621.244771 (excluding connections establishing)
statement latencies in milliseconds:
1.607780 with t as(select ctid from tbl13 order by ctid desc limit 5) delete from tbl13 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18573
tps = 619.008531 (including connections establishing)
tps = 619.148479 (excluding connections establishing)
statement latencies in milliseconds:
1.613143 with t as(select ctid from tbl1 order by ctid desc limit 5) delete from tbl1 where ctid in (select ctid from t) returning *;
transaction type: Custom query
scaling factor: 1
query mode: prepared
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18717
tps = 623.839093 (including connections establishing)
tps = 623.982969 (excluding connections establishing)
statement latencies in milliseconds:
1.600710 with t as(select ctid from tbl14 order by ctid desc limit 5) delete from tbl14 where ctid in (select ctid from t) returning *;
number of clients: 1
number of threads: 1
duration: 30 s
number of transactions actually processed: 18710
tps = 623.627170 (including connections establishing)
tps = 623.762851 (excluding connections establishing)
statement latencies in milliseconds:
1.601275 with t as(select ctid from tbl15 order by ctid desc limit 5) delete from tbl15 where ctid in (select ctid from t) returning *;
TPS可以达到将近1W.
postgres=# select 623*16;
?column?
----------
9968
(1 row)
为了达到更好的吞吐量, 建议程序一次多取一些, 例如一次取20条, 用完再来取, 那么可以减少和数据库的交互次数, 提高整体性能.
PS:
PostgreSQL 支持ctid扫描方法,因此本例可以不需要对ctid建立索引,方法例子请参考
《块级(ctid)扫描在IoT(物联网)极限写和消费读并存场景的应用》
[《在PostgreSQL中实现update | delete limit - CTID扫描实践》](../201608/20160827_01.md) |