Greenplum 海量数据,大宽表 行存 VS 列存
背景
GPDB支持行存列存,而且相关的选择原理我们之前的文章也有很详细的分析,同时在性能方面也做过较多的评测对比。
但是在大宽表上,相差到底有多悬殊我们之前没有对其进行过对比。主要的差异来自deform和QUERY扫描的存储空间(理论上,数据量越大,同时访问的列越少,列存的优势越明显。或者是访问到的列越靠后,优势越明显)。
一些原理参考
《PostgreSQL VOPS 向量计算 + DBLINK异步并行 - 单实例 10亿 聚合计算跑进2秒》
《PostgreSQL 向量化执行插件(瓦片式实现-vops) 10x提速OLAP》
一些选择参考本文末尾的链接文章。
测试
1、创建一个函数,用于创建1000个列的表(每列类型INT8)
create or replace function crt_tbl(name, text, text) returns void as $$
declare
sql text;
begin
sql := 'create table '||$1||'(';
for i in 1..1000 loop
sql := sql||'c'||i||' int8 default random()*10000,';
end loop;
sql := rtrim(sql, ',');
sql := sql||') with (APPENDONLY=true, ORIENTATION='||$2||', COMPRESSTYPE='||$3||')';
execute sql;
end;
$$ language plpgsql strict;
2、创建行存表
select crt_tbl('tbl_1000_row_nonc', 'row', 'none');
select crt_tbl('tbl_1000_row_c', 'row', 'zlib');
3、创建列存表
select crt_tbl('tbl_1000_column_nonc', 'column', 'none');
select crt_tbl('tbl_1000_column_c', 'column', 'zlib');
4、写入100万行测试数据
postgres=# insert into tbl_1000_row_nonc (c1) select generate_series(1,1000000);
INSERT 0 1000000
Time: 221003.467 ms
postgres=# insert into tbl_1000_row_c select * from tbl_1000_row_nonc;
INSERT 0 1000000
Time: 12298.931 ms
postgres=# insert into tbl_1000_column_nonc select * from tbl_1000_row_nonc;
INSERT 0 1000000
Time: 23332.039 ms
postgres=# insert into tbl_1000_column_c select * from tbl_1000_row_nonc;
INSERT 0 1000000
Time: 17017.119 ms
5、空间对比
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_row_nonc'));
pg_size_pretty
----------------
7653 MB
(1 row)
Time: 282.180 ms
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_column_nonc'));
pg_size_pretty
----------------
7647 MB
(1 row)
Time: 55.315 ms
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_row_c'));
pg_size_pretty
----------------
2522 MB
(1 row)
Time: 56.017 ms
postgres=# select pg_size_pretty(pg_total_relation_size('tbl_1000_column_c'));
pg_size_pretty
----------------
2520 MB
(1 row)
Time: 55.557 ms
6、第一列查询对比
postgres=# explain analyze select c1,count(*) from tbl_1000_row_nonc group by c1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice1; segments: 48) (cost=259714.00..272214.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 352 ms to end, start offset by 1.483 ms.
-> HashAggregate (cost=259714.00..272214.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.005 ms to first row, 101 ms to end, start offset by 63 ms.
-> Append-only Scan on tbl_1000_row_nonc (cost=0.00..254714.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 19 ms to end, start offset by 82 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 353.793 ms
(15 rows)
postgres=# explain analyze select c1,count(*) from tbl_1000_row_c group by c1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice1; segments: 48) (cost=95537.00..108037.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 1128 ms to end, start offset by 1.360 ms.
-> HashAggregate (cost=95537.00..108037.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.003 ms to first row, 699 ms to end, start offset by 42 ms.
-> Append-only Scan on tbl_1000_row_c (cost=0.00..90537.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 13 ms to end, start offset by 42 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 500K bytes avg x 48 workers, 500K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 1130.006 ms
(15 rows)
postgres=# explain analyze select c1,count(*) from tbl_1000_column_nonc group by c1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice1; segments: 48) (cost=259518.00..272018.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 245 ms to end, start offset by 1.365 ms.
-> HashAggregate (cost=259518.00..272018.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.005 ms to first row, 9.968 ms to end, start offset by 60 ms.
-> Append-only Columnar Scan on tbl_1000_column_nonc (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 16 ms to end, start offset by 60 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 821K bytes avg x 48 workers, 821K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 246.967 ms
(15 rows)
postgres=# explain analyze select c1,count(*) from tbl_1000_column_c group by c1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice1; segments: 48) (cost=259518.00..272018.00 rows=1000000 width=16)
Rows out: 1000000 rows at destination with 249 ms to end, start offset by 1.450 ms.
-> HashAggregate (cost=259518.00..272018.00 rows=20834 width=16)
Group By: c1
Rows out: Avg 20833.3 rows x 48 workers. Max 20854 rows (seg42) with 0.004 ms to first row, 8.861 ms to end, start offset by 46 ms.
-> Append-only Columnar Scan on tbl_1000_column_c (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 16 ms to end, start offset by 45 ms.
Slice statistics:
(slice0) Executor memory: 405K bytes.
(slice1) Executor memory: 853K bytes avg x 48 workers, 853K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 250.583 ms
(15 rows)
7、最后一列查询对比
postgres=# explain analyze select c1000,count(*) from tbl_1000_row_nonc group by c1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; segments: 48) (cost=260048.04..260167.34 rows=9544 width=16)
Rows out: 10001 rows at destination with 296 ms to end, start offset by 1.806 ms.
-> HashAggregate (cost=260048.04..260167.34 rows=199 width=16)
Group By: tbl_1000_row_nonc.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.002 ms to first row, 133 ms to end, start offset by 21 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=259714.00..259904.88 rows=199 width=16)
Hash Key: tbl_1000_row_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 141 ms to end, start offset by 21 ms.
-> HashAggregate (cost=259714.00..259714.00 rows=199 width=16)
Group By: tbl_1000_row_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.003 ms to first row, 209 ms to end, start offset by 63 ms.
-> Append-only Scan on tbl_1000_row_nonc (cost=0.00..254714.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 16 ms to end, start offset by 64 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).
(slice2) Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 298.222 ms
(22 rows)
postgres=# explain analyze select c1000,count(*) from tbl_1000_row_c group by c1000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; segments: 48) (cost=95886.65..96011.53 rows=9990 width=16)
Rows out: 10001 rows at destination with 1244 ms to end, start offset by 1.760 ms.
-> HashAggregate (cost=95886.65..96011.53 rows=209 width=16)
Group By: tbl_1000_row_c.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.002 ms to first row, 432 ms to end, start offset by 32 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=95537.00..95736.80 rows=209 width=16)
Hash Key: tbl_1000_row_c.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 755 ms to end, start offset by 57 ms.
-> HashAggregate (cost=95537.00..95537.00 rows=209 width=16)
Group By: tbl_1000_row_c.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.004 ms to first row, 1085 ms to end, start offset by 63 ms.
-> Append-only Scan on tbl_1000_row_c (cost=0.00..90537.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 22 ms to end, start offset by 59 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1818K bytes avg x 48 workers, 1818K bytes max (seg0).
(slice2) Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 1246.276 ms
(22 rows)
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_c group by c1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; segments: 48) (cost=259870.80..259996.80 rows=10080 width=16)
Rows out: 10001 rows at destination with 78 ms to end, start offset by 1.783 ms.
-> HashAggregate (cost=259870.80..259996.80 rows=210 width=16)
Group By: tbl_1000_column_c.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.004 ms to first row, 24 ms to end, start offset by 26 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=259518.00..259719.60 rows=210 width=16)
Hash Key: tbl_1000_column_c.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 31 ms to end, start offset by 23 ms.
-> HashAggregate (cost=259518.00..259518.00 rows=210 width=16)
Group By: tbl_1000_column_c.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.004 ms to first row, 5.962 ms to end, start offset by 42 ms.
-> Append-only Columnar Scan on tbl_1000_column_c (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 28 ms to end, start offset by 30 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1787K bytes avg x 48 workers, 1787K bytes max (seg0).
(slice2) Executor memory: 434K bytes avg x 48 workers, 434K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 80.152 ms
(22 rows)
postgres=# explain analyze select c1000,count(*) from tbl_1000_column_nonc group by c1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 48:1 (slice2; segments: 48) (cost=259856.48..259977.37 rows=9671 width=16)
Rows out: 10001 rows at destination with 77 ms to end, start offset by 2.019 ms.
-> HashAggregate (cost=259856.48..259977.37 rows=202 width=16)
Group By: tbl_1000_column_nonc.c1000
Rows out: Avg 208.4 rows x 48 workers. Max 223 rows (seg17) with 0.003 ms to first row, 26 ms to end, start offset by 35 ms.
-> Redistribute Motion 48:48 (slice1; segments: 48) (cost=259518.00..259711.42 rows=202 width=16)
Hash Key: tbl_1000_column_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers at destination. Max 9360 rows (seg46) with 20 ms to end, start offset by 30 ms.
-> HashAggregate (cost=259518.00..259518.00 rows=202 width=16)
Group By: tbl_1000_column_nonc.c1000
Rows out: Avg 8749.4 rows x 48 workers. Max 8814 rows (seg18) with 0.006 ms to first row, 5.476 ms to end, start offset by 25 ms.
-> Append-only Columnar Scan on tbl_1000_column_nonc (cost=0.00..254518.00 rows=20834 width=8)
Rows out: 0 rows (seg0) with 27 ms to end, start offset by 25 ms.
Slice statistics:
(slice0) Executor memory: 417K bytes.
(slice1) Executor memory: 1755K bytes avg x 48 workers, 1755K bytes max (seg0).
(slice2) Executor memory: 1330K bytes avg x 48 workers, 1330K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: enable_bitmapscan=off; enable_seqscan=off; optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 78.976 ms
(22 rows)
9、更新对比
postgres=# update tbl_1000_row_nonc set c1000=c1000+1;
UPDATE 1000000
Time: 7085.645 ms
postgres=# update tbl_1000_row_c set c1000=c1000+1;
UPDATE 1000000
Time: 6734.279 ms
postgres=# update tbl_1000_column_nonc set c1000=c1000+1;
UPDATE 1000000
Time: 13514.749 ms
postgres=# update tbl_1000_column_c set c1000=c1000+1;
UPDATE 1000000
Time: 10629.104 ms
小结
表|空间|查询第一列|查询最后一列|全表更新
—|—|—|—|—
AO行存不压缩 | 7653 MB | 353.793 ms | 298.222 ms | 7085.645 ms
AO行存压缩 | 2522 MB | 1130.006 ms | 1246.276 ms | 6734.279 ms
AO列存不压缩 | 7647 MB | 246.967 ms | 80.152 ms | 13514.749 ms
AO列存压缩 | 2520 MB | 250.583 ms | 78.976 ms | 10629.104 ms
对于大宽表,查询性能显然列存储要好很多(不管是第一列还是最后一列,性能一致),写入和更新性能列存储略差因为需要操作更多的数据文件。
列存第一列和最后一列性能差异是返回记录数导致的,除了第一列,其他列的取值空间都是10000,第一列是100万。
有兴趣可以再测试一些其他参数,比如压缩级别,块大小,HEAP TABLE:
where storage_parameter is:
APPENDONLY={TRUE|FALSE}
BLOCKSIZE={8192-2097152}
ORIENTATION={COLUMN|ROW}
COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
COMPRESSLEVEL={0-9}
CHECKSUM={TRUE|FALSE}
FILLFACTOR={10-100}
OIDS[=TRUE|FALSE]
参考
《Greenplum 行存、列存,堆表、AO表性能对比 - 阿里云HDB for PostgreSQL最佳实践》
《Greenplum 行存、列存,堆表、AO表的原理和选择》
《Greenplum 最佳实践 - 行存与列存的选择以及转换方法》
《PostgreSQL VOPS 向量计算 + DBLINK异步并行 - 单实例 10亿 聚合计算跑进2秒》
《PostgreSQL 向量化执行插件(瓦片式实现-vops) 10x提速OLAP》