PostgreSQL sharding : citus 系列6 - count(distinct xx) 加速 (use 估值插件 hll hyperloglog)
背景
在分布式数据库中,计算count(distinct xxx),需要对distinct 的字段,
1、去重,
2、重分布去重后的数据,(这一步,如果distinct值特别多,那么就会比较耗时)
3、然后再去重,
4、最后count (xxx),
5、求所有节点的count SUM。
例如,以下是Greenplum的执行计划例子
postgres=# explain analyze select count(distinct c_acctbal) from customer;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=182242.41..182242.42 rows=1 width=8)
Rows out: 1 rows with 0.006 ms to first row, 69 ms to end, start offset by 23 ms.
-> Gather Motion 16:1 (slice2; segments: 16) (cost=53392.85..173982.82 rows=660767 width=8)
Rows out: 818834 rows at destination with 3.416 ms to first row, 447 ms to end, start offset by 23 ms.
-> HashAggregate (cost=53392.85..61652.43 rows=41298 width=8)
Group By: customer.c_acctbal
Rows out: Avg 51177.1 rows x 16 workers. Max 51362 rows (seg3) with 0.004 ms to first row, 33 ms to end, start offset by 25 ms.
-> Redistribute Motion 16:16 (slice1; segments: 16) (cost=30266.00..43481.34 rows=41298 width=8)
Hash Key: customer.c_acctbal
Rows out: Avg 89865.6 rows x 16 workers at destination. Max 90305 rows (seg3) with 18 ms to first row, 120 ms to end, start offset by 25 ms.
-> HashAggregate (cost=30266.00..30266.00 rows=41298 width=8)
Group By: customer.c_acctbal
Rows out: Avg 89865.6 rows x 16 workers. Max 89929 rows (seg2) with 0.007 ms to first row, 33 ms to end, start offset by 26 ms.
-> Append-only Columnar Scan on customer (cost=0.00..22766.00 rows=93750 width=8)
Rows out: Avg 93750.0 rows x 16 workers. Max 93751 rows (seg4) with 20 ms to first row, 30 ms to end, start offset by 26 ms.
Slice statistics:
(slice0) Executor memory: 387K bytes.
(slice1) Executor memory: 6527K bytes avg x 16 workers, 6527K bytes max (seg0).
(slice2) Executor memory: 371K bytes avg x 16 workers, 371K bytes max (seg0).
Statement statistics:
Memory used: 1280000K bytes
Optimizer status: legacy query optimizer
Total runtime: 723.143 ms
(23 rows)
以下是citus的例子
postgres=# explain analyze select count(distinct bid) from pgbench_accounts ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=0.00..0.00 rows=0 width=0) (actual time=31.748..31.749 rows=1 loops=1)
-> Custom Scan (Citus Real-Time) (cost=0.00..0.00 rows=0 width=0) (actual time=31.382..31.510 rows=1280 loops=1)
Task Count: 128
Tasks Shown: One of 128
-> Task
Node: host=172.24.211.224 port=1921 dbname=postgres
-> HashAggregate (cost=231.85..231.95 rows=10 width=4) (actual time=3.700..3.702 rows=10 loops=1)
Group Key: bid
-> Seq Scan on pgbench_accounts_106812 pgbench_accounts (cost=0.00..212.48 rows=7748 width=4) (actual time=0.017..2.180 rows=7748 loops=1)
Planning time: 0.445 ms
Execution time: 3.781 ms
Planning time: 1.399 ms
Execution time: 32.159 ms
(13 rows)
对于可估值计算的场景,即不需要精确distinct值的场景,PostgreSQL提供了一个名为hll的插件,可以用来估算distinct元素个数。
citus 结合hll,可以实现超高速的count(distinct xxx),即使distinct值非常非常多,也不慢。
SET citus.count_distinct_error_rate to 0.005;
0.005表示失真度
hll加速citus count(distinct xxx)使用举例
部署
1、所有节点(coordinator 与 worker节点),安装hll软件
yum install -y gcc-c++
cd ~/
git clone https://github.com/citusdata/postgresql-hll
cd postgresql-hll
. /var/lib/pgsql/.bash_profile
USE_PGXS=1 make
USE_PGXS=1 make install
2、所有节点(coordinator 与 worker节点),在需要用到HLL的DB中增加插件
su - postgres -c "psql -d postgres -c 'create extension hll;'"
su - postgres -c "psql -d newdb -c 'create extension hll;'"
使用举例
1、创建测试表,128 shard
create table test (id int primary key, a int, b int, c int);
set citus.shard_count =128;
select create_distributed_table('test', 'id');
2、写入10亿测试数据,a字段10唯一值,b字段100唯一值,c字段100万唯一值
insert into test select id, random()*9, random()*99, random()*999999 from generate_series(1,1000000000) t(id);
3、(coordinator节点)设置全局或当前会话级参数,指定失真度,越小失真度越小
SET citus.count_distinct_error_rate to 0.005;
newdb=# explain select count(distinct bid) from pgbench_accounts group by bid;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=0.00..0.00 rows=0 width=0)
Group Key: remote_scan.worker_column_2
-> Custom Scan (Citus Real-Time) (cost=0.00..0.00 rows=0 width=0)
Task Count: 128
Tasks Shown: One of 128
-> Task
Node: host=172.24.211.224 port=8001 dbname=newdb
-> GroupAggregate (cost=97272.79..105102.29 rows=1000 width=36)
Group Key: bid
-> Sort (cost=97272.79..99227.04 rows=781700 width=4)
Sort Key: bid
-> Seq Scan on pgbench_accounts_102008 pgbench_accounts (cost=0.00..20759.00 rows=781700 width=4)
(12 rows)
4、对比是否使用HLL加速(少量唯一值,HLL没有性能提升,因为本身就不存在瓶颈)
4.1、未使用hll
newdb=# set citus.count_distinct_error_rate to 0;
newdb=# select count(distinct bid) from pgbench_accounts;
count
-------
1000
(1 row)
Time: 423.364 ms
postgres=# set citus.count_distinct_error_rate to 0;
postgres=# select count(distinct a) from test;
count
-------
10
(1 row)
Time: 2392.709 ms (00:02.393)
4.2、使用hll
newdb=# set citus.count_distinct_error_rate to 0.005;
newdb=# select count(distinct bid) from pgbench_accounts;
count
-------
1000
(1 row)
Time: 444.287 ms
postgres=# set citus.count_distinct_error_rate to 0.005;
postgres=# select count(distinct a) from test;
count
-------
10
(1 row)
Time: 2375.473 ms (00:02.375)
5、对比是否使用HLL加速(大量唯一值,HLL性能提升显著)
5.1、未使用hll
postgres=# set citus.count_distinct_error_rate to 0;
count
----------
10000000
(1 row)
Time: 5826241.205 ms (01:37:06.241)
128个节点,每个节点最多发送10亿/128条数据给coordinator,慢是可以理解的。另一方面,coordinator可以边接收边去重(postgresql 11增加了parallel gather, merge sort等能力,citus coordinator可以借鉴),没必要等所有数据都收完再去重。
5.2、使用hll
postgres=# set citus.count_distinct_error_rate to 0.005;
postgres=# select count(distinct (a,c)) from test;
count
---------
9999995
(1 row)
Time: 4468.749 ms (00:04.469)
6、设置不同的精度参数,性能对比
newdb=# set citus.count_distinct_error_rate to 0.1;
newdb=# select count(distinct (aid,bid)) from pgbench_accounts ;
count
----------
94778491
(1 row)
Time: 545.301 ms
newdb=# set citus.count_distinct_error_rate to 0.01;
newdb=# select count(distinct (aid,bid)) from pgbench_accounts ;
count
-----------
100293937
(1 row)
Time: 554.333 ms
-- 推荐设置0.005
newdb=# set citus.count_distinct_error_rate to 0.005;
newdb=# select count(distinct (aid,bid)) from pgbench_accounts ;
count
-----------
100136086
(1 row)
Time: 1053.070 ms (00:01.053)
newdb=# set citus.count_distinct_error_rate to 0.001;
newdb=# select count(distinct (aid,bid)) from pgbench_accounts ;
count
-----------
100422107
(1 row)
Time: 9287.934 ms (00:09.288)
小结
hll是应用广泛的PostgreSQL估值插件。
使用hll,大幅提升了citus count(disinct xxx)的性能(特别当distinct结果集很大时,hll大幅降低了重分布开销,性能提升非常明显(本例1000万唯一值,耗时5826秒降低到了4秒))。
唯一值精度可通过参数citus.count_distinct_error_rate
进行设置。
参考
《Greenplum 最佳实践 - 估值插件hll的使用(以及hll分式聚合函数优化)》
《PostgreSQL hll (HyperLogLog) extension for “State of The Art Cardinality Estimation Algorithm” - 3》
《PostgreSQL hll (HyperLogLog) extension for “State of The Art Cardinality Estimation Algorithm” - 2》
《PostgreSQL hll (HyperLogLog) extension for “State of The Art Cardinality Estimation Algorithm” - 1》
《PostgreSQL count-min sketch top-n 概率计算插件 cms_topn (结合窗口实现同比、环比、滑窗分析等) - 流计算核心功能之一》
https://github.com/citusdata/postgresql-hll
https://github.com/citusdata/postgresql-topn
https://docs.citusdata.com/en/v7.5/develop/reference_sql.html