PostgreSQL 11 preview - 分区表智能并行聚合、分组计算(已类似MPP架构,性能暴增)
背景
PostgreSQL 并行计算开始在细节方面进行打磨,例如11已添加了JOIN的分区并行,当两个分区表的分区定义一致时,在分区字段上JOIN就可以用到分区与分区之间直接并行JOIN,而不需要将数据APPEND后在JOIN。
《PostgreSQL 11 preview - 分区表智能并行JOIN (已类似MPP架构,性能暴增)》
现在又一个PATCH要提交了,还是和分区表有关,这次是以分区为单位的聚合、分组计算。
实际上也蛮好理解的,分两种情况,
1、一种是多阶段聚合,原理如下:(常用在MPP数据库,目前PG的单表多阶段并行聚合也是这么做的)
《PostgreSQL 10 自定义并行计算聚合函数的原理与实践 - (含array_agg合并多个数组为单个一元数组的例子)》
2、第二种是智能分区并行聚合,这种情况与第一种不矛盾,但是在非多阶段聚合时也可能能起到优化作用,比如说根据分区字段进行分组聚合,那么分区与分区之间实际上是没有计算集合的重叠的,各自算各自的。
Hi all,
Declarative partitioning is supported in PostgreSQL 10 and work is already
in
progress to support partition-wise joins. Here is a proposal for
partition-wise
aggregation/grouping. Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15%
when
partitions are local.
Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition. Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce
the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.
Attached an experimental patch for the same based on the partition-wise join
patches posted in [1].
This patch currently implements partition-wise aggregation when group clause
contains the partitioning key. A query below, involving a partitioned table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7
times
improvement when the partitions were located on the foreign servers.
Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.
In order to support partial aggregation on foreign partitions, we need
support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.
Though is lot of work to be done, I would like to get suggestions/opinions
from
hackers.
I would like to thank Ashutosh Bapat for providing a draft patch and helping
me off-list on this feature while he is busy working on partition-wise join
feature.
[1]
https://www.postgresql.org/message-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR%3DPbXj1e%3D9a%3DqMoQ%40mail.gmail.com
Thanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
这里有两个例子:
开启分区智能聚合,各个分区聚合后再APPEND结果。
Here is the sample plan:
postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
Append (cost=5100.00..61518.90 rows=30 width=12) (actual
time=324.837..944.804 rows=30 loops=1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=324.837..324.838 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p1 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=309.954..309.956 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p2 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=310.002..310.004 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370 ms
Execution time: 945.384 ms
(9 rows)
关闭分区智能聚合,需要将数据汇聚后,再进行聚合操作。
postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=121518.01..121518.31 rows=30 width=12) (actual
time=6498.452..6498.459 rows=30 loops=1)
Group Key: plt1.a
-> Append (cost=0.00..106518.00 rows=3000001 width=4) (actual
time=0.595..5769.592 rows=3000000 loops=1)
-> Seq Scan on plt1 (cost=0.00..0.00 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=1)
-> Foreign Scan on fplt1_p1 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.587..1844.506 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p2 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.384..1839.633 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p3 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.402..1876.505 rows=1000000 loops=1)
Planning time: 0.251 ms
Execution time: 6499.018 ms
(9 rows)
场景
1、枚举、哈希分区表:
因为在不同分区中,分区字段值非常鲜明分离。可以按分区字段聚合,按分区字段GROUP BY。
2、范围分区表:
某些按分区字段GROUP BY(看GROUP BY的表达式是否有优化空间)
3、继承表,根据子表约束,选择是否可以使用分区智能聚合、GROUP。
参考
https://www.postgresql.org/message-id/flat/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com#CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
https://commitfest.postgresql.org/17/1250/