PostgreSQL 10.0 preview 性能增强 - hashed aggregation with grouping sets(多维分析)更快,更省内存

1 minute read

背景

grouping sets 是多维分析语法,PostgreSQL 从9.5开始支持这种语法,常被用于OLAP系统,数据透视等应用场景。

《PostgreSQL 9.5 new feature - Support GROUPING SETS, CUBE and ROLLUP.》

由于多维分析的一个QUERY涉及多个GROUP,所以如果使用hash agg的话,需要多个HASH table,并行计算. 9.5, 9.6的时候,还不支持一个QUERY使用多个HASH TABLE并行计算。

10.0 扩展了聚合NODE,支持hashAggregate并行开多个hashtable,以及MixedAggregate策略用于sort grouping时哈希表的数据倒腾。

使用时对用户完全透明,同时优化器在使用hash agg, multi hashtable,时,会尽量的减少重复SORT。

总而言之,grouping set多维分析会更快(即使包含排序),更省内存。

Support hashed aggregation with grouping sets.    
    
This extends the Aggregate node with two new features:     
HashAggregate can now run multiple hashtables concurrently,     
and a new strategy MixedAggregate populates hashtables while doing sorted grouping.    
    
The planner will now attempt to save as many sorts as possible when    
planning grouping sets queries, while not exceeding work_mem for the    
estimated combined sizes of all hashtables used.  No SQL-level changes    
are required.  There should be no user-visible impact other than the    
new EXPLAIN output and possible changes to result ordering when ORDER    
BY was not used (which affected a few regression tests).  The    
enable_hashagg option is respected.    
    
Author: Andrew Gierth    
Reviewers: Mark Dilger, Andres Freund    
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk    
    

例子

+explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)        
+  from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;        
+                                               QUERY PLAN                                                       
+--------------------------------------------------------------------------------------------------------        
+ Sort        
+   Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), "*VALUES*".column1, "*VALUES*".column2        
+   ->  HashAggregate        
+         Hash Key: "*VALUES*".column1        
+         Hash Key: "*VALUES*".column2        
+         ->  Values Scan on "*VALUES*"        
+(6 rows)       

这个patch的讨论,详见邮件组,本文末尾URL。

PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。

参考

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b5635948ab165b6070e7d05d111f966e07570d81

Flag Counter

digoal’s 大量PostgreSQL文章入口