PostgreSQL 9.5 new feature - Support GROUPING SETS, CUBE and ROLLUP.

3 minute read

背景

PostgreSQL 9.5 已实现分组集合特性

+T431   Extended grouping capabilities          YES   
+T432   Nested and concatenated GROUPING SETS           YES   
+T433   Multiargument GROUPING function         YES   

用法参考

1. http://www.postgresql.org/docs/devel/static/queries-table-expressions.html#QUERIES-GROUPING-SETS

2. http://www.postgresql.org/docs/devel/static/functions-aggregate.html

回归测试输出

src/test/regress/expected/groupingsets.out

分组集合实现的是将多个分组查询的结果union all的效果。

例如:

=> SELECT * FROM items_sold;  
 brand | size | sales  
-------+------+-------  
 Foo   | L    |  10  
 Foo   | M    |  20  
 Bar   | M    |  15  
 Bar   | L    |  5  
(4 rows)  
  
=> SELECT brand, size, sum(sales) FROM items_sold GROUP BY GROUPING SETS ((brand), (size), ());  
 brand | size | sum  
-------+------+-----  
 Foo   |      |  30  
 Bar   |      |  20  
       | L    |  15  
       | M    |  35  
       |      |  50  
(5 rows)  

其中GROUP BY GROUPING SETS ((brand), (size), ());

相当于以下三个group by的union all:

group by brand  
group by size  
group by ()  

未在分组的列以NULL代替。

分组集合除了可以用GROUPING SETS来指定,另外还提供了两个特殊的写法rollup和cube.

ROLLUP ( e1, e2, e3, ... )  

代表递减分组,一般用于异构结构的分组如国家,省份,城市,乡镇这样的结构查询。

逐级分组汇总结果,它相当于如下写法:

GROUPING SETS (  
    ( e1, e2, e3, ... ),  
    ...  
    ( e1, e2 )  
    ( e1 )  
    ( )  -- 注意包含全集  
)  

还有一种写法是CUBE

CUBE ( a, b, c )  

cube是任意组合,相当于:

GROUPING SETS (  
    ( a, b, c ),  
    ( a, b    ),  
    ( a,    c ),  
    ( a       ),  
    (    b, c ),  
    (    b    ),  
    (       c ),  
    (         )    -- 注意包含全集  
)  

在cube和rollup中使用括号可以将多个表达式作为单个表达式来处理:

ROLLUP ( a, (b,c), d )  

递减,相当于

GROUPING SETS (  
    ( a, b, c, d ),  
    ( a, b, c    ),  
    ( a          ),  
    (            )  
)  
CUBE ( (a,b), (c,d) )  

相当于:

GROUPING SETS (  
    ( a, b, c, d ),  
    ( a, b       ),  
    (       c, d ),  
    (            )  
)  

同时cube,rollup,grouping sets还可以混合使用:

GROUP BY a, CUBE(b,c), GROUPING SETS ((d), (e))  

相当于:

GROUP BY GROUPING SETS (  
  (a,b,c,d), (a,b,c,e),  
  (a,b,d),   (a,b,e),  
  (a,c,d),   (a,c,e),  
  (a,d),     (a,e)  
)  

既然分组聚合相当于多个group by 的union all,那么我们如何区分当前输出的记录是哪个分组group by的输出呢?

用grouping(cols)可以表示未参与聚合的表达式的比特位,并转换为INT输出。

例如:

=> SELECT * FROM items_sold;  
 make  | model | sales  
-------+-------+-------  
 Foo   | GT    |  10  
 Foo   | Tour  |  20  
 Bar   | City  |  15  
 Bar   | Sport |  5  
(4 rows)  

grouping()中必须包含group by后面的任意或所有列。

=> SELECT make, model, GROUPING(make,model), sum(sales) FROM items_sold GROUP BY ROLLUP(make,model);  
 make  | model | grouping | sum  
-------+-------+----------+-----  
 Foo   | GT    |        0 | 10  
 Foo   | Tour  |        0 | 20  
 Bar   | City  |        0 | 15  
 Bar   | Sport |        0 | 5  
 Foo   |       |        1 | 30  
 Bar   |       |        1 | 20  
       |       |        3 | 50  
(7 rows)  

grouping()中必须包含group by后面的任意或所有列,不能包含未参与聚合的列。

grouping()中的每个表达式用1个比特位表示。

postgres=# create table tbl(c1 int,c2 int,c3 int, c4 int, c5 int);  
CREATE TABLE  
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);  
 c1 | count | grouping   
----+-------+----------  
    |     0 |        7   -- b'111'::int = 7  
(1 row)  
  
postgres=# select c1,count(*),grouping(c1,c2) from tbl group by cube(c1,c2,c3);  
 c1 | count | grouping   
----+-------+----------  
    |     0 |        3  b'11'::int = 3  
(1 row)  
  
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);  
 c1 | count | grouping   
----+-------+----------  
    |     0 |        1  b'1'::int = 1  
(1 row)  
  
postgres=# insert into tbl values (1,2,3,4,5);  
INSERT 0 1  
postgres=# insert into tbl values (1,2,3,4,6);  
INSERT 0 1  
postgres=# insert into tbl values (2,3,4,5,6);  
INSERT 0 1  
postgres=# select c1,count(*),grouping(c2) from tbl group by cube(c1,c2,c3);  
 c1 | count | grouping   
----+-------+----------  
  1 |     2 |        0  
  1 |     2 |        0  
  1 |     2 |        1  
  2 |     1 |        0  
  2 |     1 |        0  
  2 |     1 |        1  
    |     3 |        1  
  1 |     2 |        1  
    |     2 |        1  
  2 |     1 |        1  
    |     1 |        1  
    |     2 |        0  
    |     2 |        0  
    |     1 |        0  
    |     1 |        0  
(15 rows)  

grouping中包含未参与聚合的列将报错:

postgres=# select c1,count(*),grouping(c4) from tbl group by cube(c1,c2,c3);  
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level  
LINE 1: select c1,count(*),grouping(c4) from tbl group by cube(c1,c2...  
                                    ^  
postgres=# select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by cube(c1,c2,c3);  
ERROR:  arguments to GROUPING must be grouping expressions of the associated query level  
LINE 1: select c1,count(*),grouping(c1,c2,c3,c4) from tbl group by c...  
                                             ^  
postgres=# select c1,count(*),grouping(c1,c2,c3) from tbl group by cube(c1,c2,c3);  
 c1 | count | grouping   
----+-------+----------  
  1 |     2 |        0  
  1 |     2 |        1  
  1 |     2 |        3  
  2 |     1 |        0  
  2 |     1 |        1  
  2 |     1 |        3  
    |     3 |        7  
  1 |     2 |        2  
    |     2 |        6  
  2 |     1 |        2  
    |     1 |        6  
    |     2 |        4  
    |     2 |        5  
    |     1 |        4  
    |     1 |        5  
(15 rows)  

参考

1. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f3d3118532175541a9a96ed78881a3b04a057128

2. http://www.postgresql.org/docs/devel/static/queries-table-expressions.html#QUERIES-GROUPING-SETS

3. http://www.postgresql.org/docs/devel/static/functions-aggregate.html

Flag Counter

digoal’s 大量PostgreSQL文章入口