PostgreSQL aggregate function 1 : General-Purpose Aggregate Functions
背景
PostgreSQL支持较多的聚合函数, 以PostgreSQL 9.4为例, 支持例如一般性的聚合, 统计学科的聚合, 排序集聚合, 假象集聚合等.
本文将对一般性聚合函数举例说明其功能和用法.
以下图表参考 :
http://www.postgresql.org/docs/9.4/static/functions-aggregate.html
Function | Argument Type(s) | Return Type | Description |
---|---|---|---|
array_agg(expression) | any | array of the argument type | input values, including nulls, concatenated into an array |
avg(expression) | smallint, int, bigint, real, double precision, numeric, or interval | numeric for any integer-type argument, double precision for a floating-point argument, otherwise the same as the argument data type | the average (arithmetic mean) of all input values |
bit_and(expression) | smallint, int, bigint, or bit | same as argument data type | the bitwise AND of all non-null input values, or null if none |
bit_or(expression) | smallint, int, bigint, or bit | same as argument data type | the bitwise OR of all non-null input values, or null if none |
bool_and(expression) | bool | bool | true if all input values are true, otherwise false |
bool_or(expression) | bool | bool | true if at least one input value is true, otherwise false |
count(*) | - | bigint | number of input rows |
count(expression) | any | bigint | number of input rows for which the value of expression is not null |
every(expression) | bool | bool | equivalent to bool_and |
json_agg(expression) | any | json | aggregates values as a JSON array |
json_object_agg(name,value) | (any, any) | json | aggregates name/value pairs as a JSON object |
max(expression) | any array, numeric, string, or date/time type | same as argument type | maximum value of expression across all input values |
min(expression) | any array, numeric, string, or date/time type | same as argument type | minimum value of expression across all input values |
string_agg(expression,delimiter) | (text, text) or (bytea, bytea) | same as argument types | input values concatenated into a string, separated by delimiter |
sum(expression) | smallint, int, bigint, real, double precision, numeric, interval, or money | bigint for smallint or int arguments, numeric for bigint arguments, otherwise the same as the argument data type | sum of expression across all input values |
xmlagg(expression) | xml | xml | concatenation of XML values (see alsoSection 9.14.1.7) |
上图中所有聚合函数, 当没有行输入时, 除了count返回0, 其他都返回null.
使用sum, array_agg时, 当没有行输入, 返回NULL可能有点别扭, 那么你可以使用coalesce来替代NULL, 如coalesce(sum(x), 0)
coalesce(array_agg(x), ‘{}’::int[])
例子 :
聚合后得到数组, null将计入数组元素
postgres=# select array_agg(id) from (values(null),(1),(2)) as t(id);
array_agg
------------
{NULL,1,2}
(1 row)
算平均值时不计算null
postgres=# select avg(id) from (values(null),(1),(2)) as t(id);
avg
--------------------
1.5000000000000000
(1 row)
算bit与 | 或 时也不计算NULL |
postgres=# select bit_and(id) from (values(null),(1),(2)) as t(id);
bit_and
---------
0
(1 row)
postgres=# select bit_or(id) from (values(null),(1),(2)) as t(id);
bit_or
--------
3
(1 row)
算布尔逻辑时也不计算NULL
postgres=# select bool_and(id) from (values(null),(true),(false)) as t(id);
bool_and
----------
f
(1 row)
every是bool_and的别名, 实际上是SQL标准中定义的.
postgres=# select every(id) from (values(null),(true),(false)) as t(id);
every
-------
f
(1 row)
SQL标准中还定义了any和some为bool_or的别名, 但是因为any和some还可以被解释为子查询, 所以在PostgreSQL中any和some的布尔逻辑聚合不可用.
postgres=# select any(id) from (values(null),(true),(false)) as t(id);
ERROR: syntax error at or near "any"
LINE 1: select any(id) from (values(null),(true),(false)) as t(id);
^
postgres=# select some(id) from (values(null),(true),(false)) as t(id);
ERROR: syntax error at or near "some"
LINE 1: select some(id) from (values(null),(true),(false)) as t(id);
^
bool_or的例子
postgres=# select bool_or(id) from (values(null),(true),(false)) as t(id);
bool_or
---------
t
(1 row)
计算非空的表达式个数, count带表达式时, 不计算null
postgres=# select count(id) from (values(null),(1),(2)) as t(id);
count
-------
2
(1 row)
计算表达式(含空值)的个数, count()计算null, 注意count()是一个独立的聚合函数. 请和count(express)区分开来.
postgres=# select count(*) from (values(null),(1),(2)) as t(id);
count
-------
3
(1 row)
postgres=# select count(*) from (values(null),(null),(1),(2)) as t(id);
count
-------
4
(1 row)
聚合后得到json, 不带key的json聚合
postgres=# select json_agg(id) from (values(null),(true),(false)) as t(id);
json_agg
---------------------
[null, true, false]
(1 row)
聚合后得到json, 带key的json聚合, 注意key不能为null, 否则报错.
postgres=# select json_object_agg(c1,c2) from (values('a',null),('b',true),('c',false)) as t(c1,c2);
json_object_agg
-----------------------------------------
{ "a" : null, "b" : true, "c" : false }
(1 row)
postgres=# select json_object_agg(c1,c2) from (values(null,null),('b',true),('c',false)) as t(c1,c2);
ERROR: 22023: field name must not be null
LOCATION: json_object_agg_transfn, json.c:1959
计算最大最小值, max, min都不计算null
postgres=# select max(id) from (values(null),(1),(2)) as t(id);
max
-----
2
(1 row)
postgres=# select min(id) from (values(null),(1),(2)) as t(id);
min
-----
1
(1 row)
聚合后得到字符串, 字符串聚合
postgres=# select string_agg(c1,'***') from (values('a',null),('b',true),('c',false)) as t(c1,c2);
string_agg
------------
a***b***c
(1 row)
postgres=# select string_agg(id,'***') from (values(null),('digoal'),('zhou')) as t(id);
string_agg
---------------
digoal***zhou
(1 row)
计算总和, sum不计算null, 当所有行都是null时, 即没有任何行输入, 返回null.
postgres=# select sum(id) from (values(null),(1),(2)) as t(id);
sum
-----
3
(1 row)
postgres=# select sum(id::int) from (values(null),(null),(null)) as t(id);
sum
-----
(1 row)
聚合后得到xml
postgres=# select xmlagg(id::xml) from (values(null),('<foo>digoal</foo>'),('<bar/>')) as t(id);
xmlagg
-------------------------
<foo>digoal</foo><bar/>
(1 row)
某些聚合函数得到的结果可能和行的输入顺序有关, 例如array_agg, json_agg, json_object_agg, string_agg, and xmlagg, 以及某些自定义聚合函数. 如何来实现呢?
支持聚合函数中使用order by的PostgreSQL版本可以用如下语法 :
postgres=# select string_agg(id,'***' order by id) from (values(null),('digoal'),('zhou')) as t(id);
string_agg
---------------
digoal***zhou
(1 row)
postgres=# select string_agg(id,'***' order by id desc) from (values(null),('digoal'),('zhou')) as t(id);
string_agg
---------------
zhou***digoal
(1 row)
不支持聚合函数中使用order by的PostgreSQL版本, 可以用如下语法 :
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
postgres=# select string_agg(id,'***') from (select id from (values(null),('digoal'),('zhou')) as t(id) order by id desc) t;
string_agg
---------------
zhou***digoal
(1 row)
参考
1. http://www.postgresql.org/docs/9.4/static/functions-aggregate.html
2. http://www.postgresql.org/docs/9.4/static/functions-xml.html
3. src/backend/utils/adt
这些函数的代码在src/backend/utils/adt这里可以查询到, 对应各自的类型.