PostgreSQL 11 preview - SQL:2011 window frame clause全面支持 及 窗口、帧用法和业务场景介绍

19 minute read

背景

PostgreSQL 在2009年发布的8.4版本中,就已经支持了window语法,frame clause有部分未支持。

PostgreSQL 11将全面覆盖SQL:2011的窗口语法。

提交的PATCH介绍如下:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a459cec96d3856f476c2db298c6b52f592894e8

Support all SQL:2011 options for window frame clauses.    
    
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"    
frame boundaries in window functions.  We'd punted on that back in the    
original patch to add window functions, because it was not clear how to    
do it in a reasonably data-type-extensible fashion.  That problem is    
resolved here by adding the ability for btree operator classes to provide    
an "in_range" support function that defines how to add or subtract the    
RANGE offset value.  Factoring it this way also allows the operator class    
to avoid overflow problems near the ends of the datatype's range, if it    
wishes to expend effort on that.  (In the committed patch, the integer    
opclasses handle that issue, but it did not seem worth the trouble to    
avoid overflow failures for datetime types.)    
    
The patch includes in_range support for the integer_ops opfamily    
(int2/int4/int8) as well as the standard datetime types.  Support for    
other numeric types has been requested, but that seems like suitable    
material for a follow-on patch.    
    
In addition, the patch adds GROUPS mode which counts the offset in    
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion    
options specified by SQL:2011.  As far as I can see, we are now fully    
up to spec on window framing options.    
    
Existing behaviors remain unchanged, except that I changed the errcode    
for a couple of existing error reports to meet the SQL spec's expectation    
that negative "offset" values should be reported as SQLSTATE 22013.    
    
Internally and in relevant parts of the documentation, we now consistently    
use the terminology "offset PRECEDING/FOLLOWING" rather than "value    
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.    
    
Oliver Ford, reviewed and whacked around some by me    
    
Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com    

部署PG 11并测试新增的功能

1、下载PG 11

https://www.postgresql.org/ftp/snapshot/dev/

2、安装、初始化、启动数据库(略)

3、测试新增功能用到的SQL如下(回归测试SQL)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blobdiff;f=src/test/regress/expected/window.out;h=b675487729b7a51ac2f668fc4668c9d6817d2856;hp=19f909f3d105087c2babcfa62ff3a77d442a3b03;hb=0a459cec96d3856f476c2db298c6b52f592894e8;hpb=23209457314f6fd89fcd251a8173b0129aaa95a2

我们也可以自己测试,新增一张测试表

create table t(    
  sid int,     -- 学号    
  sub int2,    -- 科目    
  des text,    -- 科目描述    
  score float4    -- 分数    
);    

写入测试数据

insert into t values     
  (1, 1, '语文', 81),    
  (2, 1, '语文', 71),    
  (3, 1, '语文', 99),    
  (4, 1, '语文', 100),    
  (5, 1, '语文', 32),    
  (6, 1, '语文', 89),    
  (7, 1, '语文', 90.5),    
  (1, 2, '数学', 81),    
  (2, 2, '数学', 96),    
  (3, 2, '数学', 78),    
  (4, 2, '数学', 90),    
  (5, 2, '数学', 12),    
  (6, 2, '数学', 97),    
  (7, 2, '数学', 99.5),    
  (1, 3, '英语', 100),    
  (2, 3, '英语', 95),    
  (3, 3, '英语', 95),    
  (4, 3, '英语', 95),    
  (5, 3, '英语', 56),    
  (6, 3, '英语', 87),    
  (7, 3, '英语', 93),    
  (1, 4, '物理', 60),    
  (2, 4, '物理', 60),    
  (3, 4, '物理', 65),    
  (4, 4, '物理', 65),    
  (5, 4, '物理', 65),    
  (6, 4, '物理', 71),    
  (7, 4, '物理', 71)    
;    

发现一例BUG,已上报。(5小时后,社区修复了这个BUG)。当使用range between and时,如果同时使用了partition by, range好像没有起作用(而是使用了整个window)。

正常情况下,应该是在window的分组内,range会控制对应的frame。

以下为BUG版的输出。

select     
  *,     
  sum(score) over w1,     
  avg(score) over w1,     
  first_value(score) over w1,     
  last_value(score) over w1     
from t     
  window w1 as (partition by sub order by score::int range between 1 preceding and 6 following)     
order by sub, score;    
    
    
 sid | sub | des  | score | sum |         avg         | first_value | last_value     
-----+-----+------+-------+-----+---------------------+-------------+------------    
   5 |   1 | 语文 |    32 | 562 | 80.2857142857142857 |          32 |        100    
   2 |   1 | 语文 |    71 | 562 | 80.2857142857142857 |          32 |        100    
   1 |   1 | 语文 |    81 | 562 | 80.2857142857142857 |          32 |        100    
   6 |   1 | 语文 |    89 | 562 | 80.2857142857142857 |          32 |        100    
   7 |   1 | 语文 |    90 | 562 | 80.2857142857142857 |          32 |        100    
   3 |   1 | 语文 |    99 | 562 | 80.2857142857142857 |          32 |        100    
   4 |   1 | 语文 |   100 | 562 | 80.2857142857142857 |          32 |        100    
   5 |   2 | 数学 |    12 | 554 | 79.1428571428571429 |          12 |        100    
   3 |   2 | 数学 |    78 | 554 | 79.1428571428571429 |          12 |        100    
   1 |   2 | 数学 |    81 | 554 | 79.1428571428571429 |          12 |        100    
   4 |   2 | 数学 |    90 | 554 | 79.1428571428571429 |          12 |        100    
   2 |   2 | 数学 |    96 | 554 | 79.1428571428571429 |          12 |        100    
   6 |   2 | 数学 |    97 | 554 | 79.1428571428571429 |          12 |        100    
   7 |   2 | 数学 |   100 | 554 | 79.1428571428571429 |          12 |        100    
   5 |   3 | 英语 |    56 | 621 | 88.7142857142857143 |          56 |        100    
   6 |   3 | 英语 |    87 | 621 | 88.7142857142857143 |          56 |        100    
   7 |   3 | 英语 |    93 | 621 | 88.7142857142857143 |          56 |        100    
   3 |   3 | 英语 |    95 | 621 | 88.7142857142857143 |          56 |        100    
   4 |   3 | 英语 |    95 | 621 | 88.7142857142857143 |          56 |        100    
   2 |   3 | 英语 |    95 | 621 | 88.7142857142857143 |          56 |        100    
   1 |   3 | 英语 |   100 | 621 | 88.7142857142857143 |          56 |        100    
   1 |   4 | 物理 |    60 | 457 | 65.2857142857142857 |          60 |         71    
   2 |   4 | 物理 |    60 | 457 | 65.2857142857142857 |          60 |         71    
   3 |   4 | 物理 |    65 | 457 | 65.2857142857142857 |          60 |         71    
   4 |   4 | 物理 |    65 | 457 | 65.2857142857142857 |          60 |         71    
   5 |   4 | 物理 |    65 | 457 | 65.2857142857142857 |          60 |         71    
   6 |   4 | 物理 |    71 | 457 | 65.2857142857142857 |          60 |         71    
   7 |   4 | 物理 |    71 | 457 | 65.2857142857142857 |          60 |         71    
(28 rows)    

正确结果如下:

 sid | sub | des  | score | sum |         avg          | first_value | last_value   
-----+-----+------+-------+-----+----------------------+-------------+------------  
   5 |   1 | 语文 |    32 |  32 |  32.0000000000000000 |          32 |         32  
   2 |   1 | 语文 |    71 |  71 |  71.0000000000000000 |          71 |         71  
   1 |   1 | 语文 |    81 |  81 |  81.0000000000000000 |          81 |         81  
   6 |   1 | 语文 |    89 | 179 |  89.5000000000000000 |          89 |         90  -- 帧框到了多条记录  
   7 |   1 | 语文 |    90 | 179 |  89.5000000000000000 |          89 |         90  
   3 |   1 | 语文 |    99 | 199 |  99.5000000000000000 |          99 |        100  
   4 |   1 | 语文 |   100 | 199 |  99.5000000000000000 |          99 |        100  
   5 |   2 | 数学 |    12 |  12 |  12.0000000000000000 |          12 |         12  
   3 |   2 | 数学 |    78 | 159 |  79.5000000000000000 |          78 |         81  
   1 |   2 | 数学 |    81 |  81 |  81.0000000000000000 |          81 |         81  
   4 |   2 | 数学 |    90 | 186 |  93.0000000000000000 |          90 |         96  
   2 |   2 | 数学 |    96 | 293 |  97.6666666666666667 |          96 |        100  
   6 |   2 | 数学 |    97 | 293 |  97.6666666666666667 |          96 |        100  
   7 |   2 | 数学 |   100 | 100 | 100.0000000000000000 |         100 |        100  
   5 |   3 | 英语 |    56 |  56 |  56.0000000000000000 |          56 |         56  
   6 |   3 | 英语 |    87 | 180 |  90.0000000000000000 |          87 |         93  
   7 |   3 | 英语 |    93 | 378 |  94.5000000000000000 |          93 |         95  
   3 |   3 | 英语 |    95 | 385 |  96.2500000000000000 |          95 |        100  
   4 |   3 | 英语 |    95 | 385 |  96.2500000000000000 |          95 |        100  
   2 |   3 | 英语 |    95 | 385 |  96.2500000000000000 |          95 |        100  
   1 |   3 | 英语 |   100 | 100 | 100.0000000000000000 |         100 |        100  
   1 |   4 | 物理 |    60 | 315 |  63.0000000000000000 |          60 |         65 -- range以peer group最大边界为帧范围  
   2 |   4 | 物理 |    60 | 315 |  63.0000000000000000 |          60 |         65 -- (60*2+65*3)/5 = 63  
   3 |   4 | 物理 |    65 | 337 |  67.4000000000000000 |          65 |         71  
   4 |   4 | 物理 |    65 | 337 |  67.4000000000000000 |          65 |         71  
   5 |   4 | 物理 |    65 | 337 |  67.4000000000000000 |          65 |         71  
   6 |   4 | 物理 |    71 | 142 |  71.0000000000000000 |          71 |         71  
   7 |   4 | 物理 |    71 | 142 |  71.0000000000000000 |          71 |         71  
(28 rows)  

当不使用partition by时,range控制的frame范围起作用了。

select     
  *,     
  sum(score) over w1,     
  avg(score) over w1,     
  first_value(score) over w1,     
  last_value(score) over w1     
from t     
  window w1 as (order by score::int range between 1 preceding and 6 following)     
order by score;    
    
 sid | sub | des  | score | sum |         avg         | first_value | last_value     
-----+-----+------+-------+-----+---------------------+-------------+------------    
   5 |   2 | 数学 |    12 |  12 | 12.0000000000000000 |          12 |         12    
   5 |   1 | 语文 |    32 |  32 | 32.0000000000000000 |          32 |         32    
   5 |   3 | 英语 |    56 | 176 | 58.6666666666666667 |          56 |         60    
   1 |   4 | 物理 |    60 | 315 | 63.0000000000000000 |          60 |         65    
   2 |   4 | 物理 |    60 | 315 | 63.0000000000000000 |          60 |         65    
   3 |   4 | 物理 |    65 | 408 | 68.0000000000000000 |          65 |         71    
   4 |   4 | 物理 |    65 | 408 | 68.0000000000000000 |          65 |         71    
   5 |   4 | 物理 |    65 | 408 | 68.0000000000000000 |          65 |         71    
   6 |   4 | 物理 |    71 | 213 | 71.0000000000000000 |          71 |         71    
   7 |   4 | 物理 |    71 | 213 | 71.0000000000000000 |          71 |         71    
   2 |   1 | 语文 |    71 | 213 | 71.0000000000000000 |          71 |         71    
   3 |   2 | 数学 |    78 | 240 | 80.0000000000000000 |          78 |         81    
   1 |   1 | 语文 |    81 | 249 | 83.0000000000000000 |          81 |         87    
   1 |   2 | 数学 |    81 | 249 | 83.0000000000000000 |          81 |         87    
   6 |   3 | 英语 |    87 | 449 | 89.8000000000000000 |          87 |         93    
   6 |   1 | 语文 |    89 | 647 | 92.4285714285714286 |          89 |         95    
   7 |   1 | 语文 |    90 | 743 | 92.8750000000000000 |          89 |         96    
   4 |   2 | 数学 |    90 | 743 | 92.8750000000000000 |          89 |         96    
   7 |   3 | 英语 |    93 | 670 | 95.7142857142857143 |          93 |         99    
   3 |   3 | 英语 |    95 | 877 | 97.4444444444444444 |          95 |        100    
   2 |   3 | 英语 |    95 | 877 | 97.4444444444444444 |          95 |        100    
   4 |   3 | 英语 |    95 | 877 | 97.4444444444444444 |          95 |        100    
   2 |   2 | 数学 |    96 | 877 | 97.4444444444444444 |          95 |        100    
   6 |   2 | 数学 |    97 | 592 | 98.6666666666666667 |          96 |        100    
   3 |   1 | 语文 |    99 | 399 | 99.7500000000000000 |          99 |        100    
   7 |   2 | 数学 |   100 | 399 | 99.7500000000000000 |          99 |        100    
   4 |   1 | 语文 |   100 | 399 | 99.7500000000000000 |          99 |        100    
   1 |   3 | 英语 |   100 | 399 | 99.7500000000000000 |          99 |        100    
(28 rows)    

社区力量,用户福音,5小时修复BUG

社区响应用户提交BUG,修复BUG的速度很给力,PostgreSQL社区给予用户强大的信心。

如Bruce momjian说的,一个用户提交的BUG一般在12小时内社区会响应。

附上提交BUG的方法:

https://www.postgresql.org/support/

https://www.postgresql.org/account/submitbug/

pic

提交BUG:  
From:	PG Bug reporting form <noreply(at)postgresql(dot)org>  
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org  
Cc:	digoal(at)126(dot)com  
Subject:	BUG #15082: PostgreSQL 11 dev bug with window range partition by QUERY  
Date:	2018-02-23 15:16:39  
  
修复:  
From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>  
To:	digoal(at)126(dot)com  
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org  
Subject:	Re: BUG #15082: PostgreSQL 11 dev bug with window range partition by QUERY  
Date:	2018-02-23 20:12:31  
Message-ID:	22042.1519416751@sss.pgh.pa.us (view raw or download thread mbox)  
=?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:  
> when i use frame clause(range), it's only right when don't use partition by,  
> and use the whole rows as one partition.  
> is it a bug?  
  
Yup.  Fixed, thanks.  
  
			regards, tom lane  

patch如下

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fe802c8185e9a53158b6797d0f6fd8bfbb01af1

  
author	Tom Lane <tgl@sss.pgh.pa.us>	  
Sat, 24 Feb 2018 04:11:40 +0800 (15:11 -0500)  
committer	Tom Lane <tgl@sss.pgh.pa.us>	  
Sat, 24 Feb 2018 04:11:40 +0800 (15:11 -0500)  
commit	9fe802c8185e9a53158b6797d0f6fd8bfbb01af1  
tree	493273a7076eb287b6b00803346f93e47a3a7e55	tree | snapshot  
parent	8af87f411c151537b6e3315c2a191110c3fec494	commit | diff  
Fix brown-paper-bag bug in commit 0a459cec96d3856f476c2db298c6b52f592894e8.  
  
RANGE_OFFSET comparisons need to examine the first ORDER BY column,  
which isn't necessarily the first column in the incoming tuples.  
No idea how this slipped through initial testing.  
  
Per bug #15082 from Zhou Digoal.  
  
Discussion: https://postgr.es/m/151939899974.1461.9411971793110285476@wrigleys.postgresql.org  

建议在提交BUG的时候,尽量的描述详细,能复现的问题写清楚复现方法,能引用的commit尽量引用,减少committer排查问题的时间。

下面详细介绍窗口查询的语法。

窗口查询语法介绍

https://www.postgresql.org/docs/devel/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

https://www.postgresql.org/docs/devel/static/sql-select.html

大括号表示必选,中括号可选。

1、仅window聚合函数,允许使用FILTER子句,表示只处理filter中WHERE TRUE的记录。

function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER window_name    
function_name ([expression [, expression ... ]]) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )    
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER window_name    
function_name ( * ) [ FILTER ( WHERE filter_clause ) ] OVER ( window_definition )    

2、窗口定义,where window_definition has the syntax

[ existing_window_name ]    
[ PARTITION BY expression [, ...] ]    
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]    
[ frame_clause ]    

3、帧定义,The optional frame_clause can be one of

rows 表示前后多少行。

groups 表示前后多少组。

例如1,1,1,2,2,3,3,3,4,5,6的顺序值。当前记录=3的任意一条时,前后1组表示2,2,3,3,3,4;当前记录=2的任意一条时,前1组以及current row表示1,1,1,2,2。

range, groups的边界都是以peer的最外围作为边界,例如字段值 1,1,1,2,3,3,边界是1或者3时,包含所有的1以及所有的3。

如果order by是一个表达式,那么边界以表达式的值来计算,如果多行表达式的值一致,那么这些行就是一组PEER。

groups,相同表达式或列值作为一个peer簇,边界输入为前后N个簇。current_row表示包含当前行所在簇的最大范围。

range,当前值的差值作为判断边界的条件。current_row表示包含当前行所在簇的最大范围。

目前range的offset仅支持int2,int4,int8,interval类型,对应类型int,date,time,timestamp,timestamptz,即类似数据库range类型的描述,order by表达式的类型决定了offset用到的类型

{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]    
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]    

4、帧的开始和结束定义,where frame_start and frame_end can be one of

UNBOUNDED PRECEDING    
offset PRECEDING    
CURRENT ROW    
offset FOLLOWING    
UNBOUNDED FOLLOWING    

默认为 between UNBOUNDED PRECEDING and CURRENT ROW。

注意:如果window中未定义order by,那么整个window就是一个group,或者整个window就是一个frame。

5、帧的不包含定义,and frame_exclusion can be one of

EXCLUDE CURRENT ROW,不包含当前行    
EXCLUDE GROUP,不包含当前行的所有peer行(当前行也不包含),即order by表达式得到的值等值的行s。    
EXCLUDE TIES,不包含当前行的所有peer行,但是包含当前行。    
EXCLUDE NO OTHERS,不排除任何行。默认值。    

窗口查询用法和业务场景举例

窗口查询SQL用法

1、查询与每门成绩第一名相差几分

select     
  *,     
  first_value(score) over w1 - score    
from t     
  window w1 as (partition by sub order by score desc)     
order by sub,score desc;    
    
 sid | sub | des  | score | ?column?     
-----+-----+------+-------+----------    
   4 |   1 | 语文 |   100 |        0    
   3 |   1 | 语文 |    99 |        1    
   7 |   1 | 语文 |    90 |       10    
   6 |   1 | 语文 |    89 |       11    
   1 |   1 | 语文 |    81 |       19    
   2 |   1 | 语文 |    71 |       29    
   5 |   1 | 语文 |    32 |       68    
   7 |   2 | 数学 |   100 |        0    
   6 |   2 | 数学 |    97 |        3    
   2 |   2 | 数学 |    96 |        4    
   4 |   2 | 数学 |    90 |       10    
   1 |   2 | 数学 |    81 |       19    
   3 |   2 | 数学 |    78 |       22    
   5 |   2 | 数学 |    12 |       88    
   1 |   3 | 英语 |   100 |        0    
   2 |   3 | 英语 |    95 |        5    
   3 |   3 | 英语 |    95 |        5    
   4 |   3 | 英语 |    95 |        5    
   7 |   3 | 英语 |    93 |        7    
   6 |   3 | 英语 |    87 |       13    
   5 |   3 | 英语 |    56 |       44    
   7 |   4 | 物理 |    71 |        0    
   6 |   4 | 物理 |    71 |        0    
   3 |   4 | 物理 |    65 |        6    
   4 |   4 | 物理 |    65 |        6    
   5 |   4 | 物理 |    65 |        6    
   1 |   4 | 物理 |    60 |       11    
   2 |   4 | 物理 |    60 |       11    
(28 rows)    

2、查询每门成绩排行,并列时,消耗计数

select     
  *,     
  rank() over w1     
from t     
  window w1 as (partition by sub order by score desc)     
order by sub,score desc;    
    
 sid | sub | des  | score | rank     
-----+-----+------+-------+------    
   4 |   1 | 语文 |   100 |    1    
   3 |   1 | 语文 |    99 |    2    
   7 |   1 | 语文 |    90 |    3    
   6 |   1 | 语文 |    89 |    4    
   1 |   1 | 语文 |    81 |    5    
   2 |   1 | 语文 |    71 |    6    
   5 |   1 | 语文 |    32 |    7    
   7 |   2 | 数学 |   100 |    1    
   6 |   2 | 数学 |    97 |    2    
   2 |   2 | 数学 |    96 |    3    
   4 |   2 | 数学 |    90 |    4    
   1 |   2 | 数学 |    81 |    5    
   3 |   2 | 数学 |    78 |    6    
   5 |   2 | 数学 |    12 |    7    
   1 |   3 | 英语 |   100 |    1    
   2 |   3 | 英语 |    95 |    2    
   3 |   3 | 英语 |    95 |    2    
   4 |   3 | 英语 |    95 |    2    
   7 |   3 | 英语 |    93 |    5    
   6 |   3 | 英语 |    87 |    6    
   5 |   3 | 英语 |    56 |    7    
   7 |   4 | 物理 |    71 |    1    
   6 |   4 | 物理 |    71 |    1    
   3 |   4 | 物理 |    65 |    3    
   4 |   4 | 物理 |    65 |    3    
   5 |   4 | 物理 |    65 |    3    
   1 |   4 | 物理 |    60 |    6    
   2 |   4 | 物理 |    60 |    6    
(28 rows)    

3、查询与总分前一名的同学的单门分差

select     
  *,     
  coalesce(lag(score) over w1, score) - score    
from     
(    
select t.*,t1.rn from t join    
(    
  select sid, row_number() over(order by sum desc) as rn from (select sid,sum(score) from t group by sid) t    
) t1     
using (sid)    
) t     
  window w1 as (partition by sub order by rn)     
order by sub,rn;    
    
    
 sid | sub | des  | score | rn | ?column?     
-----+-----+------+-------+----+----------    
   7 |   1 | 语文 |    90 |  1 |        0    
   4 |   1 | 语文 |   100 |  2 |      -10    
   6 |   1 | 语文 |    89 |  3 |       11    
   3 |   1 | 语文 |    99 |  4 |      -10    
   2 |   1 | 语文 |    71 |  5 |       28    
   1 |   1 | 语文 |    81 |  6 |      -10    
   5 |   1 | 语文 |    32 |  7 |       49    
   7 |   2 | 数学 |   100 |  1 |        0    
   4 |   2 | 数学 |    90 |  2 |       10    
   6 |   2 | 数学 |    97 |  3 |       -7    
   3 |   2 | 数学 |    78 |  4 |       19    
   2 |   2 | 数学 |    96 |  5 |      -18    
   1 |   2 | 数学 |    81 |  6 |       15    
   5 |   2 | 数学 |    12 |  7 |       69    
   7 |   3 | 英语 |    93 |  1 |        0    
   4 |   3 | 英语 |    95 |  2 |       -2    
   6 |   3 | 英语 |    87 |  3 |        8    
   3 |   3 | 英语 |    95 |  4 |       -8    
   2 |   3 | 英语 |    95 |  5 |        0    
   1 |   3 | 英语 |   100 |  6 |       -5    
   5 |   3 | 英语 |    56 |  7 |       44    
   7 |   4 | 物理 |    71 |  1 |        0    
   4 |   4 | 物理 |    65 |  2 |        6    
   6 |   4 | 物理 |    71 |  3 |       -6    
   3 |   4 | 物理 |    65 |  4 |        6    
   2 |   4 | 物理 |    60 |  5 |        5    
   1 |   4 | 物理 |    60 |  6 |        0    
   5 |   4 | 物理 |    65 |  7 |       -5    
(28 rows)    

4、查询每3个相邻分组的分数平均值,SUM值,COUNT值。

需要使用group的帧语法,order by中每簇相同表达式的值,或者列的值,表示一个peer group。

select     
  *,     
  avg(score) over w1,    
  sum(score) over w1,    
  count(score) over w1,    
  first_value(score) over w1,    
  last_value(score) over w1    
from t     
  window w1 as (partition by sub order by score desc groups between 1 preceding and 1 following)     
order by sub,score desc;    
    
    
 sid | sub | des  | score |         avg         | sum | count | first_value | last_value     
-----+-----+------+-------+---------------------+-----+-------+-------------+------------    
   4 |   1 | 语文 |   100 | 99.5000000000000000 | 199 |     2 |         100 |         99    
   3 |   1 | 语文 |    99 | 96.3333333333333333 | 289 |     3 |         100 |         90    
   7 |   1 | 语文 |    90 | 92.6666666666666667 | 278 |     3 |          99 |         89    
   6 |   1 | 语文 |    89 | 86.6666666666666667 | 260 |     3 |          90 |         81    
   1 |   1 | 语文 |    81 | 80.3333333333333333 | 241 |     3 |          89 |         71    
   2 |   1 | 语文 |    71 | 61.3333333333333333 | 184 |     3 |          81 |         32    
   5 |   1 | 语文 |    32 | 51.5000000000000000 | 103 |     2 |          71 |         32    
   7 |   2 | 数学 |   100 | 98.5000000000000000 | 197 |     2 |         100 |         97    
   6 |   2 | 数学 |    97 | 97.6666666666666667 | 293 |     3 |         100 |         96    
   2 |   2 | 数学 |    96 | 94.3333333333333333 | 283 |     3 |          97 |         90    
   4 |   2 | 数学 |    90 | 89.0000000000000000 | 267 |     3 |          96 |         81    
   1 |   2 | 数学 |    81 | 83.0000000000000000 | 249 |     3 |          90 |         78    
   3 |   2 | 数学 |    78 | 57.0000000000000000 | 171 |     3 |          81 |         12    
   5 |   2 | 数学 |    12 | 45.0000000000000000 |  90 |     2 |          78 |         12    
   1 |   3 | 英语 |   100 | 96.2500000000000000 | 385 |     4 |         100 |         95    
   2 |   3 | 英语 |    95 | 95.6000000000000000 | 478 |     5 |         100 |         93    
   3 |   3 | 英语 |    95 | 95.6000000000000000 | 478 |     5 |         100 |         93    
   4 |   3 | 英语 |    95 | 95.6000000000000000 | 478 |     5 |         100 |         93    
   7 |   3 | 英语 |    93 | 93.0000000000000000 | 465 |     5 |          95 |         87    
   6 |   3 | 英语 |    87 | 78.6666666666666667 | 236 |     3 |          93 |         56    
   5 |   3 | 英语 |    56 | 71.5000000000000000 | 143 |     2 |          87 |         56    
   7 |   4 | 物理 |    71 | 67.4000000000000000 | 337 |     5 |          71 |         65  -- 物理的分数分布的边界最好理解    
   6 |   4 | 物理 |    71 | 67.4000000000000000 | 337 |     5 |          71 |         65    
   3 |   4 | 物理 |    65 | 65.2857142857142857 | 457 |     7 |          71 |         60    
   4 |   4 | 物理 |    65 | 65.2857142857142857 | 457 |     7 |          71 |         60    
   5 |   4 | 物理 |    65 | 65.2857142857142857 | 457 |     7 |          71 |         60    
   1 |   4 | 物理 |    60 | 63.0000000000000000 | 315 |     5 |          65 |         60    
   2 |   4 | 物理 |    60 | 63.0000000000000000 | 315 |     5 |          65 |         60    
(28 rows)    

业务场景举例

1、数据去重

c1,c2字段重复时,保留crt_time最大的一条,其余删除。

delete from test1 where ctid = any(array(    
  select ctid from     
    (select row_number() over (partition by c1,c2 order by crt_time desc) as rn, ctid from test1) t     
  where t.rn<>1    
));      

2、辐射统计

每个学生,与它前后相差10分的学生,作为一个帧,计算他们的总分,平均值,学生数,最高分和最低分。

select     
  *,     
  sum(score) over w1,     
  avg(score) over w1,     
  count(*) over w1,    
  first_value(score) over w1,     
  last_value(score) over w1     
from t     
  window w1 as (partition by sub order by score::int range between 10 preceding and 10 following)     
order by sub, score;    

3、滑窗分析

估值滑窗(最近7天UV)    
SELECT   
  date,   
  # hll_union_agg(users) OVER seven_days   
FROM daily_uniques   
  WINDOW seven_days AS (ORDER BY date ASC ROWS 6 PRECEDING);    
    
日UV     
select   
  count(distinct uid)   
from t where dt='2017-11-11';   
  
select # hll_uid from t where dt='2017-11-11';     
    
滑动分析:最近N天UV     
SELECT date,   
  # hll_union_agg(users) OVER seven_days   
FROM daily_uniques   
  WINDOW seven_days AS (ORDER BY date ASC ROWS 6 PRECEDING);     
    
每日流失UV     
SELECT   
  date,   
  (# hll_union_agg(users) OVER two_days) - # users AS lost_uniques   
FROM daily_uniques   
  WINDOW two_days AS (ORDER BY date ASC ROWS 1 PRECEDING);    
  
每日新增UV  
SELECT   
  date,   
  (# hll_union_agg(users) OVER two_days) -   
    (# lag(users) over (ORDER BY date ASC)) AS new_uniques   
FROM daily_uniques   
  WINDOW two_days AS (ORDER BY date ASC ROWS 1 PRECEDING);  

小结

1、本文介绍了PostgreSQL窗口查询的用法。

2、PostgreSQL 11完全兼容SQL:2011的window frame clause标准。

3、目前range offset支持了numeric, float4, float8, int2, int4, int8, datetime, interval等类型,可以进行花式滑动分析。

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8b29e88cdce17705f0b2c43e50219ce1d7d2f603

参考

https://www.postgresql.org/docs/devel/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

《PostgreSQL、Greenplum 滑动窗口 分析SQL 实践》

《车联网案例,轨迹清洗 - 阿里云RDS PostgreSQL最佳实践 - 窗口函数》

《PostgreSQL 海量时序数据(任意滑动窗口实时统计分析) - 传感器、人群、物体等对象跟踪》

《PostgreSQL 聚合表达式 FILTER , order , within group, over window 用法》

《用PostgreSQL描绘人生的高潮、尿点、低谷 - 窗口/帧 or 斜率/导数/曲率/微积分?》

《时序数据合并场景加速分析和实现 - 复合索引,窗口分组查询加速,变态递归加速》

《PostgreSQL 数据去重方法大全》

Flag Counter

digoal’s 大量PostgreSQL文章入口