递归优化CASE - performance tuning case :use cursortriggerrecursive replace (group by and order by) REDUCE needed blockes scan
背景
一个比较经典的SQL如下 :
select a.skyid, a.giftpackageid, a.giftpackagename, a.intimestamp
from tbl_anc_player_win_log a
group by a.skyid, a.giftpackageid, a.giftpackagename, a.intimestamp
order by a.intimestamp desc LIMIT 10;
显然走的是全表扫描.
有几种优化的手段如下, 数据量越大提升效果越明显, 测试使用了10W条记录 :
1. 使用子查询降低 group by 的数据量.
2. 使用游标和临时表.
3. 在tbl_anc_player_win_log表上建立触发器, 用另一张表来记录唯一的a.skyid, a.giftpackageid, a.giftpackagename, a.intimestamp.
下面分别来讲一下 :
测试表 :
create table user_download_log (user_id int not null, listid int not null, apkid int not null, get_time timestamp(0) not null, otherinfo text);
create index idx_user_download_log_time on user_download_log (get_time);
测试数据 :
insert into user_download_log select generate_series(0,100000),generate_series(0,100000),generate_series(0,100000),generate_series(clock_timestamp(),clock_timestamp()+interval '100000 min',interval '1 min'), 'this is test';
1. 使用子查询来降低 group by 的数据量.
我们看原始的SQL, 是全表来group by的, 但是实际上只需要取10条. 按时间排序.
那么可以根据数据特点, 按照时间字段排序取出部分记录(具体取出多少条, 视大概多少条能取到10个唯一的a.skyid, a.giftpackageid, a.giftpackagename,a.intimestamp 而定)
例如假设取1000条可以得到10个唯一的(a.skyid, a.giftpackageid, a.giftpackagename,a.intimestamp)
SQL如下 :
digoal=> select * from (select user_id,listid,apkid,get_time from user_download_log order by get_time desc limit 1000) as t group by user_id,listid,apkid,get_time order by get_time desc limit 10;
user_id | listid | apkid | get_time
---------+--------+--------+---------------------
100000 | 100000 | 100000 | 2012-11-22 23:40:06
99999 | 99999 | 99999 | 2012-11-22 23:39:06
99998 | 99998 | 99998 | 2012-11-22 23:38:06
99997 | 99997 | 99997 | 2012-11-22 23:37:06
99996 | 99996 | 99996 | 2012-11-22 23:36:06
99995 | 99995 | 99995 | 2012-11-22 23:35:06
99994 | 99994 | 99994 | 2012-11-22 23:34:06
99993 | 99993 | 99993 | 2012-11-22 23:33:06
99992 | 99992 | 99992 | 2012-11-22 23:32:06
99991 | 99991 | 99991 | 2012-11-22 23:31:06
(10 rows)
Time: 2.010 ms
执行计划如下 :
digoal=> explain select * from (select user_id,listid,apkid,get_time from user_download_log order by get_time desc limit 1000) as t group by user_id,listid,apkid,get_time order by get_time desc limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----------
Limit (cost=86.02..86.05 rows=10 width=20)
-> Sort (cost=86.02..88.52 rows=1000 width=20)
Sort Key: user_download_log.get_time
-> HashAggregate (cost=54.41..64.41 rows=1000 width=20)
-> Limit (cost=0.00..34.41 rows=1000 width=20)
-> Index Scan Backward using idx_user_download_log_time on user_download_log (cost=0.00..3441.38 rows=100001
width=20)
(6 rows)
Time: 0.765 ms
原始SQL :
digoal=> select user_id,listid,apkid,get_time from user_download_log group by user_id,listid,apkid,get_time order by get_time desc limit 10;
user_id | listid | apkid | get_time
---------+--------+--------+---------------------
100000 | 100000 | 100000 | 2012-11-22 23:40:06
99999 | 99999 | 99999 | 2012-11-22 23:39:06
99998 | 99998 | 99998 | 2012-11-22 23:38:06
99997 | 99997 | 99997 | 2012-11-22 23:37:06
99996 | 99996 | 99996 | 2012-11-22 23:36:06
99995 | 99995 | 99995 | 2012-11-22 23:35:06
99994 | 99994 | 99994 | 2012-11-22 23:34:06
99993 | 99993 | 99993 | 2012-11-22 23:33:06
99992 | 99992 | 99992 | 2012-11-22 23:32:06
99991 | 99991 | 99991 | 2012-11-22 23:31:06
(10 rows)
Time: 97.731 ms
原始SQL执行计划 :
digoal=> explain select user_id,listid,apkid,get_time from user_download_log group by user_id,listid,apkid,get_time order by get_time desc limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------
Limit (cost=12189.92..12190.05 rows=10 width=20)
-> Group (cost=12189.92..13439.93 rows=100001 width=20)
-> Sort (cost=12189.92..12439.92 rows=100001 width=20)
Sort Key: get_time, user_id, listid, apkid
-> Seq Scan on user_download_log (cost=0.00..1834.01 rows=100001 width=20)
(5 rows)
Time: 0.742 ms
使用这种优化方法得到的查询速度是原始SQL的49倍.
2. 使用游标, 以及临时表来存储得到的值.
函数如下 :
create or replace function get_user_download_log(i_limit int) returns setof record as $$
declare
v_result record;
v_query refcursor;
v_limit int := 0;
begin
CREATE TEMPORARY TABLE IF NOT EXISTS tmp_user_download_log (user_id int, listid int, apkid int, get_time timestamp(0));
truncate table tmp_user_download_log;
open v_query for select user_id,listid,apkid,get_time from user_download_log order by get_time desc;
loop
fetch v_query into v_result;
perform 1 from tmp_user_download_log where user_id=v_result.user_id and listid=v_result.listid and apkid=v_result.apkid and get_time=v_result.get_time;
if not found then
if v_limit >= i_limit then
exit;
end if;
if (v_result.user_id is not null and v_result.listid is not null and v_result.apkid is not null and v_result.get_time is not null) then
insert into tmp_user_download_log(user_id, listid, apkid, get_time)
values(v_result.user_id, v_result.listid, v_result.apkid, v_result.get_time);
end if;
v_limit := v_limit + 1;
end if;
end loop;
close v_query;
return query select * from tmp_user_download_log;
end;
$$ language plpgsql;
使用这个函数取数的执行计划如下 :
digoal=> explain analyze select * from get_user_download_log(10) as (c1 int, c2 int, c3 int, c4 timestamp(0));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Function Scan on get_user_download_log (cost=0.25..10.25 rows=1000 width=20) (actual time=0.948..0.951 rows=20 loops=1)
Total runtime: 0.978 ms
(2 rows)
执行时间和执行结果如下 :
digoal=> select * from get_user_download_log(10) as (c1 int, c2 int, c3 int, c4 timestamp(0));
c1 | c2 | c3 | c4
--------+--------+--------+---------------------
100000 | 100000 | 100000 | 2012-11-22 23:28:28
99999 | 99999 | 99999 | 2012-11-22 23:27:28
99998 | 99998 | 99998 | 2012-11-22 23:26:28
99997 | 99997 | 99997 | 2012-11-22 23:25:28
99996 | 99996 | 99996 | 2012-11-22 23:24:28
99995 | 99995 | 99995 | 2012-11-22 23:23:28
99994 | 99994 | 99994 | 2012-11-22 23:22:28
99993 | 99993 | 99993 | 2012-11-22 23:21:28
99992 | 99992 | 99992 | 2012-11-22 23:20:28
99991 | 99991 | 99991 | 2012-11-22 23:19:28
(10 rows)
Time: 1.818 ms
使用这种优化方法得到的查询速度是原始SQL的54倍.
3. 使用触发器, 以及新建一张表来存储唯一的user_id,listid,apkid,get_time.
新建表如下 :
create table user_download_uk (user_id int, listid int, apkid int, get_time timestamp(0), primary key (user_id,listid,apkid,get_time));
新建插入触发器函数 :
create or replace function tg_user_download_log_insert() returns trigger as $$
declare
begin
perform 1 from user_download_uk where user_id=NEW.user_id and listid=NEW.listid and apkid=NEW.apkid and get_time=NEW.get_time;
if found then
return null;
else
insert into user_download_uk(user_id,listid,apkid,get_time) values(NEW.user_id,NEW.listid,NEW.apkid,NEW.get_time);
end if;
return null;
end;
$$ language plpgsql;
新建删除触发器函数 :
create or replace function tg_user_download_log_delete() returns trigger as $$
declare
begin
delete from user_download_uk where user_id=OLD.user_id and listid=OLD.listid and apkid=OLD.apkid and get_time=OLD.get_time;
return null;
end;
$$ language plpgsql;
新建更新触发器函数 :
create or replace function tg_user_download_log_update() returns trigger as $$
declare
begin
perform 1 from user_download_uk where user_id=OLD.user_id and listid=OLD.listid and apkid=OLD.apkid and get_time=OLD.get_time;
if found then
update user_download_uk set user_id=NEW.user_id and listid=NEW.listid and apkid=NEW.apkid and get_time=NEW.get_time
where user_id=OLD.user_id and listid=OLD.listid and apkid=OLD.apkid and get_time=OLD.get_time;
else
insert into user_download_uk(user_id,listid,apkid,get_time) values(NEW.user_id,NEW.listid,NEW.apkid,NEW.get_time);
end if;
return null;
end;
$$ language plpgsql;
在user_download_log表上创建三个触发器.
digoal=> create trigger tg_user_download_log_update after update on user_download_log for each row execute procedure tg_user_download_log_update();
CREATE TRIGGER
Time: 1.531 ms
digoal=> create trigger tg_user_download_log_insert after insert on user_download_log for each row execute procedure tg_user_download_log_insert();
CREATE TRIGGER
Time: 1.527 ms
digoal=> create trigger tg_user_download_log_delete after delete on user_download_log for each row execute procedure tg_user_download_log_delete();
CREATE TRIGGER
Time: 1.484 ms
清除数据, 重新插入 :
digoal=> truncate user_download_log ;
TRUNCATE TABLE
Time: 6.216 ms
在新增的表上对order by字段创建索引.
digoal=> create index idx_user_download_uk on user_download_uk (get_time);
CREATE INDEX
Time: 64.599 ms
插入数据 :
digoal=> insert into user_download_log select generate_series(0,100000),generate_series(0,100000),generate_series(0,100000),generate_series(clock_timestamp(),clock_timestamp()+interval '100000 min',interval '1 min'), 'this is test';
INSERT 0 100001
Time: 4911.241 ms
显然插入速度有明显下降, 这是这种方法的弊端, 需要权衡这个表的DML操作和DQL操作的比例来选择是否要用这种方法进行优化.
digoal=> select * from user_download_uk order by get_time desc limit 10;
user_id | listid | apkid | get_time
---------+--------+--------+---------------------
100000 | 100000 | 100000 | 2012-11-23 01:20:11
99999 | 99999 | 99999 | 2012-11-23 01:19:11
99998 | 99998 | 99998 | 2012-11-23 01:18:11
99997 | 99997 | 99997 | 2012-11-23 01:17:11
99996 | 99996 | 99996 | 2012-11-23 01:16:11
99995 | 99995 | 99995 | 2012-11-23 01:15:11
99994 | 99994 | 99994 | 2012-11-23 01:14:11
99993 | 99993 | 99993 | 2012-11-23 01:13:11
99992 | 99992 | 99992 | 2012-11-23 01:12:11
99991 | 99991 | 99991 | 2012-11-23 01:11:11
(10 rows)
Time: 0.360 ms
使用这种优化方法得到的查询速度是原始SQL的271倍.
小结
1. 使用子查询降低 group by 的数据量.
使用这种优化方法得到的查询速度是原始SQL的49倍.
弊端, 不好评估子查询中到底要限制多少行, 才能得到10条唯一的a.skyid, a.giftpackageid, a.giftpackagename, a.intimestamp. 如果评估少了得不到10条最终结果, 如果评估多了又费性能.
2. 使用游标和临时表.
使用这种优化方法得到的查询速度是原始SQL的54倍.
使用这种方法应该是比较折中的, 但是需要规划好临时表 .
3. 在tbl_anc_player_win_log表上建立触发器, 用另一张表来记录唯一的a.skyid, a.giftpackageid, a.giftpackagename, a.intimestamp.
使用这种优化方法得到的查询速度是原始SQL的271倍.
使用这种方法得到的提升最明显, 但是对DML的性能影响也是非常大的, 需要权衡利弊.
补充1
1. 使用游标的例子, 还有优化的空间, 那就是把临时表去掉. 使用array来存储过往的记录. 如下 :
需要创建一个返回结果类型, 因为函数中不允许定义record[]类型的数组. v_result_array record[];
定义返回结果类型, 还有一个好处是使得查询更加简单了.
digoal=> create type typ_user_download_log as (user_id int, listid int, apkid int, get_time timestamp(0));
CREATE TYPE
函数如下 :
create or replace function get_user_download_log(i_limit int) returns setof typ_user_download_log as $$
declare
v_result typ_user_download_log;
v_query refcursor;
v_limit int := 0;
v_result_array typ_user_download_log[];
begin
open v_query for select user_id,listid,apkid,get_time from user_download_log order by get_time desc;
loop
fetch v_query into v_result;
if ( v_result = ANY(v_result_array) ) then
else
if v_limit >= i_limit then
exit;
end if;
v_result_array := array_append(v_result_array, v_result);
return next v_result;
v_limit := v_limit + 1;
end if;
end loop;
close v_query;
return;
end;
$$ language plpgsql;
插入几条重复记录, 看看是否能正常过滤.
digoal=> select ctid,* from user_download_log order by get_time desc limit 10;
ctid | user_id | listid | apkid | get_time | otherinfo
----------+---------+--------+--------+---------------------+--------------
(833,41) | 100000 | 100000 | 100000 | 2012-11-23 01:20:11 | this is test
(833,40) | 99999 | 99999 | 99999 | 2012-11-23 01:19:11 | this is test
(833,39) | 99998 | 99998 | 99998 | 2012-11-23 01:18:11 | this is test
(833,38) | 99997 | 99997 | 99997 | 2012-11-23 01:17:11 | this is test
(833,37) | 99996 | 99996 | 99996 | 2012-11-23 01:16:11 | this is test
(833,36) | 99995 | 99995 | 99995 | 2012-11-23 01:15:11 | this is test
(833,35) | 99994 | 99994 | 99994 | 2012-11-23 01:14:11 | this is test
(833,34) | 99993 | 99993 | 99993 | 2012-11-23 01:13:11 | this is test
(833,33) | 99992 | 99992 | 99992 | 2012-11-23 01:12:11 | this is test
(833,32) | 99991 | 99991 | 99991 | 2012-11-23 01:11:11 | this is test
(10 rows)
Time: 0.712 ms
插入前10条重复数据.
digoal=> insert into user_download_log select * from user_download_log order by get_time desc limit 10;
INSERT 0 10
Time: 2.322 ms
digoal=> select ctid,* from user_download_log order by get_time desc limit 10;
ctid | user_id | listid | apkid | get_time | otherinfo
----------+---------+--------+--------+---------------------+--------------
(833,41) | 100000 | 100000 | 100000 | 2012-11-23 01:20:11 | this is test
(833,43) | 100000 | 100000 | 100000 | 2012-11-23 01:20:11 | this is test
(833,40) | 99999 | 99999 | 99999 | 2012-11-23 01:19:11 | this is test
(833,44) | 99999 | 99999 | 99999 | 2012-11-23 01:19:11 | this is test
(833,39) | 99998 | 99998 | 99998 | 2012-11-23 01:18:11 | this is test
(833,45) | 99998 | 99998 | 99998 | 2012-11-23 01:18:11 | this is test
(833,38) | 99997 | 99997 | 99997 | 2012-11-23 01:17:11 | this is test
(833,46) | 99997 | 99997 | 99997 | 2012-11-23 01:17:11 | this is test
(833,37) | 99996 | 99996 | 99996 | 2012-11-23 01:16:11 | this is test
(833,47) | 99996 | 99996 | 99996 | 2012-11-23 01:16:11 | this is test
(10 rows)
Time: 0.649 ms
查询, 得到正确的结果
digoal=> select * from get_user_download_log(10);
user_id | listid | apkid | get_time
---------+--------+--------+---------------------
100000 | 100000 | 100000 | 2012-11-23 01:20:11
99999 | 99999 | 99999 | 2012-11-23 01:19:11
99998 | 99998 | 99998 | 2012-11-23 01:18:11
99997 | 99997 | 99997 | 2012-11-23 01:17:11
99996 | 99996 | 99996 | 2012-11-23 01:16:11
99995 | 99995 | 99995 | 2012-11-23 01:15:11
99994 | 99994 | 99994 | 2012-11-23 01:14:11
99993 | 99993 | 99993 | 2012-11-23 01:13:11
99992 | 99992 | 99992 | 2012-11-23 01:12:11
99991 | 99991 | 99991 | 2012-11-23 01:11:11
(10 rows)
Time: 0.809 ms
使用这种优化方法得到的查询速度是原始SQL的121倍.
补充2
以下是延展出来的方法, 使用递归调用, 也可以达到同样的优化目的, 但是. 以下测试的group by 只用到了一个字段. 而上面的例子用到了多个字段group by .
测试表 :
CREATE TABLE test (
username TEXT,
some_ts timestamptz,
random_value INT4
);
测试数据 :
INSERT INTO test (username, some_ts, random_value)
SELECT
'user #' || cast(floor(random() * 10) as int4),
now() - '1 year'::INTERVAL * random(),
cast(random() * 100000000 as INT4)
FROM
generate_series(1,2000000);
优化将用到这个索引 :
CREATE INDEX i on test (username, some_ts);
分析表 :
analyze test;
测试数据分布 :
SELECT
username,
count(*)
FROM test
group by username
order by username;
username │ count
──────────┼────────
user #0 │ 199871
user #1 │ 199939
user #2 │ 200388
user #3 │ 199849
user #4 │ 200329
user #5 │ 199504
user #6 │ 199903
user #7 │ 200799
user #8 │ 199487
user #9 │ 199931
(10 rows)
未优化的SQL :
select username,some_ts,random_value from (select row_number() over (partition by username order by some_ts desc) as rownum , * from test) as t where t.rownum<6;
执行时间 :
username | some_ts | random_value
----------+-------------------------------+--------------
user #0 | 2012-10-08 10:26:38.561924+08 | 44572919
user #0 | 2012-10-08 10:26:28.625924+08 | 5466578
user #0 | 2012-10-08 10:21:18.277124+08 | 32176884
user #0 | 2012-10-08 10:16:49.227524+08 | 81763617
user #0 | 2012-10-08 10:15:49.611524+08 | 9824604
user #1 | 2012-10-08 10:25:37.045124+08 | 51284408
user #1 | 2012-10-08 10:24:28.184324+08 | 42880507
user #1 | 2012-10-08 10:24:10.040324+08 | 76807969
user #1 | 2012-10-08 10:24:00.536324+08 | 31799228
user #1 | 2012-10-08 10:23:40.577924+08 | 62644003
user #2 | 2012-10-08 10:19:43.064324+08 | 96476095
user #2 | 2012-10-08 10:19:08.849924+08 | 17594572
user #2 | 2012-10-08 10:17:45.992324+08 | 41356858
user #2 | 2012-10-08 10:16:01.361924+08 | 9022134
user #2 | 2012-10-08 10:15:53.585924+08 | 38457579
user #3 | 2012-10-08 10:25:55.707524+08 | 51972834
user #3 | 2012-10-08 10:24:01.918724+08 | 51456774
user #3 | 2012-10-08 10:23:23.470724+08 | 76070209
user #3 | 2012-10-08 10:22:59.451524+08 | 75154506
user #3 | 2012-10-08 10:22:27.829124+08 | 13303013
user #4 | 2012-10-08 10:26:07.198724+08 | 99513201
user #4 | 2012-10-08 10:23:50.859524+08 | 55677602
user #4 | 2012-10-08 10:18:31.265924+08 | 19827206
user #4 | 2012-10-08 10:18:12.344324+08 | 66195606
user #4 | 2012-10-08 10:17:29.230724+08 | 68841533
user #5 | 2012-10-08 10:26:51.003524+08 | 9603697
user #5 | 2012-10-08 10:26:43.659524+08 | 68753345
user #5 | 2012-10-08 10:25:34.712324+08 | 47807516
user #5 | 2012-10-08 10:24:33.454724+08 | 64949093
user #5 | 2012-10-08 10:18:59.605124+08 | 20250684
user #6 | 2012-10-08 10:26:42.536324+08 | 75469114
user #6 | 2012-10-08 10:21:20.782724+08 | 4394586
user #6 | 2012-10-08 10:21:13.352324+08 | 40429474
user #6 | 2012-10-08 10:16:06.632324+08 | 71651030
user #6 | 2012-10-08 10:14:39.454724+08 | 39326466
user #7 | 2012-10-08 10:26:29.576324+08 | 33673002
user #7 | 2012-10-08 10:21:39.877124+08 | 76655284
user #7 | 2012-10-08 10:20:26.437124+08 | 47873719
user #7 | 2012-10-08 10:20:01.813124+08 | 99362289
user #7 | 2012-10-08 10:19:37.880324+08 | 31506556
user #8 | 2012-10-08 10:25:06.545924+08 | 22525296
user #8 | 2012-10-08 10:24:34.145924+08 | 24814269
user #8 | 2012-10-08 10:24:07.016324+08 | 91073517
user #8 | 2012-10-08 10:23:42.565124+08 | 11352497
user #8 | 2012-10-08 10:21:52.318724+08 | 71172705
user #9 | 2012-10-08 10:26:07.544324+08 | 35362281
user #9 | 2012-10-08 10:26:01.928324+08 | 57774609
user #9 | 2012-10-08 10:25:20.888324+08 | 13287142
user #9 | 2012-10-08 10:25:11.643524+08 | 53517373
user #9 | 2012-10-08 10:21:26.571524+08 | 74587255
(50 rows)
Time: 11329.845 ms
使用递归调用优化, 主要目的是让每个递归查询其走索引扫描 :
WITH RECURSIVE skip AS (
( SELECT t.username FROM test as t ORDER BY t.username limit 1 )
union all
(
SELECT
(
SELECT min( t2.username )
FROM test t2
WHERE t2.username > s.username
)
FROM skip s
WHERE s.username IS NOT NULL
)
),
with_data as (
SELECT array(
SELECT t
FROM test t
WHERE t.username = s.username
ORDER BY t.some_ts desc LIMIT 5
) as rows
FROM skip s
WHERE s.username IS NOT NULL
)
SELECT (unnest( rows )).* FROM with_data;
执行时间 :
username | some_ts | random_value
----------+-------------------------------+--------------
user #0 | 2012-10-08 10:26:38.561924+08 | 44572919
user #0 | 2012-10-08 10:26:28.625924+08 | 5466578
user #0 | 2012-10-08 10:21:18.277124+08 | 32176884
user #0 | 2012-10-08 10:16:49.227524+08 | 81763617
user #0 | 2012-10-08 10:15:49.611524+08 | 9824604
user #1 | 2012-10-08 10:25:37.045124+08 | 51284408
user #1 | 2012-10-08 10:24:28.184324+08 | 42880507
user #1 | 2012-10-08 10:24:10.040324+08 | 76807969
user #1 | 2012-10-08 10:24:00.536324+08 | 31799228
user #1 | 2012-10-08 10:23:40.577924+08 | 62644003
user #2 | 2012-10-08 10:19:43.064324+08 | 96476095
user #2 | 2012-10-08 10:19:08.849924+08 | 17594572
user #2 | 2012-10-08 10:17:45.992324+08 | 41356858
user #2 | 2012-10-08 10:16:01.361924+08 | 9022134
user #2 | 2012-10-08 10:15:53.585924+08 | 38457579
user #3 | 2012-10-08 10:25:55.707524+08 | 51972834
user #3 | 2012-10-08 10:24:01.918724+08 | 51456774
user #3 | 2012-10-08 10:23:23.470724+08 | 76070209
user #3 | 2012-10-08 10:22:59.451524+08 | 75154506
user #3 | 2012-10-08 10:22:27.829124+08 | 13303013
user #4 | 2012-10-08 10:26:07.198724+08 | 99513201
user #4 | 2012-10-08 10:23:50.859524+08 | 55677602
user #4 | 2012-10-08 10:18:31.265924+08 | 19827206
user #4 | 2012-10-08 10:18:12.344324+08 | 66195606
user #4 | 2012-10-08 10:17:29.230724+08 | 68841533
user #5 | 2012-10-08 10:26:51.003524+08 | 9603697
user #5 | 2012-10-08 10:26:43.659524+08 | 68753345
user #5 | 2012-10-08 10:25:34.712324+08 | 47807516
user #5 | 2012-10-08 10:24:33.454724+08 | 64949093
user #5 | 2012-10-08 10:18:59.605124+08 | 20250684
user #6 | 2012-10-08 10:26:42.536324+08 | 75469114
user #6 | 2012-10-08 10:21:20.782724+08 | 4394586
user #6 | 2012-10-08 10:21:13.352324+08 | 40429474
user #6 | 2012-10-08 10:16:06.632324+08 | 71651030
user #6 | 2012-10-08 10:14:39.454724+08 | 39326466
user #7 | 2012-10-08 10:26:29.576324+08 | 33673002
user #7 | 2012-10-08 10:21:39.877124+08 | 76655284
user #7 | 2012-10-08 10:20:26.437124+08 | 47873719
user #7 | 2012-10-08 10:20:01.813124+08 | 99362289
user #7 | 2012-10-08 10:19:37.880324+08 | 31506556
user #8 | 2012-10-08 10:25:06.545924+08 | 22525296
user #8 | 2012-10-08 10:24:34.145924+08 | 24814269
user #8 | 2012-10-08 10:24:07.016324+08 | 91073517
user #8 | 2012-10-08 10:23:42.565124+08 | 11352497
user #8 | 2012-10-08 10:21:52.318724+08 | 71172705
user #9 | 2012-10-08 10:26:07.544324+08 | 35362281
user #9 | 2012-10-08 10:26:01.928324+08 | 57774609
user #9 | 2012-10-08 10:25:20.888324+08 | 13287142
user #9 | 2012-10-08 10:25:11.643524+08 | 53517373
user #9 | 2012-10-08 10:21:26.571524+08 | 74587255
(50 rows)
Time: 1.924 ms
优化解说
lines 1-14 generate list of unique usernames – just usernames. This is done using recursive CTE:
Line 2 gets first, smallest username
Lines 5-12 are called recursively, and they fetch next username each time it gets called
The only issue with it is that we will get 11 rows – final row will contain NULL. But this can be filtered out later on.
Lines 15-23 get actual data for all rows
Lines 16-21 get an array of rows (because we can’t generate more rows than we had from “skip” CTE). Each array contains 5 newest rows for given user. I don’t have to SELECT username separately, because it’s part of the row that is being compacted into array.
Line 23 removed this additional NULL row that I mentioned above
Line 25 generates final resultset. It gets (from with_data) 10 rows, each row contains array. Each array has 5 elements, and each of these elements contains a row from original test table. Now, we just need to:
Unnest array – it generates 50 rows, each with single value, being row representation
Unpack rows – by using (row_variable).* syntax
使用递归优化摘自 :
http://www.depesz.com/2012/10/05/getting-top-n-rows-per-group/