PostgreSQL 递归应用实践 - 非“传销”的高并发实时藤、树状佣金分配体系

7 minute read

背景

早在十年前,PostgreSQL 8点几的版本就支持了递归查询,递归查询的应用非常的广泛,如下:

《PostgreSQL 递归妙用案例 - 分组数据去重与打散》

《PostgreSQL Oracle 兼容性之 - INDEX SKIP SCAN (递归查询变态优化) 非驱动列索引扫描优化》

《PostgreSQL SELECT 的高级用法(CTE, LATERAL, ORDINALITY, WINDOW, SKIP LOCKED, DISTINCT, GROUPING SETS, …)》

《PostgreSQL Oracle 兼容性之 - connect by 高级选项 CONNECT_BY_ISLEAF、SYS_CONNECT_BY_PATH、CONNECT_BY_ISCYCLE、LEVEL》

《PostgrSQL 递归SQL的几个应用 - 极客与正常人的思维》

《快速入门PostgreSQL应用开发与管理 - 4 高级SQL用法》

《PostgreSQL 递归查询CASE - 树型路径分组输出》

《用PostgreSQL找回618秒逝去的青春 - 递归收敛优化》

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

《时序数据合并场景加速分析和实现 - 复合索引,窗口分组查询加速,变态递归加速》

《PostgreSQL 使用递归SQL 找出数据库对象之间的依赖关系》

《PostgreSQL 递归死循环案例及解法》

《PostgreSQL 递归查询一例 - 资金累加链》

《PostgreSQL Oracle 兼容性之 - WITH 递归 ( connect by )》

《递归优化CASE - group by & distinct tuning case : use WITH RECURSIVE and min() function》

《递归优化CASE - performance tuning case :use cursor\trigger\recursive replace (group by and order by) REDUCE needed blockes scan》

《PostgreSQL 树状数据存储与查询(非递归) - Use ltree extension deal tree-like data type》

《PostgreSQL Oracle 兼容性之 - connect by 高级选项 CONNECT_BY_ISLEAF、SYS_CONNECT_BY_PATH、CONNECT_BY_ISCYCLE、LEVEL》

《PostgreSQL Oracle 兼容性之 - connect by》

本文要介绍一个分佣场景:

虽然都说要远离“传销”,但是“传销”的分佣模式是被很多场景所使用的,例如通过发展下线得到更多的佣金。

A赚的钱,必须要分给他的直线上游,同时可能还要分给他的顶级上游。

如果将层级关系使用关系数据库存储,那么实际上分佣最关键的是通过关系找到某个用户的上游,以及最上游。可以通过递归查询得到。

以1亿用户为例,最大50个层级的关系网。

设计1

1、表结构设计

create table tbl (    
  uid int8 primary key,  -- 用户ID    
  pid int8               -- 直接上游ID,如果一个用户是ROOT用户,则PID为 null   
);    
    
create index idx_tbl_1 on tbl (pid);    

2、创建一个函数,按规则返回它的上游

create or replace function gen_pid(int8) returns int8 as $$    
  -- 生成它的上游ID,200万以内的ID为根ID。其他都取比它小200万对应的那个ID,形成一颗50级的树。    
  select case when $1<=2000000 then null else $1-2000000 end;    
$$ language sql strict;    

3、写入1亿数据,形成深度为50的树。

insert into tbl select id, gen_pid(id) from generate_series(1,100000000) t(id) on conflict do nothing;    

递归查询

使用递归查询语法:

当一个用户获得一笔收入时,需要将他的收入,分配一部分佣金给他的直接上级,以及总的上级。输入UID,查找根、直接上级

with recursive tmp as (select * from tbl where uid=94499137    
union all    
select tbl.* from tbl join tmp on (tmp.pid=tbl.uid))    
select uid,pid from tmp where pid is null or uid=94499137;    
    
   uid    |   pid    
----------+----------    
 94499137 | 92499137            -- 直接上级    
   499137 |                     -- 根的PID=NULL    
(2 rows)    

不加限制,则返回的是以输入UID为末端,层层往上,输出整颗树的数据。

postgres=# with recursive tmp as (select * from tbl where uid=94499137    
union all    
select tbl.* from tbl join tmp on (tmp.pid=tbl.uid))    
select uid,pid from tmp;    
   uid    |   pid    
----------+----------    
 94499137 | 92499137    
 92499137 | 90499137    
 90499137 | 88499137    
 88499137 | 86499137    
 86499137 | 84499137    
 84499137 | 82499137    
 82499137 | 80499137    
 80499137 | 78499137    
 78499137 | 76499137    
 76499137 | 74499137    
 74499137 | 72499137    
 72499137 | 70499137    
 70499137 | 68499137    
 68499137 | 66499137    
 66499137 | 64499137    
 64499137 | 62499137    
 62499137 | 60499137    
 60499137 | 58499137    
 58499137 | 56499137    
 56499137 | 54499137    
 54499137 | 52499137    
 52499137 | 50499137    
 50499137 | 48499137    
 48499137 | 46499137    
 46499137 | 44499137    
 44499137 | 42499137    
 42499137 | 40499137    
 40499137 | 38499137    
 38499137 | 36499137    
 36499137 | 34499137    
 34499137 | 32499137    
 32499137 | 30499137    
 30499137 | 28499137    
 28499137 | 26499137    
 26499137 | 24499137    
 24499137 | 22499137    
 22499137 | 20499137    
 20499137 | 18499137    
 18499137 | 16499137    
 16499137 | 14499137    
 14499137 | 12499137    
 12499137 | 10499137    
 10499137 |  8499137    
  8499137 |  6499137    
  6499137 |  4499137    
  4499137 |  2499137    
  2499137 |   499137    
   499137 |    
(48 rows)    

性能压测

随机输入用户ID,返回它的直接上级,以及他的总上级ID。

vi test.sql    
    
\set uid random(1,100000000)    
with recursive tmp as (select * from tbl where uid=:uid    
union all    
select tbl.* from tbl join tmp on (tmp.pid=tbl.uid))    
select uid,pid from tmp where pid is null or uid=:uid;    
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120    

性能如下

transaction type: ./test.sql    
scaling factor: 1    
query mode: prepared    
number of clients: 56    
number of threads: 56    
duration: 120 s    
number of transactions actually processed: 8933930    
latency average = 0.752 ms    
latency stddev = 0.406 ms    
tps = 74448.004668 (including connections establishing)    
tps = 74458.612227 (excluding connections establishing)    
statement latencies in milliseconds:    
         0.002  \set uid random(1,100000000)    
         0.751  with recursive tmp as (select * from tbl where uid=:uid    

7.4万QPS

设计2

增加一个字段,表示这个节点是否享有分佣金的权利。(1表示有,0表示没有)。

同时,只有第一个享有分佣金权利的上级节点,以及根节点,享有分佣金的机会。 那么结构如何设计、SQL如何写呢?

1、结构

create table tbl(  
  uid int8 primary key,  -- 用户ID  
  pid int8,  -- 上级用户ID  
  status int2  -- 是否享有分佣权利(0表示没有,1表示有)  
);  

2、生成测试数据

insert into tbl select id, gen_pid(id), floor(random()*1.99) from generate_series(1,100000000) t(id) on conflict do nothing;   

3、查询SQL

with recursive tmp as (  
select uid,pid,status,'0' as p_status from tbl where uid=94499137    
union all    
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)  
)    
select uid,pid,status,p_status from tmp   
where pid is null -- 根节点 如果根也要判断是否有分金权限则  (pid is null and status=1)  
or   
(uid <> 94499137 and status=1 and substring(p_status,'\d(.*)') !~ '1');    
   uid    |   pid    | status |                     p_status                       
----------+----------+--------+--------------------------------------------------  
 88499137 | 86499137 |      1 | 1000  
   499137 |          |      0 | 010000001001000110010101010001101000001001011000  
(2 rows)  

4、全部路径查出来的例子

postgres=# with recursive tmp as (  
select uid,pid,status,'0' as p_status from tbl where uid=94499137    
union all    
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)  
)    
select uid,pid,status,p_status from tmp;    
   uid    |   pid    | status |                     p_status                       
----------+----------+--------+--------------------------------------------------  
 94499137 | 92499137 |      0 | 0  
 92499137 | 90499137 |      0 | 00  
 90499137 | 88499137 |      0 | 000  
 88499137 | 86499137 |      1 | 1000        -- 符合条件,输出第一个拥有分金权利的上级。   
 86499137 | 84499137 |      1 | 11000  
 84499137 | 82499137 |      0 | 011000  
 82499137 | 80499137 |      1 | 1011000  
 80499137 | 78499137 |      0 | 01011000  
 78499137 | 76499137 |      0 | 001011000  
 76499137 | 74499137 |      1 | 1001011000  
 74499137 | 72499137 |      0 | 01001011000  
 72499137 | 70499137 |      0 | 001001011000  
 70499137 | 68499137 |      0 | 0001001011000  
 68499137 | 66499137 |      0 | 00001001011000  
 66499137 | 64499137 |      0 | 000001001011000  
 64499137 | 62499137 |      1 | 1000001001011000  
 62499137 | 60499137 |      0 | 01000001001011000  
 60499137 | 58499137 |      1 | 101000001001011000  
 58499137 | 56499137 |      1 | 1101000001001011000  
 56499137 | 54499137 |      0 | 01101000001001011000  
 54499137 | 52499137 |      0 | 001101000001001011000  
 52499137 | 50499137 |      0 | 0001101000001001011000  
 50499137 | 48499137 |      1 | 10001101000001001011000  
 48499137 | 46499137 |      0 | 010001101000001001011000  
 46499137 | 44499137 |      1 | 1010001101000001001011000  
 44499137 | 42499137 |      0 | 01010001101000001001011000  
 42499137 | 40499137 |      1 | 101010001101000001001011000  
 40499137 | 38499137 |      0 | 0101010001101000001001011000  
 38499137 | 36499137 |      1 | 10101010001101000001001011000  
 36499137 | 34499137 |      0 | 010101010001101000001001011000  
 34499137 | 32499137 |      0 | 0010101010001101000001001011000  
 32499137 | 30499137 |      1 | 10010101010001101000001001011000  
 30499137 | 28499137 |      1 | 110010101010001101000001001011000  
 28499137 | 26499137 |      0 | 0110010101010001101000001001011000  
 26499137 | 24499137 |      0 | 00110010101010001101000001001011000  
 24499137 | 22499137 |      0 | 000110010101010001101000001001011000  
 22499137 | 20499137 |      1 | 1000110010101010001101000001001011000  
 20499137 | 18499137 |      0 | 01000110010101010001101000001001011000  
 18499137 | 16499137 |      0 | 001000110010101010001101000001001011000  
 16499137 | 14499137 |      1 | 1001000110010101010001101000001001011000  
 14499137 | 12499137 |      0 | 01001000110010101010001101000001001011000  
 12499137 | 10499137 |      0 | 001001000110010101010001101000001001011000  
 10499137 |  8499137 |      0 | 0001001000110010101010001101000001001011000  
  8499137 |  6499137 |      0 | 00001001000110010101010001101000001001011000  
  6499137 |  4499137 |      0 | 000001001000110010101010001101000001001011000  
  4499137 |  2499137 |      0 | 0000001001000110010101010001101000001001011000  
  2499137 |   499137 |      1 | 10000001001000110010101010001101000001001011000  
   499137 |          |      0 | 010000001001000110010101010001101000001001011000  
(48 rows)  

5、压测

随机输入用户ID,返回它的拥有分金权利的第一个上级,以及他的总上级ID。

vi test.sql    
    
\set uid random(1,100000000)    
with recursive tmp as (  
select uid,pid,status,'0' as p_status from tbl where uid=:uid    
union all    
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)  
)    
select uid,pid,status,p_status from tmp   
where pid is null   
or   
(uid <> :uid and status=1 and substring(p_status,'\d(.*)') !~ '1');     
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120    

性能如下

transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
duration: 120 s  
number of transactions actually processed: 9917080  
latency average = 0.678 ms  
latency stddev = 0.372 ms  
tps = 82640.572124 (including connections establishing)  
tps = 82652.202185 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set uid random(1,100000000)    
         0.676  with recursive tmp as (  

QPS: 82640

测试场景3

一个分支节点可能有多个子节点,或者说一个老板有多个直接下属,但是数据结构上没有闭环。

按照设计2,再新增一些子节点,挂到某些节点下面,满足这种场景的测试需求。

1、加子节点数据

insert into tbl select id, random()*1000000+1, floor(random()*1.99) from generate_series(100000001,110000000) t(id) on conflict do nothing;   
insert into tbl select id, 98000000+random()*1000000+1, floor(random()*1.99) from generate_series(110000001,120000000) t(id) on conflict do nothing;   
insert into tbl select id, 96000000+random()*1000000+1, floor(random()*1.99) from generate_series(120000001,140000000) t(id) on conflict do nothing;   

查看一个老板有多个下属的情况

postgres=# select * from tbl where pid=100;  
    uid    | pid | status   
-----------+-----+--------  
 102744852 | 100 |      0  
   2000100 | 100 |      0  
 102528366 | 100 |      0  
 101494941 | 100 |      1  
 103618372 | 100 |      1  
 102937592 | 100 |      0  
 108272885 | 100 |      0  
 106060898 | 100 |      0  
 105693863 | 100 |      1  
 108090257 | 100 |      1  
 109855280 | 100 |      0  
 109603022 | 100 |      1  
(12 rows)  

2、压测

随机输入用户ID,返回它的拥有分金权利的第一个上级,以及他的总上级ID。

vi test.sql    
    
\set uid random(1,140000000)    
with recursive tmp as (  
select uid,pid,status,'0' as p_status from tbl where uid=:uid    
union all    
select tbl.uid,tbl.pid,tbl.status,tbl.status||tmp.p_status as p_status from tbl join tmp on (tmp.pid=tbl.uid)  
)    
select uid,pid,status,p_status from tmp   
where pid is null   
or   
(uid <> :uid and status=1 and substring(p_status,'\d(.*)') !~ '1');     
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120    

性能如下

transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 56  
number of threads: 56  
duration: 120 s  
number of transactions actually processed: 13363407  
latency average = 0.503 ms  
latency stddev = 0.383 ms  
tps = 111342.309347 (including connections establishing)  
tps = 111359.640535 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set uid random(1,140000000)    
         0.502  with recursive tmp as (  

QPS: 111342

设计3

使用物化视图存储每个节点的所有上级节点。

小结

本文介绍的分佣场景,与“传销”模式类似。虽然都说要远离“传销”,但是“传销”的分佣模式是被很多场景所使用的,例如通过发展下线得到更多的佣金。

A赚的钱,必须要分给他的直线上游,同时可能还要分给他的顶级上游。

如果将层级关系使用关系数据库存储,那么实际上分佣最关键的是通过关系找到某个用户的上游,以及最上游。可以通过递归查询得到。

以1亿用户为例,最大50个层级的关系网,这样一个量级下面,查询速度可以达到7.4万QPS,瓶颈响应时间0.75毫秒。

相比传统的,直接将层级全部存储到一行中,递归有几个好处:

1、维护方便,更换关系时,只需要修改几条相关的记录。

2、查询性能可以满足。

而直接将所有层级都放到每一条记录中,最严重的问题就是更改关系时,需要动用的记录数可能非常非常多。动一个人的记录,就需要修改所有涉及到这个人的路径的所有记录,更新量是非常巨大的。

Flag Counter

digoal’s 大量PostgreSQL文章入口