PostgreSQL 9.4 pending patch : UNNEST with multiple args, and TABLE with multiple funcs

5 minute read

背景

PostgreSQL 9.4已经支持unnest(array, array, ….)多参数. 不需要使用WITH ORDINALITY

http://www.postgresql.org/docs/9.4/static/functions-array.html

  • unnest(anyarray) setof anyelement expand an array to a set of rows unnest(ARRAY[1,2])
    unnest(anyarray, anyarray [, …]) setof anyelement, anyelement [, …] expand multiple arrays (possibly of different types) to a set of rows. This is only allowed in the FROM clause; see Section 7.2.1.4 unnest(ARRAY[1,2],ARRAY[‘foo’,’bar’,’baz’])

使用WITH ORDINALITY会新增一个类似行号的值, 即一一匹配的行号.

PostgreSQL 的一个补丁, 支持多个数组输入, 一对一的输出行列转换后的结果.

http://www.postgresql.org/message-id/flat/48bb41eca62e428687cc9b8241661427@news-out.riddles.org.uk#48bb41eca62e428687cc9b8241661427@news-out.riddles.org.uk

例如pg_stats的most_common_vals和most_common_freqs是两个数组, 在没有这个补丁时, 并排输出需要使用窗口函数进行关联后输出, 或者使用函数来输出.

例如 http://blog.163.com/digoal@126/blog/static/1638770402013710105353862/ 文中用到的关联.

digoal=# select * from    
(select row_number() over(partition by r) as rn,ele from (select unnest(most_common_elems::text::int[]) ele,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t1  
join  
(select row_number() over(partition by r) as rn,freq from (select unnest(most_common_elem_freqs) freq,2 as r from pg_stats where tablename='test_2' and attname='appid') t) t2  
on (t1.rn=t2.rn) order by t2.freq desc limit 20;  
 rn  | ele | rn  |    freq      
-----+-----+-----+------------  
   3 |   2 |   3 |   0.803167  
   9 |   8 |   9 |   0.802233  
   7 |   6 |   7 |   0.802133  
   8 |   7 |   8 |   0.802067  
   6 |   5 |   6 |   0.801633  
   4 |   3 |   4 |   0.801367  
   5 |   4 |   5 |     0.8006  
  10 |   9 |  10 |   0.799933  
   2 |   1 |   2 |   0.798167  
   1 |   0 |   1 |     0.5835  
  11 |  10 |  11 |     0.5822  
  92 |  91 |  92 | 0.00263333  
 320 | 319 | 320 | 0.00256667  
  15 |  14 |  15 | 0.00246667  
 769 | 768 | 769 | 0.00243333  
 961 | 960 | 961 | 0.00243333  
  42 |  41 |  42 | 0.00236667  
 503 | 502 | 503 | 0.00236667  
 651 | 650 | 651 | 0.00236667  
 664 | 663 | 664 | 0.00236667  
(20 rows)  

使用本补丁后, 这个SQL可以改成

digoal=# select u.* from pg_stats t,unnest(most_common_elems::text::int[],most_common_elem_freqs) WITH ORDINALITY as u(a,b,r) where t.tablename='test_2' and t.attname='appid' and u.a is not null order by u.b desc limit 20;  
  a  |     b      |  r    
-----+------------+-----  
   2 |   0.803167 |   3  
   8 |   0.802233 |   9  
   6 |   0.802133 |   7  
   7 |   0.802067 |   8  
   5 |   0.801633 |   6  
   3 |   0.801367 |   4  
   4 |     0.8006 |   5  
   9 |   0.799933 |  10  
   1 |   0.798167 |   2  
   0 |     0.5835 |   1  
  10 |     0.5822 |  11  
  91 | 0.00263333 |  92  
 319 | 0.00256667 | 320  
  14 | 0.00246667 |  15  
 768 | 0.00243333 | 769  
 960 | 0.00243333 | 961  
  41 | 0.00236667 |  42  
 502 | 0.00236667 | 503  
 650 | 0.00236667 | 651  
 663 | 0.00236667 | 664  
(20 rows)  

使用后大大简化了SQL写法.

补丁安装 :

[root@db-172-16-3-39 soft_bak]# tar -zxvf postgresql-5e3e8e4.tar.gz  
[root@db-172-16-3-39 soft_bak]# cd postgresql-5e3e8e4  
[root@db-172-16-3-39 postgresql-5e3e8e4]# wget http://www.postgresql.org/message-id/attachment/29844/table-functions.patch  
[root@db-172-16-3-39 postgresql-5e3e8e4]# patch -p1 < ./table-functions.patch   
patching file src/backend/access/common/tupdesc.c  
patching file src/backend/commands/explain.c  
patching file src/backend/executor/nodeFunctionscan.c  
patching file src/backend/nodes/copyfuncs.c  
patching file src/backend/nodes/equalfuncs.c  
patching file src/backend/nodes/makefuncs.c  
patching file src/backend/nodes/nodeFuncs.c  
patching file src/backend/nodes/outfuncs.c  
patching file src/backend/nodes/readfuncs.c  
patching file src/backend/optimizer/path/allpaths.c  
patching file src/backend/optimizer/path/costsize.c  
patching file src/backend/optimizer/path/pathkeys.c  
patching file src/backend/optimizer/plan/createplan.c  
patching file src/backend/optimizer/plan/initsplan.c  
patching file src/backend/optimizer/plan/planner.c  
patching file src/backend/optimizer/plan/setrefs.c  
patching file src/backend/optimizer/plan/subselect.c  
patching file src/backend/optimizer/prep/prepjointree.c  
patching file src/backend/optimizer/util/clauses.c  
patching file src/backend/optimizer/util/pathnode.c  
patching file src/backend/parser/gram.y  
patching file src/backend/parser/parse_clause.c  
patching file src/backend/parser/parse_relation.c  
patching file src/backend/rewrite/rewriteHandler.c  
patching file src/backend/utils/adt/ruleutils.c  
patching file src/include/access/tupdesc.h  
patching file src/include/nodes/execnodes.h  
patching file src/include/nodes/parsenodes.h  
patching file src/include/nodes/plannodes.h  
patching file src/include/optimizer/pathnode.h  
patching file src/include/optimizer/paths.h  
patching file src/include/parser/parse_relation.h  
[root@db-172-16-3-39 postgresql-5e3e8e4]# ./configure --prefix=/home/pg94/pgsql9.4devel --with-pgport=2999 --with-perl --with-tcl --with-python --with-openssl --with-pam --without-ldap --with-libxml --with-libxslt --enable-thread-safety --with-wal-blocksize=16 && gmake && gmake install  
[root@db-172-16-3-39 postgresql-5e3e8e4]# cd contrib/  
[root@db-172-16-3-39 contrib]# gmake && gmake install  
  
[root@db-172-16-3-39 postgresql-5e3e8e4]# su - pg94  
pg94@db-172-16-3-39-> initdb -D $PGDATA -E UTF8 --locale=C -W -U postgres  
pg_ctl start  

补丁实现了table的多srf函数支持

digoal=# select * from table(generate_series(1,10),generate_series(2,5));  
 ?column? | ?column?   
----------+----------  
        1 |        2  
        2 |        3  
        3 |        4  
        4 |        5  
        5 |           
        6 |           
        7 |           
        8 |           
        9 |           
       10 |           
(10 rows)  

补丁的实施原理,

unnest(a,b,c)  

SQL解析时转换成

TABLE(unnest(a),unnest(b),unnest(c))  

所以这个补丁实际上并不是改了unnest函数, 而是改了sql parser.

因此unnest(anyarray, anyarray,…) 这个函数是不存在的, 也不能写在select clause中.

所以不能用以下写法

digoal=# select unnest(most_common_elems::text::int[],most_common_elem_freqs) from pg_stats t where t.tablename='test_2' and t.attname='appid' limit 20;  
ERROR:  function unnest(integer[], real[]) does not exist  
LINE 1: select unnest(most_common_elems::text::int[],most_common_ele...  
               ^  
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.  

本补丁的详细介绍

The spec syntax for table function calls, <table function derived table>  
in <table reference>, looks like TABLE(func(args...)) AS ...  
  
This patch implements that, plus an extension: it allows multiple  
functions, TABLE(func1(...), func2(...), func3(...)) [WITH ORDINALITY]  
and defines this as meaning that the functions are to be evaluated in  
parallel.  
  
This is implemented by changing RangeFunction, function RTEs, and the  
FunctionScan node to take lists of function calls rather than a single  
function. The calling convention for SRFs is completely unchanged; each  
function returns its own rows (or a tuplestore in materialize mode) just  
as before, and FunctionScan combines the results into a single output  
tuple (keeping track of which functions are exhausted in order to  
correctly fill in nulls on a backwards scan).  
  
Then, a hack in the parser converts unnest(...) appearing as a  
func_table (and only there) into a list of unnest() calls, one for each  
parameter.  So  
  
   select ... from unnest(a,b,c)  
  
is converted to  
  
   select ... from TABLE(unnest(a),unnest(b),unnest(c))  
  
and if unnest appears as part of an existing list inside TABLE(), it's  
expanded to multiple entries there too.  
  
This parser hackery is of course somewhat ugly. But given the objective  
of implementing the spec's unnest syntax, it seems to be the least ugly  
of the possible approaches. (The hard part of doing it any other way  
would be generating the description of the result type; composite array  
parameters expand into multiple result columns.)  
  
Overall, it's my intention here to remove as many as feasible of the old  
reasons why one might use an SRF in the select list. This should also  
address the points that Josh brought up in discussion of ORDINALITY  
regarding use of SRF-in-select to unnest multiple arrays.  
  
(As a side issue, this patch also sets up pathkeys for ordinality along  
the lines of a patch I suggested to Greg a while back in response to  
his.)  
  
Current patch status:  
  
This is a first working cut: no docs, no tests, not enough comments, the  
deparse logic probably needs more work (it deparses correctly but the  
formatting may be suboptimal). However all the functionality is believed  
to be in place.  

小结

1. 这个补丁比较适合并列数组的查询需求.

例如本例中用到的pg_stats中的频繁出现的值和频繁出现的值对应的出现频率这2个字段就是一一对应的数组.

参考

1. http://www.postgresql.org/message-id/flat/48bb41eca62e428687cc9b8241661427@news-out.riddles.org.uk#48bb41eca62e428687cc9b8241661427@news-out.riddles.org.uk

2. http://blog.163.com/digoal@126/blog/static/16387704020136301599959/

3. SRFs(setof return functions) 返回多行的函数.

4. http://blog.163.com/digoal@126/blog/static/1638770402013710105353862/

Flag Counter

digoal’s 大量PostgreSQL文章入口