PostgreSQL 10.0 preview 功能增强 - 后台运行(pg_background)
背景
当用户在管理数据库时,如果要在交互式界面跑一些QUERY,但是不知道QUERY要运行多久,担心网络问题或者其他问题导致终端断开,QUERY执行情况不明的话。就需要后台运行这个功能了。
后台运行在LINUX中也很常见,比如
nohup ls -la / >/tmp/result 2>&1 &
这样的话,即使断开会话也没关系,这条命令会在后台运行,并将标准输出存入/tmp/result,标准错误也重定向到标准输出。
对于PostgreSQL数据库,在10.0的这个特性出来之前,用户可以使用dblink的异步调用,达到类似的目的,但是不能断开会话,注意了。
dblink异步调用
使用dblink异步调用,可以达到后台运行的目的,但是别忘了,dblink连接是当前会话建立的,当前会话退出,连接也会退出。
postgres=# create extension dblink;
CREATE EXTENSION
创建测试表
postgres=# create table t(id int);
CREATE TABLE
建立连接
postgres=# select dblink_connect('digoal','host=127.0.0.1 port=1921 user=postgres dbname=postgres');
dblink_connect
----------------
OK
(1 row)
开始事务(你也可以使用autocommit)
postgres=# select * from dblink_send_query('digoal', 'begin');
dblink_send_query
-------------------
1
(1 row
获取异步调用结果
postgres=# select * from dblink_get_result('digoal') as t(res text);
res
-------
BEGIN
(1 row)
获取异步调用结果为0时,才能对这个连接进行下一次异步调用。
postgres=# select * from dblink_get_result('digoal') as t(res text);
res
-----
(0 rows)
异步插入数据
postgres=# select * from dblink_send_query('digoal', 'insert into t values (1)');
dblink_send_query
-------------------
1
(1 row)
获取异步调用结果
postgres=# select * from dblink_get_result('digoal') as t(res text);
res
------------
INSERT 0 1
(1 row)
postgres=# select * from dblink_get_result('digoal') as t(res text);
res
-----
(0 rows)
查看数据是否插入,因为异步事务没有提交,所以看不到数据
postgres=# select * from t;
id
----
(0 rows)
提交异步事务
postgres=# select * from dblink_send_query('digoal', 'commit');
dblink_send_query
-------------------
1
(1 row)
查看数据,有了
postgres=# select * from t;
id
----
1
(1 row)
断开本地会话,异步会话也会断开,未提交的异步事务自动回滚。
postgres=# select dblink_connect('digoal','host=127.0.0.1 port=1921 user=postgres dbname=postgres');
-[ RECORD 1 ]--+---
dblink_connect | OK
postgres=# select * from dblink_send_query('digoal', 'begin');
-[ RECORD 1 ]-----+--
dblink_send_query | 1
postgres=# select * from dblink_get_result('digoal') as t(res text);
-[ RECORD 1 ]
res | BEGIN
postgres=# select * from dblink_get_result('digoal') as t(res text);
(0 rows)
postgres=# select * from dblink_send_query('digoal', 'insert into t values (2)');
-[ RECORD 1 ]-----+--
dblink_send_query | 1
退出当前会话
postgres=# \q
重新连接,异步会话已断开,并回滚。
postgres=# select * from t;
id
----
1
(1 row)
使用dblink异步接口,可以完成一些后台运行的功能,但是比较繁琐,也不完美(比如当前会话不能退出)
https://www.postgresql.org/docs/9.6/static/contrib-dblink-get-result.html
PostgreSQL 10.0 新增了background session的功能,这个功能可以对标类似Oracle的自治事务的功能。(是plsql函数或存储过程迁移到PostgreSQL plpgsql的有利兼容性,此前需要使用dblink模拟自治事务)
基于background session,开发了一个后台运行的管理接口。可以方便的执行后台事务了。
PostgreSQL 10.0 background session(自治事务)功能
参考
《PostgreSQL 10.0 preview 功能增强 - 匿名、自治事务(Oracle 兼容性)》
PostgreSQL 10.0 后台运行接口功能
一开始的设计比较简单,提供了三个API函数
• pg_background_launch : 开启后台work进程与会话,执行用户提供的SQL,返回后台会话的PID
• pg_background_result : 根据提供的PID,返回这个后台会话执行SQL的结果
• pg_background_detach : 根据提供的PID,返回这个后台会话执行SQL的结果,同时关闭这个后台进程。
最开始的讨论细节如下
Hi All,
I would like to take over pg_background patch and repost for
discussion and review.
Initially Robert Haas has share this for parallelism demonstration[1]
and abandoned later with
summary of open issue[2] with this pg_background patch need to be
fixed, most of them seems to be
addressed in core except handling of type exists without binary
send/recv functions and documentation.
I have added handling for types that don't have binary send/recv
functions in the attach patch and will
work on documentation at the end.
One concern with this patch is code duplication with
exec_simple_query(), we could
consider Jim Nasby’s patch[3] to overcome this, but certainly we
will end up by complicating
exec_simple_query() to make pg_background happy.
As discussed previously[1] pg_background is a contrib module that lets
you launch arbitrary command in a background worker.
• VACUUM in background
• Autonomous transaction implementation better than dblink way (i.e.
no separate authentication required).
• Allows to perform task like CREATE INDEX CONCURRENTLY from a
procedural language.
This module comes with following SQL APIs:
• pg_background_launch : This API takes SQL command, which user wants
to execute, and size of queue buffer.
This function returns the process id of background worker.
• pg_background_result : This API takes the process id as input
parameter and returns the result of command
executed thought the background worker.
• pg_background_detach : This API takes the process id and detach the
background process which is waiting for
user to read its results.
Here's an example of running vacuum and then fetching the results.
Notice that the
notices from the original session are propagated to our session; if an
error had occurred,
it would be re-thrown locally when we try to read the results.
postgres=# create table foo (a int);
CREATE TABLE
postgres=# insert into foo values(generate_series(1,5));
INSERT 0 5
postgres=# select pg_background_launch('vacuum verbose foo');
pg_background_launch
----------------------
65427
(1 row)
postgres=# select * from pg_background_result(65427) as (x text);
INFO: vacuuming "public.foo"
INFO: "foo": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
x
--------
VACUUM
(1 row)
Thanks to Vibhor kumar, Rushabh Lathia and Robert Haas for feedback.
Please let me know your thoughts, and thanks for reading.
[1]. https://www.postgresql.org/message-id/CA%2BTgmoam66dTzCP8N2cRcS6S6dBMFX%2BJMba%2BmDf68H%3DKAkNjPQ%40mail.gmail.com
[2]. https://www.postgresql.org/message-id/CA%2BTgmobPiT_3Qgjeh3_v%2B8Cq2nMczkPyAYernF_7_W9a-6T1PA%40mail.gmail.com
[3]. https://www.postgresql.org/message-id/54541779.1010906%40BlueTreble.com
Regards,
Amul
社区讨论后,这个架构改成了这样的,架构更优雅一些。
• pg_background_launch : 这个接口只是用来fork一个后台进程,并返回PID
• pg_background_run : 根据提供的PID,让这个后台进程执行提供的SQL。
• pg_background_result : 根据提供的PID,获取执行SQL的结果。
• pg_background_detach : 关闭后台进程与会话。
讨论细节如下
Hi all,
As we have discussed previously, we need to rework this patch as a client of
Peter Eisentraut's background sessions code[1].
Attaching trial version patch to discussed possible design and api.
We could have following APIs :
• pg_background_launch : This function start and stores new background
session, and returns the process id of background worker.
• pg_background_run : This API takes the process id and SQL command as
input parameter. Using this process id, stored worker's session is
retrieved and give SQL command is executed under it.
• pg_background_result : This API takes the process id as input
parameter and returns the result of command executed thought the
background worker session. Same as it was before but now result can
be fetch in LIFO order i.e. result of last executed query using
pg_background_run will be fetched first.
• pg_background_detach : This API takes the process id and detach the
background process. Stored worker's session is not dropped until this
called.
• TBC : API to discard result of last query or discard altogether?
• TBC : How about having one more api to see all existing sessions ?
Kindly share your thoughts/suggestions. Note that attach patch is WIP
version, code, comments & behaviour could be vague.
------------------
Quick demo:
------------------
Apply attach patch to the top of Peter Eisentraut's
0001-Add-background-sessions.patch[1]
postgres=# select pg_background_launch();
pg_background_launch
----------------------
21004
(1 row)
postgres=# select pg_background_run(21004, 'vacuum verbose foo');
pg_background_run
-------------------
(1 row)
postgres=# select * from pg_background_result(21004) as (x text);
INFO: vacuuming "public.foo"
INFO: "foo": found 0 removable, 5 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
x
--------
VACUUM
(1 row)
postgres=# select pg_background_run(21004, 'select * from foo');
pg_background_run
-------------------
(1 row)
postgres=# select * from pg_background_result(21004) as (x int);
x
---
1
2
3
4
5
(5 rows)
postgres=# select pg_background_detach(21004);
pg_background_detach
----------------------
(1 row)
References :
[1] https://www.postgresql.org/message-id/e1c2d331-ee6a-432d-e9f5-dcf85cffaf29%402ndquadrant.com.
Regards,
Amul Sul
后面的讨论又改成了这样,还是朝着优雅的方向在改进
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, failed
I’ll summarize here my thoughts as a reviewer on the current state of the pg_background:
1. Current version of a code [1] is fine, from my point of view. I have no suggestions on improving it. There is no documentation, but code is commented.
2. Patch is dependent on background sessions from the same commitfest.
3. There can exist more features, but for v1 there is surely enough features.
4. There is some controversy on where implemented feature shall be: in separate extension (as in this patch), in db_link, in some PL API, in FDW or somewhere else.
I think that new extension is an appropriate place for the feature. But I’m not certain.
Summarizing these points, appropriate statuses of the patch are ‘Ready for committer’ or ‘Rejected’.
Between these two I choose ‘Ready for committer’, I think patch is committable (after bg sessions).
Best regards, Andrey Borodin.
The new status of this patch is: Ready for Committer
这个patch在commit前,还有一些变数,比如可能将这个功能合并到dblink里面。而不是新开一个extension插件.
这个patch的讨论,详见邮件组,本文末尾URL。
PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。
参考
https://commitfest.postgresql.org/13/893/
https://www.postgresql.org/message-id/flat/CAAJ_b97FRO+Y_-SOgXGj-WPwtuWrmifgfgPvbXMAvUKQykvNvA@mail.gmail.com#CAAJ_b97FRO+Y_-SOgXGj-WPwtuWrmifgfgPvbXMAvUKQykvNvA@mail.gmail.com