PostgreSQL 函数调试、诊断、优化 & auto_explain

4 minute read

背景

PostgreSQL的服务端编程能力比较强,不仅支持像java, R, python这种流行的编程语言,另外还内置了一个与Oracle plsql功能几乎一致的plpgsql语言。

对于开发人员来说,有了语言还不够的,调试是非常常见的需求。

所以PostgreSQL的函数语言怎么调试呢,使用pldebugger即可。

pldebugger的安装

https://git.postgresql.org/gitweb/?p=pldebugger.git;a=summary

plpgsql调试方法

https://www.pgadmin.org/docs/1.22/debugger.html

除了函数的调试,还有函数内SQL的执行计划如何观测。

auto_explain

auto_explain 一个比较给力的模块.

用于向日志输出超出定义阈值执行时间的SQL的执行计划。更强大的是可以选择是否输出内嵌SQL的执行计划(如函数中的SQL).

auto_explain的输出更详细的解释可以参考EXPLAIN的解释。或者man EXPLAIN.

SYNOPSIS  
       EXPLAIN [ ( option [, ...] ) ] statement  
       EXPLAIN [ ANALYZE ] [ VERBOSE ] statement  
  
       where option can be one of:  
  
           ANALYZE [ boolean ]  
           VERBOSE [ boolean ]  
           COSTS [ boolean ]  
           BUFFERS [ boolean ]  
           FORMAT { TEXT | XML | JSON | YAML }  

开启方式有两种 :

1. 通过配置文件postgresql.conf

需要重启数据库。

2. 通过LOAD ‘auto_explain’;

这个只在当前SESSION生效,不需要重启数据库, 需要超级用户权限。

其实还可以设置local_preload_libraries,session_preload_libraries参数来启用。

例子

1. 修改配置文件:

shared_preload_libraries = 'auto_explain'  

新增如下配置:

auto_explain.log_min_duration = 0  # 为了方便查看,这里把时间设置为0,所有SQL都会被auto_explain捕获输出.实际使用的时候适当调大。如 100ms  
auto_explain.log_analyze = true  

# 以下可选  
auto_explain.log_verbose = true  
auto_explain.log_buffers = true  
auto_explain.log_nested_statements = true  

然后重启数据库.重启时会输出一个LOG

postgres@db5-> LOG:  loaded library "$libdir/auto_explain.so"  

下面来执行几条SQL

postgres@db5-> psql -h 127.0.0.1 digoal digoal  
psql (9.1beta1)  
Type "help" for help.  
  
digoal=> select * from tbl_user_info limit 2;  
 id | firstname | lastname |   corp     
----+-----------+----------+----------  
  1 | zhou1     | digoal1  | sky-mobi  
  2 | zhou2     | digoal2  | sky-mobi  
(2 rows)  
  
digoal=> select count(*) from tbl_test;  
  count    
---------  
 1000100  
(1 row)  

日志输出

2011-06-08 15:19:14.390 CST,"digoal","digoal",13789,"127.0.0.1:59549",4def2270.35dd,1,"SELECT",2011-06-08 15:19:12 CST,2/18,0,LOG,00  
000,"duration: 0.040 ms  plan:  
Query Text: select * from tbl_user_info limit 2;  
Limit  (cost=0.00..0.04 rows=2 width=31) (actual time=0.020..0.023 rows=2 loops=1)  
  Output: id, firstname, lastname, corp  
  Buffers: shared hit=1  
  ->  Seq Scan on public.tbl_user_info  (cost=0.00..183.00 rows=10000 width=31) (actual time=0.014..0.015 rows=2 loops=1)  
        Output: id, firstname, lastname, corp  
        Buffers: shared hit=1",,,,,,,,,"psql"  
2011-06-08 15:19:17.902 CST,"digoal","digoal",13789,"127.0.0.1:59549",4def2270.35dd,2,"SELECT",2011-06-08 15:19:12 CST,2/19,0,LOG,00  
000,"duration: 1232.064 ms  plan:  
Query Text: select count(*) from tbl_test;  
Aggregate  (cost=16927.25..16927.26 rows=1 width=0) (actual time=1232.044..1232.045 rows=1 loops=1)  
  Output: count(*)  
  Buffers: shared hit=4426  
  ->  Seq Scan on public.tbl_test  (cost=0.00..14427.00 rows=1000100 width=0) (actual time=0.015..626.872 rows=1000100 loops=1)  
        Output: id  
        Buffers: shared hit=4426",,,,,,,,,"psql"  

2. 通过LOAD ‘auto_explain’ ;

这个只在当前SESSION生效,不需要重启数据库, 需要超级用户权限。

首先先恢复postgresql.conf的配置,去除前面的配置.然后重启数据库.

普通用户不允许加载auto_explain模块.(普通用户只允许加载$libdir/plugins目录下的模块,但是auto_explain即使拷贝到这个目录也不行)

digoal=> load 'auto_explain';  
ERROR:  access to library "auto_explain" is not allowed  
digoal=> \c digoal postgres  
You are now connected to database "digoal" as user "postgres".  
digoal=# load 'auto_explain';  
LOAD  
digoal=# set auto_explain.log_min_duration=0;  
SET  
digoal=# select * from tbl_user_info limit 2;  
 id | firstname | lastname |   corp     
----+-----------+----------+----------  
  1 | zhou1     | digoal1  | sky-mobi  
  2 | zhou2     | digoal2  | sky-mobi  
(2 rows)  

查看日志

2011-06-08 15:25:33.361 CST,"postgres","digoal",13968,"127.0.0.1:15445",4def23d5.3690,1,"SELECT",2011-06-08 15:25:09 CST,2/9,0,LOG,0  
0000,"duration: 0.048 ms  plan:  
Query Text: select * from tbl_user_info limit 2;  
Limit  (cost=0.00..0.04 rows=2 width=31)  
  ->  Seq Scan on tbl_user_info  (cost=0.00..183.00 rows=10000 width=31)",,,,,,,,,"psql"  

连接到digoal用户,这个LOAD的模块消失(后面会有解释).

digoal=# load 'auto_explain';  
LOAD  
digoal=# set session auto_explain.log_min_duration=0;  
SET  
digoal=# \c digoal digoal  
You are now connected to database "digoal" as user "digoal".  
digoal=> select * from tbl_user_info limit 3;  
 id | firstname | lastname |   corp     
----+-----------+----------+----------  
  1 | zhou1     | digoal1  | sky-mobi  
  2 | zhou2     | digoal2  | sky-mobi  
  3 | zhou3     | digoal3  | sky-mobi  
(3 rows)  

查看日志,没有auto_explain的输出.即使这里使用了session来定义这个参数的有效范围。

原因是session重新分配了,因为\c digoal digoal 相当于重新连接,backend也重新生成,如下.

digoal=> select * from pg_stat_activity;  
 datid | datname | procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port |         backend_s  
tart         |          xact_start           |          query_start          | waiting |          current_query            
-------+---------+---------+----------+---------+------------------+-------------+-----------------+-------------+------------------  
-------------+-------------------------------+-------------------------------+---------+---------------------------------  
 16430 | digoal  |   14155 |    16423 | digoal  | psql             | 127.0.0.1   |                 |        5959 | 2011-06-08 15:31:  
32.794906+08 | 2011-06-08 15:31:40.616346+08 | 2011-06-08 15:31:40.616346+08 | f       | select * from pg_stat_activity;  
(1 row)  
  
digoal=> \c digoal postgres  
You are now connected to database "digoal" as user "postgres".  
digoal=# select * from pg_stat_activity;  
 datid | datname | procpid | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_  
start         |          xact_start           |          query_start          | waiting |          current_query            
-------+---------+---------+----------+----------+------------------+-------------+-----------------+-------------+-----------------  
--------------+-------------------------------+-------------------------------+---------+---------------------------------  
 16430 | digoal  |   14161 |       10 | postgres | psql             | 127.0.0.1   |                 |        5961 | 2011-06-08 15:31  
:43.937297+08 | 2011-06-08 15:31:45.178005+08 | 2011-06-08 15:31:45.178005+08 | f       | select * from pg_stat_activity;  
(1 row)  

那么如何避免这样的情况发生呢? 答案是使用set role .

digoal=# load 'auto_explain';  
LOAD  
digoal=# set session auto_explain.log_min_duration=0;  
SET  
digoal=# set role digoal;  
SET  
digoal=> select * from tbl_user_info limit 3;  
 id | firstname | lastname |   corp     
----+-----------+----------+----------  
  1 | zhou1     | digoal1  | sky-mobi  
  2 | zhou2     | digoal2  | sky-mobi  
  3 | zhou3     | digoal3  | sky-mobi  
(3 rows) 

再看日志,已经有了 :

2011-06-08 15:29:30.011 CST,"postgres","digoal",14062,"127.0.0.1:5939",4def24bc.36ee,1,"SELECT",2011-06-08 15:29:00 CST,2/29,0,LOG,0  
0000,"duration: 0.031 ms  plan:  
Query Text: select * from tbl_user_info limit 3;  
Limit  (cost=0.00..0.05 rows=3 width=31)  
  ->  Seq Scan on tbl_user_info  (cost=0.00..183.00 rows=10000 width=31)",,,,,,,,,"psql"  

如果你不想看日志这么麻烦,想在client直接显示,也很方便,设置client_min_messages=’log’就可以看到auto explain的输出了。

set client_min_messages='log';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_verbose = true;
set auto_explain.log_buffers = true;
set auto_explain.log_nested_statements = true;

postgres=# do language plpgsql $$        
declare
begin
perform 1 from pg_class where oid=1;
end;
$$;

LOG:  duration: 0.008 ms  plan:
Query Text: SELECT 1 from pg_class where oid=1
Index Only Scan using pg_class_oid_index on pg_catalog.pg_class  (cost=0.27..1.29 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=1)
  Output: 1
  Index Cond: (pg_class.oid = '1'::oid)
  Heap Fetches: 0
  Buffers: shared hit=2
DO

另一个例子

load 'auto_explain';
set auto_explain.log_analyze =on;
set auto_explain.log_buffers =on;
set auto_explain.log_min_duration =0;
set auto_explain.log_nested_statements =on;
set auto_explain.log_timing =on;
set auto_explain.log_verbose =on;
set client_min_messages ='log';


postgres=# do language plpgsql $$
declare
  arr text;
begin
  select string_agg((random()*100000)::int::text, ',') into arr from generate_series(1,10);    
  execute 'select * from a where id in ('||arr||')';
end;
$$ ;


LOG:  duration: 0.024 ms  plan:
Query Text: select string_agg((random()*100000)::int::text, ',')          from generate_series(1,10)
Aggregate  (cost=25.01..25.02 rows=1 width=32) (actual time=0.021..0.021 rows=1 loops=1)
  Output: string_agg((((random() * '100000'::double precision))::integer)::text, ','::text)
  ->  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00 rows=1000 width=0) (actual time=0.008..0.009 rows=10 loops=1)
        Output: generate_series
        Function Call: generate_series(1, 10)

LOG:  duration: 0.104 ms  plan:
Query Text: select * from a where id in (23707,28634,42724,79820,82297,8859,19271,54725,5089,42662)
Index Scan using a_pkey on public.a  (cost=0.43..17.72 rows=10 width=45) (actual time=0.036..0.100 rows=10 loops=1)
  Output: id, info, crt_time
  Index Cond: (a.id = ANY ('{23707,28634,42724,79820,82297,8859,19271,54725,5089,42662}'::integer[]))
  Buffers: shared hit=40

DO
Time: 1.126 ms

参考

1. man LOAD

2. man EXPLAIN

3. http://www.postgresql.org/docs/9.6/static/auto-explain.html

Flag Counter

digoal’s 大量PostgreSQL文章入口