执行计划维度统计判断执行计划翻转统计每种执行计划的统计信息 use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2

11 minute read

背景

pg_stat_plans 类似于auto_explain和pg_stat_statements的结合. 或者说它是从pg_stat_statements扩展而来.

pg_stat_statements一般被用来跟踪数据库中执行的SQL语句的次数, 共计执行时间, 块扫描的信息等等. 如下 :

       Column        |       Type       | Modifiers                                                                                   
---------------------+------------------+-----------                                                                                  
 userid              | oid              |   
 dbid                | oid              |   
 query               | text             |   
 calls               | bigint           |   
 total_time          | double precision |   
 rows                | bigint           |   
 shared_blks_hit     | bigint           |   
 shared_blks_read    | bigint           |   
 shared_blks_dirtied | bigint           |   
 shared_blks_written | bigint           |   
 local_blks_hit      | bigint           |   
 local_blks_read     | bigint           |   
 local_blks_dirtied  | bigint           |   
 local_blks_written  | bigint           |   
 temp_blks_read      | bigint           |   
 temp_blks_written   | bigint           |   
 blk_read_time       | double precision |   
 blk_write_time      | double precision |   

而pg_stat_plans呢? 除了记录调用次数, 共计执行时间以外, 还需要记录planid(从query tree hash而来), 如下 :

       Column        |       Type       | Modifiers   
---------------------+------------------+-----------  
 userid              | oid              |   
 dbid                | oid              |   
 planid              | oid              |   
 query               | text             |   
 had_our_search_path | boolean          |   
 query_valid         | boolean          |   
 calls               | bigint           |   
 total_time          | double precision |   
 rows                | bigint           |   
 shared_blks_hit     | bigint           |   
 shared_blks_read    | bigint           |   
 shared_blks_written | bigint           |   
 local_blks_hit      | bigint           |   
 local_blks_read     | bigint           |   
 local_blks_written  | bigint           |   
 temp_blks_read      | bigint           |   
 temp_blks_written   | bigint           |   
 last_startup_cost   | double precision |   
 last_total_cost     | double precision |   

换句话说, pg_stat_plans并没有真正记录explain的内容, 而是换算成了hash值, 这可以省去不少存储空间, 因为这些都是消耗内存的.

pg_stat_plans需要的共享内存在数据库启动时就被分配, 约等于pg_stat_plans.max * plans_query_size bytes;

安装

1. 下载

https://github.com/2ndQuadrant/pg_stat_plans.git

2. 下载后解压到 $PG_SRC/contrib/

3. 编译安装需要用到pg_config, 所以需要把它放到PATH中.

su - root  
. /home/postgres/.bash_profile  
cd $PG_SRC/contrib/pg_stat_plans/  
gmake clean  
gmake  
gmake install  

4. 9.1和9.2 使用create extension 安装.

9.0则使用pg_stat_plans.sql来安装.

5. 安装完后需要修改postgresql.conf重启数据库.

简单的配置如下 :

# postgresql.conf  
shared_preload_libraries = 'pg_stat_plans'  
# Optionally:  
pg_stat_plans.max = 10000  
pg_stat_plans.track = all  

如果plans_query_size bytes=2048的话, 需要大约20MB的共享内存.(如果track_activity_query_size没有设置的话, query_size使用PostgreSQL的全局参数值)

其他可用到的参数 :

pg_stat_plans adds the following configuration parameters:  
  
pg_stat_plans.max (integer)  
pg_stat_plans.max is the maximum number of plans tracked by the module   
(i.e., the maximum number of rows in the pg_stat_plans view).   
If more distinct plans than that are observed, information about the least-executed statements is discarded.   
The default value is 1000. This parameter can only be set at server start.  
  
pg_stat_plans.track (enum)  
pg_stat_plans.track controls which statements' plans are counted by the module.   
Specify top to track top-level statements (those issued directly by clients), all to also track nested statements   
(such as statements invoked within functions), or none to disable plan statistics collection.   
The default value is top. Only superusers can change this setting.  
  
pg_stat_plans.save (boolean)  
pg_stat_plans.save specifies whether to save plan statistics across server shutdowns.   
If it is off then statistics are not saved at shutdown nor reloaded at server start.   
The default value is on. This parameter can only be set in the postgresql.conf file or on the server command line.  
  
pg_stat_plans.planid_notice (boolean)  
Raise notice of a plan's id after its execution. Useful for verifying explain output on an ad-hoc basis.  
  
pg_stat_plans.explain_format (enum)  
pg_stat_plans.explain_format selects the EXPLAIN output format to be used   
(i.e the format that will be returned by pg_stat_plans_explain()).   
The allowed values are text, xml, json, and yaml. The default is text.  
  
pg_stat_plans.verbose (boolean)  
pg_stat_plans.verbose specifies if explain output should be verbose   
(that is, equivalent to specifying VERBOSE with SQL EXPLAIN). The default is off.  
  
pg_stat_plans.plans_query_size (integer)  
Controls the length in bytes of the stored SQL query string.   
Because truncating the stored strings prevents subsequently explaining the entry, it may be necessary to increase this value.   
The default is 2048.  

配置好后重启数据库.

测试

我这里使用PostgreSQL 9.2beta4进行测试.

1. 在PSQL中查看是否安装好了pg_stat_plans.

postgres=# \dx  
                       List of installed extensions  
     Name      | Version |   Schema   |            Description              
---------------+---------+------------+-----------------------------------  
 pg_stat_plans | 0.1     | public     | Monitors execution costs of plans  

2. 可以看到新增了3个函数, 2个视图.

-[ RECORD 7 ]-------+---------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------  
--------------------------------------------------------------------------------------------------------------------------  
Schema              | public  
Name                | pg_stat_plans_explain  
Result data type    | text  
Argument data types | planid oid, userid oid DEFAULT NULL::oid, dbid oid DEFAULT NULL::oid, encodingid oid DEFAULT NULL::oid  
Type                | normal  
-[ RECORD 8 ]-------+---------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------  
--------------------------------------------------------------------------------------------------------------------------  
Schema              | public  
Name                | pg_stat_plans_pprint  
Result data type    | text  
Argument data types | sqltext text  
Type                | normal  
-[ RECORD 9 ]-------+---------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------  
------------------------------------------------------------------------------------------------------------------------------------  
--------------------------------------------------------------------------------------------------------------------------  
Schema              | public  
Name                | pg_stat_plans_reset  
Result data type    | void  
Argument data types |   
Type                | normal  
  
postgres=#  \dv  
List of relations  
-[ RECORD 1 ]-----------------  
Schema | public  
Name   | pg_stat_plans  
Type   | view  
Owner  | postgres  
-[ RECORD 2 ]-----------------  
Schema | public  
Name   | pg_stat_plans_queries  
Type   | view  
Owner  | postgres  

3. 创建测试表, 插入测试数据.

分别测试一个SQL的两种执行计划. 那么pg_stat_plans将跟踪到这个改变, 并在pg_log中记录WARNING告警.

postgres=# drop table test;  
DROP TABLE  
postgres=# create table test (id int);  
CREATE TABLE  
postgres=# select pg_stat_plans_reset();  
 pg_stat_plans_reset   
---------------------  
 (1 row)  
postgres=# insert into test select generate_series(1,100000);  
INSERT 0 100000  
postgres=# select * from test where id=1;  
 id   
----  
  1  
(1 row)  

创建索引, 执行同一条SQL, 执行计划将发生变化. 以此来观察pg_stat_plans的变化.

postgres=# create index idx_test on test(id);  
CREATE INDEX  
postgres=# select * from test where id=1;  
 id   
----  
  1  
(1 row)  

查询pg_stat_plans, 可以看出planid=2758236521的记录发生了变化, query_valid=t变成了query_valid=f ;

postgres=# select planid,query,query_valid,pg_stat_plans_explain(planid, userid, dbid) from pg_stat_plans where dbid=12788 and query ~ 'test';  
-[ RECORD 1 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
planid                | 2758236521  
query                 | select * from test where id=1;  
query_valid           | f  
pg_stat_plans_explain | ***** Existing entry's planid (2758236521) and explain of original SQL query string planid (1252991393) differ *****  
                      | Index Only Scan using idx_test on test  (cost=0.00..2.30 rows=1 width=4)  
                      |   Index Cond: (id = 1)  
-[ RECORD 5 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
planid                | 1252991393  
query                 | select * from test where id=1;  
query_valid           | t  
pg_stat_plans_explain | Index Only Scan using idx_test on test  (cost=0.00..2.30 rows=1 width=4)  
                      |   Index Cond: (id = 1)  

从这个信息来看, planid=2758236521对应的SQL, 当前的执行计划(通过pg_stat_plans_explain(planid, userid, dbid)得到)与执行时的执行计划不一致, 也就是发生了改变, 所以”Existing entry’s planid (2758236521) and explain of original SQL query string planid (1252991393) differ”.

并且query_valid修改成f. 同时在pg_stat_plans中新增了一条planid=1252991393的执行计划. 与当前执行计划一致.

柱状图 :

postgres=# select * from pg_stat_plans_queries where normalize_query ~ 'test';  
-[ RECORD 1 ]-------+----------------------------------------------  
userid              | 10  
dbid                | 12788  
plan_ids            | {2758236521,1252991393}  
calls_histogram     | {1,5}  
avg_time_histogram  | {21.224,0.016}  
normalize_query     | select * from test where id=?;  
calls               | 6  
total_time          | 21.304  
time_variance       | 224.889632  
time_stddev         | 14.9963206154043  
rows                | 2  
shared_blks_hit     | 444  
shared_blks_read    | 3  
shared_blks_written | 0  
local_blks_hit      | 0  
local_blks_read     | 0  
local_blks_written  | 0  
temp_blks_read      | 0  
temp_blks_written   | 0  

解释 :

plan_ids 存储planid的数组

calls_histogram 存储的是对应的调用次数的数组

avg_time_histogram 存储的是对应的平均执行时间的数组.

从这里查询到的信息, 使用planid=2758236521这个执行计划的SQL, 调用了1次, 平均执行时间21.224毫秒.

使用planid=2758236521这个执行计划的SQL, 调用了5次, 平均执行时间0.016毫秒.

4. 接下来观察一下当执行计划与原先一致时, query_valid再变回去t值.

删除索引 :

postgres=# drop index idx_test;  
DROP INDEX  

重新查看pg_stat_plans_explain :

postgres=# select planid,query,query_valid,pg_stat_plans_explain(planid, userid, dbid) from pg_stat_plans where dbid=12788 and query ~ 'test';  
NOTICE:  updated pg_stat_plans query string of entry 2758236521  
CONTEXT:  SQL statement "EXPLAIN select * from test where id=1;"  
WARNING:  Existing pg_stat_plans entry planid (1252991393) differs from new plan for query (2758236521).  
-[ RECORD 1 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
planid                | 2758236521  
query                 | select * from test where id=1;  
query_valid           | f  
pg_stat_plans_explain | Seq Scan on test  (cost=0.00..1693.00 rows=1 width=4)  
                      |   Filter: (id = 1)  
-[ RECORD 6 ]---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
planid                | 1252991393  
query                 | select * from test where id=1;  
query_valid           | t  
pg_stat_plans_explain | ***** Existing entry's planid (1252991393) and explain of original SQL query string planid (2758236521) differ *****  
                      | Seq Scan on test  (cost=0.00..1693.00 rows=1 width=4)  
                      |   Filter: (id = 1)  

显然这里发生了调换, planid=2758236521的执行计划变回去了, 而planid=1252991393的SQL的执行计划则与之前index only scan不同.

所以这里发生了两个更新记录的操作. 都是修改query_valid.

小结

1. pg_stat_plans 一般可以用来发现数据库中执行计划发生变化的SQL。

第一, 从pg_stat_plans视图的query_valid字段值为false可以发现当前的SQL执行计划与执行这个SQL时用到的执行计划不一致.

第二, 从PostgreSQL log中的WARNING也能发现这个不一致.

2012-10-22 17:08:12.118 CST,"postgres","postgres",6720,"[local]",50850932.1a40,2,"SELECT",2012-10-22 16:52:02 CST,2/426,0,WARNING,55000,"Existing pg_stat_plans entry planid (2758236521) differs from new plan for query (1252991393).",,,,,,,,,"psql"  

2. pg_stat_plans_explain(planid, userid, dbid) 函数将调用PostgreSQL中的explain得到执行计划(query tree). 调用pg_stat_plans_explain除了得到当前的执行计划外, 还需要对query tree的hash值与planid进行匹配, 如果不匹配则修改query_valid字段中存储的值为false. 所以不调用pg_stat_plans_explain()的话不会对query_valid的内容进行更改.

3. planId |= hash_any(jstate.jumble, jstate.jumble_len) , 不同的执行计划可能得到同样的hash值, 也就是planid. 尽管这种概率较小.

【参考】
1. https://github.com/2ndQuadrant/pg_stat_plans

2. http://blog.163.com/digoal@126/blog/static/163877040201151534631313/

3. http://blog.163.com/digoal@126/blog/static/163877040201221382150858/

4. http://blog.ioguix.net/

5. src/backend/access/hash/hashfunc.c

/*  
* hash_any() -- hash a variable-length key into a 32-bit value  
*      k       : the key (the unaligned variable-length array of bytes)  
*      len     : the length of the key, counting by bytes  
*  
* Returns a uint32 value.  Every bit of the key affects every bit of  
* the return value.  Every 1-bit and 2-bit delta achieves avalanche.  
* About 6*len+35 instructions. The best hash table sizes are powers  
* of 2.  There is no need to do mod a prime (mod is sooo slow!).  
* If you need less than 32 bits, use a bitmask.  
*  
* Note: we could easily change this function to return a 64-bit hash value  
* by using the final values of both b and c.  b is perhaps a little less  
* well mixed than c, however.  
*/  
Datum  
hash_any(register const unsigned char *k, register int keylen)  
{  
   register uint32 a,  
               b,  
               c,  
               len;  
 
   /* Set up the internal state */  
   len = keylen;  
   a = b = c = 0x9e3779b9 + len + 3923095;  
 
   /* If the source pointer is word-aligned, we use word-wide fetches */  
   if (((intptr_t) k & UINT32_ALIGN_MASK) == 0)  
   {  
       /* Code path for aligned source data */  
       register const uint32 *ka = (const uint32 *) k;  
 
       /* handle most of the key */  
       while (len >= 12)  
       {  
           a += ka[0];  
           b += ka[1];  
           c += ka[2];  
           mix(a, b, c);  
           ka += 3;  
           len -= 12;  
       }  
 
       /* handle the last 11 bytes */  
       k = (const unsigned char *) ka;  
#ifdef WORDS_BIGENDIAN  
       switch (len)  
       {  
           case 11:  
               c += ((uint32) k[10] << 8);  
               /* fall through */  
           case 10:  
               c += ((uint32) k[9] << 16);  
               /* fall through */  
           case 9:  
               c += ((uint32) k[8] << 24);  
               /* the lowest byte of c is reserved for the length */  
               /* fall through */  
           case 8:  
               b += ka[1];  
               a += ka[0];  
               break;  
           case 7:  
               b += ((uint32) k[6] << 8);  
               /* fall through */  
           case 6:  
               b += ((uint32) k[5] << 16);  
               /* fall through */  
           case 5:  
               b += ((uint32) k[4] << 24);  
               /* fall through */  
           case 4:  
               a += ka[0];  
               break;  
           case 3:  
               a += ((uint32) k[2] << 8);  
               /* fall through */  
           case 2:  
               a += ((uint32) k[1] << 16);  
               /* fall through */  
           case 1:  
               a += ((uint32) k[0] << 24);  
               /* case 0: nothing left to add */  
       }  
#else                           /* !WORDS_BIGENDIAN */  
       switch (len)  
       {  
           case 11:  
               c += ((uint32) k[10] << 24);  
               /* fall through */  
           case 10:  
               c += ((uint32) k[9] << 16);  
               /* fall through */  
           case 9:  
               c += ((uint32) k[8] << 8);  
               /* the lowest byte of c is reserved for the length */  
               /* fall through */  
           case 8:  
               b += ka[1];  
               a += ka[0];  
               break;  
           case 7:  
               b += ((uint32) k[6] << 16);  
               /* fall through */  
           case 6:  
               b += ((uint32) k[5] << 8);  
               /* fall through */  
           case 5:  
               b += k[4];  
               /* fall through */  
           case 4:  
               a += ka[0];  
               break;  
           case 3:  
               a += ((uint32) k[2] << 16);  
               /* fall through */  
           case 2:  
               a += ((uint32) k[1] << 8);  
               /* fall through */  
           case 1:  
               a += k[0];  
               /* case 0: nothing left to add */  
       }  
#endif   /* WORDS_BIGENDIAN */  
   }  
   else  
   {  
       /* Code path for non-aligned source data */  
 
       /* handle most of the key */  
       while (len >= 12)  
       {  
#ifdef WORDS_BIGENDIAN  
           a += (k[3] + ((uint32) k[2] << 8) + ((uint32) k[1] << 16) + ((uint32) k[0] << 24));  
           b += (k[7] + ((uint32) k[6] << 8) + ((uint32) k[5] << 16) + ((uint32) k[4] << 24));  
           c += (k[11] + ((uint32) k[10] << 8) + ((uint32) k[9] << 16) + ((uint32) k[8] << 24));  
#else                           /* !WORDS_BIGENDIAN */  
           a += (k[0] + ((uint32) k[1] << 8) + ((uint32) k[2] << 16) + ((uint32) k[3] << 24));  
           b += (k[4] + ((uint32) k[5] << 8) + ((uint32) k[6] << 16) + ((uint32) k[7] << 24));  
           c += (k[8] + ((uint32) k[9] << 8) + ((uint32) k[10] << 16) + ((uint32) k[11] << 24));  
#endif   /* WORDS_BIGENDIAN */  
           mix(a, b, c);  
           k += 12;  
           len -= 12;  
       }  
 
       /* handle the last 11 bytes */  
#ifdef WORDS_BIGENDIAN  
       switch (len)            /* all the case statements fall through */  
       {  
           case 11:  
               c += ((uint32) k[10] << 8);  
           case 10:  
               c += ((uint32) k[9] << 16);  
           case 9:  
               c += ((uint32) k[8] << 24);  
               /* the lowest byte of c is reserved for the length */  
           case 8:  
               b += k[7];  
           case 7:  
               b += ((uint32) k[6] << 8);  
           case 6:  
               b += ((uint32) k[5] << 16);  
           case 5:  
               b += ((uint32) k[4] << 24);  
           case 4:  
               a += k[3];  
           case 3:  
               a += ((uint32) k[2] << 8);  
           case 2:  
               a += ((uint32) k[1] << 16);  
           case 1:  
               a += ((uint32) k[0] << 24);  
               /* case 0: nothing left to add */  
       }  
#else                           /* !WORDS_BIGENDIAN */  
       switch (len)            /* all the case statements fall through */  
       {  
           case 11:  
               c += ((uint32) k[10] << 24);  
           case 10:  
               c += ((uint32) k[9] << 16);  
           case 9:  
               c += ((uint32) k[8] << 8);  
               /* the lowest byte of c is reserved for the length */  
           case 8:  
               b += ((uint32) k[7] << 24);  
           case 7:  
               b += ((uint32) k[6] << 16);  
           case 6:  
               b += ((uint32) k[5] << 8);  
           case 5:  
               b += k[4];  
           case 4:  
               a += ((uint32) k[3] << 24);  
           case 3:  
               a += ((uint32) k[2] << 16);  
           case 2:  
               a += ((uint32) k[1] << 8);  
           case 1:  
               a += k[0];  
               /* case 0: nothing left to add */  
       }  
#endif   /* WORDS_BIGENDIAN */  
   }  
 
   final(a, b, c);  
 
   /* report the result */  
   return UInt32GetDatum(c);  
}  

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

执行计划维度统计判断执行计划翻转统计每种执行计划的统计信息 use pg_stat_plans in PostgreSQL 9.0, 9.1 and 9.2

背景

安装

测试

小结

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法