PostgreSQL pg_stat_ pg_statio_ 统计信息(scan,read,fetch,hit)源码解读
背景
PostgreSQL 几个统计信息的解释难以理解,所以本文花一些时间从源码的角度来解释一下。
让大家对这几个容易误解的统计值有更好的理解。
比较难理解的几个统计值为:
pg_stat_all_indexes 的
idx_scan
idx_tup_read
idx_tup_fetch
pg_statio_all_indexes 的
idx_blks_read
idx_blks_hit
pg_stat_all_tables 的
seq_scan
seq_tup_read
idx_scan
idx_tup_fetch
pg_statio_all_tables 的
heap_blks_read
heap_blks_hit
idx_blks_read
idx_blks_hit
toast_blks_read heap表对应的toast表统计字段
toast_blks_hit
tidx_blks_read heap表的索引对应的toast索引统计字段
tidx_blks_hit
统计信息来源
首先需要获得以上统计字段的信息来源:
postgres=# \d+ pg_stat_all_indexes
View "pg_catalog.pg_stat_all_indexes"
Column | Type | Modifiers | Storage | Description
---------------+--------+-----------+---------+-------------
relid | oid | | plain |
indexrelid | oid | | plain |
schemaname | name | | plain |
relname | name | | plain |
indexrelname | name | | plain |
idx_scan | bigint | | plain | -- pg_stat_get_numscans(i.oid)
idx_tup_read | bigint | | plain | -- pg_stat_get_tuples_returned(i.oid)
idx_tup_fetch | bigint | | plain | -- pg_stat_get_tuples_fetched(i.oid)
View definition:
SELECT c.oid AS relid,
i.oid AS indexrelid,
n.nspname AS schemaname,
c.relname,
i.relname AS indexrelname,
pg_stat_get_numscans(i.oid) AS idx_scan,
pg_stat_get_tuples_returned(i.oid) AS idx_tup_read,
pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch
FROM pg_class c
JOIN pg_index x ON c.oid = x.indrelid
JOIN pg_class i ON i.oid = x.indexrelid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]);
postgres=# \d+ pg_statio_all_indexes
View "pg_catalog.pg_statio_all_indexes"
Column | Type | Modifiers | Storage | Description
---------------+--------+-----------+---------+-------------
relid | oid | | plain |
indexrelid | oid | | plain |
schemaname | name | | plain |
relname | name | | plain |
indexrelname | name | | plain |
idx_blks_read | bigint | | plain | -- pg_stat_get_blocks_fetched(i.oid) - pg_stat_get_blocks_hit(i.oid)
idx_blks_hit | bigint | | plain | -- pg_stat_get_blocks_hit(i.oid)
View definition:
SELECT c.oid AS relid,
i.oid AS indexrelid,
n.nspname AS schemaname,
c.relname,
i.relname AS indexrelname,
pg_stat_get_blocks_fetched(i.oid) - pg_stat_get_blocks_hit(i.oid) AS idx_blks_read,
pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit
FROM pg_class c
JOIN pg_index x ON c.oid = x.indrelid
JOIN pg_class i ON i.oid = x.indexrelid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]);
postgres=# \d+ pg_stat_all_tables
View "pg_catalog.pg_stat_all_tables"
Column | Type | Modifiers | Storage | Description
---------------------+--------------------------+-----------+---------+-------------
relid | oid | | plain |
schemaname | name | | plain |
relname | name | | plain |
seq_scan | bigint | | plain | -- pg_stat_get_numscans(c.oid)
seq_tup_read | bigint | | plain | -- pg_stat_get_tuples_returned(c.oid)
idx_scan | bigint | | plain | -- sum(pg_stat_get_numscans(i.indexrelid))
idx_tup_fetch | bigint | | plain | -- sum(pg_stat_get_tuples_fetched(i.indexrelid)) + pg_stat_get_tuples_fetched(c.oid)
n_tup_ins | bigint | | plain |
n_tup_upd | bigint | | plain |
n_tup_del | bigint | | plain |
n_tup_hot_upd | bigint | | plain |
n_live_tup | bigint | | plain |
n_dead_tup | bigint | | plain |
n_mod_since_analyze | bigint | | plain |
last_vacuum | timestamp with time zone | | plain |
last_autovacuum | timestamp with time zone | | plain |
last_analyze | timestamp with time zone | | plain |
last_autoanalyze | timestamp with time zone | | plain |
vacuum_count | bigint | | plain |
autovacuum_count | bigint | | plain |
analyze_count | bigint | | plain |
autoanalyze_count | bigint | | plain |
View definition:
SELECT c.oid AS relid,
n.nspname AS schemaname,
c.relname,
pg_stat_get_numscans(c.oid) AS seq_scan,
pg_stat_get_tuples_returned(c.oid) AS seq_tup_read,
sum(pg_stat_get_numscans(i.indexrelid))::bigint AS idx_scan,
sum(pg_stat_get_tuples_fetched(i.indexrelid))::bigint + pg_stat_get_tuples_fetched(c.oid) AS idx_tup_fetch,
pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins,
pg_stat_get_tuples_updated(c.oid) AS n_tup_upd,
pg_stat_get_tuples_deleted(c.oid) AS n_tup_del,
pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd,
pg_stat_get_live_tuples(c.oid) AS n_live_tup,
pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze,
pg_stat_get_vacuum_count(c.oid) AS vacuum_count,
pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count,
pg_stat_get_analyze_count(c.oid) AS analyze_count,
pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count
FROM pg_class c
LEFT JOIN pg_index i ON c.oid = i.indrelid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])
GROUP BY c.oid, n.nspname, c.relname;
postgres=# \d+ pg_statio_all_tables
View "pg_catalog.pg_statio_all_tables"
Column | Type | Modifiers | Storage | Description
-----------------+--------+-----------+---------+-------------
relid | oid | | plain |
schemaname | name | | plain |
relname | name | | plain |
heap_blks_read | bigint | | plain | -- pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)
heap_blks_hit | bigint | | plain | -- pg_stat_get_blocks_hit(c.oid)
idx_blks_read | bigint | | plain | -- sum(pg_stat_get_blocks_fetched(i.indexrelid) - pg_stat_get_blocks_hit(i.indexrelid))
idx_blks_hit | bigint | | plain | -- sum(pg_stat_get_blocks_hit(i.indexrelid))
toast_blks_read | bigint | | plain | -- pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)
toast_blks_hit | bigint | | plain | -- pg_stat_get_blocks_hit(t.oid)
tidx_blks_read | bigint | | plain | -- sum(pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))
tidx_blks_hit | bigint | | plain | -- sum(pg_stat_get_blocks_hit(x.indexrelid))
View definition:
SELECT c.oid AS relid,
n.nspname AS schemaname,
c.relname,
pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid) AS heap_blks_read,
pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
sum(pg_stat_get_blocks_fetched(i.indexrelid) - pg_stat_get_blocks_hit(i.indexrelid))::bigint AS idx_blks_read,
sum(pg_stat_get_blocks_hit(i.indexrelid))::bigint AS idx_blks_hit,
pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
sum(pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))::bigint AS tidx_blks_read,
sum(pg_stat_get_blocks_hit(x.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class c
LEFT JOIN pg_index i ON c.oid = i.indrelid
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid
LEFT JOIN pg_index x ON t.oid = x.indrelid
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])
GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
以上几个统计字段的统计信息来自如下函数:
idx_scan | bigint | | plain | -- pg_stat_get_numscans(i.oid)
idx_tup_read | bigint | | plain | -- pg_stat_get_tuples_returned(i.oid)
idx_tup_fetch | bigint | | plain | -- pg_stat_get_tuples_fetched(i.oid)
idx_blks_read | bigint | | plain | -- pg_stat_get_blocks_fetched(i.oid) - pg_stat_get_blocks_hit(i.oid)
idx_blks_hit | bigint | | plain | -- pg_stat_get_blocks_hit(i.oid)
seq_scan | bigint | | plain | -- pg_stat_get_numscans(c.oid)
seq_tup_read | bigint | | plain | -- pg_stat_get_tuples_returned(c.oid)
idx_scan | bigint | | plain | -- sum(pg_stat_get_numscans(i.indexrelid))
idx_tup_fetch | bigint | | plain | -- sum(pg_stat_get_tuples_fetched(i.indexrelid)) + pg_stat_get_tuples_fetched(c.oid)
heap_blks_read | bigint | | plain | -- pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)
heap_blks_hit | bigint | | plain | -- pg_stat_get_blocks_hit(c.oid)
idx_blks_read | bigint | | plain | -- sum(pg_stat_get_blocks_fetched(i.indexrelid) - pg_stat_get_blocks_hit(i.indexrelid))
idx_blks_hit | bigint | | plain | -- sum(pg_stat_get_blocks_hit(i.indexrelid))
toast_blks_read | bigint | | plain | -- pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)
toast_blks_hit | bigint | | plain | -- pg_stat_get_blocks_hit(t.oid)
tidx_blks_read | bigint | | plain | -- sum(pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))
tidx_blks_hit | bigint | | plain | -- sum(pg_stat_get_blocks_hit(x.indexrelid))
这些SQL函数对应的C函数如下:
postgres=# \df+ pg_stat_get_numscans
| Source code | Description
+----------------------+--------------------------------------------------
| pg_stat_get_numscans | statistics: number of scans done for table/index
postgres=# \df+ pg_stat_get_tuples_returned
| Source code | Description
+-----------------------------+----------------------------------------------
| pg_stat_get_tuples_returned | statistics: number of tuples read by seqscan
postgres=# \df+ pg_stat_get_tuples_fetched
| Source code | Description
+----------------------------+-------------------------------------------------
| pg_stat_get_tuples_fetched | statistics: number of tuples fetched by idxscan
postgres=# \df+ pg_stat_get_blocks_fetched
| Source code | Description
+----------------------------+--------------------------------------
| pg_stat_get_blocks_fetched | statistics: number of blocks fetched
postgres=# \df+ pg_stat_get_blocks_hit
| Source code | Description
+------------------------+---------------------------------------------
| pg_stat_get_blocks_hit | statistics: number of blocks found in cache
源码分析
以上5个C函数的源码分析
pg_stat_get_numscans
result = (int64) (tabentry->numscans);
pg_stat_get_tuples_returned
result = (int64) (tabentry->tuples_returned);
pg_stat_get_tuples_fetched
result = (int64) (tabentry->tuples_fetched);
pg_stat_get_blocks_fetched
result = (int64) (tabentry->blocks_fetched);
pg_stat_get_blocks_hit
result = (int64) (tabentry->blocks_hit);
他们都是返回以下数据结构中的某项计数器的值:
/* ----------
* PgStat_StatTabEntry The collector's data per table (or index)
* ----------
*/
typedef struct PgStat_StatTabEntry
{
Oid tableid;
PgStat_Counter numscans;
PgStat_Counter tuples_returned;
PgStat_Counter tuples_fetched;
PgStat_Counter tuples_inserted;
PgStat_Counter tuples_updated;
PgStat_Counter tuples_deleted;
PgStat_Counter tuples_hot_updated;
PgStat_Counter n_live_tuples;
PgStat_Counter n_dead_tuples;
PgStat_Counter changes_since_analyze;
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
TimestampTz vacuum_timestamp; /* user initiated vacuum */
PgStat_Counter vacuum_count;
TimestampTz autovac_vacuum_timestamp; /* autovacuum initiated */
PgStat_Counter autovac_vacuum_count;
TimestampTz analyze_timestamp; /* user initiated */
PgStat_Counter analyze_count;
TimestampTz autovac_analyze_timestamp; /* autovacuum initiated */
PgStat_Counter autovac_analyze_count;
} PgStat_StatTabEntry;
这些值是由pgstats进程来进行统计的
PgstatCollectorMain(int argc, char *argv[])
{
...
for (;;)
{
/* Clear any already-pending wakeups */
ResetLatch(&pgStatLatch);
...
/*
* O.K. - we accept this message. Process it.
*/
switch (msg.msg_hdr.m_type)
{
...
case PGSTAT_MTYPE_TABSTAT:
pgstat_recv_tabstat((PgStat_MsgTabstat *) &msg, len);
break;
...
/* ----------
* pgstat_recv_tabstat() -
*
* Count what the backend has done.
* ----------
*/
static void
pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
{
...
tabentry->numscans += tabmsg->t_counts.t_numscans;
tabentry->tuples_returned += tabmsg->t_counts.t_tuples_returned;
tabentry->tuples_fetched += tabmsg->t_counts.t_tuples_fetched;
tabentry->tuples_inserted += tabmsg->t_counts.t_tuples_inserted;
tabentry->tuples_updated += tabmsg->t_counts.t_tuples_updated;
tabentry->tuples_deleted += tabmsg->t_counts.t_tuples_deleted;
tabentry->tuples_hot_updated += tabmsg->t_counts.t_tuples_hot_updated;
tabentry->n_live_tuples += tabmsg->t_counts.t_delta_live_tuples;
tabentry->n_dead_tuples += tabmsg->t_counts.t_delta_dead_tuples;
tabentry->changes_since_analyze += tabmsg->t_counts.t_changed_tuples;
tabentry->blocks_fetched += tabmsg->t_counts.t_blocks_fetched;
tabentry->blocks_hit += tabmsg->t_counts.t_blocks_hit;
...
tabmsg->t_counts.? 则是通过如下macro进行改写的:
/* nontransactional event counts are simple enough to inline */
#define pgstat_count_heap_scan(rel) \
do { \
if ((rel)->pgstat_info != NULL) \
(rel)->pgstat_info->t_counts.t_numscans++; \
} while (0)
#define pgstat_count_heap_getnext(rel) \
do { \
if ((rel)->pgstat_info != NULL) \
(rel)->pgstat_info->t_counts.t_tuples_returned++; \
} while (0)
#define pgstat_count_heap_fetch(rel) \
do { \
if ((rel)->pgstat_info != NULL) \
(rel)->pgstat_info->t_counts.t_tuples_fetched++; \
} while (0)
#define pgstat_count_index_scan(rel) \
do { \
if ((rel)->pgstat_info != NULL) \
(rel)->pgstat_info->t_counts.t_numscans++; \
} while (0)
#define pgstat_count_index_tuples(rel, n) \
do { \
if ((rel)->pgstat_info != NULL) \
(rel)->pgstat_info->t_counts.t_tuples_returned += (n); \
} while (0)
#define pgstat_count_buffer_read(rel) \
do { \
if ((rel)->pgstat_info != NULL) \
(rel)->pgstat_info->t_counts.t_blocks_fetched++; \
} while (0)
#define pgstat_count_buffer_hit(rel) \
do { \
if ((rel)->pgstat_info != NULL) \
(rel)->pgstat_info->t_counts.t_blocks_hit++; \
} while (0)
取出每个macro对应的调用以及解释如下:
pgstat_count_index_scan(idx_oid)
统计指定索引的扫描次数,和扫描多少条记录无关,和node有个,一般看执行计划中,同一个索引如果在NODE中出现一次,那么就算扫描一次。
pgstat_count_index_tuples(rel, n)
统计从指定索引获取tid的条数
pgstat_count_heap_fetch
统计从使用指定索引的tid获取HEAP tuple的条数
pgstat_count_buffer_read
统计所有的buffer read次数, 包括在postgresql shared buffers中命中的,或者未在postgresql shared buffers中命中的。
pgstat_count_buffer_hit(rel)
统计在shared buffers中命中的buffer read次数。
pgstat_count_heap_scan(rel)
统计指定表的,全表扫描的次数,和返回的记录数无关,只和执行计划中的node相关,如果在plan中某个表只有一个seq scan的次数,则这条SQL执行一次时算一次。
pgstat_count_heap_getnext(rel)
指从全表扫描节点中扫描的记录数
梳理一下最初的统计字段和这些macro的对应关系:
pg_stat_all_indexes 的
idx_scan 对应 pgstat_count_index_scan(idx_oid) 产生的计数的累加
idx_tup_read 对应 pgstat_count_index_tuples(idx_oid, n) 产生的计数的累加
idx_tup_fetch 对应 pgstat_count_heap_fetch(idx_oid) 产生的计数的累加
pg_statio_all_indexes 的
idx_blks_read 对应 pgstat_count_buffer_read(idx_oid) 产生的计数的累加
idx_blks_hit 对应 pgstat_count_buffer_hit(idx_oid) 产生的计数的累加
pg_stat_all_tables 的
seq_scan 对应 pgstat_count_heap_scan(tbl_oid) 产生的计数的累加
seq_tup_read 对应 pgstat_count_heap_getnext(tbl_oid) 产生的计数的累加
idx_scan 对应 sum( 索引 pgstat_count_index_scan(tbl_oid) ) 产生的计数的累加
idx_tup_fetch 对应 sum( 索引 pgstat_count_index_tuples(idx_oid, n) ) + pgstat_count_heap_fetch(tbl_oid) 产生的计数的累加
pg_statio_all_tables 的
heap_blks_read 对应 pgstat_count_buffer_read(tbl_oid) - pgstat_count_buffer_hit(tbl_oid) 产生的计数的累加
heap_blks_hit 对应 pgstat_count_buffer_hit(tbl_oid) 产生的计数的累加
idx_blks_read 对应 sum( 索引 pgstat_count_buffer_read(idx_oid) - pgstat_count_buffer_hit(idx_oid) ) 产生的计数的累加
idx_blks_hit 对应 sum( pgstat_count_buffer_hit(idx_oid) ) 产生的计数的累加
toast_blks_read 对应 pgstat_count_buffer_read(toast_oid) - pgstat_count_buffer_hit(toast_oid) 产生的计数的累加
toast_blks_hit 对应 pgstat_count_buffer_hit(toast_oid) 产生的计数的累加
tidx_blks_read 对应 sum( 索引 pgstat_count_buffer_read(toast_idx_oid) - pgstat_count_buffer_hit(toast_idx_oid) ) 产生的计数的累加
tidx_blks_hit 对应 sum( pgstat_count_buffer_hit(toast_idx_oid) ) 产生的计数的累加
小结
pg_stat_all_indexes 的
idx_scan 索引被扫描了多少次,和扫描多少条记录无关,和node有关,一般看执行计划,同一个索引如果在NODE中出现一次,那么这条SQL执行时这个索引就算扫描一次。
idx_tup_read 从指定索引获取tid的条(tid是指向heap表数据块上某条记录的一个数据结构, 如(1,10)表示1号数据块的第10条记录)
idx_tup_fetch 从索引的tid获取HEAP tuple的记数(每扫一条tuple计1)
pg_statio_all_indexes 的
idx_blks_read 该索引的buffer read记数(每读一个block记1次), 包括在postgresql shared buffers中命中的,或者未在postgresql shared buffers中命中的。
idx_blks_hit 在postgresql shared buffers中命中的buffer read记数。
pg_stat_all_tables 的
seq_scan 全表扫描的次数
seq_tup_read 使用全表扫描扫过的tuple条数
idx_scan 该表的所有索引扫描次数总和
idx_tup_fetch 该表的所有 从索引的tid获取HEAP tuple的条数
pg_statio_all_tables 的
heap_blks_read 该表的buffer read记数, 包括在postgresql shared buffers中命中的,或者未在postgresql shared buffers中命中的。
heap_blks_hit 在postgresql shared buffers中命中的buffer read记数。
idx_blks_read 该表所有索引 所有buffer read记数(含命中和未命中)
idx_blks_hit 该表所有索引 在shared buffer命中的buffer read记数
toast_blks_read heap表对应的toast表统计字段
toast_blks_hit
tidx_blks_read heap表的索引对应的toast索引统计字段
tidx_blks_hit
希望能帮助到大家,还不理解就要打屁屁了。