PostgreSQL Improve fsm & vm output

4 minute read

背景

以前写过一篇关于使用pageinspect和pgstattuple来精确计算数据库对象膨胀量的文章.

《PostgreSQL 如何精确计算表膨胀(fsm,数据块layout讲解) - PostgreSQL table exactly bloat monitor use freespace map data》

用到的就是freespacemap的记录的字节流信息(256份). 有兴趣的朋友可以参看上文.

本文要讲的是pg_freespacemap和pgstattuple模块的新增功能

这个补丁涉及两个模块, pg_freespacemap和pgstattuple.

补丁过程如下 :

cd postgresql-9.3beta2  
wget http://www.postgresql.org/message-id/attachment/29690/pgstattuple_vm_v2.patch  
wget http://www.postgresql.org/message-id/attachment/29689/pg_freespace_vm_v3.patch  
[root@db-172-16-3-33 postgresql-9.3beta2]# patch -p1 < ./pg_freespace_vm_v3.patch   
patching file contrib/pg_freespacemap/Makefile  
patching file contrib/pg_freespacemap/expected/pg_freespacemap.out  
patching file contrib/pg_freespacemap/pg_freespacemap--1.0--1.1.sql  
patching file contrib/pg_freespacemap/pg_freespacemap--1.0.sql  
patching file contrib/pg_freespacemap/pg_freespacemap--1.1.sql  
patching file contrib/pg_freespacemap/pg_freespacemap.c  
patching file contrib/pg_freespacemap/pg_freespacemap.control  
patching file contrib/pg_freespacemap/sql/pg_freespacemap.sql  
[root@db-172-16-3-33 postgresql-9.3beta2]# patch -p1 < ./pgstattuple_vm_v2.patch   
patching file contrib/pgstattuple/Makefile  
patching file contrib/pgstattuple/expected/pgstattuple.out  
patching file contrib/pgstattuple/pgstattuple--1.0--1.1.sql  
patching file contrib/pgstattuple/pgstattuple--1.0--1.2.sql  
patching file contrib/pgstattuple/pgstattuple--1.1--1.2.sql  
patching file contrib/pgstattuple/pgstattuple--1.1.sql  
patching file contrib/pgstattuple/pgstattuple--1.2.sql  
patching file contrib/pgstattuple/pgstattuple.c  
patching file contrib/pgstattuple/pgstattuple.control  
patching file contrib/pgstattuple/sql/pgstattuple.sql  
  
[root@db-172-16-3-33 postgresql-9.3beta2]# gmake world  
[root@db-172-16-3-33 postgresql-9.3beta2]# gmake install-world  
  
pg93@db-172-16-3-33-> pg_ctl start  
digoal=# create extension pg_freespacemap;  
CREATE EXTENSION  
digoal=# create extension pgstattuple;  
CREATE EXTENSION  

下面测试一下打完补丁后的函数.

一. pg_freespacemap的更改体现在函数层面如下 :

+-- complain if script is sourced in psql, rather than via CREATE EXTENSION  
+\echo Use "CREATE EXTENSION pg_freespacemap" to load this file. \quit  
+  
+-- Register the C function.  
+CREATE FUNCTION pg_freespace(regclass, bigint)  
+RETURNS int2  
+AS 'MODULE_PATHNAME', 'pg_freespace'  
+LANGUAGE C STRICT;  
+  
+CREATE FUNCTION pg_is_all_visible(regclass, bigint)  
+RETURNS bool  
+AS 'MODULE_PATHNAME', 'pg_is_all_visible'  
+LANGUAGE C STRICT;  
+  
+-- pg_freespace shows the recorded space avail at each block in a relation  
+CREATE FUNCTION  
+  pg_freespace(rel regclass, blkno OUT bigint, avail OUT int2)  
+RETURNS SETOF RECORD  
+AS $$  
+  SELECT blkno, pg_freespace($1, blkno) AS avail  
+  FROM generate_series(0, pg_relation_size($1) / current_setting('block_size')::bigint - 1) AS blkno;  
+$$  
+LANGUAGE SQL;  
+  
+CREATE FUNCTION  
+  pg_freespace_with_vminfo(rel regclass, blkno OUT bigint, avail OUT int2, is_all_visible OUT boolean)  
+RETURNS SETOF RECORD  
+AS $$  
+  SELECT blkno, pg_freespace($1, blkno) AS avail, pg_is_all_visible($1, blkno) as is_all_visible  
+  FROM generate_series(0, pg_relation_size($1) / current_setting('block_size')::bigint - 1) AS blkno;  
+$$  
+LANGUAGE SQL;  

测试 :

查看数据对对象指定的数据块中的记录是否所有记录对所有事务可见.

创建测试表 :

digoal=# create table t1(id int, info text);  
CREATE TABLE  
digoal=# insert into t1 select generate_series(1,1000),md5(random()::text);  
INSERT 0 1000  

此时查询visible的话, 所有的数据块都是false的, 因为还没有建立vm文件.

digoal=# select * from pg_is_all_visible('t1'::regclass, 1);  
 pg_is_all_visible   
-------------------  
 f  
(1 row)  
digoal=# select pg_relation_filepath('t1'::regclass);  
 pg_relation_filepath   
----------------------  
 base/16384/16410  
(1 row)  
digoal=# select * from pg_stat_file('base/16384/16410_vm');  
ERROR:  could not stat file "base/16384/16410_vm": No such file or directory  

但是注意fsm文件已经有了.

digoal=# select * from pg_stat_file('base/16384/16410_fsm');  
 size  |         access         |      modification      |         change         | creation | isdir   
-------+------------------------+------------------------+------------------------+----------+-------  
 24576 | 2013-07-19 15:52:40+08 | 2013-07-19 15:52:40+08 | 2013-07-19 15:52:40+08 |          | f  
(1 row)  
digoal=# vacuum analyze t1;  
VACUUM  

vacuum后, vm文件将被建立.

digoal=# select * from pg_stat_file('base/16384/16410_vm');  
 size |         access         |      modification      |         change         | creation | isdir   
------+------------------------+------------------------+------------------------+----------+-------  
 8192 | 2013-07-19 15:53:34+08 | 2013-07-19 15:53:34+08 | 2013-07-19 15:53:34+08 |          | f  
(1 row)  

查看数据库对象的数据块范围

digoal=# select min(ctid),max(ctid) from t1;  
  min  |  max     
-------+--------  
 (0,1) | (8,40)  
(1 row)  

查看指定的BLOCK_ID的可见性, 现在vm中记录了的块信息为0-8. 都有了.

digoal=# select * from pg_is_all_visible('t1'::regclass, 8);  
 pg_is_all_visible   
-------------------  
 t  
(1 row)  
digoal=# select * from pg_is_all_visible('t1'::regclass, 0);  
 pg_is_all_visible   
-------------------  
 t  
(1 row)  

删除一条记录, 然后查看它的可见性.

digoal=# delete from t1 where ctid='(0,1)';  
DELETE 1  

此时block 0为false, 因为有一条脏数据是需要vacuum的. 这个块信息将从vm信息中抹除.

digoal=# select * from pg_is_all_visible('t1'::regclass, 0);  
 pg_is_all_visible   
-------------------  
 f  
(1 row)  

vacuum后, 这个block_id又变得可见了(换言之又在vm这种了.).

digoal=# vacuum analyze t1;  
VACUUM  
digoal=# select * from pg_is_all_visible('t1'::regclass, 0);  
 pg_is_all_visible   
-------------------  
 t  
(1 row)  

vm文件的设计初衷也包含了减轻vacuum负担. 因为在vm中记录的数据块vacuum进程可以不用理会(除了某些特殊情况, 例如prevent xid wrap).

pg_freespace函数则可以用来查询数据块包含多少剩余空间.

digoal=# select * from pg_freespace('t1');  
 blkno | avail   
-------+-------  
     0 |    64  
     1 |     0  
     2 |     0  
     3 |     0  
     4 |     0  
     5 |     0  
     6 |     0  
     7 |     0  
     8 |  5440  
(9 rows)  

以上结果表明0,8号数据块的剩余空间是多少字节.

二. pgstattuple的更改体现在函数层面如下 :

+-- complain if script is sourced in psql, rather than via CREATE EXTENSION  
+\echo Use "CREATE EXTENSION pgstattuple" to load this file. \quit  
+  
+CREATE FUNCTION pgstattuple(IN relname text,  
+    OUT table_len BIGINT,              -- physical table length in bytes  
+    OUT tuple_count BIGINT,            -- number of live tuples  
+    OUT tuple_len BIGINT,              -- total tuples length in bytes  
+    OUT tuple_percent FLOAT8,          -- live tuples in %  
+    OUT dead_tuple_count BIGINT,       -- number of dead tuples  
+    OUT dead_tuple_len BIGINT,         -- total dead tuples length in bytes  
+    OUT dead_tuple_percent FLOAT8,     -- dead tuples in %  
+    OUT free_space BIGINT,             -- free space in bytes  
+    OUT free_percent FLOAT8,           -- free space in %  
+    OUT all_visible_percent FLOAT8)            -- all visible blocks in %  
+AS 'MODULE_PATHNAME', 'pgstattuple'  
+LANGUAGE C STRICT;  
+  
+CREATE FUNCTION pgstattuple(IN reloid oid,  
+    OUT table_len BIGINT,              -- physical table length in bytes  
+    OUT tuple_count BIGINT,            -- number of live tuples  
+    OUT tuple_len BIGINT,              -- total tuples length in bytes  
+    OUT tuple_percent FLOAT8,          -- live tuples in %  
+    OUT dead_tuple_count BIGINT,       -- number of dead tuples  
+    OUT dead_tuple_len BIGINT,         -- total dead tuples length in bytes  
+    OUT dead_tuple_percent FLOAT8,     -- dead tuples in %  
+    OUT free_space BIGINT,             -- free space in bytes  
+    OUT free_percent FLOAT8,           -- free space in %  
+    OUT all_visible_percent FLOAT8)            -- all visible blocks in %  
+AS 'MODULE_PATHNAME', 'pgstattuplebyid'  
+LANGUAGE C STRICT;  
+  
+CREATE FUNCTION pgstatindex(IN relname text,  
+    OUT version INT,  
+    OUT tree_level INT,  
+    OUT index_size BIGINT,  
+    OUT root_block_no BIGINT,  
+    OUT internal_pages BIGINT,  
+    OUT leaf_pages BIGINT,  
+    OUT empty_pages BIGINT,  
+    OUT deleted_pages BIGINT,  
+    OUT avg_leaf_density FLOAT8,  
+    OUT leaf_fragmentation FLOAT8)  
+AS 'MODULE_PATHNAME', 'pgstatindex'  
+LANGUAGE C STRICT;  
+  
+CREATE FUNCTION pg_relpages(IN relname text)  
+RETURNS BIGINT  
+AS 'MODULE_PATHNAME', 'pg_relpages'  
+LANGUAGE C STRICT;  
+  
+CREATE FUNCTION pgstatginindex(IN relname regclass,  
+    OUT version INT4,  
+    OUT pending_pages INT4,  
+    OUT pending_tuples BIGINT)  
+AS 'MODULE_PATHNAME', 'pgstatginindex'  
+LANGUAGE C STRICT;  

测试 :

digoal=# select * from pgstattuple('t1');  
-[ RECORD 1 ]-------+------  
table_len           | 73728  
tuple_count         | 999  
tuple_len           | 60939  
tuple_percent       | 82.65  
dead_tuple_count    | 0  
dead_tuple_len      | 0  
dead_tuple_percent  | 0  
free_space          | 5540  
free_percent        | 7.51  
all_visible_percent | 100  

增加了all_visible_percent这个输出, 代表百分之多少的块是在VM中的, 越多的话vacuum这个表将越快. 因为可以忽略掉很多数据块的检测.

接下来执行一个删除动作. 将把0号块从vm中抹除.

digoal=# delete from t1 where ctid='(0,2)';  
DELETE 1  
digoal=# select * from pgstattuple('t1');  
-[ RECORD 1 ]-------+------  
table_len           | 73728  
tuple_count         | 998  
tuple_len           | 60878  
tuple_percent       | 82.57  
dead_tuple_count    | 1  
dead_tuple_len      | 61  
dead_tuple_percent  | 0.08  
free_space          | 5540  
free_percent        | 7.51  
all_visible_percent | 88.89  

9个块中有8个块是all visible的. 所以all_visible_percent是88.89;

digoal=# select 8/9.0;  
-[ RECORD 1 ]--------------------  
?column? | 0.88888888888888888889  

参考

1. http://www.postgresql.org/message-id/flat/20130614.174415.66698858.horiguchi.kyotaro@lab.ntt.co.jp#20130614.174415.66698858.horiguchi.kyotaro@lab.ntt.co.jp

2. 《PostgreSQL 如何精确计算表膨胀(fsm,数据块layout讲解) - PostgreSQL table exactly bloat monitor use freespace map data》

Flag Counter

digoal’s 大量PostgreSQL文章入口