PostgreSQL 10.0 preview 功能增强 - WAL一致性校验
背景
10.0 新增了一个DEBUG参数,用于检测recovery过程中,由于wal replay BUG或者备库的物理数据块异常导致的wal replay回放出来的块不正确的问题。
当产生脏页时,在wal记录中,可能有两种信息:
1. 只记录了数据变更的部分。
2. FULL PAGE,记录了整个数据块。(发生时机:当开启了full page write参数,checkpoint后第一次变更的块)
在PostgreSQL进入恢复过程(或者standby)时,PostgreSQL startup进程会从WAL日志中读取wal record与数据文件的块进行回放组合,生成变更后的块。如果WAL是full page,则直接使用FULL PAGE。回放后的块覆盖老的数据块,实现恢复的目的。
但是有可能因为各种原因,导致回放后的数据块是不对的,比如前面提到的原因:(由于wal replay BUG或者备库的物理数据块异常导致的wal replay回放出来的块不正确)。
PostgreSQL 10.0新增的wal_consistency_checking参数,可以用于发现这种问题。
为什么PostgreSQL 10.0要加这个参数呢?
因为PostgreSQL的扩展功能极强,已经支持了wal generic record,也就是说,用户可以自定义往wal中写数据,开放这样的功能,有利于开发者调试自己扩展的wal generic writer的正确性。
Add WAL consistency checking facility.
author Robert Haas <rhaas@postgresql.org>
Thu, 9 Feb 2017 04:45:30 +0800 (15:45 -0500)
committer Robert Haas <rhaas@postgresql.org>
Thu, 9 Feb 2017 04:45:30 +0800 (15:45 -0500)
When the new GUC wal_consistency_checking is set to a non-empty value,
it triggers recording of additional full-page images, which are
compared on the standby against the results of applying the WAL record
(without regard to those full-page images). Allowable differences
such as hints are masked out, and the resulting pages are compared;
any difference results in a FATAL error on the standby.
Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki
Linnakangas. Extensively reviewed and revised by Michael Paquier and
by me, with additional reviews and comments from Amit Kapila, Álvaro
Herrera, Simon Riggs, and Peter Eisentraut.
wal_consistency_checking用法
wal_consistency_checking 参数可以设置为如下值:
all, heap, heap2, btree, hash, gin, gist, sequence, spgist, brin, and generic.
参数内容表示当主库产生WAL对应的resource manger record时,自动将当时脏页的FULL PAGE写入WAL中。在startup进程回放日志时,会比较 “这个FULL PAGE” 与 “wal partial record+data page replay出来的PAGE” 是否一致,如果不一致,说明WAL回放有问题。startup 进程将会fatal,停止恢复。
对于正常的差异(例如hint bit)是不会报错的。
《为什么PostgreSQL查询语句也可能产生 xlog, 并且可能对buffer有write操作 ? hint bits》
wal_consistency_checking (string)
This parameter is intended to be used to check for bugs in the WAL redo routines.
When enabled, full-page images of any buffers modified in conjunction with the WAL record are added to the record.
If the record is subsequently replayed, the system will first apply each record and then test whether the buffers modified by the record match the stored images.
In certain cases (such as hint bits), minor variations are acceptable, and will be ignored.
Any unexpected differences will result in a fatal error, terminating recovery.
The default value of this setting is the empty string, which disables the feature.
It can be set to all to check all records, or to a comma-separated list of resource managers to check only records originating from those resource managers.
Currently, the supported resource managers are heap, heap2, btree, hash, gin, gist, sequence, spgist, brin, and generic. Only superusers can change this setting.
这个patch的讨论,详见邮件组,本文末尾URL。
PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。
参考
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a507b86900f695aacc8d52b7d2cfcb65f58862a2
https://www.postgresql.org/docs/devel/static/runtime-config-developer.html