PostgreSQL WAL replay 加速(datapage preload) - 恢复加速, 备库延迟优化

1 minute read


PostgreSQL 数据库恢复时,读取wal,如果当前wal page不是full page,则从这笔wal record对应的data file中拿到datapage,与wal record合并,覆盖对应data page。持续读取wal 实现恢复的目的。

需要注意wal是顺序读写,而data file可能是离散读写(大部分oltp业务都是如此),WAL的目的就是要将离散的DATA FILE写变成顺序的IO。

那么问题来了,恢复时,data file就变成了离散的读操作。

在主库WAL产生量非常巨大时,standby recovery(replay)将会导致与主库的延迟,通常wal write不会有大的延迟(因为WAL是顺序写),replay的延迟主要是recovery时data file的离散读导致。


DBAs struggling with replication lag is nothing new. A large volume of data or write IO comes into the system and the followers struggle to keep up. pg_prefaulter was written to eliminate replication lag on followers and also improves database startup times.

If your database is under 24/7 write workload, has periodic replication lag that is unacceptable, or want to reduce the startup time of PostgreSQL, pg_prefaulter will help all three of these scenarios.

At Joyent we use PostgreSQL as the metadata tier for our object storage system, Manta. This talk chronicles how we identified our source of replication lag and why we found it necessary to write pgprefaulter. pgprefaulter is a sidecar process for PostgreSQL written in Go that pre-fetches pages from disk and loads them into the operating system’s filesystem cache before PostgreSQL requests them during the startup and application of WAL records.

Additionally, this talk also discusses:

the design considerations that went into writing pg_prefaulter
the various forms of “replication lag” in PostgreSQL (WAL receive lag, WAL apply lag, and checkpoint lag)
pathologies that came from deploying pg_prefaulter
why we now consider pg_prefaulter mission critical software for our production databases
tips for deploying pg_prefaulter


在备库接收到WAL后,解析WAL,并提前将需要用到的DATA FILE PAGE加载到OS PAGE CACHE中,在postgresql startup process replay wal时,读取需要的data page时,从os cache读取,从而降低replay时因为读取data page带来的IO等待。



pg_prefaulter: Scaling WAL Performance (application/pdf - 2.6 MB)

Flag Counter

digoal’s 大量PostgreSQL文章入口