PostgreSQL Failover slot - 支持将slot信息发送给物理备库

2 minute read

背景

PostgreSQL主库创建的slot(物理或逻辑的都算),在备库是否存在?

目前的版本,PG在主库创建的SLOT,备库是没有的(不会通过流复制协议复制到备库)

(但是,2ND提出的failover slot,就是来解决这个问题的,在主库创建的slot,会通过流复制协议,发送给物理备库。)

测试

主库

1、创建slot

postgres=# select pg_create_physical_replication_slot('a');  
 pg_create_physical_replication_slot   
-------------------------------------  
 (a,)  
(1 row)  

2、查看SLOT信息

postgres=# select * from pg_replication_slots;  
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn   
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------  
 a         |        | physical  |        |          | f         | f      |            |      |              |             |   
(1 row)  

3、查看slot在PGDATA中的信息

cd $PGDATA  
  
cd pg_replslot/  
ll  
total 4.0K  
drwx------ 2 postgres postgres 4.0K May 16 10:46 a  

备库

备库,pg_replslot这个目录是空的。

postgres=# select * from pg_replication_slots;  
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn   
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------  
(0 rows)  

主库创建了SLOT后,如果将备库recovery.conf配置如下

primary_slot_name = 'a'  

到主库查看,可以看到,slot激活,并开始记录restart_lsn

postgres=# select * from pg_replication_slots;  
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn  | confirmed_flush_lsn   
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+--------------+---------------------  
 a         |        | physical  |        |          | f         | t      |       3670 |      |              | 118/EC000060 |   
(1 row)  

为什么主库的slot信息没有同步给备库呢?

1、首先,目前slot的信息没有记录WAL,不会传递给备库。

2、另一方面,备库也可以创建slot,备库创建的SLOT与主库创建的SLOT毫无关系,自己记录自己的restart_lsn。

postgres=# select pg_create_physical_replication_slot('a');  
 pg_create_physical_replication_slot   
-------------------------------------  
 (a,)  
(1 row)  
  
postgres=# select * from pg_replication_slots;  
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn   
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------  
 a         |        | physical  |        |          | f         | f      |            |      |              |             |   
(1 row)  

slot信息的主备同步有什么应用场景呢?

1、HA后,使用slot的应用,不受影响。

例如,逻辑订阅、使用slot的逻辑备库

2nd提出了一个概念slot failover。

Failover slots

当使用pg_create_logical_replication_slot创建slot时,如果指定failover参数为true,则表示创建的是failover slot,那么这个slot的信息会同步给备库。同时备库上也可以看到这个slot的信息。

对于主库的逻辑订阅者,当上游发生了HA时,逻辑订阅者可以连到新的主库,并继续接收变更,不会导致逻辑订阅的中断。

Failover slots address these issues by synchronizing slot creation, deletion and position updates to replica servers. This is done through the WAL stream like everything else. If a slot is created as a failover slot using the new failover boolean option to pg_create_logical_replication_slot then its creation is logged in WAL. So are subsequent position updates when the client tells the server it can free no-longer-needed resources.

If the master fails and a standby is promoted, logical decoding clients can just reconnect to the standby and carry on as if nothing had happened. If a DNS update or IP swap is done along with the promotion the logical decoding client might not even notice it’s anything more than a master restart.

2nd的设计中,physical和logical 都可以创建failover slot,

A physical failover slot can be created, just like a logical failover slot.

physical slot

https://www.postgresql.org/docs/devel/static/warm-standby.html#STREAMING-REPLICATION-SLOTS-MANIPULATION

1、防止WAL被删,SLOT restart_lsn之后的WAL文件都不会删除。(wal_keep_segments则是一个人为设定的边界,slot是自动设定的边界(无上限),所以使用slot并且下游断开后,可能导致数据库的WAL堆积爆满)

2、防止备库出现recovery conflict,当备库配置了primary_slot_name,那么主库会保留备库需要查询的数据版本(注意,这样可能引起主库膨胀,vacuum worker频繁的扫描table,CPU升高,原理如下:)。

《PostgreSQL物理”备库”的哪些操作或配置,可能影响”主库”的性能、垃圾回收、IO波动》

logical slot

用于逻辑订阅

注意

2nd的failover slot设计中,仅仅适用于物理流复制的主备节点。逻辑复制的主备节点,不支持failover slot。

意思是说主库创建的failover slot,这个slot的信息可以复制到物理备库,但是不会复制到它的逻辑备库。

Failover slots do not aid logical replication solutions in supporting failover to a logical replica. They exist to allow logical replication to follow a physical failover.

Supporting failover to a logical replica is a completely unrelated matter. There are a number of limitations in PostgreSQL core that are relevant to it, like the currently missing support for logical decoding of sequence position advances. Failover slots will neither help nor hinder there. What they do is provide a way to integrate logical replication into HA solutions now, into existing mature and established infrastructure patterns.

小结

1、目前的PG版本(包括PG 11),暂时不支持failover slot,也就是说slot在主库,备库是独立的。当主备发送HA,使用slot的应用,无法漂移到新的主库。

2、2nd提出了failover slot的概念,将来物理备库可以接收上游主库创建的failover slot的信息,实现SLOT在物理主备之间的漂移 。

参考

https://www.postgresql.org/docs/devel/static/functions-admin.html

https://wiki.postgresql.org/wiki/Failover_slots

https://blog.2ndquadrant.com/failover-slots-postgresql/

《PostgreSQL 10.0 preview 功能增强 - 备库支持逻辑订阅,订阅支持主备漂移了》

《PostgreSQL物理”备库”的哪些操作或配置,可能影响”主库”的性能、垃圾回收、IO波动》

Flag Counter

digoal’s 大量PostgreSQL文章入口