PostgreSQL 逻辑复制插件 UDR, 可以愉快的玩类似MySQL的binlog复制了

5 minute read

背景

PostgreSQL 的流复制素来以高效,实时,稳定著称;

为企业解决了很多问题,例如容灾,备份,HA,读写分离等等。

但是流复制有一个无法克服的弊端,下游节点只能做到只读,并且只能复制整个集群(使用walbouncer可以做到基于表空间或库级别的物理流复制)。

http://www.cybertec.at/en/products/walbouncer-enterprise-grade-partial-replication/

如果用户确实有表级或行级的复制需求,我们不得不使用其他手段来实施,例如londiste3, dblink, trigger, bucardo, slony-I等。

这些插件或工具是基于触发器的,所以对上游节点的性能影响比较大,而且复制效率一般般。

PostgreSQL社区一直在努力将逻辑复制加入到PG的内核中,同样使用的是XLOG,从XLOG中解出row,在下游节点回放。有点类似于MySQL的binlog复制方案。

在逻辑复制加入PostgreSQL内核代码前(预计9.6的版本可能会加入),用户可以使用2nd提供的bdr插件来实现逻辑复制。

如果做单向的复制,使用9.4或以上的PostgreSQL版本即可,而如果要使用双向复制(多主),则需要使用2nd提供的PostgreSQL版本。

地址:

https://github.com/2ndQuadrant/bdr

本文以单向复制为例,即UDR,讲解一下这个插件的使用。

插件部署

下载插件,我们需要的是bdr-plugin的稳定分支。

# git clone -b bdr-plugin/REL0_9_STABLE git://git.postgresql.org/git/2ndquadrant_bdr.git bdr-plugin  

安装UDR插件

# export PATH=/opt/pgsql/bin:$PATH  
# cd bdr-plugin  
# ./autogen.sh  
# ./configure BUILDING_UDR=1  
# make; make install  

修改BUG

# cd /opt/pgsql/share/extension  
[root@digoal extension]# cat bdr.control |grep default_version  
default_version = '0.9.2.0'  
  
# vi bdr--0.9.2.0.sql   
-- 注释掉这行,应该是bdr的BUG,这个函数依赖的C函数在2nd改版的postgresql下面。  
-- CREATE OR REPLACE FUNCTION bdr.bdr_internal_sequence_reset_cache(seq regclass)  
-- RETURNS void LANGUAGE c AS 'MODULE_PATHNAME' STRICT;  

配置

配置上游节点

$ vi postgresql.conf  
listen_addresses='0.0.0.0'  
port=1921  
max_connections=100  
unix_socket_directories='.'  
ssl=on  
ssl_ciphers='EXPORT40'  
shared_buffers=512MB  
huge_pages=try  
max_prepared_transactions=0  
max_stack_depth=100kB  
dynamic_shared_memory_type=posix  
max_files_per_process=500  
shared_preload_libraries='bdr'  
max_worker_processes=8  
wal_level=logical  
fsync=off  
synchronous_commit=off  
wal_sync_method=open_datasync  
full_page_writes=off  
wal_log_hints=off  
wal_buffers=16MB  
wal_writer_delay=10ms  
checkpoint_segments=8  
archive_mode=off  
archive_command='/bin/date'  
max_wal_senders=10  
max_replication_slots=10  
hot_standby=on  
wal_receiver_status_interval=1s  
hot_standby_feedback=off  
enable_bitmapscan=on  
enable_hashagg=on  
enable_hashjoin=on  
enable_indexscan=on  
enable_material=on  
enable_mergejoin=on  
enable_nestloop=on  
enable_seqscan=on  
enable_sort=on  
enable_tidscan=on  
log_destination='csvlog'  
logging_collector=on  
log_directory='pg_log'  
log_truncate_on_rotation=on  
log_rotation_size=10MB  
log_checkpoints=on  
log_connections=on  
log_disconnections=on  
log_duration=off  
log_error_verbosity=verbose  
log_line_prefix='%i  
log_statement='none'  
log_timezone='PRC'  
autovacuum=on  
log_autovacuum_min_duration=0  
autovacuum_vacuum_scale_factor=0.0002  
autovacuum_analyze_scale_factor=0.0001  
datestyle='iso,  
timezone='PRC'  
lc_messages='C'  
lc_monetary='C'  
lc_numeric='C'  
lc_time='C'  
default_text_search_config='pg_catalog.english'  
bdr.conflict_logging_include_tuples=true  
bdr.log_conflicts_to_table=true  
bdr.temp_dump_directory='pg_bdr_temp_dump_dir'  
  
$ vi pg_hba.conf  
# "local" is for Unix domain socket connections only  
local   all             all                                     trust  
# IPv4 local connections:  
host    all             all             127.0.0.1/32            trust  
# IPv6 local connections:  
#host    all             all             ::1/128                 trust  
# Allow replication connections from localhost, by a user with the  
# replication privilege.  
local   replication     postgres                                trust  
host    replication     postgres 127.0.0.1/32            trust  

配置下游节点

1. 只有postgresql.conf中配置的监听端口不一样,其他一样。
2. 创建逻辑备份目录。在下游节点初始化订阅时,需要用来存储从上游节点dump的整个被订阅的数据库的数据,所以这个目录的空间要足够大。

mkdir $PGDATA/pg_bdr_temp_dump_dir  

开始使用

启动数据库

假设我的上游节点是1921端口,下游节点是1922端口。

pg_ctl start -D /data01/pgdata_1921  
pg_ctl start -D /data01/pgdata_1922  

在上游节点,我有一个数据库为up,我需要将这个数据库复制到下游节点的数据库down中。

创建上游数据库,并且在up库创建bdr扩展。

postgres@digoal-> psql -h 127.0.0.1 -p 1921  
psql (9.4.4)  
Type "help" for help.  
postgres=# create database up;  
CREATE DATABASE  
postgres=# \c up  
You are now connected to database "up" as user "postgres".  
up=# create table tb(id int,info text);  
CREATE TABLE  
up=# insert into tb select generate_series(1,100);  
INSERT 0 100  
up=# create extension btree_gist;  
CREATE EXTENSION  
up=# create extension bdr;  
CREATE EXTENSION  

创建测试表,测试数据类型,测试函数,测试视图

postgres=# \c up  
You are now connected to database "up" as user "postgres".  
up=# create table t1(id int primary key,info text);  
CREATE TABLE  
up=# create or replace function f1() returns void as $$  
  declare  
  begin  
     raise notice '%', now();  
   end;  
   $$ language plpgsql;  
CREATE FUNCTION  
up=# create view v1 as select count(*) as cnt from t1;  
CREATE VIEW  
up=# create type dt as (c1 int,c2 int,c3 int);  
CREATE TYPE  
up=# insert into t1 select generate_series(1,100);  
INSERT 0 100  
up=# create table t2(id int,c1 dt);  
CREATE TABLE  
up=# insert into t2 values (1,'(1,1,1)');  
INSERT 0 1  
up=# insert into t2 values (2,'(1,1,1)');  
INSERT 0 1  
up=# insert into t2 values (2,'(1,1,1)');  
INSERT 0 1  
up=# insert into t2 values (2,'(1,1,1)');  
INSERT 0 1  
up=# insert into t2 values (2,'(1,1,1)');  
INSERT 0 1  

创建bdr扩展后,新建的表会自动添加TRUNCATE触发器

up=# \d t1  
      Table "public.t1"  
 Column |  Type   | Modifiers   
--------+---------+-----------  
 id     | integer | not null  
 info   | text    |   
Indexes:  
    "t1_pkey" PRIMARY KEY, btree (id)  
Triggers:  
    truncate_trigger AFTER TRUNCATE ON t1 FOR EACH STATEMENT EXECUTE PROCEDURE bdr.queue_truncate()  

创建下游节点的数据库down,同时也在这个数据库中创建bdr扩展。

postgres@digoal-> psql -h 127.0.0.1 -p 1922  
psql (9.4.4)  
Type "help" for help.  
postgres=# create database down;  
CREATE DATABASE  
postgres=# \c down  
You are now connected to database "down" as user "postgres".  
down=# create extension btree_gist;  
CREATE EXTENSION  
down=# create extension bdr;  
CREATE EXTENSION  
down=# create database up;  -- 务必创建哦  

为什么在下游节点还需要创建一个up库(虽然我们不是将数据订阅到up库),但是没有这个库,还原会报错,例如:

这显然是个BUG。

Dumping remote database "hostaddr=127.0.0.1 port=1921 dbname=up user=postgres fallback_application_name='bdr (6203675445083668497,1,16385,): init_replica dump'" with 1 concurrent workers to "pg_bdr_temp_dump_dir/postgres-bdr-000C837A-1.8393"  
Restoring dump to local DB "hostaddr=127.0.0.1 port=1922 dbname=down user=postgres fallback_application_name='bdr (6203675445083668497,1,16385,): init_replica restore' options='-c bdr.do_not_replicate=on -c bdr.permit_unsafe_ddl_commands=on -c bdr.skip_ddl_replication=on -c bdr.skip_ddl_locking=on'" with 1 concurrent workers from "pg_bdr_temp_dump_dir/postgres-bdr-000C837A-1.8393"  
pg_restore: [archiver (db)] Error while PROCESSING TOC:  
pg_restore: [archiver (db)] Error from TOC entry 3265; 1262 16385 SECURITY LABEL up postgres  
pg_restore: [archiver (db)] could not execute query: ERROR:  database "up" does not exist  
    Command was: SECURITY LABEL FOR bdr ON DATABASE up IS '{ "bdr" : true }';  
  
pg_restore to hostaddr=127.0.0.1 port=1922 dbname=down user=postgres fallback_application_name='bdr (6203675445083668497,1,16385,): init_replica restore' options='-c bdr.do_not_replicate=on -c bdr.permit_unsafe_ddl_commands=on -c bdr.skip_ddl_replication=on -c bdr.skip_ddl_locking=on' failed, aborting  

在上游节点开启一个更新的压力测试,以测试在上游节点有DML操作时可以复制数据。相互不干扰。

postgres@digoal-> vi t.sql  
\setrandom id 1 100  
update t1 set info=now()::text where id=:id;  
  
postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./t.sql -c 8 -j 8 -T 10000 up  
progress: 1.0 s, 30818.6 tps, lat 0.243 ms stddev 0.414  
progress: 2.0 s, 32295.2 tps, lat 0.246 ms stddev 0.328  
......  

在下游节点定于上游节点的up数据库。

postgres@digoal-> psql -h 127.0.0.1 -p 1922 down  
psql (9.4.4)  
Type "help" for help.  
down=# select bdr.bdr_subscribe(local_node_name:='down_1922', subscribe_to_dsn:='hostaddr=127.0.0.1 port=1921 dbname=up user=postgres', node_local_dsn:='hostaddr=127.0.0.1 port=1922 dbname=down user=postgres');  
 bdr_subscribe   
---------------  
   
(1 row)  

查看订阅状态

down=# select * from bdr.bdr_nodes;  
     node_sysid      | node_timeline | node_dboid | node_status |      node_name       |                     node_local_dsn                     |                  node_init_from_dsn                    
---------------------+---------------+------------+-------------+----------------------+--------------------------------------------------------+------------------------------------------------------  
 6177143025216388117 |            10 |      70522 | r           | down_1922            |                                                        |   
 6177143025216388117 |            10 |      81996 | i           | down_1922-subscriber | hostaddr=127.0.0.1 port=1922 dbname=down user=postgres | hostaddr=127.0.0.1 port=1921 dbname=up user=postgres  
(2 rows)  
i表示正在初始化.  

在上游节点可以查看到对应的slot已经被创建了

postgres=# select * from pg_replication_slots ;  
                slot_name                 | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn   
------------------------------------------+--------+-----------+--------+----------+--------+------+--------------+-------------  
 bdr_70522_6177143025216388117_10_83065__ | bdr    | logical   |  70522 | up       | t      |      |    661974801 | 2B/22A50A70  
(1 row)  
  
postgres=# select * from pg_stat_replication ;  
 pid  | usesysid | usename  |             application_name             | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |  state  | sent_location | write_location | flush_location | replay_loca  
tion | sync_priority | sync_state   
------+----------+----------+------------------------------------------+-------------+-----------------+-------------+-------------------------------+--------------+---------+---------------+----------------+----------------+------------  
-----+---------------+------------  
 6170 |       10 | postgres | bdr (6177143025216388117,10,83065,):init | 127.0.0.1   |                 |       50711 | 2015-10-09 23:09:54.935301+08 |              | startup | 0/0           |                |                |              
     |             0 | async  
(1 row)  

一会,我们可以看到数据已经拷贝到下游节点的down库了。

down=# \dt  
        List of relations  
 Schema | Name | Type  |  Owner     
--------+------+-------+----------  
 public | t1   | table | postgres  
 public | t2   | table | postgres  
 public | tb   | table | postgres  
(3 rows)  
  
down=# \dv  
        List of relations  
 Schema | Name | Type |  Owner     
--------+------+------+----------  
 public | v1   | view | postgres  
(1 row)  
  
down=# \df f1  
                        List of functions  
 Schema | Name | Result data type | Argument data types |  Type    
--------+------+------------------+---------------------+--------  
 public | f1   | void             |                     | normal  
(1 row)  
  
down=# \dT   
          List of data types  
 Schema |     Name      | Description   
--------+---------------+-------------  
 public | dt            |   

测试DDL,上游节点加了event,禁止直接使用DDL,所以我们需要通过函数来执行DDL。

postgres@digoal-> psql -h 127.0.0.1 -p 1921  
psql (9.4.4)  
Type "help" for help.  
postgres=# \c up  
up=# select * from bdr.bdr_replicate_ddl_command('create table public.new(id int)');  -- 注意必须制定schema  
 bdr_replicate_ddl_command   
---------------------------  
   
(1 row)  

bdr对应的表,管理函数。查看:

http://bdr-project.org/docs/0.9.0/functions.html

http://bdr-project.org/docs/0.9.0/catalogs-views.html

删除订阅的方法

1. 在上游节点删除slot,处理DDL队列,例如truncate bdr_queued_commands, bdr_queued_drops队列中的数据。

2. 在下游节点,注释shared_preload_library,重启数据库,去bdr_supervisordb 库删除下游节点的订阅库信息。

3. 在下游节点,改回shared_preload_library,重启数据库,删除原订阅库的bdr extension。

解决下游节点异常,例如无法添加订阅,可能由于之前没有正确的删除订阅。

注释shared_preload,重启数据库,去bdr_supervisordb 库修复。然后解除注释,重启数据库。

postgres=# \l  
                                List of databases  
       Name       |  Owner   | Encoding | Collate | Ctype |   Access privileges     
------------------+----------+----------+---------+-------+-----------------------  
 bdr_supervisordb | postgres | UTF8     | C       | C     |   
 postgres         | postgres | UTF8     | C       | C     |   
 template0        | postgres | UTF8     | C       | C     | =c/postgres          +  
                  |          |          |         |       | postgres=CTc/postgres  
 template1        | postgres | UTF8     | C       | C     | =c/postgres          +  
                  |          |          |         |       | postgres=CTc/postgres  
 up               | postgres | UTF8     | C       | C     |   
(5 rows)  

解决订阅异常,

2015-10-09 23:41:59.756 CST,,,7785,,5617e047.1e69,1,,2015-10-09 23:41:59 CST,3/0,0,ERROR,55000,"previous init failed, manual cleanup is required","Found bdr.bdr_nodes entry for bdr (6203671810517003626,1,16385,) with state=i in remote bdr.bdr_nodes","Remove all replication identifiers and slots corresponding to this node from the init target node then drop and recreate this database and try again",,,,,,"bdr_init_replica, bdr_init_replica.c:899","bdr (6203671810517003626,1,16385,): perdb"  

链接到上游节点,删除slot

删除下游节点数据库,重新创建下游节点数据库,重新订阅。

小结

1. 如果上游节点产生XLOG非常频繁,下游节点初始化订阅的时间会非常漫长。

2. 如果上游节点产生XLOG非常频繁,下游节点的延迟可能会很大。

3. 由于udr和bdr是在一个插件中共用的,只是编译参数使用udr的编译选项,包含了BDR的功能后使用起来显得非常混乱。

4. 文档中存在一些BUG,例如删除UDR订阅的函数bdr.bdr_unsubscribe(local_node_name)不存在。

PostgreSQL的udr插件还有很多可以改进的地方,包括管理方面的,性能方面的。PGSQL又将多一个杀手锏。

如果担心目前还不够完善,在加入PG内核前,我们用它来做跨数据库大版本的增量迁移,跨硬件架构,或者不同数据块大小的增量数据迁移,是个不错的选择。

参考

1. http://2ndquadrant.com/en/resources/bdr/

2. http://bdr-project.org/docs/0.9.0/index.html

3. https://github.com/2ndQuadrant/bdr/tree/bdr-plugin/REL0_9_STABLE

4. http://www.postgresql.org/docs/9.5/static/test-decoding.html

5. http://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-REPLICATION

6. http://www.postgresql.org/docs/9.5/static/warm-standby.html#STREAMING-REPLICATION-SLOTS

7. http://www.postgresql.org/docs/9.5/static/protocol-replication.html

8. http://www.cybertec.at/en/products/walbouncer-enterprise-grade-partial-replication/

9. http://blog.163.com/digoal@126/blog/static/1638770402014101715715991

Flag Counter

digoal’s 大量PostgreSQL文章入口