PostgreSQL 最佳实践 - pg_rman 以standby为源的备份浅析

4 minute read

背景

为了降低备份对数据库的性能影响,我们在有standby的情况下,可以选择从standby备份PostgreSQL。

pg_rman是一个备份工具,同样支持从standby备份数据库,但是有一些使用的注意事项。

例如需要连接主库,需要能读取归档目录,需要告诉主库和备库的连接地址,需要备库的$PGDATA等等。

为什么既要连接主库,还要能力连接备库。

pg_rman需要连接到主库执行pg_start_backup,pg_stop_backup。

同时需要连接到standby数据库查看pg_last_xlog_replay_location(),检查是否已恢复到pg_start_backup返回的LSN,然后才能开始备份。

归档如何备份?

因为PostgreSQL 9.5以前的版本,备库是不会产生归档的,所以如果从备库备份,必须确保pg_rman能访问备库postgresql.conf中配置的archive_command所指定的归档目录,或在pg_rman.ini中配置的ARCLOG_PATH。

并且确保ARCLOG_PATH所在的位置正是主库的归档写入位置。

通常这需要一个主备共享的归档位置,例如分布式文件系统,或主备都能访问的NFS。

如果你使用的是9.5或者更新的版本,那么可以配置备库的 archive_mode=always,让备库也产生归档

archive_mode (enum)
When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive_command. 
In addition to off, to disable, there are two modes: 
    on, and always. 

During normal operation, there is no difference between the two modes, but when set to always the WAL archiver is enabled also during archive recovery or standby mode. 
In always mode, all files restored from the archive or streamed with streaming replication will be archived (again). 
See Section 25.2.9 for details.

代码如下

/* Archive modes */
typedef enum ArchiveMode
{
        ARCHIVE_MODE_OFF = 0,           /* disabled */
        ARCHIVE_MODE_ON,                        /* enabled while server is running normally */
        ARCHIVE_MODE_ALWAYS                     /* enabled always (even during recovery) */
} ArchiveMode;
-------------------------
/* Is WAL archiving enabled always (even during recovery)? */
#define XLogArchivingAlways() \
        (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && wal_level >= WAL_LEVEL_ARCHIVE)
-------------------------
/*
 * Archiver is allowed to start up at the current postmaster state?
 *
 * If WAL archiving is enabled always, we are allowed to start archiver
 * even during recovery.
 */
#define PgArchStartupAllowed()  \
        ((XLogArchivingActive() && pmState == PM_RUN) ||        \
         (XLogArchivingAlways() &&      \
          (pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY)))
-------------------------
                        if (PgArchStartupAllowed() && PgArchPID == 0)
                                PgArchPID = pgarch_start();   // 启动archiver

另外,备份是从standby的$PGDATA拷贝文件的,所以你需要告诉pg_rman standby的$PGDATA位置。

配置备库

postgresql.conf

listen_addresses = '0.0.0.0'            # what IP address(es) to listen on;  
port = 1922                             # (change requires restart)  
max_connections = 100                   # (change requires restart)  
superuser_reserved_connections = 3      # (change requires restart)  
unix_socket_directories = '.'   # comma-separated list of directories  
shared_buffers = 8GB                    # min 128kB  
maintenance_work_mem = 512MB            # min 1MB  
autovacuum_work_mem = 1GB               # min 1MB, or -1 to use maintenance_work_mem  
vacuum_cost_delay = 0                   # 0-100 milliseconds  
bgwriter_delay = 10ms                   # 10-10000ms between rounds  
wal_level = hot_standby                 # minimal, archive, hot_standby, or logical  
fsync = on                              # turns forced synchronization on or off  
synchronous_commit = off                # synchronization level;  
full_page_writes = off                  # recover from partial page writes  
wal_buffers = 16MB                      # min 32kB, -1 sets based on shared_buffers  
commit_delay = 0                        # range 0-100000, in microseconds  
commit_siblings = 15                    # range 1-1000  
checkpoint_timeout = 55min              # range 30s-1h  
max_wal_size = 32GB  
checkpoint_completion_target = 0.9      # checkpoint target duration, 0.0 - 1.0  
archive_mode = always               # enables archiving; off, on, or always  
archive_command = 'cp %p /data04/digoal/arc_log1922/%f'         # command to use to archive a logfile segment  
max_wal_senders = 10            # max number of walsender processes  
hot_standby = on                        # "on" allows queries during recovery  
hot_standby_feedback = off               # send info from standby to prevent  
log_destination = 'csvlog'              # Valid values are combinations of  
logging_collector = on          # Enable capturing of stderr and csvlog  
log_truncate_on_rotation = on           # If on, an existing log file with the  
log_error_verbosity = verbose  # terse, default, or verbose messages  
log_timezone = 'PRC'  
autovacuum = on                 # Enable autovacuum subprocess?  'on'  
autovacuum_max_workers = 8              # max number of autovacuum subprocesses  
autovacuum_naptime = 30s         # time between autovacuum runs  
autovacuum_vacuum_threshold = 50        # min number of row updates before  
autovacuum_analyze_threshold = 50       # min number of row updates before  
autovacuum_vacuum_scale_factor = 0.002  # fraction of table size before vacuum  
autovacuum_analyze_scale_factor = 0.001 # fraction of table size before analyze  
autovacuum_vacuum_cost_delay = 0        # default vacuum cost delay for  
datestyle = 'iso, mdy'  
timezone = 'PRC'  
client_encoding = UTF8          # actually, defaults to database  
lc_messages = 'C'                       # locale for system error message  
lc_monetary = 'C'                       # locale for monetary formatting  
lc_numeric = 'C'                        # locale for number formatting  
lc_time = 'C'                           # locale for time formatting  
default_text_search_config = 'pg_catalog.english'  

recovery.conf

recovery_target_timeline = 'latest'  
standby_mode = on  
primary_conninfo = 'host=127.0.0.1 port=1921 user=postgres'             # e.g. 'host=localhost port=5432'  

初始化pg_rman

mkdir -p /data05/digoal/pgstdbak  

指定备库$PGDATA以及备份目录

pg_rman init -D /data04/digoal/pg_root_1922 -B /data05/digoal/pgstdbak   
  
INFO: ARCLOG_PATH is set to '/data04/digoal/arc_log1922'  
INFO: SRVLOG_PATH is set to '/data04/digoal/pg_root_1922/pg_log'  

如果ARCLOG_PATH目录不是主库的归档目录,需要修改一下。(指PostgreSQL < 9.5的版本)

方法举例

1. 修改PostgreSQL内核,让备库支持归档(指PostgreSQL < 9.5的版本)

2. 主备使用分布式文件系统或共享的NFS作为归档目录(指PostgreSQL < 9.5的版本)

3. 主备分别开启nfs server服务,并且相互mount对方的归档目录(指PostgreSQL < 9.5的版本)

4. 改造pg_rman, 让它可以从pg_xlog目录备份需要的redo文件(可能需要流式备份,否则频繁产生redo的大库xlog容易被删除或覆盖)(指PostgreSQL < 9.5的版本)

那么pg_rman只要能访问到真正的归档位置即可。 (指PostgreSQL < 9.5的版本)

digoal@iZ28tqoemgtZ-> cd /data05/digoal/pgstdbak/  
digoal@iZ28tqoemgtZ-> ll  
total 16K  
drwx------ 4 digoal digoal 4.0K Aug 29 14:13 backup  
-rw-rw-r-- 1 digoal digoal   91 Aug 29 14:13 pg_rman.ini  
-rw-rw-r-- 1 digoal digoal   40 Aug 29 14:13 system_identifier  
drwx------ 2 digoal digoal 4.0K Aug 29 14:13 timeline_history  
  
digoal@iZ28tqoemgtZ-> cat pg_rman.ini   
ARCLOG_PATH='/data04/digoal/arc_log'   # 这个应该改成真实的归档目录, 而不是standby配置的(指PostgreSQL < 9.5的版本) 
SRVLOG_PATH='/data04/digoal/pg_root_1922/pg_log'  
  
digoal@iZ28tqoemgtZ-> cat system_identifier   
SYSTEM_IDENTIFIER='6318621837015461309'  

全量备份举例

export PGPASSWORD=postgres         # 超级用户密码  

pg_rman backup \    
-b full \                          # 全量备份   
-B /data05/digoal/pgstdbak \       # 备份目录  
-D /data04/digoal/pg_root_1922 \   # 备库的$PGDATA  
-s \                               # 备份pg_log  
-Z \                               # 压缩  
--keep-data-generations=3 \        # 保留3个全量备份,删除不需要的全量备份  
--keep-data-days=10 \              # 保证能恢复到10天内的任意时间点,删除不需要的  
--keep-arclog-files=30 \           # 保留最近30个归档文件  
--keep-arclog-days=20 \            # 保留20天内的归档文件  
--keep-srvlog-files=10 \           # 保留最近10个日志文件(pg_log)  
--keep-srvlog-days=20 \            # 保留20天内的日志文件(pg_log)  
--standby-host=127.0.0.1 \         # 如何连接standby  
--standby-port=1922 \              # 如何连接standby  
-h 127.0.0.1 \                     # 如何连接primary  
-p 1921 \                          # 如何连接primary  
-U postgres \                      # 如何连接primary, standby(超级用户)  
-d postgres                        # 如何连接primary, standby(database name)  
  

INFO: copying database files  
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived  
INFO: copying archived WAL files  
INFO: copying server log files  
INFO: backup complete  
HINT: Please execute 'pg_rman validate' to verify the files are correctly copied.  
INFO: start deleting old archived WAL files from ARCLOG_PATH (keep files = 30, keep days = 20)  
INFO: the threshold timestamp calculated by keep days is "2016-08-09 00:00:00"  
INFO: start deleting old server files from SRVLOG_PATH (keep files = 10, keep days = 20)  
INFO: the threshold timestamp calculated by keep days is "2016-08-09 00:00:00"  
INFO: start deleting old backup (keep generations = 3 AND keep after = 2016-08-19 00:00:00)  
INFO: does not include the backup just taken  
  

digoal@iZ28tqoemgtZ-> pg_rman validate -B /data05/digoal/pgstdbak  
INFO: validate: "2016-08-29 14:18:26" backup, archive log files and server log files by CRC  
INFO: backup "2016-08-29 14:18:26" is valid  


digoal@iZ28tqoemgtZ-> pg_rman show -B /data05/digoal/pgstdbak
==========================================================
 StartTime           Mode  Duration    Size   TLI  Status 
==========================================================
2016-08-29 15:05:32  FULL        0m   178MB     1  OK
2016-08-29 15:04:07  FULL        0m   178MB     1  OK
2016-08-29 15:02:48  FULL        0m   178MB     1  OK
2016-08-29 14:41:05  INCR        0m    275B     1  OK
2016-08-29 14:38:12  FULL        0m   178MB     1  OK
2016-08-29 14:18:26  FULL        0m   178MB     1  OK

Flag Counter

digoal’s 大量PostgreSQL文章入口