PostgreSQL 最佳实践 - pg_rman 以standby为源的备份浅析
背景
为了降低备份对数据库的性能影响,我们在有standby的情况下,可以选择从standby备份PostgreSQL。
pg_rman是一个备份工具,同样支持从standby备份数据库,但是有一些使用的注意事项。
例如需要连接主库,需要能读取归档目录,需要告诉主库和备库的连接地址,需要备库的$PGDATA等等。
为什么既要连接主库,还要能力连接备库。
pg_rman需要连接到主库执行pg_start_backup,pg_stop_backup。
同时需要连接到standby数据库查看pg_last_xlog_replay_location(),检查是否已恢复到pg_start_backup返回的LSN,然后才能开始备份。
归档如何备份?
因为PostgreSQL 9.5以前的版本,备库是不会产生归档的,所以如果从备库备份,必须确保pg_rman能访问备库postgresql.conf中配置的archive_command所指定的归档目录,或在pg_rman.ini中配置的ARCLOG_PATH。
并且确保ARCLOG_PATH所在的位置正是主库的归档写入位置。
通常这需要一个主备共享的归档位置,例如分布式文件系统,或主备都能访问的NFS。
如果你使用的是9.5或者更新的版本,那么可以配置备库的 archive_mode=always,让备库也产生归档
archive_mode (enum)
When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive_command.
In addition to off, to disable, there are two modes:
on, and always.
During normal operation, there is no difference between the two modes, but when set to always the WAL archiver is enabled also during archive recovery or standby mode.
In always mode, all files restored from the archive or streamed with streaming replication will be archived (again).
See Section 25.2.9 for details.
代码如下
/* Archive modes */
typedef enum ArchiveMode
{
ARCHIVE_MODE_OFF = 0, /* disabled */
ARCHIVE_MODE_ON, /* enabled while server is running normally */
ARCHIVE_MODE_ALWAYS /* enabled always (even during recovery) */
} ArchiveMode;
-------------------------
/* Is WAL archiving enabled always (even during recovery)? */
#define XLogArchivingAlways() \
(XLogArchiveMode == ARCHIVE_MODE_ALWAYS && wal_level >= WAL_LEVEL_ARCHIVE)
-------------------------
/*
* Archiver is allowed to start up at the current postmaster state?
*
* If WAL archiving is enabled always, we are allowed to start archiver
* even during recovery.
*/
#define PgArchStartupAllowed() \
((XLogArchivingActive() && pmState == PM_RUN) || \
(XLogArchivingAlways() && \
(pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY)))
-------------------------
if (PgArchStartupAllowed() && PgArchPID == 0)
PgArchPID = pgarch_start(); // 启动archiver
另外,备份是从standby的$PGDATA拷贝文件的,所以你需要告诉pg_rman standby的$PGDATA位置。
配置备库
postgresql.conf
listen_addresses = '0.0.0.0' # what IP address(es) to listen on;
port = 1922 # (change requires restart)
max_connections = 100 # (change requires restart)
superuser_reserved_connections = 3 # (change requires restart)
unix_socket_directories = '.' # comma-separated list of directories
shared_buffers = 8GB # min 128kB
maintenance_work_mem = 512MB # min 1MB
autovacuum_work_mem = 1GB # min 1MB, or -1 to use maintenance_work_mem
vacuum_cost_delay = 0 # 0-100 milliseconds
bgwriter_delay = 10ms # 10-10000ms between rounds
wal_level = hot_standby # minimal, archive, hot_standby, or logical
fsync = on # turns forced synchronization on or off
synchronous_commit = off # synchronization level;
full_page_writes = off # recover from partial page writes
wal_buffers = 16MB # min 32kB, -1 sets based on shared_buffers
commit_delay = 0 # range 0-100000, in microseconds
commit_siblings = 15 # range 1-1000
checkpoint_timeout = 55min # range 30s-1h
max_wal_size = 32GB
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
archive_mode = always # enables archiving; off, on, or always
archive_command = 'cp %p /data04/digoal/arc_log1922/%f' # command to use to archive a logfile segment
max_wal_senders = 10 # max number of walsender processes
hot_standby = on # "on" allows queries during recovery
hot_standby_feedback = off # send info from standby to prevent
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_truncate_on_rotation = on # If on, an existing log file with the
log_error_verbosity = verbose # terse, default, or verbose messages
log_timezone = 'PRC'
autovacuum = on # Enable autovacuum subprocess? 'on'
autovacuum_max_workers = 8 # max number of autovacuum subprocesses
autovacuum_naptime = 30s # time between autovacuum runs
autovacuum_vacuum_threshold = 50 # min number of row updates before
autovacuum_analyze_threshold = 50 # min number of row updates before
autovacuum_vacuum_scale_factor = 0.002 # fraction of table size before vacuum
autovacuum_analyze_scale_factor = 0.001 # fraction of table size before analyze
autovacuum_vacuum_cost_delay = 0 # default vacuum cost delay for
datestyle = 'iso, mdy'
timezone = 'PRC'
client_encoding = UTF8 # actually, defaults to database
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
recovery.conf
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=127.0.0.1 port=1921 user=postgres' # e.g. 'host=localhost port=5432'
初始化pg_rman
mkdir -p /data05/digoal/pgstdbak
指定备库$PGDATA以及备份目录
pg_rman init -D /data04/digoal/pg_root_1922 -B /data05/digoal/pgstdbak
INFO: ARCLOG_PATH is set to '/data04/digoal/arc_log1922'
INFO: SRVLOG_PATH is set to '/data04/digoal/pg_root_1922/pg_log'
如果ARCLOG_PATH目录不是主库的归档目录,需要修改一下。(指PostgreSQL < 9.5的版本)
方法举例
1. 修改PostgreSQL内核,让备库支持归档(指PostgreSQL < 9.5的版本)
2. 主备使用分布式文件系统或共享的NFS作为归档目录(指PostgreSQL < 9.5的版本)
3. 主备分别开启nfs server服务,并且相互mount对方的归档目录(指PostgreSQL < 9.5的版本)
4. 改造pg_rman, 让它可以从pg_xlog目录备份需要的redo文件(可能需要流式备份,否则频繁产生redo的大库xlog容易被删除或覆盖)(指PostgreSQL < 9.5的版本)
那么pg_rman只要能访问到真正的归档位置即可。 (指PostgreSQL < 9.5的版本)
digoal@iZ28tqoemgtZ-> cd /data05/digoal/pgstdbak/
digoal@iZ28tqoemgtZ-> ll
total 16K
drwx------ 4 digoal digoal 4.0K Aug 29 14:13 backup
-rw-rw-r-- 1 digoal digoal 91 Aug 29 14:13 pg_rman.ini
-rw-rw-r-- 1 digoal digoal 40 Aug 29 14:13 system_identifier
drwx------ 2 digoal digoal 4.0K Aug 29 14:13 timeline_history
digoal@iZ28tqoemgtZ-> cat pg_rman.ini
ARCLOG_PATH='/data04/digoal/arc_log' # 这个应该改成真实的归档目录, 而不是standby配置的(指PostgreSQL < 9.5的版本)
SRVLOG_PATH='/data04/digoal/pg_root_1922/pg_log'
digoal@iZ28tqoemgtZ-> cat system_identifier
SYSTEM_IDENTIFIER='6318621837015461309'
全量备份举例
export PGPASSWORD=postgres # 超级用户密码
pg_rman backup \
-b full \ # 全量备份
-B /data05/digoal/pgstdbak \ # 备份目录
-D /data04/digoal/pg_root_1922 \ # 备库的$PGDATA
-s \ # 备份pg_log
-Z \ # 压缩
--keep-data-generations=3 \ # 保留3个全量备份,删除不需要的全量备份
--keep-data-days=10 \ # 保证能恢复到10天内的任意时间点,删除不需要的
--keep-arclog-files=30 \ # 保留最近30个归档文件
--keep-arclog-days=20 \ # 保留20天内的归档文件
--keep-srvlog-files=10 \ # 保留最近10个日志文件(pg_log)
--keep-srvlog-days=20 \ # 保留20天内的日志文件(pg_log)
--standby-host=127.0.0.1 \ # 如何连接standby
--standby-port=1922 \ # 如何连接standby
-h 127.0.0.1 \ # 如何连接primary
-p 1921 \ # 如何连接primary
-U postgres \ # 如何连接primary, standby(超级用户)
-d postgres # 如何连接primary, standby(database name)
INFO: copying database files
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
INFO: copying archived WAL files
INFO: copying server log files
INFO: backup complete
HINT: Please execute 'pg_rman validate' to verify the files are correctly copied.
INFO: start deleting old archived WAL files from ARCLOG_PATH (keep files = 30, keep days = 20)
INFO: the threshold timestamp calculated by keep days is "2016-08-09 00:00:00"
INFO: start deleting old server files from SRVLOG_PATH (keep files = 10, keep days = 20)
INFO: the threshold timestamp calculated by keep days is "2016-08-09 00:00:00"
INFO: start deleting old backup (keep generations = 3 AND keep after = 2016-08-19 00:00:00)
INFO: does not include the backup just taken
digoal@iZ28tqoemgtZ-> pg_rman validate -B /data05/digoal/pgstdbak
INFO: validate: "2016-08-29 14:18:26" backup, archive log files and server log files by CRC
INFO: backup "2016-08-29 14:18:26" is valid
digoal@iZ28tqoemgtZ-> pg_rman show -B /data05/digoal/pgstdbak
==========================================================
StartTime Mode Duration Size TLI Status
==========================================================
2016-08-29 15:05:32 FULL 0m 178MB 1 OK
2016-08-29 15:04:07 FULL 0m 178MB 1 OK
2016-08-29 15:02:48 FULL 0m 178MB 1 OK
2016-08-29 14:41:05 INCR 0m 275B 1 OK
2016-08-29 14:38:12 FULL 0m 178MB 1 OK
2016-08-29 14:18:26 FULL 0m 178MB 1 OK