PostgreSQL 最佳实践 - pg_rman 数据库恢复示例 与 软件限制解说

8 minute read

背景

pg_rman备份已经讲完了,接下来讲一下数据恢复。

由于pg_rman使用了物理备份,所以恢复时,与普通物理备份的恢复原理是一样的。

需要将数据文件恢复,同时需要提供recovery.conf,在recovery.conf中指定需要恢复到哪个位置,以及如何获取XLOG归档文件等配置。

数据库恢复

pg_rman数据恢复时的两个必要要素
1. 新的$PGDATA
2. 备份目录

命令的选项也很简单,甚至可以不指定任何option

Restore options:  
The parameters which are started with –recovery are same as parameters in recovery.conf. See also “Recovery Configuration” for details.  

接下来的几个配置,与recovery.conf的意思对齐。

如果不指定时间线,则使用$PGDATA/global/pg_control,如果没有$PGDATA/global/pg_control,则使用最新的全量备份集的时间线。

--recovery-target-timeline TIMELINE  
Specifies recovering into a particular timeline. If not specified, the current timeline from ($PGDATA/global/pg_control) is used.  

如果不指定,则恢复到最新时间

--recovery-target-time TIMESTAMP  
This parameter specifies the time stamp up to which recovery will proceed. If not specified, continue recovery to the latest time.  

如果不指定,则恢复到最新xid

--recovery-target-xid XID  
This parameter specifies the transaction ID up to which recovery will proceed. If not specified, continue recovery to the latest xid.  

如果不指定,则默认使用true,即恢复到包含恢复目标XID的commit record为止,或者第一笔commit record ts>指定ts的 commit redo record为止;
如果是false则不apply恢复目标XID的commit record,或者不apply第一笔commit record ts>=指定ts的 commit redo record。

--recovery-target-inclusive  
Specifies whether we stop just after the specified recovery target (true), or just before the recovery target (false). Default is true.  

是否使用硬链接复制archive log,而不需要拷贝文件

The following parameter determines the behavior of restore.  
--hard-copy  
The archive WAL are copied to archive WAL storage area. If not specified, pg_rman makes symbolic link to archive WAL where are in the backup catalog directory.  

例子

恢复时,需要注意,你可以选择原地恢复(覆盖式),或者使用新的$PGDATA作为恢复目标。

但是请注意,不管是哪种恢复方式,如果在本机恢复的话,pg_rman可能会覆盖原有的数据文件,arch, pg_xlog目录中的文件,所以,如果你要保留原数据,建议先将原数据目录重命名。

1. 原地恢复

2. 使用新的$PGDATA恢复

在本机恢复的例子
1. 停库

pg_ctl stop -m fast -D /data04/digoal/pg_root_1922  

2. 重命名原数据相关目录

  $PGDATA  
mv /data04/digoal/pg_root_1922 /data04/digoal/old_pg_root_1922  

  PG_XLOG  
mv /data05/digoal/pg_xlog_1922 /data05/digoal/old_pg_xlog_1922  

  表空间  
mv /data02/digoal/tbs1_1922 /data02/digoal/old_tbs1_1922  

  归档目录,除了要重命名,还需要新建一个原目录  
mv /data04/digoal/arc_log1922 /data04/digoal/old_arc_log1922  
mkdir /data04/digoal/arc_log1922  

...  

3. pg_rman restore

pg_rman restore -B /data05/digoal/pgstdbak -D /data04/digoal/pg_restore_root   
  
WARNING: could not open pg_controldata file "/data04/digoal/pg_restore_root/global/pg_control": No such file or directory  
WARNING: could not open pg_controldata file "/data04/digoal/pg_restore_root/global/pg_control": No such file or directory  
INFO: the recovery target timeline ID is not given  
INFO: use timeline ID of latest full backup as recovery target: 1  
INFO: calculating timeline branches to be used to recovery target point  
INFO: searching latest full backup which can be used as restore start point  
INFO: found the full backup can be used as base in recovery: "2016-08-29 15:05:32"  
INFO: copying online WAL files and server log files  
INFO: clearing restore destination  
INFO: validate: "2016-08-29 15:05:32" backup, archive log files and server log files by SIZE  
INFO: backup "2016-08-29 15:05:32" is valid  
INFO: restoring database files from the full mode backup "2016-08-29 15:05:32"  
INFO: searching incremental backup to be restored  
INFO: validate: "2016-08-29 15:13:10" backup, archive log files and server log files by SIZE  
INFO: backup "2016-08-29 15:13:10" is valid  
INFO: restoring database files from the incremental mode backup "2016-08-29 15:13:10"  
INFO: searching backup which contained archived WAL files to be restored  
INFO: backup "2016-08-29 15:13:10" is valid  
INFO: restoring WAL files from backup "2016-08-29 15:13:10"  
INFO: restoring online WAL files and server log files  
INFO: generating recovery.conf  
INFO: restore complete  
HINT: Recovery will start automatically when the PostgreSQL server is started.  

4. 修改postgresql.conf, recovery.conf
根据实际情况调整,本例不修改

cd /data04/digoal/pg_restore_root/  

digoal@iZ28tqoemgtZ-> ll  
total 124K  
-rw------- 1 digoal digoal  193 Aug 29 17:05 backup_label.old  
drwx------ 7 digoal digoal 4.0K Aug 29 17:05 base  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 global  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_clog  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_commit_ts  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_dynshmem  
-rw------- 1 digoal digoal 4.4K Aug 29 17:05 pg_hba.conf  
-rw------- 1 digoal digoal 1.6K Aug 29 17:05 pg_ident.conf  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_log  
drwx------ 4 digoal digoal 4.0K Aug 29 17:05 pg_logical  
drwx------ 4 digoal digoal 4.0K Aug 29 17:05 pg_multixact  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_notify  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_replslot  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_serial  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_snapshots  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_stat  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_stat_tmp  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_subtrans  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_tblspc  
drwx------ 2 digoal digoal 4.0K Aug 29 17:05 pg_twophase  
-rw------- 1 digoal digoal    4 Aug 29 17:05 PG_VERSION  
lrwxrwxrwx 1 digoal digoal   27 Aug 29 17:05 pg_xlog -> /data05/digoal/pg_xlog_1922  
-rw------- 1 digoal digoal   88 Aug 29 17:05 postgresql.auto.conf  
-rw------- 1 digoal digoal  22K Aug 29 17:05 postgresql.conf  
-rw------- 1 digoal digoal   44 Aug 29 17:05 postmaster.opts  
-rw-r--r-- 1 digoal digoal  130 Aug 29 17:06 recovery.conf  
  
digoal@iZ28tqoemgtZ-> cd pg_tblspc/  
digoal@iZ28tqoemgtZ-> ll  
total 0  
lrwxrwxrwx 1 digoal digoal 24 Aug 29 17:05 16719 -> /data02/digoal/tbs1_1922  
  
cd ..  
vi postgresql.conf  
archive_command = 'cp %p /data04/digoal/arc_log1922/%f'  
  
vi recovery.conf  
# recovery.conf generated by pg_rman 1.3.2  
restore_command = 'cp /data04/digoal/arc_log1922/%f %p'  
recovery_target_timeline = '1'  

5. 如果备份集的时间线发生了变化,需要先手工拷贝到归档目录,再执行restore

digoal@iZ28tqoemgtZ-> pg_rman restore -B /data05/digoal/pgstdbak -D /data04/digoal/pg_restore_root   

WARNING: could not open pg_controldata file "/data04/digoal/pg_restore_root/global/pg_control": No such file or directory  
WARNING: could not open pg_controldata file "/data04/digoal/pg_restore_root/global/pg_control": No such file or directory  
INFO: the recovery target timeline ID is not given  
INFO: use timeline ID of latest full backup as recovery target: 2  
INFO: calculating timeline branches to be used to recovery target point  
ERROR: could not open destination file "/data04/digoal/arc_log1922/00000002.history": No such file or directory  
  
digoal@iZ28tqoemgtZ-> mkdir /data04/digoal/arc_log1922  

digoal@iZ28tqoemgtZ-> cp /data05/digoal/pgstdbak/timeline_history/00000002.history /data04/digoal/arc_log1922/  
  
digoal@iZ28tqoemgtZ-> pg_rman restore -B /data05/digoal/pgstdbak -D /data04/digoal/pg_restore_root   

WARNING: could not open pg_controldata file "/data04/digoal/pg_restore_root/global/pg_control": No such file or directory  
WARNING: could not open pg_controldata file "/data04/digoal/pg_restore_root/global/pg_control": No such file or directory  
INFO: the recovery target timeline ID is not given  
INFO: use timeline ID of latest full backup as recovery target: 2  
INFO: calculating timeline branches to be used to recovery target point  
INFO: searching latest full backup which can be used as restore start point  
INFO: found the full backup can be used as base in recovery: "2016-08-29 17:14:20"  
INFO: copying online WAL files and server log files  
INFO: clearing restore destination  
INFO: validate: "2016-08-29 17:14:20" backup and archive log files by SIZE  
INFO: backup "2016-08-29 17:14:20" is valid  
INFO: restoring database files from the full mode backup "2016-08-29 17:14:20"  
INFO: searching incremental backup to be restored  
INFO: searching backup which contained archived WAL files to be restored  
INFO: backup "2016-08-29 17:14:20" is valid  
INFO: restoring WAL files from backup "2016-08-29 17:14:20"  
INFO: restoring online WAL files and server log files  
INFO: generating recovery.conf  
INFO: restore complete  
HINT: Recovery will start automatically when the PostgreSQL server is started.  
  
digoal@iZ28tqoemgtZ-> cd /data04/digoal/arc_log1922  
digoal@iZ28tqoemgtZ-> ll  
total 16K  
lrwxrwxrwx 1 digoal digoal 71 Aug 29 17:18 00000002000000470000000C -> /data05/digoal/pgstdbak/20160829/171420/arclog/00000002000000470000000C  
lrwxrwxrwx 1 digoal digoal 71 Aug 29 17:18 00000002000000470000000D -> /data05/digoal/pgstdbak/20160829/171420/arclog/00000002000000470000000D  
lrwxrwxrwx 1 digoal digoal 87 Aug 29 17:18 00000002000000470000000D.00000028.backup -> /data05/digoal/pgstdbak/20160829/171420/arclog/00000002000000470000000D.00000028.backup  
-rw------- 1 digoal digoal 42 Aug 29 17:18 00000002.history  

6. 启动恢复目标数据库

digoal@iZ28tqoemgtZ-> pg_ctl start -D /data04/digoal/pg_restore_root  
server starting  
digoal@iZ28tqoemgtZ-> LOG:  00000: redirecting log output to logging collector process  
HINT:  Future log output will appear in directory "pg_log".  
LOCATION:  SysLogger_Start, syslogger.c:622  
  
digoal@iZ28tqoemgtZ-> psql -h 127.0.0.1 -p 1922  
psql (9.5.3)  
Type "help" for help.  
  
postgres=# select pg_is_in_recovery();  
 pg_is_in_recovery   
-------------------  
 f  
(1 row)  
  
postgres=# \l+  
                                                                   List of databases  
        Name        |  Owner   | Encoding | Collate | Ctype |   Access privileges   |  Size   | Tablespace |                Description                   
--------------------+----------+----------+---------+-------+-----------------------+---------+------------+--------------------------------------------  
 contrib_regression | postgres | UTF8     | C       | C     |                       | 7137 kB | pg_default |   
 db1                | postgres | UTF8     | C       | C     |                       | 111 MB  | tbs1       |   
 postgres           | postgres | UTF8     | C       | C     |                       | 797 MB  | pg_default | default administrative connection database  
 template0          | postgres | UTF8     | C       | C     | =c/postgres          +| 7137 kB | pg_default | unmodifiable empty database  
                    |          |          |         |       | postgres=CTc/postgres |         |            |   
 template1          | postgres | UTF8     | C       | C     | =c/postgres          +| 7137 kB | pg_default | default template for new databases  
                    |          |          |         |       | postgres=CTc/postgres |         |            |   
(5 rows)  

注意事项

1. 小心覆盖原有的部分

2. 建议先将原有的目录重命名,或者在其他机器恢复,

3. 软链接的处理,会恢复到目标,并且重新建立软链接,所以目录结构必须与备份时保持一致。

4. 用户可以指定$PGDATA,恢复到新的目标目录,但是arch_log, 表空间, pg_xlog目录无法指定新的位置,所以原地还原时,必须注意这些目录可能被覆盖,先重命名是比较好的手段。

pg_rman 可以优化的地方

1. 检查哪些要恢复的块与目标块的CRC是否一致,如果一致,不需要拷贝,减少WRITE。

pg_rman 软件限制

pg_rman的使用限制

pg_rman has the following restrictions.

1. Requires to read database cluster directory and write backup catalog directory.
For example, you need to mount the disk where backup catalog is placed with NFS from database server.
实际上不是必须的,如果没有指定原来的$PGDATA,则使用备份集的元数据。

2. Block sizes of pg_rman and server should be matched. BLCKSZ and XLOG_BLCKSZ also should be matched.
编译pg_rman时,最好使用启动数据集的集群软件的pg_config。 确保块大小一致。
因为需要做块的校验。 读取LSN等,都与块大小有关。

3. If there are some unreadable files/directories in database cluster directory, WAL directory or archived WAL directory, the backup or restore would be failed.

4. When taking an incremental backup, pg_rman check the timeline ID of the target database whether it is the same with the one of the full backup in backup list.
But, pg_rman does not check whether the data itself is same with the full backup in backup list.
So, you can take an incremental backup over the full backup against the database which has the same timeline ID but has different data.

从standby备份时的软件限制

Getting backup from standby-site, pg_rman has the follow restrictions too.

1. The environment of replication should be built right, or the backup will not finish.

2. You can’t get backups on master and standby at the same time.
因为pg_rman使用的是exclusive bakcup,(pg_start_backup),所以同一时间,只能跑一次pg_start_backup。
pg_basebackup则使用的是shared backup,可以跑多个。

3. You can’t get backups on multi standbys at the same time too.
道理同上

4. Basically, the backup from standby-site is used for restoring on MASTER.
pg_rman doesn’t treat the backup as restoring on standby automatically.

5. If you want to restore the backup on STANDBY, you have to manage archive logs with your self.
因为备库不归档,所以从standby备份时,需要解决归档备份的问题。
我在前面的文档中已经提及,包括解决思路。

如果使用快照备份,有哪些软件限制

When using storage snapshot, pg_rman has the following restrictions too.

1. If your snapshot does not have any file update time, incremental backup is same with full backup.

2. Because pg_rman judges performing full backup or incremental backup by update time for files.
If files don’t have update time because of storage snapshot specification, pg_rman performs full backup every time.

3. You can’t backup for one side works storage with split mirror snapshot.

4. Before you execute pg_rman, you should perform storage “RESYNC”.

5. After pg_rman performs backup with split mirror snapshot, storeage will be “SPLITTED”(works on one side).
pg_rman perform SPLIT command for getting snapshot, but doesn’t perform RESYNC command.
6. You cant’t get snapshot from different vendor storages in a time.

7. You cant’t use some vendor storages which have different commands for getting snapshot.

8. The script and commands for getting storage snapshot should be executable.

9. It’s expected to have authority of root for getting snapshot or mounting volumes.
So a user, performs pg_rman, is granted to execute any commands in the script.

10. If you use LVM(Logical Volume Manager), it’s needed root authority for mount, umount, lvcreate, lvremove, lvscan commands.
You should granted to these commands with sudo command to non-password executable.

Flag Counter

digoal’s 大量PostgreSQL文章入口