PostgreSQL 最佳实践 - 块级增量备份(ZFS篇)验证 - recovery test script for zfs snapshot clone + postgresql stream replication + archive

2 minute read

背景

前面我写过一篇关于使用ZFS的快照特性和PostgreSQL流复制来完成数据库增量PITR备份的文章 :

http://blog.163.com/digoal@126/blog/static/163877040201451894734122/

本文将基于这个备份机, 写一个自动的恢复测试脚本.

(假设快照每天一个, 并规则命名)

2个脚本如下 :

# mkdir /root/test_recovery  
  
[root@db-pitr test_recovery]# cat recovery_entry.sh  
#!/bin/bash  
/root/test_recovery/recovery_test.sh pg000001 >>/root/test_recovery/pg000001.log 2>&1  
sleep 10  
/root/test_recovery/recovery_test.sh pg000002 >>/root/test_recovery/pg000002.log 2>&1  
  
  
  
[root@db-pitr test_recovery]# cat recovery_test.sh   
#!/bin/bash  
  
if [ $# -ne 1 ]; then  
  echo "for exp: ./sh pg000001"  
  exit 1  
fi  
  
UN="$1"  
date +%F%T  
echo $UN  
  
. /etc/profile  
. /home/${UN}/.bash_profile  
  
DATE=`date -d '-1 day' +%Y%m%d`  
zfs clone -o mountpoint=/test_recovery zp1/${UN}@$DATE zp1/test_recovery  
  
rm -f /test_recovery/pg_root/postgresql.conf  
rm -f /test_recovery/pg_root/recovery.conf  
rm -f /test_recovery/pg_root/postmaster.pid  
rm -f /test_recovery/pg_root/pg_xlog/*  
cp -p /root/test_recovery/${UN}_postgresql.conf /test_recovery/pg_root/postgresql.conf  
cp -p /root/test_recovery/${UN}_recovery.conf /test_recovery/pg_root/recovery.conf  
  
su - ${UN} -c "postgres -D /test_recovery/pg_root &"  
  
for ((i=1;i<180;i++)) do  
  echo $i  
  sleep 20  
  /pg_home/${UN}/pgsql/bin/psql -h 127.0.0.1 -p 11111 postgres postgres -c "select now(),* from pg_database;"  
  ret=$?  
  if [ $ret -eq 0 ]; then  
    break  
  fi  
done  
  
su - ${UN} -c "pg_ctl stop -w -t 6000 -m immediate -D /test_recovery/pg_root"  
sleep 60  
  
zfs destroy zp1/test_recovery  

并且为了防止恢复测试时, 一些参数冲突, 我们需要事先准备好恢复文件和配置文件, 如下 :

[root@db-pitr test_recovery]# pwd  
/root/test_recovery  
[root@db-pitr test_recovery]# ll  
total 164  
-rw-r--r-- 1 root     root      3267 Dec 10 10:54 pg000001.log  
-rw------- 1 pg000001 pg000001 20415 Dec 10 10:38 pg000001_postgresql.conf  
-rw-r--r-- 1 pg000001 pg000001  4793 Dec 10 10:39 pg000001_recovery.conf  
-rw-r--r-- 1 root     root      4520 Dec 10 10:58 pg000002.log  
-rw------- 1 pg000002 pg000002 19767 Dec 10 10:41 pg000002_postgresql.conf  
-rw-r--r-- 1 pg000002 pg000002  4795 Dec 10 10:41 pg000002_recovery.conf  
-r-x------ 1 root     root       478 Dec 10 10:50 recovery_entry.sh  
-r-x------ 1 root     root       985 Dec 10 10:32 recovery_test.sh  

与本机standby的配置相比, 主要修改几个地方, 防止冲突, max_connections是为了防止不可启动hot standby :

例如 :

pg000001_postgresql.conf  
port = 11111  
max_connections = 10000  
shared_buffers = 4096MB  
archive_mode = off  
hot_standby = on  
log_directory = 'pg_log'  
  
pg000001_recovery.conf  
restore_command = 'cp /pg_arch/pg000001/arch/*/%f %p'  
standby_mode = on  
#primary_conninfo  

测试 :

# crontab -e  
52 10 10 6 * /root/test_recovery/recovery_entry.sh  
52 10 10 12 * /root/test_recovery/recovery_entry.sh  

每年6月10日10点52, 12月10日10点52自动执行.

执行恢复脚本的日志输出举例 :

# cat pg000001.log   
2014-12-1010:52:02  
pg000001  
rm: cannot remove `/test_recovery/pg_root/pg_xlog/archive_status': Is a directory  
1  
LOG:  00000: loaded library "pg_stat_statements"  
LOCATION:  load_libraries, miscinit.c:1296  
LOG:  00000: redirecting log output to logging collector process  
HINT:  Future log output will appear in directory "pg_log".  
LOCATION:  SysLogger_Start, syslogger.c:649  
             now              |    datname     | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid | dattablespace |                                                                datacl                                                                  
------------------------------+----------------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------  
 2014-12-10 10:54:11.77975+08 | template0      |     10 |        6 | C          | C        | t             | f            |           -1 |         12809 |         1674 |          1 |          1663 | {=c/postgres,postgres=CTc/postgres}  
 2014-12-10 10:54:11.77975+08 | postgres       |     10 |        6 | C          | C        | f             | t            |           -1 |         12809 |         1674 |          1 |          1663 |   
 2014-12-10 10:54:11.77975+08 | digoal |  16396 |        6 | C          | C        | f             | t            |           -1 |         12809 |         1674 |          1 |          1663 |   
(3 rows)  
  
waiting for server to shut down.... done  
server stopped  

参考

1. http://blog.163.com/digoal@126/blog/static/163877040201451894734122/

Flag Counter

digoal’s 大量PostgreSQL文章入口