zfs 快照增量恢复 vs PostgreSQL xlog增量恢复 速度

4 minute read

背景

测试一下哪个恢复速度快。

测试环境同

《zfs 快照增量大小 vs PostgreSQL产生的XLOG大小》

开启归档

开启full page write

postgres=# show archive_mode ;  
 archive_mode   
--------------  
 on  
(1 row)  
  
postgres=# show archive_command ;  
                           archive_command                              
----------------------------------------------------------------------  
 cp %p /u02/dege.zzz/archive/%f  
(1 row)  
  
postgres=# show full_page_writes ;  
 full_page_writes   
------------------  
 on  
(1 row)  

测试1

大范围测试

创建检查点

postgres=# checkpoint;  
CHECKPOINT  
  
start backup  
postgres=# select pg_start_backup('a');  
 pg_start_backup   
-----------------  
 5AA/7F000028  
(1 row)  

创建快照1

#zfs snapshot zp1/data01@1  
  
stop backup  
postgres=# select pg_stop_backup();  
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived  
 pg_stop_backup   
----------------  
 5AA/7F000130  
(1 row)  

压力测试600秒

pgbench -M prepared -n -r -P 1 -c 48 -j 48 -T 600  

创建检查点

postgres=# checkpoint;  
CHECKPOINT  
  
start backup  
postgres=# select pg_start_backup('a');  
 pg_start_backup   
-----------------  
 5BC/98000060  
(1 row)  

创建快照2

#zfs snapshot zp1/data01@2  
  
stop backup  
postgres=# select pg_stop_backup();  
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived  
 pg_stop_backup   
----------------  
 5BC/98000168  
(1 row)  

生成XID

postgres=# select txid_current();  
 txid_current   
--------------  
    233178486  
(1 row)  

切换日志

postgres=# select pg_switch_xlog();  
 pg_switch_xlog   
----------------  
 5BC/990000D8  
(1 row)  

关闭数据库

pg_ctl stop -m fast  
waiting for server to shut down........ done  
server stopped  

备份快照增量

#zfs send -i zp1/data01@1 zp1/data01@2 > /u02/dege.zzz/zfs_back/1_2  

回滚到快照1

#zfs rollback -r zp1/data01@1  
#zfs list -t snapshot  
zp1/data01@1             16K      -  1.01T  -  

使用ZFS增量恢复

计时开始

使用zfs快照恢复

计时结束

#date +%F%T; cat /u02/dege.zzz/zfs_back/1_2 | zfs receive -d zp1/data01; date +%F%T;  
2016-01-0415:49:57  
2016-01-0415:58:04  
  
#zfs list -t snapshot  
zp1/data01@1            480K      -  1.01T  -  
zp1/data01@2               0      -  1.01T  -  

计时开始

关闭归档

vi postgresql.conf  
archive_mode = off  

创建recovery.conf

恢复到指定XID

vi recovery.conf  
restore_command = 'cp /u02/dege.zzz/archive/%f %p'  
recovery_target_xid = '233178486'  

计时结束

pg_ctl start  
2016-01-04 16:11:05.366 CST,,,32353,,568a2919.7e61,2,,2016-01-04 16:11:05 CST,,0,LOG,00000,"starting point-in-time recovery to XID 233178486",,,,,,,,"StartupXLOG, xlog.c:6011",""  
2016-01-04 16:11:05.401 CST,,,32353,,568a2919.7e61,3,,2016-01-04 16:11:05 CST,,0,LOG,00000,"restored log file ""00000002000005BC00000098"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 16:11:05.408 CST,,,32353,,568a2919.7e61,4,,2016-01-04 16:11:05 CST,1/0,0,LOG,00000,"redo starts at 5BC/98000060",,,,,,,,"StartupXLOG, xlog.c:6695",""  
2016-01-04 16:11:05.409 CST,,,32353,,568a2919.7e61,5,,2016-01-04 16:11:05 CST,1/0,0,LOG,00000,"consistent recovery state reached at 5BC/98000168",,,,,,,,"CheckRecoveryConsistency, xlog.c:7487",""  
2016-01-04 16:11:05.409 CST,,,32351,,568a2919.7e5f,2,,2016-01-04 16:11:05 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,"sigusr1_handler, postmaster.c:4929",""  
2016-01-04 16:11:05.441 CST,,,32353,,568a2919.7e61,6,,2016-01-04 16:11:05 CST,1/0,0,LOG,00000,"restored log file ""00000002000005BC00000099"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 16:11:05.445 CST,,,32353,,568a2919.7e61,7,,2016-01-04 16:11:05 CST,1/0,0,LOG,00000,"recovery stopping after commit of transaction 233178486, time 2016-01-04 15:33:50.559409+08",,,,,,,,"recoveryStopsAfter, xlog.c:5575",""  
2016-01-04 16:11:05.445 CST,,,32353,,568a2919.7e61,8,,2016-01-04 16:11:05 CST,1/0,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,"recoveryPausesHere, xlog.c:5621",""  

使用zfs快照增量恢复,总耗时 8分7秒。

使用XLOG增量恢复

停库

pg_ctl stop -m fast  

回滚到快照1

#zfs rollback -r zp1/data01@1  

计时开始

关闭归档

vi postgresql.conf  
archive_mode = off  

创建recovery.conf

恢复到指定XID

vi recovery.conf  
restore_command = 'cp /u02/dege.zzz/archive/%f %p'  
recovery_target_xid = '233178486'  

计时结束

pg_ctl start  
  
2016-01-04 16:15:15.511 CST,,,32671,,568a2a13.7f9f,2,,2016-01-04 16:15:15 CST,,0,LOG,00000,"starting point-in-time recovery to XID 233178486",,,,,,,,"StartupXLOG, xlog.c:6011",""  
2016-01-04 16:15:15.553 CST,,,32671,,568a2a13.7f9f,3,,2016-01-04 16:15:15 CST,,0,LOG,00000,"restored log file ""00000002000005AA0000007F"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 16:15:15.559 CST,,,32671,,568a2a13.7f9f,4,,2016-01-04 16:15:15 CST,1/0,0,LOG,00000,"redo starts at 5AA/7F000028",,,,,,,,"StartupXLOG, xlog.c:6695",""  
2016-01-04 16:15:15.559 CST,,,32671,,568a2a13.7f9f,5,,2016-01-04 16:15:15 CST,1/0,0,LOG,00000,"consistent recovery state reached at 5AA/7F000130",,,,,,,,"CheckRecoveryConsistency, xlog.c:7487",""  
2016-01-04 16:15:15.559 CST,,,32669,,568a2a13.7f9d,2,,2016-01-04 16:15:15 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,"sigusr1_handler, postmaster.c:4929",""  
2016-01-04 16:15:15.596 CST,,,32671,,568a2a13.7f9f,6,,2016-01-04 16:15:15 CST,1/0,0,LOG,00000,"restored log file ""00000002000005AA00000080"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 16:15:15.675 CST,,,32671,,568a2a13.7f9f,7,,2016-01-04 16:15:15 CST,1/0,0,LOG,00000,"restored log file ""00000002000005AA00000081"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
......  
2016-01-04 16:26:27.890 CST,,,32671,,568a2a13.7f9f,4639,,2016-01-04 16:15:15 CST,1/0,0,LOG,00000,"restored log file ""00000002000005BC00000099"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 16:26:27.891 CST,,,32671,,568a2a13.7f9f,4640,,2016-01-04 16:15:15 CST,1/0,0,LOG,00000,"recovery stopping after commit of transaction 233178486, time 2016-01-04 15:33:50.559409+08",,,,,,,,"recoveryStopsAfter, xlog.c:5575",""  
2016-01-04 16:26:27.891 CST,,,32671,,568a2a13.7f9f,4641,,2016-01-04 16:15:15 CST,1/0,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,"recoveryPausesHere, xlog.c:5621",""  

使用XLOG增量恢复,总耗时 11分12秒。

激活

postgres=# select pg_xlog_replay_resume();  
 pg_xlog_replay_resume   
-----------------------  
   
(1 row)  

开启归档

测试2

小范围测试

删除快照1,2

#zfs destroy zp1/data01@1  
#zfs destroy zp1/data01@2  

创建检查点

postgres=# checkpoint;  
CHECKPOINT  
  
start backup  
postgres=# select pg_start_backup('a');  
 pg_start_backup   
-----------------  
 5BC/9A000028  
(1 row)  

创建快照1

#zfs snapshot zp1/data01@1  
  
stop backup  
postgres=# select pg_stop_backup();  
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup  
 pg_stop_backup   
----------------  
 5BC/9A000130  
(1 row)  

压力测试600秒

pgbench -M prepared -n -r -P 1 -c 48 -j 48 -T 600 digoal  

创建检查点

postgres=# checkpoint;  
  
start backup  
postgres=# select pg_start_backup('a');  

创建快照2

#zfs snapshot zp1/data01@2  
  
stop backup  
postgres=# select pg_stop_backup();  

生成XID

postgres=# select txid_current();  
 txid_current   
--------------  
    283000806  
(1 row)  

切换日志

postgres=# select pg_switch_xlog();  

关闭数据库

pg_ctl stop -m fast  

备份快照增量

# nohup zfs send -i zp1/data01@1 zp1/data01@2 > /u02/dege.zzz/zfs_back/new_1_2 &  

回滚到快照1

#zfs rollback -r zp1/data01@1  

使用ZFS增量恢复

计时开始

使用zfs快照恢复

计时结束

#date +%F%T; cat /u02/dege.zzz/zfs_back/new_1_2 | zfs receive -d zp1/data01; date +%F%T;  
2016-01-0417:09:13  
2016-01-0417:09:33  

计时开始

关闭归档

vi postgresql.conf  
archive_mode = off  

创建recovery.conf

恢复到指定XID

vi recovery.conf  
restore_command = 'cp /u02/dege.zzz/archive/%f %p'  
recovery_target_xid = '283000806'  

计时结束

pg_ctl start  
  
2016-01-04 17:10:42.548 CST,,,51498,,568a3712.c92a,2,,2016-01-04 17:10:42 CST,,0,LOG,00000,"starting point-in-time recovery to XID 283000806",,,,,,,,"StartupXLOG, xlog.c:6011",""  
2016-01-04 17:10:42.584 CST,,,51498,,568a3712.c92a,3,,2016-01-04 17:10:42 CST,,0,LOG,00000,"restored log file ""00000003000005C300000060"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 17:10:42.592 CST,,,51498,,568a3712.c92a,4,,2016-01-04 17:10:42 CST,1/0,0,LOG,00000,"redo starts at 5C3/60043250",,,,,,,,"StartupXLOG, xlog.c:6695",""  
2016-01-04 17:10:42.654 CST,,,51498,,568a3712.c92a,5,,2016-01-04 17:10:42 CST,1/0,0,LOG,00000,"restored log file ""00000003000005C300000061"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 17:10:42.716 CST,,,51498,,568a3712.c92a,6,,2016-01-04 17:10:42 CST,1/0,0,LOG,00000,"restored log file ""00000003000005C300000062"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
...  
2016-01-04 17:10:44.224 CST,,,51498,,568a3712.c92a,33,,2016-01-04 17:10:42 CST,1/0,0,LOG,00000,"recovery stopping after commit of transaction 283000806, time 2016-01-04 17:06:25.459104+08",,,,,,,,"recoveryStopsAfter, xlog.c:5575",""  
2016-01-04 17:10:44.224 CST,,,51498,,568a3712.c92a,34,,2016-01-04 17:10:42 CST,1/0,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,"recoveryPausesHere, xlog.c:5621",""  

使用zfs快照增量恢复,总耗时 22秒。

使用XLOG增量恢复

停库

pg_ctl stop -m fast  

回滚到快照1

#zfs rollback -r zp1/data01@1  

计时开始

vi postgresql.conf  
archive_mode = off  

创建recovery.conf

恢复到指定XID

vi recovery.conf  
restore_command = 'cp /u02/dege.zzz/archive/%f %p'  
recovery_target_xid = '283000806'  

计时结束

pg_ctl start  
  
2016-01-04 17:13:20.544 CST,,,51775,,568a37b0.ca3f,2,,2016-01-04 17:13:20 CST,,0,LOG,00000,"starting point-in-time recovery to XID 283000806",,,,,,,,"StartupXLOG, xlog.c:6011",""  
2016-01-04 17:13:20.580 CST,,,51775,,568a37b0.ca3f,3,,2016-01-04 17:13:20 CST,,0,LOG,00000,"restored log file ""00000003000005BF000000EC"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 17:13:20.584 CST,,,51775,,568a37b0.ca3f,4,,2016-01-04 17:13:20 CST,1/0,0,LOG,00000,"redo starts at 5BF/EC000028",,,,,,,,"StartupXLOG, xlog.c:6695",""  
2016-01-04 17:13:20.584 CST,,,51775,,568a37b0.ca3f,5,,2016-01-04 17:13:20 CST,1/0,0,LOG,00000,"consistent recovery state reached at 5BF/EC000130",,,,,,,,"CheckRecoveryConsistency, xlog.c:7487",""  
2016-01-04 17:13:20.585 CST,,,51773,,568a37b0.ca3d,2,,2016-01-04 17:13:20 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,"sigusr1_handler, postmaster.c:4929",""  
2016-01-04 17:13:20.618 CST,,,51775,,568a37b0.ca3f,6,,2016-01-04 17:13:20 CST,1/0,0,LOG,00000,"restored log file ""00000003000005BF000000ED"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
...  
2016-01-04 17:19:50.393 CST,,,51775,,568a37b0.ca3f,915,,2016-01-04 17:13:20 CST,1/0,0,LOG,00000,"restored log file ""00000003000005C30000007A"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 17:19:50.455 CST,,,51775,,568a37b0.ca3f,916,,2016-01-04 17:13:20 CST,1/0,0,LOG,00000,"restored log file ""00000003000005C30000007B"" from archive",,,,,,,,"RestoreArchivedFile, xlogarchive.c:254",""  
2016-01-04 17:19:50.461 CST,,,51775,,568a37b0.ca3f,917,,2016-01-04 17:13:20 CST,1/0,0,LOG,00000,"recovery stopping after commit of transaction 283000806, time 2016-01-04 17:06:25.459104+08",,,,,,,,"recoveryStopsAfter, xlog.c:5575",""  
2016-01-04 17:19:50.461 CST,,,51775,,568a37b0.ca3f,918,,2016-01-04 17:13:20 CST,1/0,0,LOG,00000,"recovery has paused",,"Execute pg_xlog_replay_resume() to continue.",,,,,,"recoveryPausesHere, xlog.c:5621",""  
2016-01-04 17:20:50.059 CST,,,51779,,568a37b0.ca43,2,,2016-01-04 17:13:20 CST,,0,LOG,00000,"restartpoint complete: wrote 95360 buffers (9.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=149.340 s, sync=0.000 s, total=149.373 s; sync files=20, longest=0.000 s, average=0.000 s; distance=5662284 kB, estimate=5662284 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8061",""  
2016-01-04 17:20:50.059 CST,,,51779,,568a37b0.ca43,3,,2016-01-04 17:13:20 CST,,0,LOG,00000,"recovery restart point at 5C1/45993310","last completed transaction was at log time 2016-01-04 17:06:25.459104+08",,,,,,,"CreateRestartPoint, xlog.c:8886",""  

使用XLOG增量恢复,总耗时 6分30秒。

激活

postgres=# select pg_xlog_replay_resume();  
 pg_xlog_replay_resume   
-----------------------  
   
(1 row)  

开启归档

小结

1. 同一个阶段中,ZFS增量恢复速度比xlog增量恢复快,及时ZFS增量比xlog更大,也一样更快。

2. 当活跃数据较少时,ZFS比XLOG文件小很多,恢复速度快很多。

Flag Counter

digoal’s 大量PostgreSQL文章入口