PostgreSQL 最佳实践 - 冷备份与还原介绍
背景
PostgreSQL 冷备份, 指在数据库关机状态下对数据库的数据文件进行的备份.
这种备份在生产中并不实用, 一般生产环境都需要保证数据库7*24小时不间断运行.
冷备份需要备份的是数据库集群主目录($PGDATA), 表空间目录, 事务日志(pg_xlog)目录.
如果在参数文件中指定了其他目录或文件, 某些也需要备份下来.
一、在备份前我们先要搞清楚需要备份哪些东西 :
1. 数据库目录 :
pg93@db-172-16-3-33-> cd $PGDATA
主目录所在位置.
pg93@db-172-16-3-33-> pwd
/pgdata1999
主目录中的文件和文件夹
pg93@db-172-16-3-33-> ll
total 140K
drwx------ 9 pg93 pg93 4.0K May 25 16:33 base
drwx------ 2 pg93 pg93 4.0K May 26 11:07 global
drwxr-xr-x 2 pg93 pg93 4.0K May 26 11:16 pgbak
drwx------ 2 pg93 pg93 4.0K May 24 14:52 pg_clog
-rw------- 1 pg93 pg93 4.6K May 25 17:59 pg_hba.conf
-rw------- 1 pg93 pg93 1.7K May 24 14:30 pg_ident.conf
drwx------ 2 pg93 pg93 12K May 26 11:06 pg_log
drwx------ 4 pg93 pg93 4.0K May 5 20:26 pg_multixact
drwx------ 2 pg93 pg93 4.0K May 26 11:06 pg_notify
drwx------ 2 pg93 pg93 4.0K May 5 20:26 pg_serial
drwx------ 2 pg93 pg93 4.0K May 5 20:26 pg_snapshots
drwx------ 2 pg93 pg93 4.0K May 26 11:06 pg_stat
drwx------ 2 pg93 pg93 4.0K May 26 11:16 pg_stat_tmp
drwx------ 2 pg93 pg93 16K May 16 15:30 pg_subtrans
drwx------ 2 pg93 pg93 4.0K May 26 11:15 pg_tblspc
drwx------ 2 pg93 pg93 4.0K May 5 20:26 pg_twophase
-rw------- 1 pg93 pg93 4 May 5 20:26 PG_VERSION
lrwxrwxrwx 1 pg93 pg93 44 May 26 11:15 pg_xlog -> /pgdata/digoal/1921/data03/pg93/1999/pg_xlog
-rw------- 1 pg93 pg93 20K May 24 09:36 postgresql.conf
-rw------- 1 pg93 pg93 27 May 26 11:06 postmaster.opts
-rw------- 1 pg93 pg93 64 May 26 11:06 postmaster.pid
-rw-r--r-- 1 root root 4.7K May 8 15:37 recovery.done
-rw-r--r-- 1 pg93 pg93 2.5K May 24 13:46 root.crt
-rw-r--r-- 1 pg93 pg93 1.3K May 24 13:37 server.crt
-r-------- 1 pg93 pg93 1.7K May 24 13:32 server.key
2. 事务日志目录
本例为
$PGDATA/pg_xlog
3. 表空间目录
进入表空间目录查看表空间 :
pg93@db-172-16-3-33-> cd pg_tblspc/
pg93@db-172-16-3-33-> ll
total 0
lrwxrwxrwx 1 pg93 pg93 47 May 26 11:15 26417 -> /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal
4. 配置文件中指定的在主目录以外的文件或目录
回到$PGDATA, 查看postgresql.conf中是否配置了$PGDATA以外的其他文件
pg93@db-172-16-3-33-> grep -E -i "dir|file" postgresql.conf
#data_directory = 'ConfigDir' # use data in another directory
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
ssl_cert_file = 'server.crt' # (change requires restart)
ssl_key_file = 'server.key' # (change requires restart)
ssl_ca_file = 'root.crt' # (change requires restart)
#ssl_crl_file = '' # (change requires restart)
#krb_server_keyfile = ''
#include_dir = 'conf.d' # include files ending in '.conf' from
# directory 'conf.d'
#include_if_exists = 'exists.conf' # include file only if it exists
#include = 'special.conf' # include file
以上配置文件中指定的目录或者文件同样需要备份下来.
二、检查哪些是不需要备份的
1. pg_xlog中的文件是不是需要全部备份呢?
在数据库关闭后, 通过控制文件的信息可以知道需要备份哪些文件.
注意是要在数据库关闭后查看哦, 否则是不准确的.
pg93@db-172-16-3-33-> pg_controldata
pg_control version number: 937
Catalog version number: 201304271
Database system identifier: 5874470726249995168
Database cluster state: shut down
pg_control last modified: Sun 26 May 2013 11:28:29 AM CST
Latest checkpoint location: E/E9000028
Prior checkpoint location: E/E8000028
Latest checkpoint's REDO location: E/E9000028
Latest checkpoint's REDO WAL file: 000000030000000E000000E9
Latest checkpoint's TimeLineID: 3
Latest checkpoint's PrevTimeLineID: 3
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/221848459
Latest checkpoint's NextOID: 26418
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 150439034
Latest checkpoint's oldestXID's DB: 12815
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Sun 26 May 2013 11:28:29 AM CST
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 100
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 16384
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0
我们需要的信息是
Latest checkpoint location: E/E9000028
Latest checkpoint's REDO location: E/E9000028
Latest checkpoint's TimeLineID: 3
从这3个信息都可以得到1个pg_xlog文件名, 表示这个数据库最后一次成功的checkpoint完成的xlog位置.
000000030000000E000000E9
对于PostgreSQL 9.3直接看这个就要可以了.
Latest checkpoint's REDO WAL file: 000000030000000E000000E9
因此需要备份的pg_xlog内容是从这个文件开始在内以及后面产生的一系列xlog文件.
本例中就只有这个文件, 如下.
pg93@db-172-16-3-33-> cd pg_xlog
pg93@db-172-16-3-33-> ll -rt
-rw------- 1 pg93 pg93 16M May 26 11:28 000000030000000E000000E9
drwx------ 2 pg93 pg93 32K May 26 11:28 archive_status
2. 不属于数据库系统的文件时不需要的.
例如在$PGDATA中,
drwxr-xr-x 2 pg93 pg93 4.0K May 26 11:16 pgbak
这个目录不是数据库集群的目录, 所以不需要备份.
另外要提一下, 尽量不要在$PGDATA或者其他数据库目录中放其他文件或目录, 这样会使备份过程变得复杂, 因为还要去过滤这些文件, 不过滤的话又浪费空间.
3. 数据库日志也是不需要备份的, 例如本例中的$PGDATA/pg_log目录.
三、备份
1. 准备好备份需要的空间. 可以是本地的也可以是异地的存储.
查看数据库的大小 :
digoal=# select round(sum(pg_database_size(oid))/1024/1024.0,2)||'MB' from pg_database;
?column?
----------
73.59MB
(1 row)
准备好可以放下整个备份的目录
[root@db-172-16-3-33 1999]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c0d0p1 29G 16G 12G 56% /
tmpfs 6.9G 0 6.9G 0% /dev/shm
/dev/mapper/vgdata01-lv03
135G 25G 104G 20% /pgdata/digoal/1921/data03
/dev/mapper/vgdata01-lv04
135G 69G 59G 54% /pgdata/digoal/1921/data04
/dev/mapper/vgdata01-lv05
135G 85G 44G 66% /pgdata/digoal/1921/data05
/dev/mapper/vgdata01-lv06
98G 189M 93G 1% /mnt
/mnt/enc_dir 98G 189M 93G 1% /mnt/enc_dir
[root@db-172-16-3-33 1999]# mkdir -p /pgdata/digoal/1921/data04/pg93backup
[root@db-172-16-3-33 1999]# chown pg93:pg93 /pgdata/digoal/1921/data04/pg93backup
2. 停库
pg93@db-172-16-3-33-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
3. 备份$PGDATA, 排除pg_xlog, pg_log以及不需要备份的目录pgbak.
pg93@db-172-16-3-33-> rsync -acvz -L --exclude "pg_xlog" --exclude "pgbak" --exclude "pg_log" $PGDATA /pgdata/digoal/1921/data04/pg93backup/
4. 备份pg_xlog
pg93@db-172-16-3-33-> pg_controldata |grep checkpoint
Latest checkpoint location: E/EB000028
Prior checkpoint location: E/EA000028
Latest checkpoint's REDO location: E/EB000028
Latest checkpoint's REDO WAL file: 000000030000000E000000EB
Latest checkpoint's TimeLineID: 3
Latest checkpoint's PrevTimeLineID: 3
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/221848464
Latest checkpoint's NextOID: 26421
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 150439034
Latest checkpoint's oldestXID's DB: 12815
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint: Sun 26 May 2013 12:24:37 PM CST
在备份目录中创建pg_xlog目录
pg93@db-172-16-3-33-> mkdir -p /pgdata/digoal/1921/data04/pg93backup/pgdata1999/pg_xlog
修改目录权限
pg93@db-172-16-3-33-> chmod 700 /pgdata/digoal/1921/data04/pg93backup/pgdata1999/pg_xlog
查找需要的pg_xlog文件
pg93@db-172-16-3-33-> cd $PGDATA
pg93@db-172-16-3-33-> ll -rt $PGDATA/pg_xlog/000000030000000E000000E*
-rw------- 1 pg93 pg93 16M May 26 12:24 /pgdata1999/pg_xlog/000000030000000E000000EA
-rw------- 1 pg93 pg93 16M May 26 12:24 /pgdata1999/pg_xlog/000000030000000E000000EB
拷贝需要的pg_xlog文件
pg93@db-172-16-3-33-> cp $PGDATA/pg_xlog/000000030000000E000000EB /pgdata/digoal/1921/data04/pg93backup/pgdata1999/pg_xlog/
5. 检查备份目录, 是否备份正常
pg93@db-172-16-3-33-> ll
total 112K
drwx------ 9 pg93 pg93 4.0K May 25 16:33 base
drwx------ 2 pg93 pg93 4.0K May 26 11:49 global
drwx------ 2 pg93 pg93 4.0K May 24 14:52 pg_clog
-rw------- 1 pg93 pg93 4.6K May 25 17:59 pg_hba.conf
-rw------- 1 pg93 pg93 1.7K May 24 14:30 pg_ident.conf
drwx------ 4 pg93 pg93 4.0K May 5 20:26 pg_multixact
drwx------ 2 pg93 pg93 4.0K May 26 11:45 pg_notify
drwx------ 2 pg93 pg93 4.0K May 5 20:26 pg_serial
drwx------ 2 pg93 pg93 4.0K May 5 20:26 pg_snapshots
drwx------ 2 pg93 pg93 4.0K May 26 11:49 pg_stat
drwx------ 2 pg93 pg93 4.0K May 26 11:49 pg_stat_tmp
drwx------ 2 pg93 pg93 4.0K May 16 15:30 pg_subtrans
drwx------ 3 pg93 pg93 4.0K May 26 11:15 pg_tblspc
drwx------ 2 pg93 pg93 4.0K May 5 20:26 pg_twophase
-rw------- 1 pg93 pg93 4 May 5 20:26 PG_VERSION
drwx------ 2 pg93 pg93 4.0K May 26 12:13 pg_xlog
-rw------- 1 pg93 pg93 20K May 24 09:36 postgresql.conf
-rw------- 1 pg93 pg93 27 May 26 11:45 postmaster.opts
-rw-r--r-- 1 pg93 pg93 4.7K May 8 15:37 recovery.done
-rw-r--r-- 1 pg93 pg93 2.5K May 24 13:46 root.crt
-rw-r--r-- 1 pg93 pg93 1.3K May 24 13:37 server.crt
-r-------- 1 pg93 pg93 1.7K May 24 13:32 server.key
pg93@db-172-16-3-33-> cd pg_tblspc/
pg93@db-172-16-3-33-> ll
total 4.0K
drwx------ 3 pg93 pg93 4.0K May 26 11:15 26417
6. 其他
如果表空间目录非常大, 也可以分开进行备份.
在备份$PGDATA时排除pg_tblspc目录即可,另外再分步进行表空间目录的备份.
四, 还原
还原冷备份前需要注意几个问题 :
还原环境中的数据库软件需与备份时的版本一致(例如9.0的备份不能还原到9.1上),
还原环境中的数据库软件的小版本尽量与原始环境一致(例如备份的版本为9.0.4, 还原的版本也尽量使用9.0.4, 版本升级请参考release说明.)
还原环境中的数据库用到的lib库应该与原始环境一致, 例如原始环境中用到了postgis, 那么还原环境也必须编译同版本的postgis.
还原环境的数据库软件编译项应该与原始环境保持一致, (特别是数据块的大小)可以在原始环境的config.log中找到,也可以使用pg_config得到,或者查看控制文件的内容得到 :
./configure --prefix=/opt/pgsql9.3beta1 --with-pgport=2099 --with-segsize=8 --with-wal-segsize=64 --with-wal-blocksize=64 --with-perl --with-python --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --enable-thread-safety
或者使用pg_config也可以得到
pg93@db-172-16-3-33-> pg_config --configure
'--prefix=/opt/pgsql9.3' '--with-pgport=1999' '--with-perl' '--with-tcl' '--with-python' '--with-openssl' '--with-pam' '--without-ldap' '--with-libxml' '--with-libxslt' '--enable-thread-safety' '--with-wal-blocksize=16' '--enable-debug'
操作系统平台一致(例如linux的备份不能还原到windows上),
硬件架构一致(例如x86的备份不能还原到小型机上)
如果是在本地环境恢复, 那么不需要考虑以上问题, 如果是在其他服务器上恢复, 请按以上要求配置好还原环境后, 再开始恢复.
1. 把数据库目录, 表空间目录, pg_xlog全部删掉.
pg93@db-172-16-3-33-> cd $PGDATA
pg93@db-172-16-3-33-> rm -rf *
pg93@db-172-16-3-33-> cd /pgdata/digoal/1921/data03/pg93/1999/pg_xlog
pg93@db-172-16-3-33-> rm -rf *
pg93@db-172-16-3-33-> cd /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal
pg93@db-172-16-3-33-> rm -rf *
2. 还原备份
pg93@db-172-16-3-33-> cp -r /pgdata/digoal/1921/data04/pg93backup/pgdata1999/* /pgdata1999/
3. 创建日志目录
pg93@db-172-16-3-33-> cd $PGDATA
pg93@db-172-16-3-33-> mkdir pg_log
pg93@db-172-16-3-33-> chmod 700 pg_log
4. 如果pg_xlog, pg_tblspc使用了软链接.
也恢复一下 :
pg93@db-172-16-3-33-> mv /pgdata1999/pg_tblspc/26425/* /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal/
pg93@db-172-16-3-33-> rm -rf /pgdata1999/pg_tblspc/26425
pg93@db-172-16-3-33-> ln -s /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal /pgdata1999/pg_tblspc/26425
如果不是软链可千万别删哦。
5. 启动数据库
pg93@db-172-16-3-33-> pg_ctl start
server starting
pg93@db-172-16-3-33-> LOG: 00000: loaded library "pg_stat_statements"
LOCATION: load_libraries, miscinit.c:1296
6. 验证, 抽查几条数据看看.
pg93@db-172-16-3-33-> psql
psql (9.3devel)
Type "help" for help.
digoal=# \db
List of tablespaces
Name | Owner | Location
------------+----------+-------------------------------------------------
pg_default | postgres |
pg_global | postgres |
tbs_digoal | postgres | /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal
(3 rows)
digoal=# select count(*) from test;
count
-------
10000
(1 row)
digoal=# \d test
Table "public.test"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
Tablespace: "tbs_digoal"