PostgreSQL 最佳实践 - 冷备份与还原介绍

6 minute read

背景

PostgreSQL 冷备份, 指在数据库关机状态下对数据库的数据文件进行的备份.

这种备份在生产中并不实用, 一般生产环境都需要保证数据库7*24小时不间断运行.

冷备份需要备份的是数据库集群主目录($PGDATA), 表空间目录, 事务日志(pg_xlog)目录.

如果在参数文件中指定了其他目录或文件, 某些也需要备份下来.

一、在备份前我们先要搞清楚需要备份哪些东西 :

1. 数据库目录 :

pg93@db-172-16-3-33-> cd $PGDATA    

主目录所在位置.

pg93@db-172-16-3-33-> pwd    
/pgdata1999    

主目录中的文件和文件夹

pg93@db-172-16-3-33-> ll    
total 140K    
drwx------ 9 pg93 pg93 4.0K May 25 16:33 base    
drwx------ 2 pg93 pg93 4.0K May 26 11:07 global    
drwxr-xr-x 2 pg93 pg93 4.0K May 26 11:16 pgbak    
drwx------ 2 pg93 pg93 4.0K May 24 14:52 pg_clog    
-rw------- 1 pg93 pg93 4.6K May 25 17:59 pg_hba.conf    
-rw------- 1 pg93 pg93 1.7K May 24 14:30 pg_ident.conf    
drwx------ 2 pg93 pg93  12K May 26 11:06 pg_log    
drwx------ 4 pg93 pg93 4.0K May  5 20:26 pg_multixact    
drwx------ 2 pg93 pg93 4.0K May 26 11:06 pg_notify    
drwx------ 2 pg93 pg93 4.0K May  5 20:26 pg_serial    
drwx------ 2 pg93 pg93 4.0K May  5 20:26 pg_snapshots    
drwx------ 2 pg93 pg93 4.0K May 26 11:06 pg_stat    
drwx------ 2 pg93 pg93 4.0K May 26 11:16 pg_stat_tmp    
drwx------ 2 pg93 pg93  16K May 16 15:30 pg_subtrans    
drwx------ 2 pg93 pg93 4.0K May 26 11:15 pg_tblspc    
drwx------ 2 pg93 pg93 4.0K May  5 20:26 pg_twophase    
-rw------- 1 pg93 pg93    4 May  5 20:26 PG_VERSION    
lrwxrwxrwx 1 pg93 pg93   44 May 26 11:15 pg_xlog -> /pgdata/digoal/1921/data03/pg93/1999/pg_xlog    
-rw------- 1 pg93 pg93  20K May 24 09:36 postgresql.conf    
-rw------- 1 pg93 pg93   27 May 26 11:06 postmaster.opts    
-rw------- 1 pg93 pg93   64 May 26 11:06 postmaster.pid    
-rw-r--r-- 1 root root 4.7K May  8 15:37 recovery.done    
-rw-r--r-- 1 pg93 pg93 2.5K May 24 13:46 root.crt    
-rw-r--r-- 1 pg93 pg93 1.3K May 24 13:37 server.crt    
-r-------- 1 pg93 pg93 1.7K May 24 13:32 server.key    

2. 事务日志目录

本例为

$PGDATA/pg_xlog    

3. 表空间目录

进入表空间目录查看表空间 :

pg93@db-172-16-3-33-> cd pg_tblspc/    
pg93@db-172-16-3-33-> ll    
total 0    
lrwxrwxrwx 1 pg93 pg93 47 May 26 11:15 26417 -> /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal    

4. 配置文件中指定的在主目录以外的文件或目录

回到$PGDATA, 查看postgresql.conf中是否配置了$PGDATA以外的其他文件

pg93@db-172-16-3-33-> grep -E -i "dir|file" postgresql.conf     
#data_directory = 'ConfigDir'           # use data in another directory    
#hba_file = 'ConfigDir/pg_hba.conf'     # host-based authentication file    
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file    
ssl_cert_file = 'server.crt'            # (change requires restart)    
ssl_key_file = 'server.key'             # (change requires restart)    
ssl_ca_file = 'root.crt'                        # (change requires restart)    
#ssl_crl_file = ''                      # (change requires restart)    
#krb_server_keyfile = ''    
#include_dir = 'conf.d'                 # include files ending in '.conf' from    
                                        # directory 'conf.d'    
#include_if_exists = 'exists.conf'      # include file only if it exists    
#include = 'special.conf'               # include file    

以上配置文件中指定的目录或者文件同样需要备份下来.

二、检查哪些是不需要备份的

1. pg_xlog中的文件是不是需要全部备份呢?

在数据库关闭后, 通过控制文件的信息可以知道需要备份哪些文件.

注意是要在数据库关闭后查看哦, 否则是不准确的.

pg93@db-172-16-3-33-> pg_controldata     
pg_control version number:            937    
Catalog version number:               201304271    
Database system identifier:           5874470726249995168    
Database cluster state:               shut down    
pg_control last modified:             Sun 26 May 2013 11:28:29 AM CST    
Latest checkpoint location:           E/E9000028    
Prior checkpoint location:            E/E8000028    
Latest checkpoint's REDO location:    E/E9000028    
Latest checkpoint's REDO WAL file:    000000030000000E000000E9    
Latest checkpoint's TimeLineID:       3    
Latest checkpoint's PrevTimeLineID:   3    
Latest checkpoint's full_page_writes: on    
Latest checkpoint's NextXID:          0/221848459    
Latest checkpoint's NextOID:          26418    
Latest checkpoint's NextMultiXactId:  1    
Latest checkpoint's NextMultiOffset:  0    
Latest checkpoint's oldestXID:        150439034    
Latest checkpoint's oldestXID's DB:   12815    
Latest checkpoint's oldestActiveXID:  0    
Latest checkpoint's oldestMultiXid:   1    
Latest checkpoint's oldestMulti's DB: 1    
Time of latest checkpoint:            Sun 26 May 2013 11:28:29 AM CST    
Fake LSN counter for unlogged rels:   0/1    
Minimum recovery ending location:     0/0    
Min recovery ending loc's timeline:   0    
Backup start location:                0/0    
Backup end location:                  0/0    
End-of-backup record required:        no    
Current wal_level setting:            hot_standby    
Current max_connections setting:      100    
Current max_prepared_xacts setting:   0    
Current max_locks_per_xact setting:   64    
Maximum data alignment:               8    
Database block size:                  8192    
Blocks per segment of large relation: 131072    
WAL block size:                       16384    
Bytes per WAL segment:                16777216    
Maximum length of identifiers:        64    
Maximum columns in an index:          32    
Maximum size of a TOAST chunk:        1996    
Date/time type storage:               64-bit integers    
Float4 argument passing:              by value    
Float8 argument passing:              by value    
Data page checksum version:           0    

我们需要的信息是

Latest checkpoint location:           E/E9000028     
Latest checkpoint's REDO location:    E/E9000028    
Latest checkpoint's TimeLineID:       3    

从这3个信息都可以得到1个pg_xlog文件名, 表示这个数据库最后一次成功的checkpoint完成的xlog位置.

000000030000000E000000E9    

对于PostgreSQL 9.3直接看这个就要可以了.

Latest checkpoint's REDO WAL file:    000000030000000E000000E9    

因此需要备份的pg_xlog内容是从这个文件开始在内以及后面产生的一系列xlog文件.

本例中就只有这个文件, 如下.

pg93@db-172-16-3-33-> cd pg_xlog    
pg93@db-172-16-3-33-> ll -rt    
-rw------- 1 pg93 pg93 16M May 26 11:28 000000030000000E000000E9    
drwx------ 2 pg93 pg93 32K May 26 11:28 archive_status    

2. 不属于数据库系统的文件时不需要的.

例如在$PGDATA中,

drwxr-xr-x 2 pg93 pg93 4.0K May 26 11:16 pgbak    

这个目录不是数据库集群的目录, 所以不需要备份.

另外要提一下, 尽量不要在$PGDATA或者其他数据库目录中放其他文件或目录, 这样会使备份过程变得复杂, 因为还要去过滤这些文件, 不过滤的话又浪费空间.

3. 数据库日志也是不需要备份的, 例如本例中的$PGDATA/pg_log目录.

三、备份

1. 准备好备份需要的空间. 可以是本地的也可以是异地的存储.

查看数据库的大小 :

digoal=# select round(sum(pg_database_size(oid))/1024/1024.0,2)||'MB' from pg_database;    
 ?column?     
----------    
 73.59MB    
(1 row)    

准备好可以放下整个备份的目录

[root@db-172-16-3-33 1999]# df -h    
Filesystem            Size  Used Avail Use% Mounted on    
/dev/cciss/c0d0p1      29G   16G   12G  56% /    
tmpfs                 6.9G     0  6.9G   0% /dev/shm    
/dev/mapper/vgdata01-lv03    
                      135G   25G  104G  20% /pgdata/digoal/1921/data03    
/dev/mapper/vgdata01-lv04    
                      135G   69G   59G  54% /pgdata/digoal/1921/data04    
/dev/mapper/vgdata01-lv05    
                      135G   85G   44G  66% /pgdata/digoal/1921/data05    
/dev/mapper/vgdata01-lv06    
                       98G  189M   93G   1% /mnt    
/mnt/enc_dir           98G  189M   93G   1% /mnt/enc_dir    
  
[root@db-172-16-3-33 1999]# mkdir -p /pgdata/digoal/1921/data04/pg93backup    
[root@db-172-16-3-33 1999]# chown pg93:pg93 /pgdata/digoal/1921/data04/pg93backup    

2. 停库

pg93@db-172-16-3-33-> pg_ctl stop -m fast    
waiting for server to shut down.... done    
server stopped    

3. 备份$PGDATA, 排除pg_xlog, pg_log以及不需要备份的目录pgbak.

pg93@db-172-16-3-33-> rsync -acvz -L --exclude "pg_xlog" --exclude "pgbak" --exclude "pg_log" $PGDATA /pgdata/digoal/1921/data04/pg93backup/    

4. 备份pg_xlog

pg93@db-172-16-3-33-> pg_controldata |grep checkpoint    
Latest checkpoint location:           E/EB000028    
Prior checkpoint location:            E/EA000028    
Latest checkpoint's REDO location:    E/EB000028    
Latest checkpoint's REDO WAL file:    000000030000000E000000EB    
Latest checkpoint's TimeLineID:       3    
Latest checkpoint's PrevTimeLineID:   3    
Latest checkpoint's full_page_writes: on    
Latest checkpoint's NextXID:          0/221848464    
Latest checkpoint's NextOID:          26421    
Latest checkpoint's NextMultiXactId:  1    
Latest checkpoint's NextMultiOffset:  0    
Latest checkpoint's oldestXID:        150439034    
Latest checkpoint's oldestXID's DB:   12815    
Latest checkpoint's oldestActiveXID:  0    
Latest checkpoint's oldestMultiXid:   1    
Latest checkpoint's oldestMulti's DB: 1    
Time of latest checkpoint:            Sun 26 May 2013 12:24:37 PM CST    

在备份目录中创建pg_xlog目录

pg93@db-172-16-3-33-> mkdir -p /pgdata/digoal/1921/data04/pg93backup/pgdata1999/pg_xlog    

修改目录权限

pg93@db-172-16-3-33-> chmod 700 /pgdata/digoal/1921/data04/pg93backup/pgdata1999/pg_xlog    

查找需要的pg_xlog文件

pg93@db-172-16-3-33-> cd $PGDATA    
pg93@db-172-16-3-33-> ll -rt $PGDATA/pg_xlog/000000030000000E000000E*    
-rw------- 1 pg93 pg93 16M May 26 12:24 /pgdata1999/pg_xlog/000000030000000E000000EA    
-rw------- 1 pg93 pg93 16M May 26 12:24 /pgdata1999/pg_xlog/000000030000000E000000EB    

拷贝需要的pg_xlog文件

pg93@db-172-16-3-33-> cp $PGDATA/pg_xlog/000000030000000E000000EB /pgdata/digoal/1921/data04/pg93backup/pgdata1999/pg_xlog/    

5. 检查备份目录, 是否备份正常

pg93@db-172-16-3-33-> ll    
total 112K    
drwx------ 9 pg93 pg93 4.0K May 25 16:33 base    
drwx------ 2 pg93 pg93 4.0K May 26 11:49 global    
drwx------ 2 pg93 pg93 4.0K May 24 14:52 pg_clog    
-rw------- 1 pg93 pg93 4.6K May 25 17:59 pg_hba.conf    
-rw------- 1 pg93 pg93 1.7K May 24 14:30 pg_ident.conf    
drwx------ 4 pg93 pg93 4.0K May  5 20:26 pg_multixact    
drwx------ 2 pg93 pg93 4.0K May 26 11:45 pg_notify    
drwx------ 2 pg93 pg93 4.0K May  5 20:26 pg_serial    
drwx------ 2 pg93 pg93 4.0K May  5 20:26 pg_snapshots    
drwx------ 2 pg93 pg93 4.0K May 26 11:49 pg_stat    
drwx------ 2 pg93 pg93 4.0K May 26 11:49 pg_stat_tmp    
drwx------ 2 pg93 pg93 4.0K May 16 15:30 pg_subtrans    
drwx------ 3 pg93 pg93 4.0K May 26 11:15 pg_tblspc    
drwx------ 2 pg93 pg93 4.0K May  5 20:26 pg_twophase    
-rw------- 1 pg93 pg93    4 May  5 20:26 PG_VERSION    
drwx------ 2 pg93 pg93 4.0K May 26 12:13 pg_xlog    
-rw------- 1 pg93 pg93  20K May 24 09:36 postgresql.conf    
-rw------- 1 pg93 pg93   27 May 26 11:45 postmaster.opts    
-rw-r--r-- 1 pg93 pg93 4.7K May  8 15:37 recovery.done    
-rw-r--r-- 1 pg93 pg93 2.5K May 24 13:46 root.crt    
-rw-r--r-- 1 pg93 pg93 1.3K May 24 13:37 server.crt    
-r-------- 1 pg93 pg93 1.7K May 24 13:32 server.key    
pg93@db-172-16-3-33-> cd pg_tblspc/    
pg93@db-172-16-3-33-> ll    
total 4.0K    
drwx------ 3 pg93 pg93 4.0K May 26 11:15 26417    

6. 其他
如果表空间目录非常大, 也可以分开进行备份.

在备份$PGDATA时排除pg_tblspc目录即可,另外再分步进行表空间目录的备份.

四, 还原

还原冷备份前需要注意几个问题 :

还原环境中的数据库软件需与备份时的版本一致(例如9.0的备份不能还原到9.1上),

还原环境中的数据库软件的小版本尽量与原始环境一致(例如备份的版本为9.0.4, 还原的版本也尽量使用9.0.4, 版本升级请参考release说明.)

还原环境中的数据库用到的lib库应该与原始环境一致, 例如原始环境中用到了postgis, 那么还原环境也必须编译同版本的postgis.

还原环境的数据库软件编译项应该与原始环境保持一致, (特别是数据块的大小)可以在原始环境的config.log中找到,也可以使用pg_config得到,或者查看控制文件的内容得到 :

./configure --prefix=/opt/pgsql9.3beta1 --with-pgport=2099 --with-segsize=8 --with-wal-segsize=64 --with-wal-blocksize=64 --with-perl --with-python --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --enable-thread-safety    

或者使用pg_config也可以得到

pg93@db-172-16-3-33-> pg_config --configure    
'--prefix=/opt/pgsql9.3' '--with-pgport=1999' '--with-perl' '--with-tcl' '--with-python' '--with-openssl' '--with-pam' '--without-ldap' '--with-libxml' '--with-libxslt' '--enable-thread-safety' '--with-wal-blocksize=16' '--enable-debug'    

操作系统平台一致(例如linux的备份不能还原到windows上),

硬件架构一致(例如x86的备份不能还原到小型机上)

如果是在本地环境恢复, 那么不需要考虑以上问题, 如果是在其他服务器上恢复, 请按以上要求配置好还原环境后, 再开始恢复.

1. 把数据库目录, 表空间目录, pg_xlog全部删掉.

pg93@db-172-16-3-33-> cd $PGDATA    
pg93@db-172-16-3-33-> rm -rf *    
pg93@db-172-16-3-33-> cd /pgdata/digoal/1921/data03/pg93/1999/pg_xlog    
pg93@db-172-16-3-33-> rm -rf *    
pg93@db-172-16-3-33-> cd /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal    
pg93@db-172-16-3-33-> rm -rf *    

2. 还原备份

pg93@db-172-16-3-33-> cp -r /pgdata/digoal/1921/data04/pg93backup/pgdata1999/* /pgdata1999/    

3. 创建日志目录

pg93@db-172-16-3-33-> cd $PGDATA    
pg93@db-172-16-3-33-> mkdir pg_log    
pg93@db-172-16-3-33-> chmod 700 pg_log    

4. 如果pg_xlog, pg_tblspc使用了软链接.

也恢复一下 :

pg93@db-172-16-3-33-> mv /pgdata1999/pg_tblspc/26425/* /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal/    
pg93@db-172-16-3-33-> rm -rf /pgdata1999/pg_tblspc/26425    
pg93@db-172-16-3-33-> ln -s /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal /pgdata1999/pg_tblspc/26425    

如果不是软链可千万别删哦。

5. 启动数据库

pg93@db-172-16-3-33-> pg_ctl start    
server starting    
pg93@db-172-16-3-33-> LOG:  00000: loaded library "pg_stat_statements"    
LOCATION:  load_libraries, miscinit.c:1296    

6. 验证, 抽查几条数据看看.

pg93@db-172-16-3-33-> psql    
psql (9.3devel)    
Type "help" for help.    
  
digoal=# \db    
                           List of tablespaces    
    Name    |  Owner   |                    Location                         
------------+----------+-------------------------------------------------    
 pg_default | postgres |     
 pg_global  | postgres |     
 tbs_digoal | postgres | /pgdata/digoal/1921/data03/pg93/1999/tbs_digoal    
(3 rows)    
digoal=# select count(*) from test;    
 count     
-------    
 10000    
(1 row)    
  
digoal=# \d test    
     Table "public.test"    
 Column |  Type   | Modifiers     
--------+---------+-----------    
 id     | integer |     
Tablespace: "tbs_digoal"    

Flag Counter

digoal’s 大量PostgreSQL文章入口