PostgreSQL 两阶段(非库级一致性)逻辑备份 - PostgreSQL non-consistent backup script

2 minute read

背景

pg_dump 备份为一致性备份. 如下 :

src/bin/pg_dump/pg_dump.c

        /*  
         * Start transaction-snapshot mode transaction to dump consistent data.  
         */  
        ExecuteSqlStatement(fout, "BEGIN");  
        if (fout->remoteVersion >= 90100)  
        {  
                if (serializable_deferrable)  
                        ExecuteSqlStatement(fout,  
                                                                "SET TRANSACTION ISOLATION LEVEL "  
                                                                "SERIALIZABLE, READ ONLY, DEFERRABLE");  
                else  
                        ExecuteSqlStatement(fout,  
                                                                "SET TRANSACTION ISOLATION LEVEL "  
                                                                "REPEATABLE READ");  
        }  
        else  
                ExecuteSqlStatement(fout,  
                                                        "SET TRANSACTION ISOLATION LEVEL SERIALIZABLE");  
  
  
# 并且pg_dump是在一个事务中完成的,   
# pg_dump备份时需要依次进行锁表的操作, 锁级别为IN ACCESS SHARE MODE. 这个锁与AccessExclusiveLock冲突.  
# 因此与操作TRUNCATE TABLE, ALTER TABLE, DROP TABLE, VACUUM FULL 以及unqualified LOCK TABLE冲突.  
                /*  
                 * Read-lock target tables to make sure they aren't DROPPED or altered  
                 * in schema before we get around to dumping them.  
                 *  
                 * Note that we don't explicitly lock parents of the target tables; we  
                 * assume our lock on the child is enough to prevent schema  
                 * alterations to parent tables.  
                 *  
                 * NOTE: it'd be kinda nice to lock other relations too, not only  
                 * plain tables, but the backend doesn't presently allow that.  
                 */  
                if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION)  
                {  
                        resetPQExpBuffer(query);  
                        appendPQExpBuffer(query,  
                                                          "LOCK TABLE %s IN ACCESS SHARE MODE",  
                                                          fmtQualifiedId(fout,  
                                                                                tblinfo[i].dobj.namespace->dobj.name,  
                                                                                         tblinfo[i].dobj.name));  
                        ExecuteSqlStatement(fout, query->data);  
                }  

当程序中使用到了这些SQL(如TRUNCATE TABLE, ALTER TABLE, DROP TABLE, VACUUM FULL 以及unqualified LOCK TABLE)时, 如果数据库非常庞大并且在做pg_dump备份时, SQL将会处于等待状态.

等待直到pg_dump结束(锁的释放需要等待顶级事务块结束).

例如 :

1. 系统中有1000个表, 备份第一个表a时会对第一个表a加access share锁. 此时如果其他程序要truncate table a; 需要等待1000个表完全备份完毕才可以进行下去.

2. 当程序先持有a的AccessExclusiveLock, 那么pg_dump也需要等待程序释放AccessExclusiveLock. 同时pg_dump有锁获取超时机制.

        if (lockWaitTimeout && fout->remoteVersion >= 70300)  
        {  
                /*  
                 * Arrange to fail instead of waiting forever for a table lock.  
                 *  
                 * NB: this coding assumes that the only queries issued within the  
                 * following loop are LOCK TABLEs; else the timeout may be undesirably  
                 * applied to other things too.  
                 */  
                resetPQExpBuffer(query);  
                appendPQExpBuffer(query, "SET statement_timeout = ");  
                appendStringLiteralConn(query, lockWaitTimeout, GetConnection(fout));  
                ExecuteSqlStatement(fout, query->data);  
        }  

因此庞大的数据库做pg_dump需要考虑到以上问题.

例如在不需要全库一致性备份的前提下, 缩小一致性备份的范围, 这样可以减少单次pg_dump的时间, 从而减少程序等待锁的时间.

如下脚本 :

#!/bin/bash  
# 环境变量  
PATH=$PATH:$HOME/bin  
export PATH  
export LANG=en_US.utf8  
export PGHOME=/opt/pgsql  
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib  
export DATE=`date +"%Y%m%d%H%M"`  
export PATH=$PGHOME/bin:$PATH:.  
  
# 程序变量  
TODAY=`date +%Y%m%d`  
EMAIL="digoal@126.com"  
BAKBASEDIR="/database/pgbak"  
RESERVE_DAY=4  
  
HOST="10.10.10.10"  
PORT="1921"  
ROLE="postgres"  
  
# 库级不一致备份, 按单表进行.  
for DB in `psql -A -q -t -h $HOST -p $PORT -U $ROLE postgres -c "select datname from pg_database where datname not in ('postgres','template0','template1')"`  
do  
echo -e "------`date +%F\ %T`----Start Backup----IP:$HOST PORT:$PORT DBNAME:$DB TYPE:$BAKTYPE TO:$BAKBASEDIR------"  
  
for TABLE in `psql -A -q -t -h $HOST -p $PORT -U $ROLE $DB -c "select schemaname||'.'||tablename from pg_tables where schemaname !~ '^pg_' and schemaname <>'information_schema'"`  
do  
pg_dump -f ${BAKBASEDIR}/${DB}-${TABLE}-${TODAY}.dmp.ing -F c -t $TABLE --lock-wait-timeout=6000 -E UTF8 -h ${HOST} -p ${PORT} -U ${ROLE} -w ${DB}  
if [ $? -ne 0 ]; then  
echo -e "backup $HOST $PORT $DB $BAKBASEDIR error \n `date +%F%T` \n"|mutt -s "ERROR : PostgreSQL_backup " ${EMAIL}  
echo -e "------`date +%F\ %T`----Error Backup----IP:$HOST PORT:$PORT DBNAME:$DB TABLE:$TABLE TO:$BAKBASEDIR------"  
rm -f ${BAKBASEDIR}/${DB}-${TABLE}-${TODAY}.dmp.ing  
break  
fi  
mv ${BAKBASEDIR}/${DB}-${TABLE}-${TODAY}.dmp.ing ${BAKBASEDIR}/${DB}-${TABLE}-${TODAY}.dmp  
echo -e "------`date +%F\ %T`----Success Backup----IP:$HOST PORT:$PORT DBNAME:$DB TABLE:$TABLE TO:$BAKBASEDIR------"  
done  
  
done  
  
echo -e "find ${BAKBASEDIR}/${DB}_${TABLE}_${TODAY}.dmp* -daystart -mtime +${RESERVE_DAY} -delete"  
find ${BAKBASEDIR}/${DB}_${TABLE}_${TODAY}.dmp* -daystart -mtime +${RESERVE_DAY} -delete  

这只是个简单的例子, 实际使用时需要考虑业务逻辑, 确保业务数据一致性. 例如某一些有关联的表确保放在同一个pg_dump中导出.

多个 -t talbename 参数即可.

同时需要注意单表不要太大, 否则该单表的DDL操作也会加大和pg_dump发生冲突的概率.

对于较大的数据库还是建议使用PITR物理增量备份方式.

参考

1. src/bin/pg_dump/pg_dump.c

Flag Counter

digoal’s 大量PostgreSQL文章入口