PostgreSQL 逻辑备份与业务DDL的锁冲突案例 - PostgreSQL logical backup attention on lock and long time dump

2 minute read

背景

PostgreSQL逻辑备份, 指通过pg_dump或者直接调用COPY进行的备份方法.

一般来说, PostgreSQL 使用pg_dump备份数据库时, 会

  1. 开启一个repeatable read事务,

  2. 然后需要从系统表以及传入的参数, 生成备份集,

  3. 备份开始前, 需要一次性把需要备份的所有对象都加上access share lock, 这个锁与DML不冲突, 只和DDL冲突, 例如TRUNCATE表会和pg_dump发生冲突(当需要备份的表被执行DDL时, 只要pg_dump未结束, DDL就会处于等待状态).

而DDL又会堵塞后面的对应对象的任何请求(包括查询),所以建议业务执行DDL时加锁超时机制(语句超时或者锁超时都行)。

锁或语句超时配置如下,可以在会话中设置

postgres=# show lock_timeout ;  
 lock_timeout   
--------------  
 0  
(1 row)  
  
postgres=# show statement_timeout ;  
 statement_timeout   
-------------------  
 0  
(1 row)  

对于并行备份,还有一些小问题,

1. 因为worker并不知道它要备份哪个对象,是pg_dump分给它的(至少目前如此),并且worker是另外起的会话(通过共享事务快照达到一致视角),所以worker加access share lock锁是在拿到pg_dump分配的任务后才加的

2. worker加锁使用了nowait的机制,如果加锁失败,直接报错,导致整个备份失败。

/*
 * Acquire lock on a table to be dumped by a worker process.
 *
 * The master process is already holding an ACCESS SHARE lock.  Ordinarily
 * it's no problem for a worker to get one too, but if anything else besides
 * pg_dump is running, there's a possible deadlock:
 *
 * 1) Master dumps the schema and locks all tables in ACCESS SHARE mode.
 * 2) Another process requests an ACCESS EXCLUSIVE lock (which is not granted
 *        because the master holds a conflicting ACCESS SHARE lock).
 * 3) A worker process also requests an ACCESS SHARE lock to read the table.
 *        The worker is enqueued behind the ACCESS EXCLUSIVE lock request.
 * 4) Now we have a deadlock, since the master is effectively waiting for
 *        the worker.  The server cannot detect that, however.
 *
 * To prevent an infinite wait, prior to touching a table in a worker, request
 * a lock in ACCESS SHARE mode but with NOWAIT.  If we don't get the lock,
 * then we know that somebody else has requested an ACCESS EXCLUSIVE lock and
 * so we have a deadlock.  We must fail the backup in that case.
 */
static void
lockTableForWorker(ArchiveHandle *AH, TocEntry *te)
{
        const char *qualId;
        PQExpBuffer query;
        PGresult   *res;

        /* Nothing to do for BLOBS */
        if (strcmp(te->desc, "BLOBS") == 0)
                return;

        query = createPQExpBuffer();

        qualId = fmtQualifiedId(AH->public.remoteVersion, te->namespace, te->tag);

        appendPQExpBuffer(query, "LOCK TABLE %s IN ACCESS SHARE MODE NOWAIT",
                                          qualId);

        res = PQexec(AH->connection, query->data);

        if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
                exit_horribly(modulename,
                                          "could not obtain lock on relation \"%s\"\n"
                "This usually means that someone requested an ACCESS EXCLUSIVE lock "
                          "on the table after the pg_dump parent process had gotten the "
                                          "initial ACCESS SHARE lock on the table.\n", qualId);

        PQclear(res);
        destroyPQExpBuffer(query);
}

3. 如果用户在并行备份开启后,worker备份对应对象前,执行DDL,并且被access share lock堵塞后,如果DDL没有加锁超时,那么worker备份这个对象时,加载access share lock nowait会失败,导致整个备份失败。

并行备份代码改进点如下

1. pg_dump备份开始时,对所有要备份的对象加access share lock,(pg_dump增加锁超时参数)

2. worker备份数据前,与pg_dump进程一样,对所有对象加锁access share lock,不要等备份对应对象前加。

3. worker加access share lock不要使用nowait选项,使用lock_timeout来控制,这样就不会一旦加不上马上就报错。减少备份失败率。

所以如果需要备份的数据集很大的话, 备份程序pg_dump持有access share lock的时间就越长, 这段时间如果程序需要对备份的表做出DDL操作(如drop, truncate), DDL SQL就要进入等待.

同样的道理, 如果系统正在跑比较长的事务或函数, 并且事务或函数中使用了表的DDL, 例如truncate. 那么备份这些表也将被堵塞知道长事务或函数结束.

这样的情况其实很容易发生, 例如凌晨是比较空闲的时间, 备份和统计作业可能都会放到凌晨来处理.

假设1点开始备份, 备份到5点结束. 备份作业包含了以下truncate操作的表.

2点的时候开始一个统计作业, 统计函数里面如果用到truncate操作的话, 这个操作可能要等到5点才能获得锁而继续下去.

这种场景的备份需要排除需要truncate的表, 那么就不会发生锁冲突.

pg_dump有一个参数来排除不需要备份的表.

  -T, --exclude-table=TABLE   do NOT dump the named table(s)  

另外还有一个参数, 指定锁等待时间, 当pg_dump不能在这个时间内获得access share lock的话, 就报错.

       --lock-wait-timeout=timeout  
              Do  not  wait forever to acquire shared table locks at the beginning of the dump. Instead fail if unable  
              to lock a table within the specified timeout. The timeout  may  be  specified  in  any  of  the  formats  
              accepted  by SET statement_timeout. (Allowed values vary depending on the server version you are dumping  
              from, but an integer number of milliseconds is accepted by  all  versions  since  7.3.  This  option  is  
              ignored when dumping from a pre-7.3 server.)  

参考

1. src/bin/pg_dump

2. man pg_dump

Flag Counter

digoal’s 大量PostgreSQL文章入口