PostgreSQL源码分析备库查询冲突 - User was holding shared buffer pin for too long

3 minute read

背景

PostgreSQL 的基于流复制的物理备库是基于redo的物理块复制备库，允许开放只读的功能，但是需要注意，由于主库可能不断的产生redo，这些redo可能会与备库的QUERY产生冲突。

什么情况下query会堵塞、或与恢复冲突？

当以下操作产生的REDO被复制到备库，并且备库准备拿这些REDO来恢复时。

Access Exclusive locks taken on the primary server, including both explicit LOCK commands and various DDL actions, conflict with table accesses in standby queries.

主库的访问排它锁，与备库对应的锁产生冲突。

例如主库truncate a表, 备库查询a表。

这种情况的冲突面很窄。
Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.

主库删除表空间，备库使用这个表空间产生临时文件。例如主库删除TBS，备库的一个大的查询需要写临时文件，并且这个临时文件是写到这个表空间的。

这种情况非常少见，也很容易规避，新建一个临时表空间不要删除即可。
Dropping a database on the primary conflicts with sessions connected to that database on the standby.

主库删除数据库，备库刚好连在这个数据库上。

这种情况也非常的少见。
Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still “see” any of the rows to be removed.

主库回收dead tuple的REDO，同事备库当前的query snapshot需要看到这些记录。

这种情况可以通过参数控制，恢复优先，或查询优先。可以配置时间窗口。

而且这种冲突出现的概率也非常的小，除非用户在备库使用repeatable read，同时是非常大的事务。

而通常用户用的都是read committed.
Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.

同上，但是当query访问的页就是要清理垃圾的页时，也是有冲突的。

这是物理复制与逻辑复制唯一有差别的地方，但是对现实场景来说，这种情况出现的概率也不大。

案例

最近收到了一个冲突的例子，是这样的，在备库查询一些函数，导致了这样的报错。

< 2016-08-09 22:03:39.534 CST >STATEMENT:  SELECTxxx('32980770','-1','0','0',20,100,10)  
< 2016-08-09 22:03:39.534 CST >ERROR:  canceling statement due to conflict with recovery  
< 2016-08-09 22:03:39.534 CST >DETAIL:  User was holding shared buffer pin for too long.  
< 2016-08-09 22:03:39.534 CST >CONTEXT:  PL/pgSQL function xxx(integer,bigint[],integer,integer) line 7 at RETURN QUERY  
  PL/pgSQL function xxx(integer,integer,bigint,bigint,integer,integer,integer) line 15 at RETURN QUERY  

是什么原因造成的呢？

PostgreSQL 备库apply与query之间可能存在的冲突种类如下

src/include/storage/procsignal.h

        /* Recovery conflict reasons */  
        PROCSIG_RECOVERY_CONFLICT_DATABASE,  
        PROCSIG_RECOVERY_CONFLICT_TABLESPACE,  
        PROCSIG_RECOVERY_CONFLICT_LOCK,  
        PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,  
        PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,  
        PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,  

例子的这个报错的冲突类型属于 PROCSIG_RECOVERY_CONFLICT_BUFFERPIN

它来自这个个函数

/*  
 * errdetail_recovery_conflict  
 *  
 * Add an errdetail() line showing conflict source.  
 */  
static int  
errdetail_recovery_conflict(void)  
{  
        switch (RecoveryConflictReason)  
        {  
                case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:  
                        errdetail("User was holding shared buffer pin for too long.");  
                        break;  

备库apply redo时，产生的冲突，调用如下

screenshot

源码

备库在apply redo时，如果遇到vacuum操作，则会触发以下。

而vacuum必须要拿到block的exclusive lock才能继续下去。

void  
LockBufferForCleanup(Buffer buffer)  
{  
...  
	for (;;)  
	{  
		/* Try to acquire lock */  
		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);  
		LockBufHdr(bufHdr);  
...  
		/* Wait to be signaled by UnpinBuffer() */  
		if (InHotStandby)  
		{  
			/* Publish the bufid that Startup process waits on */  
			SetStartupBufferPinWaitBufId(buffer - 1);  
			/* Set alarm and then wait to be signaled by UnpinBuffer() */  
			ResolveRecoveryConflictWithBufferPin();  
			/* Reset the published bufid */  
			SetStartupBufferPinWaitBufId(-1);  
		}  

如果是standby，则判断是否超时，如果没有超时，vacuum redo会继续等待。

void  
ResolveRecoveryConflictWithBufferPin(void)  
{  
        TimestampTz ltime;  
  
        Assert(InHotStandby);  
  
        ltime = GetStandbyLimitTime();  
  
        if (ltime == 0)  
        {  
                /*  
                 * We're willing to wait forever for conflicts, so set timeout for  
                 * deadlock check only  
                 */  
                enable_timeout_after(STANDBY_DEADLOCK_TIMEOUT, DeadlockTimeout);  
        }  
        else if (GetCurrentTimestamp() >= ltime)  
        {  
                /*  
                 * We're already behind, so clear a path as quickly as possible.  
                 */  
                SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);  
        }  

冲突超时时间获取，超时则触发 SendRecoveryConflictWithBufferPin ，导致这个CASE的错误。

/*  
 * Determine the cutoff time at which we want to start canceling conflicting  
 * transactions.  Returns zero (a time safely in the past) if we are willing  
 * to wait forever.  
 */  
static TimestampTz  
GetStandbyLimitTime(void)  
{  
        TimestampTz rtime;  
        bool            fromStream;  
  
        /*  
         * The cutoff time is the last WAL data receipt time plus the appropriate  
         * delay variable.  Delay of -1 means wait forever.  
         */  
        GetXLogReceiptTime(&rtime, &fromStream);  
        if (fromStream)  
        {  
                if (max_standby_streaming_delay < 0)  
                        return 0;                       /* wait forever */  
                return TimestampTzPlusMilliseconds(rtime, max_standby_streaming_delay);  
        }  
        else  
        {  
                if (max_standby_archive_delay < 0)  
                        return 0;                       /* wait forever */  
                return TimestampTzPlusMilliseconds(rtime, max_standby_archive_delay);  
        }  
}  

如何避免或降低冲突

PostgreSQL提供了3种解决备库上查询与恢复冲突的办法

在主库配置vacuum_defer_cleanup_age来解决以上最后两种冲突。

vacuum_defer_cleanup_age = 0   # number of xacts by which cleanup is delayed

在备库配置recovery延迟来解决以上所有冲突，给备库的QUERY设置一个执行窗口

max_standby_archive_delay = 30s        # max delay before canceling queries  
# when reading WAL from archive;  
# -1 allows indefinite delay  
#   
max_standby_streaming_delay = 30s      # max delay before canceling queries  
# when reading streaming WAL;  
# -1 allows indefinite delay  

在备库配置hot_standby_feedback，备库会将备库的xmin反馈给上游

从而主库知道备库还需要哪些记录，在cleanup dead tuple时，会考虑备库的情况，防止冲突。

feedback 协议如下

Hot Standby feedback message (F)
Byte1('h')
Identifies the message as a Hot Standby feedback message.

Int64
The client's system clock at the time of transmission, as microseconds since midnight on 2000-01-01.

Int32
The standby's current xmin. This may be 0, if the standby is sending notification that Hot Standby feedback will no longer be sent on this connection. Later non-zero messages may reinitiate the feedback mechanism.

Int32
The standby's current epoch.

参数如下

hot_standby_feedback = off             # send info from standby to prevent query conflicts  
#   
wal_retrieve_retry_interval = 1s   

通过pg_stat_database.conflicts可以了解冲突次数
通过pg_stat_database_conflicts 可以了解每种冲突类型的次数
源码

/* --------
 * pgstat_report_recovery_conflict() -
 *
 *      Tell the collector about a Hot Standby recovery conflict.
 * --------
 */
void
pgstat_report_recovery_conflict(int reason)
{
        PgStat_MsgRecoveryConflict msg;

        if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
                return;

        pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RECOVERYCONFLICT);
        msg.m_databaseid = MyDatabaseId;
        msg.m_reason = reason;
        pgstat_send(&msg, sizeof(msg));
}

祝大家玩得开心，欢迎随时来 阿里云促膝长谈业务需求，恭候光临。

阿里云的小伙伴们加油，努力 做好内核与服务，打造最贴地气的云数据库 。

digoal’s 大量PostgreSQL文章入口

Twitter Facebook Google+ LinkedIn

Digoal.zhou

PostgreSQL源码分析备库查询冲突 - User was holding shared buffer pin for too long

背景

什么情况下query会堵塞、或与恢复冲突？

案例

如何避免或降低冲突

digoal’s 大量PostgreSQL文章入口

You May Also Enjoy

PostgreSQL(PPAS 兼容Oracle) 从零开始入门手册 - 珍藏版

PostgreSQL pipelinedb 流计算插件 - IoT应用 - 实时轨迹聚合

PostgreSQL plpgsql 存储过程、函数 - 状态、异常变量打印、异常捕获… - GET [STACKED] DIAGNOSTICS

PostgreSQL datediff 日期间隔（单位转换）兼容SQL用法