parallel blocking waiting by slow BLOCK extend relation , ExclusiveLock on extension of relation

5 minute read

背景

前面一篇文章我简单的介绍了一下如何找到出现锁等待的会话,以及它到底在等那个会话,在等的这个会话现在在执行什么操作,事务开启多久了?

参考:

http://blog.163.com/digoal@126/blog/static/1638770402015410104726197/

锁在数据库中是一个非常需要注意的东西,锁时间越长,影响可能越大。

有一个例子是这样的,某个业务有比较高并发的插入请求,当对象的空间被写满时,数据库会扩展一个数据块供客户使用,但是一次只会扩展一个数据块,而且扩展数据块的操作是extend locktype的ExclusiveLock。

也就是说,同一时间只允许一个扩展块的操作,其他的都会进入锁等待队列。

那么问题来了,如果第一个请求的扩展块的动作变慢了的话,因为并发量大,接下来的插入请求也需要扩展块,而如果其中任何一个出现扩展块堵塞,那么就非常容易引起越来越多的扩展块请求和拥塞。

这个时候该怎么办呢?

注意

PostgreSQL 9.6版本已经解决了这个问题,扩展块时会自适应扩展多个块,而不是一块一块的扩。

https://www.postgresql.org/docs/9.6/static/release-9-6.html

Extend relations multiple blocks at a time when there is contention for the relation’s extension lock (Dilip Kumar)

This improves scalability by decreasing contention.

正文

找到堵塞原因,

1. 从数据库层面找,使用如下:

postgres=# create or replace function f_lock_level(i_mode text) returns int as $$  
declare  
begin  
  case i_mode  
    when 'INVALID' then return 0;  
    when 'AccessShareLock' then return 1;  
    when 'RowShareLock' then return 2;  
    when 'RowExclusiveLock' then return 3;  
    when 'ShareUpdateExclusiveLock' then return 4;  
    when 'ShareLock' then return 5;  
    when 'ShareRowExclusiveLock' then return 6;  
    when 'ExclusiveLock' then return 7;  
    when 'AccessExclusiveLock' then return 8;  
    else return 0;  
  end case;  
end;   
$$ language plpgsql strict;  

修改查询语句,按锁级别排序:

with t_wait as                       
(select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a,transactionid,b.query,b.xact_start,b.query_start,b.usename,b.datname from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted),  
t_run as   
(select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a,transactionid,b.query,b.xact_start,b.query_start,b.usename,b.datname from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted)   
select r.locktype,r.mode r_mode,r.usename r_user,r.datname r_db,r.relation::regclass,r.pid r_pid,r.xact_start r_xact_start,r.query_start r_query_start,now()-r.query_start r_locktime,r.query r_query,  
w.mode w_mode,w.pid w_pid,w.xact_start w_xact_start,w.query_start w_query_start,now()-w.query_start w_locktime,w.query w_query    
from t_wait w,t_run r where  
  r.locktype is not distinct from w.locktype and  
  r.database is not distinct from w.database and  
  r.relation is not distinct from w.relation and  
  r.page is not distinct from w.page and  
  r.tuple is not distinct from w.tuple and  
  r.classid is not distinct from w.classid and  
  r.objid is not distinct from w.objid and  
  r.objsubid is not distinct from w.objsubid and  
  r.transactionid is not distinct from w.transactionid and  
  r.pid <> w.pid  
  order by f_lock_level(w.mode)+f_lock_level(r.mode) desc,r.xact_start;  

如果无法排查,从系统层面或内核层面排查。例如从上面找到的大家都在等的那个PID,跟踪进程的系统调用。

如果来不及排查,迅速杀掉所有的扩展块请求,恢复业务先。

另一方面,对于这种业务类型,使用32KB的数据块可以更好的提高效率。

http://blog.163.com/digoal@126/blog/static/163877040201392641033482

常见的日志如下,如果你的系统频繁出现这样的等待,说明你要提高IO能力,或者将数据块改为32KB的。

"process 33604 still waiting for ExclusiveLock on extension of relation 686064100 of database 35078604 after 1000.289 ms"  
"process 5626 acquired ExclusiveLock on extension of relation 686064091 of database 35078604 after 3247.773 ms"  

下面这个错误,是因为设置了锁超时参数:

ERROR,55P03,"canceling statement due to lock timeout"  

如果你的环境遇到这种问题了,怎么处理比较好呢?

1. 提高IO能力

2. 将数据块改为32KB,必须重新编译PostgreSQL,重新初始化数据库,迁移数据。

3. 监控,遇到高并发的块扩展等待,自动杀掉。

下面是自动杀的方法:

将pg_locks中的mode信息转换为数字,数字越大,锁级别越高。

create or replace function f_lock_level(i_mode text) returns int as $$  
declare  
begin  
  case i_mode  
    when 'INVALID' then return 0;  
    when 'AccessShareLock' then return 1;  
    when 'RowShareLock' then return 2;  
    when 'RowExclusiveLock' then return 3;  
    when 'ShareUpdateExclusiveLock' then return 4;  
    when 'ShareLock' then return 5;  
    when 'ShareRowExclusiveLock' then return 6;  
    when 'ExclusiveLock' then return 7;  
    when 'AccessExclusiveLock' then return 8;  
    else return 0;  
  end case;  
end;   
$$ language plpgsql strict;  

事务持续时间和发生等待的会话个数作为参数,当扩展数据块的锁等待超出时间和会话个数时,杀掉这些会话。

create or replace function f_kill_extend(i_interval interval, i_waiting int8) returns void as $$  
declare  
  v_database oid;  
  v_relation oid;  
  v_pid int;  
  v_record record;  
begin  
if (pg_is_in_recovery()) then  
  return;  
end if;  
  
for v_record in with t_wait as                       
(select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a,transactionid,b.query,b.xact_start,b.query_start,b.usename,b.datname from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted),  
t_run as   
(select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a,transactionid,b.query,b.xact_start,b.query_start,b.usename,b.datname from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted)   
select r.locktype,r.mode r_mode,r.usename r_user,r.datname r_db,r.relation::regclass,r.pid r_pid,r.xact_start r_xact_start,r.query_start r_query_start,now()-r.query_start r_locktime,r.query r_query,  
w.mode w_mode,w.pid w_pid,w.xact_start w_xact_start,w.query_start w_query_start,now()-w.query_start w_locktime,w.query w_query    
from t_wait w,t_run r where  
  r.locktype is not distinct from w.locktype and  
  r.database is not distinct from w.database and  
  r.relation is not distinct from w.relation and  
  r.page is not distinct from w.page and  
  r.tuple is not distinct from w.tuple and  
  r.classid is not distinct from w.classid and  
  r.objid is not distinct from w.objid and  
  r.objsubid is not distinct from w.objsubid and  
  r.transactionid is not distinct from w.transactionid and  
  r.pid <> w.pid  
  order by f_lock_level(w.mode)+f_lock_level(r.mode) desc,r.xact_start   
LOOP  
  raise notice '%', v_record;  
END LOOP;  
  
for v_database,v_relation in select database,relation from pg_locks where   
  locktype='extend' and mode='ExclusiveLock' and not granted and  
  pid in (select pid from pg_stat_activity where now()-xact_start > i_interval)   
  group by 1,2 having count(*) > i_waiting  
loop  
  perform pg_terminate_backend(pid) from pg_locks   
    where locktype='extend' and mode='ExclusiveLock'   
    and database=v_database and relation=v_relation;  
end loop;  
return;  
end;  
$$ language plpgsql strict;  

输出示例:

postgres=# select f_kill_extend(interval '1 s', 10);  
NOTICE:  (extend,ExclusiveLock,digoal,digoal,16399,13698,"2015-05-11 13:14:55.10005+08","2015-05-11 13:14:55.100272+08",00:00:00.309695,"update tbl set crt_time=now() where id between $1 and $2+10000;",ExclusiveLock,13671,"2015-05-11 13:14:55.315437+08","2015-05-11 13:14:55.315592+08",00:00:00.094375,"update tbl set crt_time=now() where id between $1 and $2+10000;")  
NOTICE:  (extend,ExclusiveLock,digoal,digoal,16399,13698,"2015-05-11 13:14:55.10005+08","2015-05-11 13:14:55.100272+08",00:00:00.309695,"update tbl set crt_time=now() where id between $1 and $2+10000;",ExclusiveLock,13681,"2015-05-11 13:14:55.037967+08","2015-05-11 13:14:55.038253+08",00:00:00.371714,"update tbl set crt_time=now() where id between $1 and $2+10000;")  
NOTICE:  (extend,ExclusiveLock,digoal,digoal,16399,13698,"2015-05-11 13:14:55.10005+08","2015-05-11 13:14:55.100272+08",00:00:00.309695,"update tbl set crt_time=now() where id between $1 and $2+10000;",ExclusiveLock,13665,"2015-05-11 13:14:55.129919+08","2015-05-11 13:14:55.130171+08",00:00:00.279796,"update tbl set crt_time=now() where id between $1 and $2+10000;")  
NOTICE:  (extend,ExclusiveLock,digoal,digoal,16399,13698,"2015-05-11 13:14:55.10005+08","2015-05-11 13:14:55.100272+08",00:00:00.309695,"update tbl set crt_time=now() where id between $1 and $2+10000;",ExclusiveLock,13699,"2015-05-11 13:14:55.229596+08","2015-05-11 13:14:55.229881+08",00:00:00.180086,"update tbl set crt_time=now() where id between $1 and $2+10000;")  
NOTICE:  (extend,ExclusiveLock,digoal,digoal,16399,13698,"2015-05-11 13:14:55.10005+08","2015-05-11 13:14:55.100272+08",00:00:00.309695,"update tbl set crt_time=now() where id between $1 and $2+10000;",ExclusiveLock,13677,"2015-05-11 13:14:55.276543+08","2015-05-11 13:14:55.276723+08",00:00:00.133244,"update tbl set crt_time=now() where id between $1 and $2+10000;")  
NOTICE:  (transactionid,ExclusiveLock,digoal,digoal,,13564,"2015-05-11 13:12:48.891089+08","2015-05-11 13:13:02.936059+08",00:01:52.473908,"update tbl set info=now() where id=1;",ShareLock,13614,"2015-05-11 13:13:12.810098+08","2015-05-11 13:13:21.54997+08",00:01:33.859997,"update tbl set info=now() where id<1000;")  
 f_kill_extend   
---------------  
   
(1 row)  

参考

1. http://blog.163.com/digoal@126/blog/static/1638770402015410104726197/

2. src/backend/storage/lmgr/lock.c

3. src/include/storage/lock.h

4. src/backend/storage/smgr/md.c

/*  
 *      mdextend() -- Add a block to the specified relation.  
 *  
 *              The semantics are nearly the same as mdwrite(): write at the  
 *              specified position.  However, this is to be used for the case of  
 *              extending a relation (i.e., blocknum is at or beyond the current  
 *              EOF).  Note that we assume writing a block beyond current EOF  
 *              causes intervening file space to become filled with zeroes.  
 */  
void  
mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,  
                 char *buffer, bool skipFsync)  
{  
        off_t           seekpos;  
        int                     nbytes;  
        MdfdVec    *v;  
  
        /* This assert is too expensive to have on normally ... */  
#ifdef CHECK_WRITE_VS_EXTEND  
        Assert(blocknum >= mdnblocks(reln, forknum));  
#endif  
  
        /*  
         * If a relation manages to grow to 2^32-1 blocks, refuse to extend it any  
         * more --- we mustn't create a block whose number actually is  
         * InvalidBlockNumber.  
         */  
        if (blocknum == InvalidBlockNumber)  
                ereport(ERROR,  
                                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),  
                                 errmsg("cannot extend file \"%s\" beyond %u blocks",  
                                                relpath(reln->smgr_rnode, forknum),  
                                                InvalidBlockNumber)));  
  
        v = _mdfd_getseg(reln, forknum, blocknum, skipFsync, EXTENSION_CREATE);  
  
        seekpos = (off_t) BLCKSZ *(blocknum % ((BlockNumber) RELSEG_SIZE));  
        Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);  
  
        /*  
         * Note: because caller usually obtained blocknum by calling mdnblocks,  
         * which did a seek(SEEK_END), this seek is often redundant and will be  
         * optimized away by fd.c.  It's not redundant, however, if there is a  
         * partial page at the end of the file. In that case we want to try to  
         * overwrite the partial page with a full page.  It's also not redundant  
         * if bufmgr.c had to dump another buffer of the same file to make room  
         * for the new page's buffer.  
         */  
        if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)  
                ereport(ERROR,  
                                (errcode_for_file_access(),  
                                 errmsg("could not seek to block %u in file \"%s\": %m",  
                                                blocknum, FilePathName(v->mdfd_vfd))));  
  
        if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ)) != BLCKSZ)  
        {  
                if (nbytes < 0)  
                        ereport(ERROR,  
                                        (errcode_for_file_access(),  
                                         errmsg("could not extend file \"%s\": %m",  
                                                        FilePathName(v->mdfd_vfd)),  
                                         errhint("Check free disk space.")));  
                /* short write: complain appropriately */  
                ereport(ERROR,  
                                (errcode(ERRCODE_DISK_FULL),  
                                 errmsg("could not extend file \"%s\": wrote only %d of %d bytes at block %u",  
                                                FilePathName(v->mdfd_vfd),  
                                                nbytes, BLCKSZ, blocknum),  
                                 errhint("Check free disk space.")));  
        }  
  
        if (!skipFsync && !SmgrIsTemp(reln))  
                register_dirty_segment(reln, forknum, v);  
  
        Assert(_mdnblocks(reln, forknum, v) <= ((BlockNumber) RELSEG_SIZE));  
}  

Flag Counter

digoal’s 大量PostgreSQL文章入口