Systemtap EXP: PostgreSQL IN-BUILD mark Class 6 - lock

16 minute read

背景

本文要讲的是PostgreSQL内建的锁类别的探针,以及使用stap跟踪的方法.

正文

PostgreSQL锁类别探针分为轻量锁, 重量锁, 以及死锁的探针.

轻量锁探针 :

        probe lwlock__acquire(LWLockId, LWLockMode); 当得到轻量锁时触发, 变量含义参见本文参考部分6,7,8.   
        probe lwlock__release(LWLockId); 当释放轻量锁时触发.  
        probe lwlock__wait__start(LWLockId, LWLockMode); 当等待轻量锁开始时触发.  
        probe lwlock__wait__done(LWLockId, LWLockMode); 当等待轻量锁结束时触发.  
        probe lwlock__condacquire(LWLockId, LWLockMode); 当得到轻量锁时触发, 注意本探针放在nowait轻量锁请求函数中, 所以不会有等待的过程, 只有获得成功或者失败. 请区别于lwlock__acquire.  
        probe lwlock__condacquire__fail(LWLockId, LWLockMode); 当得到轻量锁失败时触发.  
        probe lwlock__wait__until__free(LWLockId, LWLockMode); 在LWLockAcquireOrWait 函数中, 这个函数用法比较特殊, 当获取锁成功时返回true; 当不能立刻获取到锁时, 等待锁释放, 返回false, 但是不获取锁. 该函数目前仅被WALWriteLock使用.  
        probe lwlock__wait__until__free__fail(LWLockId, LWLockMode); 函数同上, 失败时触发.  

重量锁探针 :

	probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);   
        当重量锁请求等待开始时触发, 前5个变量对应LOCKTAG的前5个field, LOCKMODE可参考本文末尾8.   
          
	probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);  
        等待结束时触发.  

死锁探针 :

        probe deadlock__found();  

探针的详细介绍 :

探针 参数 描述
lwlock-acquire (LWLockId, LWLockMode) Probe that fires when an LWLock has been acquired. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared.
lwlock-release (LWLockId) Probe that fires when an LWLock has been released (but note that any released waiters have not yet been awakened). arg0 is the LWLock’s ID.
lwlock-wait-start (LWLockId, LWLockMode) Probe that fires when an LWLock was not immediately available and a server process has begun to wait for the lock to become available. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared.
lwlock-wait-done (LWLockId, LWLockMode) Probe that fires when a server process has been released from its wait for an LWLock (it does not actually have the lock yet). arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared.
lwlock-condacquire (LWLockId, LWLockMode) Probe that fires when an LWLock was successfully acquired when the caller specified no waiting. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared.
lwlock-condacquire-fail (LWLockId, LWLockMode) Probe that fires when an LWLock was not successfully acquired when the caller specified no waiting. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared.
lock-wait-start (unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE) Probe that fires when a request for a heavyweight lock (lmgr lock) has begun to wait because the lock is not available. arg0 through arg3 are the tag fields identifying the object being locked. arg4 indicates the type of object being locked. arg5 indicates the lock type being requested.
lock-wait-done (unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE) Probe that fires when a request for a heavyweight lock (lmgr lock) has finished waiting (i.e., has acquired the lock). The arguments are the same as for lock-wait-start.
deadlock-found () Probe that fires when a deadlock is found by the deadlock detector.

举例

1. 跟踪轻量锁等待次数.

stap -e '  
global var1  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("lwlock__wait__done") {  
  var1[$arg1, $arg2]++  
}  
probe timer.s($1) {  
  println("*******************")  
  foreach(v=[x,y] in var1+)   
    printf("lockid:%d, lockmode:%d, wait_count:%d\n", x,y,v)  
  delete var1  
}' 5  

SQL :

digoal=# \sf f_test(int)  
CREATE OR REPLACE FUNCTION public.f_test(i_id integer)  
 RETURNS void  
 LANGUAGE plpgsql  
 STRICT  
AS $function$  
declare  
begin  
  update test set info=md5(random()::text), crt_time=clock_timestamp() where id=i_id;  
  if not found then  
    insert into test(id,info,crt_time) values(i_id,md5(random()::text),clock_timestamp());  
  end if;  
  return;  
  exception when others then  
    return;  
end;  
$function$  
digoal=# \d test  
                Table "public.test"  
  Column  |            Type             | Modifiers   
----------+-----------------------------+-----------  
 id       | integer                     | not null  
 info     | text                        |   
 crt_time | timestamp without time zone |   
Indexes:  
    "test_pkey" PRIMARY KEY, btree (id)  

stap输出 :

*******************  
lockid:30604, lockmode:1, wait_count:1  
lockid:6150, lockmode:0, wait_count:1  
lockid:137242, lockmode:1, wait_count:1  
lockid:33180, lockmode:1, wait_count:1  
lockid:37820, lockmode:1, wait_count:1  
lockid:55818, lockmode:1, wait_count:1  
lockid:152122, lockmode:1, wait_count:1  
lockid:3, lockmode:1, wait_count:1  
lockid:18220, lockmode:0, wait_count:1  
lockid:34, lockmode:1, wait_count:1  
lockid:38, lockmode:0, wait_count:1  
lockid:63456, lockmode:1, wait_count:1  
lockid:23704, lockmode:1, wait_count:1  
lockid:52826, lockmode:1, wait_count:1  
lockid:49032, lockmode:1, wait_count:1  
lockid:45, lockmode:1, wait_count:1  
lockid:45348, lockmode:1, wait_count:1  
lockid:2364, lockmode:1, wait_count:1  
lockid:40, lockmode:0, wait_count:1  
lockid:45918, lockmode:1, wait_count:1  
lockid:150650, lockmode:1, wait_count:1  
lockid:56324, lockmode:1, wait_count:1  
lockid:32554, lockmode:1, wait_count:1  
lockid:38, lockmode:1, wait_count:1  
lockid:20636, lockmode:1, wait_count:1  
lockid:6534, lockmode:1, wait_count:1  
lockid:39126, lockmode:1, wait_count:1  
lockid:42, lockmode:0, wait_count:1  
lockid:1640, lockmode:1, wait_count:1  
lockid:46, lockmode:0, wait_count:1  
lockid:39, lockmode:1, wait_count:1  
lockid:53778, lockmode:1, wait_count:1  
lockid:35, lockmode:1, wait_count:1  
lockid:33450, lockmode:1, wait_count:1  
lockid:33, lockmode:1, wait_count:2  
lockid:46, lockmode:1, wait_count:2  
lockid:48, lockmode:0, wait_count:2  
lockid:43, lockmode:0, wait_count:2  
lockid:96, lockmode:1, wait_count:2  
lockid:44, lockmode:1, wait_count:2  
lockid:37, lockmode:1, wait_count:2  
lockid:40, lockmode:1, wait_count:2  
lockid:13932, lockmode:1, wait_count:2  
lockid:48, lockmode:1, wait_count:4  
lockid:8, lockmode:0, wait_count:7  
lockid:13, lockmode:0, wait_count:88  
lockid:60, lockmode:0, wait_count:224  
lockid:54, lockmode:0, wait_count:235  
lockid:58, lockmode:0, wait_count:237  
lockid:49, lockmode:0, wait_count:240  
lockid:56, lockmode:0, wait_count:243  
lockid:57, lockmode:0, wait_count:249  
lockid:64, lockmode:0, wait_count:251  
lockid:63, lockmode:0, wait_count:261  
lockid:55, lockmode:0, wait_count:262  
lockid:59, lockmode:0, wait_count:263  
lockid:53, lockmode:0, wait_count:273  
lockid:52, lockmode:0, wait_count:275  
lockid:51, lockmode:0, wait_count:275  
lockid:62, lockmode:0, wait_count:276  
lockid:61, lockmode:0, wait_count:281  
lockid:50, lockmode:0, wait_count:287  
lockid:12, lockmode:0, wait_count:1514  
lockid:11, lockmode:1, wait_count:3385  
lockid:11, lockmode:0, wait_count:4103  
lockid:3, lockmode:0, wait_count:6376  
lockid:4, lockmode:1, wait_count:6980  
lockid:4, lockmode:0, wait_count:19500  
lockid:7, lockmode:0, wait_count:31472  
... 略  

2. 跟踪重量锁 :

stap :

[root@db-172-16-3-150 postgresql-9.3.1]# stap -v -D MAXSKIPPED=10000000 -e '  
global var1%[120000], var2%[120000]  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("lock__wait__start") {  
  var1[pid()] = gettimeofday_us()  
}  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("lock__wait__done") {  
  p=pid()  
  t=gettimeofday_us()  
  if (p in var1)  
    var2[$arg1, $arg2, $arg3, $arg4, $arg5, $arg6] <<< (t - var1[p])  
}  
probe timer.s($1) {  
  println("*******************")  
  foreach([a,b,c,d,e,f] in var2 @sum - limit 5)   
    printdln("**",a,b,c,d,e,f,@sum(var2[a,b,c,d,e,f])/1000,@count(var2[a,b,c,d,e,f]),@avg(var2[a,b,c,d,e,f])/1000)  
  delete var2  
}' 5  

测试SQL :

pg93@db-172-16-3-150-> cat test.sql  
\setrandom id 1 8  
select f_test(:id);  
pg93@db-172-16-3-150-> pgbench -M prepared -n -r -f ./test.sql -c 64 -j 8 -T 1000  

64个链接, 将id现在在8个以内, 会产生大量的等待.

digoal=# \sf f_test(int)  
CREATE OR REPLACE FUNCTION public.f_test(i_id integer)  
 RETURNS void  
 LANGUAGE plpgsql  
 STRICT  
AS $function$  
declare  
begin  
  update test set info=md5(random()::text), crt_time=clock_timestamp() where id=i_id;  
  if not found then  
    insert into test(id,info,crt_time) values(i_id,md5(random()::text),clock_timestamp());  
  end if;  
  return;  
  exception when others then  
    return;  
end;  
$function$  

stap输出 :

最后面3列为一共等待的时间, 一共等待的次数, 以及平均等待时间(ms).

3表示LOCKTAG_TUPLE, 参考本文末尾的定义

7表示ExclusiveLock, 参考本文末尾的定义

163842473577940**116表示 dboid, reloid, blocknum, offnum

digoal=# select oid from pg_database where datname='digoal';  
  oid    
-------  
 16384  
(1 row)  
digoal=# select oid from pg_class where relname='test';  
  oid    
-------  
 24735  
(1 row)  
digoal=# select max(ctid) from test;  
    max       
------------  
 (80505,78)  
(1 row)  
*******************  
16384**24735**77940**116**3**7**304**38**8  
16384**24735**77921**83**3**7**295**32**9  
16384**24735**77979**32**3**7**271**41**6  
16384**24735**77906**29**3**7**235**37**6  
16384**24735**77921**138**3**7**233**44**5  
*******************  
129233058**0**0**0**4**5**1355**9**150  
129233135**0**0**0**4**5**1197**8**149  
129233044**0**0**0**4**5**749**6**124  
129233289**0**0**0**4**5**747**5**149  
16384**24735**78068**67**3**7**317**14**22  
*******************  
129363231**0**0**0**4**5**921**19**48  
16384**24735**78091**103**3**7**450**47**9  
16384**24735**78199**108**3**7**300**25**12  
16384**24735**78130**37**3**7**264**5**52  
16384**24735**78131**66**3**7**261**56**4  
*******************  
16384**24735**78277**36**3**7**350**38**9  
16384**24735**78290**20**3**7**301**35**8  
16384**24735**78206**114**3**7**268**43**6  
16384**24735**78277**93**3**7**254**30**8  
16384**24735**78290**79**3**7**253**31**8  
*******************  
16384**24735**78613**25**3**7**220**6**36  
16384**24735**78781**19**3**7**206**21**9  
16384**24735**78621**32**3**7**184**22**8  
16384**24735**78613**21**3**7**165**3**55  
16384**24735**78702**78**3**7**159**17**9  

参考

1. http://www.postgresql.org/docs/9.3/static/dynamic-trace.html

2.

src/backend/storage/lmgr/lwlock.c

src/backend/storage/lmgr/lock.c

src/backend/storage/lmgr/deadlock.c

3. 探针信息 :

/* TRACE_POSTGRESQL_LWLOCK_ACQUIRE ( int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_ACQUIRE_ENABLED() __builtin_expect (lwlock__acquire_semaphore, 0)  
#define postgresql_lwlock__acquire_semaphore lwlock__acquire_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_ACQUIRE_ENABLED() __builtin_expect (postgresql_lwlock__acquire_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__acquire_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_ACQUIRE(arg1,arg2) \  
DTRACE_PROBE2(postgresql,lwlock__acquire,arg1,arg2)  
  
/* TRACE_POSTGRESQL_LWLOCK_RELEASE ( int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED() __builtin_expect (lwlock__release_semaphore, 0)  
#define postgresql_lwlock__release_semaphore lwlock__release_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED() __builtin_expect (postgresql_lwlock__release_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__release_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_RELEASE(arg1) \  
DTRACE_PROBE1(postgresql,lwlock__release,arg1)  
  
/* TRACE_POSTGRESQL_LWLOCK_WAIT_START ( int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED() __builtin_expect (lwlock__wait__start_semaphore, 0)  
#define postgresql_lwlock__wait__start_semaphore lwlock__wait__start_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED() __builtin_expect (postgresql_lwlock__wait__start_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__wait__start_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_START(arg1,arg2) \  
DTRACE_PROBE2(postgresql,lwlock__wait__start,arg1,arg2)  
  
/* TRACE_POSTGRESQL_LWLOCK_WAIT_DONE ( int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED() __builtin_expect (lwlock__wait__done_semaphore, 0)  
#define postgresql_lwlock__wait__done_semaphore lwlock__wait__done_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED() __builtin_expect (postgresql_lwlock__wait__done_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__wait__done_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(arg1,arg2) \  
DTRACE_PROBE2(postgresql,lwlock__wait__done,arg1,arg2)  
  
/* TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE ( int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_ENABLED() __builtin_expect (lwlock__condacquire_semaphore, 0)  
#define postgresql_lwlock__condacquire_semaphore lwlock__condacquire_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_ENABLED() __builtin_expect (postgresql_lwlock__condacquire_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__condacquire_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(arg1,arg2) \  
DTRACE_PROBE2(postgresql,lwlock__condacquire,arg1,arg2)  
  
/* TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL ( int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL_ENABLED() __builtin_expect (lwlock__condacquire__fail_semaphore, 0)  
#define postgresql_lwlock__condacquire__fail_semaphore lwlock__condacquire__fail_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL_ENABLED() __builtin_expect (postgresql_lwlock__condacquire__fail_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__condacquire__fail_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(arg1,arg2) \  
DTRACE_PROBE2(postgresql,lwlock__condacquire__fail,arg1,arg2)  
  
/* TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE ( int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_ENABLED() __builtin_expect (lwlock__wait__until__free_semaphore, 0)  
#define postgresql_lwlock__wait__until__free_semaphore lwlock__wait__until__free_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_ENABLED() __builtin_expect (postgresql_lwlock__wait__until__free_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__wait__until__free_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE(arg1,arg2) \  
DTRACE_PROBE2(postgresql,lwlock__wait__until__free,arg1,arg2)  
  
/* TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL ( int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL_ENABLED() __builtin_expect (lwlock__wait__until__free__fail_semaphore, 0)  
#define postgresql_lwlock__wait__until__free__fail_semaphore lwlock__wait__until__free__fail_semaphore  
#else  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL_ENABLED() __builtin_expect (postgresql_lwlock__wait__until__free__fail_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lwlock__wait__until__free__fail_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL(arg1,arg2) \  
DTRACE_PROBE2(postgresql,lwlock__wait__until__free__fail,arg1,arg2)  
  
/* TRACE_POSTGRESQL_LOCK_WAIT_START ( unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LOCK_WAIT_START_ENABLED() __builtin_expect (lock__wait__start_semaphore, 0)  
#define postgresql_lock__wait__start_semaphore lock__wait__start_semaphore  
#else  
#define TRACE_POSTGRESQL_LOCK_WAIT_START_ENABLED() __builtin_expect (postgresql_lock__wait__start_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lock__wait__start_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LOCK_WAIT_START(arg1,arg2,arg3,arg4,arg5,arg6) \  
DTRACE_PROBE6(postgresql,lock__wait__start,arg1,arg2,arg3,arg4,arg5,arg6)  
  
/* TRACE_POSTGRESQL_LOCK_WAIT_DONE ( unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_LOCK_WAIT_DONE_ENABLED() __builtin_expect (lock__wait__done_semaphore, 0)  
#define postgresql_lock__wait__done_semaphore lock__wait__done_semaphore  
#else  
#define TRACE_POSTGRESQL_LOCK_WAIT_DONE_ENABLED() __builtin_expect (postgresql_lock__wait__done_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_lock__wait__done_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_LOCK_WAIT_DONE(arg1,arg2,arg3,arg4,arg5,arg6) \  
DTRACE_PROBE6(postgresql,lock__wait__done,arg1,arg2,arg3,arg4,arg5,arg6)  
  
/* TRACE_POSTGRESQL_DEADLOCK_FOUND () */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_DEADLOCK_FOUND_ENABLED() __builtin_expect (deadlock__found_semaphore, 0)  
#define postgresql_deadlock__found_semaphore deadlock__found_semaphore  
#else  
#define TRACE_POSTGRESQL_DEADLOCK_FOUND_ENABLED() __builtin_expect (postgresql_deadlock__found_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_deadlock__found_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_DEADLOCK_FOUND() \  
DTRACE_PROBE(postgresql,deadlock__found)  

4. 探针在源码中的信息:

轻量锁

src/backend/storage/lmgr/lwlock.c

/*  
 * LWLockAcquire - acquire a lightweight lock in the specified mode  
 *  
 * If the lock is not available, sleep until it is.  
 *  
 * Side effect: cancel/die interrupts are held off until lock release.  
 */  
void  
LWLockAcquire(LWLockId lockid, LWLockMode mode)  
...  
  
                TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);  
  
                for (;;)  
                {  
                        /* "false" means cannot accept cancel/die interrupt here. */  
                        PGSemaphoreLock(&proc->sem, false);  
                        if (!proc->lwWaiting)  
                                break;  
                        extraWaits++;  
                }  
  
                TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);  
...  
        /* We are done updating shared state of the lock itself. */  
        SpinLockRelease(&lock->mutex);  
  
        TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode);  
...  
/*  
 * LWLockConditionalAcquire - acquire a lightweight lock in the specified mode  
 *  
 * If the lock is not available, return FALSE with no side-effects.  
 *  
 * If successful, cancel/die interrupts are held off until lock release.  
 */  
bool  
LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)  
{  
...  
        /* We are done updating shared state of the lock itself. */  
        SpinLockRelease(&lock->mutex);  
  
        if (mustwait)  
        {  
                /* Failed to get lock, so release interrupt holdoff */  
                RESUME_INTERRUPTS();  
                LOG_LWDEBUG("LWLockConditionalAcquire", lockid, "failed");  
                TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(lockid, mode);  
        }  
        else  
        {  
                /* Add lock to list of locks held by this backend */  
                held_lwlocks[num_held_lwlocks++] = lockid;  
                TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(lockid, mode);  
        }  
...  
/*  
 * LWLockAcquireOrWait - Acquire lock, or wait until it's free  
 *  
 * The semantics of this function are a bit funky.      If the lock is currently  
 * free, it is acquired in the given mode, and the function returns true.  If  
 * the lock isn't immediately free, the function waits until it is released  
 * and returns false, but does not acquire the lock.  
 *  
 * This is currently used for WALWriteLock: when a backend flushes the WAL,  
 * holding WALWriteLock, it can flush the commit records of many other  
 * backends as a side-effect.  Those other backends need to wait until the  
 * flush finishes, but don't need to acquire the lock anymore.  They can just  
 * wake up, observe that their records have already been flushed, and return.  
 */  
bool  
LWLockAcquireOrWait(LWLockId lockid, LWLockMode mode)  
{  
...  
                TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);  
  
                for (;;)  
                {  
                        /* "false" means cannot accept cancel/die interrupt here. */  
                        PGSemaphoreLock(&proc->sem, false);  
                        if (!proc->lwWaiting)  
                                break;  
                        extraWaits++;  
                }  
  
                TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);  
...  
        /*  
         * Fix the process wait semaphore's count for any absorbed wakeups.  
         */  
        while (extraWaits-- > 0)  
                PGSemaphoreUnlock(&proc->sem);  
  
        if (mustwait)  
        {  
                /* Failed to get lock, so release interrupt holdoff */  
                RESUME_INTERRUPTS();  
                LOG_LWDEBUG("LWLockAcquireOrWait", lockid, "failed");  
                TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL(lockid, mode);  
        }  
        else  
        {  
                /* Add lock to list of locks held by this backend */  
                held_lwlocks[num_held_lwlocks++] = lockid;  
                TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE(lockid, mode);  
        }  
...  
/*  
 * LWLockRelease - release a previously acquired lock  
 */  
void  
LWLockRelease(LWLockId lockid)  
{  
...  
        /* We are done updating shared state of the lock itself. */  
        SpinLockRelease(&lock->mutex);  
  
        TRACE_POSTGRESQL_LWLOCK_RELEASE(lockid);  

重量锁

src/backend/storage/lmgr/lock.c

/*  
 * LockAcquireExtended - allows us to specify additional options  
 *  
 * reportMemoryError specifies whether a lock request that fills the  
 * lock table should generate an ERROR or not. This allows a priority  
 * caller to note that the lock table is full and then begin taking  
 * extreme action to reduce the number of other lock holders before  
 * retrying the action.  
 */  
LockAcquireResult  
LockAcquireExtended(const LOCKTAG *locktag,  
                                        LOCKMODE lockmode,  
                                        bool sessionLock,  
                                        bool dontWait,  
                                        bool reportMemoryError)  
{  
...  
                /*  
                 * Sleep till someone wakes me up.  
                 */  
  
                TRACE_POSTGRESQL_LOCK_WAIT_START(locktag->locktag_field1,  
                                                                                 locktag->locktag_field2,  
                                                                                 locktag->locktag_field3,  
                                                                                 locktag->locktag_field4,  
                                                                                 locktag->locktag_type,  
                                                                                 lockmode);  
  
                WaitOnLock(locallock, owner);  
  
                TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1,  
                                                                                locktag->locktag_field2,  
                                                                                locktag->locktag_field3,  
                                                                                locktag->locktag_field4,  
                                                                                locktag->locktag_type,  
                                                                                lockmode);  
...  

死锁

src/backend/storage/lmgr/deadlock.c

/*  
 * DeadLockCheck -- Checks for deadlocks for a given process  
 *  
 * This code looks for deadlocks involving the given process.  If any  
 * are found, it tries to rearrange lock wait queues to resolve the  
 * deadlock.  If resolution is impossible, return DS_HARD_DEADLOCK ---  
 * the caller is then expected to abort the given proc's transaction.  
 *  
 * Caller must already have locked all partitions of the lock tables.  
 *  
 * On failure, deadlock details are recorded in deadlockDetails[] for  
 * subsequent printing by DeadLockReport().  That activity is separate  
 * because (a) we don't want to do it while holding all those LWLocks,  
 * and (b) we are typically invoked inside a signal handler.  
 */  
DeadLockState  
DeadLockCheck(PGPROC *proc)  
{  
...  
        /* Search for deadlocks and possible fixes */  
        if (DeadLockCheckRecurse(proc))  
        {  
                /*  
                 * Call FindLockCycle one more time, to record the correct  
                 * deadlockDetails[] for the basic state with no rearrangements.  
                 */  
                int                     nSoftEdges;  
  
                TRACE_POSTGRESQL_DEADLOCK_FOUND();  
  
                nWaitOrders = 0;  
                if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))  
                        elog(FATAL, "deadlock seems to have disappeared");  
  
                return DS_HARD_DEADLOCK;        /* cannot find a non-deadlocked state */  
        }  

5. LWLockID 类型定义

src/include/storage/lwlock.h

/*  
 * We have a number of predefined LWLocks, plus a bunch of LWLocks that are  
 * dynamically assigned (e.g., for shared buffers).  The LWLock structures  
 * live in shared memory (since they contain shared data) and are identified  
 * by values of this enumerated type.  We abuse the notion of an enum somewhat  
 * by allowing values not listed in the enum declaration to be assigned.  
 * The extra value MaxDynamicLWLock is there to keep the compiler from  
 * deciding that the enum can be represented as char or short ...  
 *  
 * If you remove a lock, please replace it with a placeholder. This retains  
 * the lock numbering, which is helpful for DTrace and other external  
 * debugging scripts.  
 */  
typedef enum LWLockId  
{  
        BufFreelistLock,  
        ShmemIndexLock,  
        OidGenLock,  
        XidGenLock,  
        ProcArrayLock,  
        SInvalReadLock,  
        SInvalWriteLock,  
        WALInsertLock,  
        WALWriteLock,  
        ControlFileLock,  
        CheckpointLock,  
        CLogControlLock,  
        SubtransControlLock,  
        MultiXactGenLock,  
        MultiXactOffsetControlLock,  
        MultiXactMemberControlLock,  
        RelCacheInitLock,  
        CheckpointerCommLock,  
        TwoPhaseStateLock,  
        TablespaceCreateLock,  
        BtreeVacuumLock,  
        AddinShmemInitLock,  
        AutovacuumLock,  
        AutovacuumScheduleLock,  
        SyncScanLock,  
        RelationMappingLock,  
        AsyncCtlLock,  
        AsyncQueueLock,  
        SerializableXactHashLock,  
        SerializableFinishedListLock,  
        SerializablePredicateLockListLock,  
        OldSerXidLock,  
        SyncRepLock,  
        /* Individual lock IDs end here */  
        FirstBufMappingLock,  
        FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,  
        FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,  
  
        /* must be last except for MaxDynamicLWLock: */  
        NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,  
  
        MaxDynamicLWLock = 1000000000  
} LWLockId;  

6. LWLockMode 类型定义

src/include/storage/lwlock.h

typedef enum LWLockMode  
{  
        LW_EXCLUSIVE,  
        LW_SHARED,  
        LW_WAIT_UNTIL_FREE                      /* A special mode used in PGPROC->lwlockMode,  
                                                                 * when waiting for lock to become free. Not  
                                                                 * to be used as LWLockAcquire argument */  
} LWLockMode;  

7. LOCKTAG以及LockTagType 类型定义

src/include/storage/lock.h

/*  
 * LOCKTAG is the key information needed to look up a LOCK item in the  
 * lock hashtable.      A LOCKTAG value uniquely identifies a lockable object.  
 *  
 * The LockTagType enum defines the different kinds of objects we can lock.  
 * We can handle up to 256 different LockTagTypes.  
 */  
typedef enum LockTagType  
{  
        LOCKTAG_RELATION,                       /* whole relation */  
        /* ID info for a relation is DB OID + REL OID; DB OID = 0 if shared */  
        LOCKTAG_RELATION_EXTEND,        /* the right to extend a relation */  
        /* same ID info as RELATION */  
        LOCKTAG_PAGE,                           /* one page of a relation */  
        /* ID info for a page is RELATION info + BlockNumber */  
        LOCKTAG_TUPLE,                          /* one physical tuple */  
        /* ID info for a tuple is PAGE info + OffsetNumber */  
        LOCKTAG_TRANSACTION,            /* transaction (for waiting for xact done) */  
        /* ID info for a transaction is its TransactionId */  
        LOCKTAG_VIRTUALTRANSACTION, /* virtual transaction (ditto) */  
        /* ID info for a virtual transaction is its VirtualTransactionId */  
        LOCKTAG_OBJECT,                         /* non-relation database object */  
        /* ID info for an object is DB OID + CLASS OID + OBJECT OID + SUBID */  
  
        /*  
         * Note: object ID has same representation as in pg_depend and  
         * pg_description, but notice that we are constraining SUBID to 16 bits.  
         * Also, we use DB OID = 0 for shared objects such as tablespaces.  
         */  
        LOCKTAG_USERLOCK,                       /* reserved for old contrib/userlock code */  
        LOCKTAG_ADVISORY                        /* advisory user locks */  
} LockTagType;  
/*  
 * The LOCKTAG struct is defined with malice aforethought to fit into 16  
 * bytes with no padding.  Note that this would need adjustment if we were  
 * to widen Oid, BlockNumber, or TransactionId to more than 32 bits.  
 *  
 * We include lockmethodid in the locktag so that a single hash table in  
 * shared memory can store locks of different lockmethods.  
 */  
typedef struct LOCKTAG  
{  
        uint32          locktag_field1; /* a 32-bit ID field */  
        uint32          locktag_field2; /* a 32-bit ID field */  
        uint32          locktag_field3; /* a 32-bit ID field */  
        uint16          locktag_field4; /* a 16-bit ID field */  
        uint8           locktag_type;   /* see enum LockTagType */  
        uint8           locktag_lockmethodid;   /* lockmethod indicator */  
} LOCKTAG;  

8. LOCKMODE 类型定义以及值定义.

src/include/storage/lock.h

/*  
 * LOCKMODE is an integer (1..N) indicating a lock type.  LOCKMASK is a bit  
 * mask indicating a set of held or requested lock types (the bit 1<<mode  
 * corresponds to a particular lock mode).  
 */  
typedef int LOCKMASK;  
typedef int LOCKMODE;  
/*  
 * These are the valid values of type LOCKMODE for all the standard lock  
 * methods (both DEFAULT and USER).  
 */  
  
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */  
#define NoLock                                  0  
  
#define AccessShareLock                 1               /* SELECT */  
#define RowShareLock                    2               /* SELECT FOR UPDATE/FOR SHARE */  
#define RowExclusiveLock                3               /* INSERT, UPDATE, DELETE */  
#define ShareUpdateExclusiveLock 4              /* VACUUM (non-FULL),ANALYZE, CREATE  
                                                                                 * INDEX CONCURRENTLY */  
#define ShareLock                               5               /* CREATE INDEX (WITHOUT CONCURRENTLY) */  
#define ShareRowExclusiveLock   6               /* like EXCLUSIVE MODE, but allows ROW  
                                                                                 * SHARE */  
#define ExclusiveLock                   7               /* blocks ROW SHARE/SELECT...FOR  
                                                                                 * UPDATE */  
#define AccessExclusiveLock             8               /* ALTER TABLE, DROP TABLE, VACUUM  
                                                                                 * FULL, and unqualified LOCK TABLE */  

9. 重量锁请求的rf by函数 :

Referenced by ConditionalLockPage(), ConditionalLockRelation(), ConditionalLockRelationOid(), ConditionalLockTuple(), ConditionalXactLockTableWait(), LockDatabaseObject(), LockPage(), LockRelation(), LockRelationForExtension(), LockRelationIdForSession(), LockRelationOid(), LockSharedObject(), LockSharedObjectForSession(), LockTuple(), pg_advisory_lock_int4(), pg_advisory_lock_int8(), pg_advisory_lock_shared_int4(), pg_advisory_lock_shared_int8(), pg_advisory_xact_lock_int4(), pg_advisory_xact_lock_int8(), pg_advisory_xact_lock_shared_int4(), pg_advisory_xact_lock_shared_int8(), pg_try_advisory_lock_int4(), pg_try_advisory_lock_int8(), pg_try_advisory_lock_shared_int4(), pg_try_advisory_lock_shared_int8(), pg_try_advisory_xact_lock_int4(), pg_try_advisory_xact_lock_int8(), pg_try_advisory_xact_lock_shared_int4(), pg_try_advisory_xact_lock_shared_int8(), VirtualXactLock(), XactLockTableInsert(), and XactLockTableWait().  

本例中锁类型为TUPLE时是LockTuple函数调用参数的一个锁. 所以通过宏SET_LOCKTAG_TUPLE可以解释这个探针中前5个变量的值的含义为(dbid, relid, blocknum, tupleoffset_inblock).

 LockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode)  
  388 {  
  389     LOCKTAG     tag;  
  390   
  391     SET_LOCKTAG_TUPLE(tag,  
  392                       relation->rd_lockInfo.lockRelId.dbId,  
  393                       relation->rd_lockInfo.lockRelId.relId,  
  394                       ItemPointerGetBlockNumber(tid),  
  395                       ItemPointerGetOffsetNumber(tid));  
  396   
  397     (void) LockAcquire(&tag, lockmode, false, false);  
  398 }  

10. 重量锁TAG设置宏定义请参考如下头文件 :

src/include/storage/lock.h

/*  
 * These macros define how we map logical IDs of lockable objects into  
 * the physical fields of LOCKTAG.      Use these to set up LOCKTAG values,  
 * rather than accessing the fields directly.  Note multiple eval of target!  
 */  
#define SET_LOCKTAG_RELATION(locktag,dboid,reloid) \  
        ((locktag).locktag_field1 = (dboid), \  
         (locktag).locktag_field2 = (reloid), \  
         (locktag).locktag_field3 = 0, \  
         (locktag).locktag_field4 = 0, \  
         (locktag).locktag_type = LOCKTAG_RELATION, \  
         (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)  
  
#define SET_LOCKTAG_RELATION_EXTEND(locktag,dboid,reloid) \  
        ((locktag).locktag_field1 = (dboid), \  
         (locktag).locktag_field2 = (reloid), \  
         (locktag).locktag_field3 = 0, \  
         (locktag).locktag_field4 = 0, \  
         (locktag).locktag_type = LOCKTAG_RELATION_EXTEND, \  
         (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)  
  
#define SET_LOCKTAG_PAGE(locktag,dboid,reloid,blocknum) \  
        ((locktag).locktag_field1 = (dboid), \  
         (locktag).locktag_field2 = (reloid), \  
         (locktag).locktag_field3 = (blocknum), \  
         (locktag).locktag_field4 = 0, \  
         (locktag).locktag_type = LOCKTAG_PAGE, \  
         (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)  
  
#define SET_LOCKTAG_TUPLE(locktag,dboid,reloid,blocknum,offnum) \  
        ((locktag).locktag_field1 = (dboid), \  
         (locktag).locktag_field2 = (reloid), \  
         (locktag).locktag_field3 = (blocknum), \  
         (locktag).locktag_field4 = (offnum), \  
         (locktag).locktag_type = LOCKTAG_TUPLE, \  
         (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)  
#define SET_LOCKTAG_TRANSACTION(locktag,xid) \  
        ((locktag).locktag_field1 = (xid), \  
         (locktag).locktag_field2 = 0, \  
         (locktag).locktag_field3 = 0, \  
         (locktag).locktag_field4 = 0, \  
         (locktag).locktag_type = LOCKTAG_TRANSACTION, \  
         (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)  
  
#define SET_LOCKTAG_VIRTUALTRANSACTION(locktag,vxid) \  
        ((locktag).locktag_field1 = (vxid).backendId, \  
         (locktag).locktag_field2 = (vxid).localTransactionId, \  
         (locktag).locktag_field3 = 0, \  
         (locktag).locktag_field4 = 0, \  
         (locktag).locktag_type = LOCKTAG_VIRTUALTRANSACTION, \  
         (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)  
  
#define SET_LOCKTAG_OBJECT(locktag,dboid,classoid,objoid,objsubid) \  
        ((locktag).locktag_field1 = (dboid), \  
         (locktag).locktag_field2 = (classoid), \  
         (locktag).locktag_field3 = (objoid), \  
         (locktag).locktag_field4 = (objsubid), \  
         (locktag).locktag_type = LOCKTAG_OBJECT, \  
         (locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)  
  
#define SET_LOCKTAG_ADVISORY(locktag,id1,id2,id3,id4) \  
        ((locktag).locktag_field1 = (id1), \  
         (locktag).locktag_field2 = (id2), \  
         (locktag).locktag_field3 = (id3), \  
         (locktag).locktag_field4 = (id4), \  
         (locktag).locktag_type = LOCKTAG_ADVISORY, \  
         (locktag).locktag_lockmethodid = USER_LOCKMETHOD)  

Flag Counter

digoal’s 大量PostgreSQL文章入口