Systemtap EXP: PostgreSQL IN-BUILD mark Class 6 - lock
背景
本文要讲的是PostgreSQL内建的锁类别的探针,以及使用stap跟踪的方法.
正文
PostgreSQL锁类别探针分为轻量锁, 重量锁, 以及死锁的探针.
轻量锁探针 :
probe lwlock__acquire(LWLockId, LWLockMode); 当得到轻量锁时触发, 变量含义参见本文参考部分6,7,8.
probe lwlock__release(LWLockId); 当释放轻量锁时触发.
probe lwlock__wait__start(LWLockId, LWLockMode); 当等待轻量锁开始时触发.
probe lwlock__wait__done(LWLockId, LWLockMode); 当等待轻量锁结束时触发.
probe lwlock__condacquire(LWLockId, LWLockMode); 当得到轻量锁时触发, 注意本探针放在nowait轻量锁请求函数中, 所以不会有等待的过程, 只有获得成功或者失败. 请区别于lwlock__acquire.
probe lwlock__condacquire__fail(LWLockId, LWLockMode); 当得到轻量锁失败时触发.
probe lwlock__wait__until__free(LWLockId, LWLockMode); 在LWLockAcquireOrWait 函数中, 这个函数用法比较特殊, 当获取锁成功时返回true; 当不能立刻获取到锁时, 等待锁释放, 返回false, 但是不获取锁. 该函数目前仅被WALWriteLock使用.
probe lwlock__wait__until__free__fail(LWLockId, LWLockMode); 函数同上, 失败时触发.
重量锁探针 :
probe lock__wait__start(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
当重量锁请求等待开始时触发, 前5个变量对应LOCKTAG的前5个field, LOCKMODE可参考本文末尾8.
probe lock__wait__done(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE);
等待结束时触发.
死锁探针 :
probe deadlock__found();
探针的详细介绍 :
探针 | 参数 | 描述 |
---|---|---|
lwlock-acquire | (LWLockId, LWLockMode) | Probe that fires when an LWLock has been acquired. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared. |
lwlock-release | (LWLockId) | Probe that fires when an LWLock has been released (but note that any released waiters have not yet been awakened). arg0 is the LWLock’s ID. |
lwlock-wait-start | (LWLockId, LWLockMode) | Probe that fires when an LWLock was not immediately available and a server process has begun to wait for the lock to become available. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared. |
lwlock-wait-done | (LWLockId, LWLockMode) | Probe that fires when a server process has been released from its wait for an LWLock (it does not actually have the lock yet). arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared. |
lwlock-condacquire | (LWLockId, LWLockMode) | Probe that fires when an LWLock was successfully acquired when the caller specified no waiting. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared. |
lwlock-condacquire-fail | (LWLockId, LWLockMode) | Probe that fires when an LWLock was not successfully acquired when the caller specified no waiting. arg0 is the LWLock’s ID. arg1 is the requested lock mode, either exclusive or shared. |
lock-wait-start | (unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE) | Probe that fires when a request for a heavyweight lock (lmgr lock) has begun to wait because the lock is not available. arg0 through arg3 are the tag fields identifying the object being locked. arg4 indicates the type of object being locked. arg5 indicates the lock type being requested. |
lock-wait-done | (unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, LOCKMODE) | Probe that fires when a request for a heavyweight lock (lmgr lock) has finished waiting (i.e., has acquired the lock). The arguments are the same as for lock-wait-start. |
deadlock-found | () | Probe that fires when a deadlock is found by the deadlock detector. |
举例
1. 跟踪轻量锁等待次数.
stap -e '
global var1
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("lwlock__wait__done") {
var1[$arg1, $arg2]++
}
probe timer.s($1) {
println("*******************")
foreach(v=[x,y] in var1+)
printf("lockid:%d, lockmode:%d, wait_count:%d\n", x,y,v)
delete var1
}' 5
SQL :
digoal=# \sf f_test(int)
CREATE OR REPLACE FUNCTION public.f_test(i_id integer)
RETURNS void
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
update test set info=md5(random()::text), crt_time=clock_timestamp() where id=i_id;
if not found then
insert into test(id,info,crt_time) values(i_id,md5(random()::text),clock_timestamp());
end if;
return;
exception when others then
return;
end;
$function$
digoal=# \d test
Table "public.test"
Column | Type | Modifiers
----------+-----------------------------+-----------
id | integer | not null
info | text |
crt_time | timestamp without time zone |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
stap输出 :
*******************
lockid:30604, lockmode:1, wait_count:1
lockid:6150, lockmode:0, wait_count:1
lockid:137242, lockmode:1, wait_count:1
lockid:33180, lockmode:1, wait_count:1
lockid:37820, lockmode:1, wait_count:1
lockid:55818, lockmode:1, wait_count:1
lockid:152122, lockmode:1, wait_count:1
lockid:3, lockmode:1, wait_count:1
lockid:18220, lockmode:0, wait_count:1
lockid:34, lockmode:1, wait_count:1
lockid:38, lockmode:0, wait_count:1
lockid:63456, lockmode:1, wait_count:1
lockid:23704, lockmode:1, wait_count:1
lockid:52826, lockmode:1, wait_count:1
lockid:49032, lockmode:1, wait_count:1
lockid:45, lockmode:1, wait_count:1
lockid:45348, lockmode:1, wait_count:1
lockid:2364, lockmode:1, wait_count:1
lockid:40, lockmode:0, wait_count:1
lockid:45918, lockmode:1, wait_count:1
lockid:150650, lockmode:1, wait_count:1
lockid:56324, lockmode:1, wait_count:1
lockid:32554, lockmode:1, wait_count:1
lockid:38, lockmode:1, wait_count:1
lockid:20636, lockmode:1, wait_count:1
lockid:6534, lockmode:1, wait_count:1
lockid:39126, lockmode:1, wait_count:1
lockid:42, lockmode:0, wait_count:1
lockid:1640, lockmode:1, wait_count:1
lockid:46, lockmode:0, wait_count:1
lockid:39, lockmode:1, wait_count:1
lockid:53778, lockmode:1, wait_count:1
lockid:35, lockmode:1, wait_count:1
lockid:33450, lockmode:1, wait_count:1
lockid:33, lockmode:1, wait_count:2
lockid:46, lockmode:1, wait_count:2
lockid:48, lockmode:0, wait_count:2
lockid:43, lockmode:0, wait_count:2
lockid:96, lockmode:1, wait_count:2
lockid:44, lockmode:1, wait_count:2
lockid:37, lockmode:1, wait_count:2
lockid:40, lockmode:1, wait_count:2
lockid:13932, lockmode:1, wait_count:2
lockid:48, lockmode:1, wait_count:4
lockid:8, lockmode:0, wait_count:7
lockid:13, lockmode:0, wait_count:88
lockid:60, lockmode:0, wait_count:224
lockid:54, lockmode:0, wait_count:235
lockid:58, lockmode:0, wait_count:237
lockid:49, lockmode:0, wait_count:240
lockid:56, lockmode:0, wait_count:243
lockid:57, lockmode:0, wait_count:249
lockid:64, lockmode:0, wait_count:251
lockid:63, lockmode:0, wait_count:261
lockid:55, lockmode:0, wait_count:262
lockid:59, lockmode:0, wait_count:263
lockid:53, lockmode:0, wait_count:273
lockid:52, lockmode:0, wait_count:275
lockid:51, lockmode:0, wait_count:275
lockid:62, lockmode:0, wait_count:276
lockid:61, lockmode:0, wait_count:281
lockid:50, lockmode:0, wait_count:287
lockid:12, lockmode:0, wait_count:1514
lockid:11, lockmode:1, wait_count:3385
lockid:11, lockmode:0, wait_count:4103
lockid:3, lockmode:0, wait_count:6376
lockid:4, lockmode:1, wait_count:6980
lockid:4, lockmode:0, wait_count:19500
lockid:7, lockmode:0, wait_count:31472
... 略
2. 跟踪重量锁 :
stap :
[root@db-172-16-3-150 postgresql-9.3.1]# stap -v -D MAXSKIPPED=10000000 -e '
global var1%[120000], var2%[120000]
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("lock__wait__start") {
var1[pid()] = gettimeofday_us()
}
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("lock__wait__done") {
p=pid()
t=gettimeofday_us()
if (p in var1)
var2[$arg1, $arg2, $arg3, $arg4, $arg5, $arg6] <<< (t - var1[p])
}
probe timer.s($1) {
println("*******************")
foreach([a,b,c,d,e,f] in var2 @sum - limit 5)
printdln("**",a,b,c,d,e,f,@sum(var2[a,b,c,d,e,f])/1000,@count(var2[a,b,c,d,e,f]),@avg(var2[a,b,c,d,e,f])/1000)
delete var2
}' 5
测试SQL :
pg93@db-172-16-3-150-> cat test.sql
\setrandom id 1 8
select f_test(:id);
pg93@db-172-16-3-150-> pgbench -M prepared -n -r -f ./test.sql -c 64 -j 8 -T 1000
64个链接, 将id现在在8个以内, 会产生大量的等待.
digoal=# \sf f_test(int)
CREATE OR REPLACE FUNCTION public.f_test(i_id integer)
RETURNS void
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
update test set info=md5(random()::text), crt_time=clock_timestamp() where id=i_id;
if not found then
insert into test(id,info,crt_time) values(i_id,md5(random()::text),clock_timestamp());
end if;
return;
exception when others then
return;
end;
$function$
stap输出 :
最后面3列为一共等待的时间, 一共等待的次数, 以及平均等待时间(ms).
3表示LOCKTAG_TUPLE, 参考本文末尾的定义
7表示ExclusiveLock, 参考本文末尾的定义
163842473577940**116表示 dboid, reloid, blocknum, offnum
digoal=# select oid from pg_database where datname='digoal';
oid
-------
16384
(1 row)
digoal=# select oid from pg_class where relname='test';
oid
-------
24735
(1 row)
digoal=# select max(ctid) from test;
max
------------
(80505,78)
(1 row)
*******************
16384**24735**77940**116**3**7**304**38**8
16384**24735**77921**83**3**7**295**32**9
16384**24735**77979**32**3**7**271**41**6
16384**24735**77906**29**3**7**235**37**6
16384**24735**77921**138**3**7**233**44**5
*******************
129233058**0**0**0**4**5**1355**9**150
129233135**0**0**0**4**5**1197**8**149
129233044**0**0**0**4**5**749**6**124
129233289**0**0**0**4**5**747**5**149
16384**24735**78068**67**3**7**317**14**22
*******************
129363231**0**0**0**4**5**921**19**48
16384**24735**78091**103**3**7**450**47**9
16384**24735**78199**108**3**7**300**25**12
16384**24735**78130**37**3**7**264**5**52
16384**24735**78131**66**3**7**261**56**4
*******************
16384**24735**78277**36**3**7**350**38**9
16384**24735**78290**20**3**7**301**35**8
16384**24735**78206**114**3**7**268**43**6
16384**24735**78277**93**3**7**254**30**8
16384**24735**78290**79**3**7**253**31**8
*******************
16384**24735**78613**25**3**7**220**6**36
16384**24735**78781**19**3**7**206**21**9
16384**24735**78621**32**3**7**184**22**8
16384**24735**78613**21**3**7**165**3**55
16384**24735**78702**78**3**7**159**17**9
参考
1. http://www.postgresql.org/docs/9.3/static/dynamic-trace.html
2.
src/backend/storage/lmgr/lwlock.c
src/backend/storage/lmgr/lock.c
src/backend/storage/lmgr/deadlock.c
3. 探针信息 :
/* TRACE_POSTGRESQL_LWLOCK_ACQUIRE ( int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_ACQUIRE_ENABLED() __builtin_expect (lwlock__acquire_semaphore, 0)
#define postgresql_lwlock__acquire_semaphore lwlock__acquire_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_ACQUIRE_ENABLED() __builtin_expect (postgresql_lwlock__acquire_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__acquire_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_ACQUIRE(arg1,arg2) \
DTRACE_PROBE2(postgresql,lwlock__acquire,arg1,arg2)
/* TRACE_POSTGRESQL_LWLOCK_RELEASE ( int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED() __builtin_expect (lwlock__release_semaphore, 0)
#define postgresql_lwlock__release_semaphore lwlock__release_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_RELEASE_ENABLED() __builtin_expect (postgresql_lwlock__release_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__release_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_RELEASE(arg1) \
DTRACE_PROBE1(postgresql,lwlock__release,arg1)
/* TRACE_POSTGRESQL_LWLOCK_WAIT_START ( int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED() __builtin_expect (lwlock__wait__start_semaphore, 0)
#define postgresql_lwlock__wait__start_semaphore lwlock__wait__start_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED() __builtin_expect (postgresql_lwlock__wait__start_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__wait__start_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_WAIT_START(arg1,arg2) \
DTRACE_PROBE2(postgresql,lwlock__wait__start,arg1,arg2)
/* TRACE_POSTGRESQL_LWLOCK_WAIT_DONE ( int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED() __builtin_expect (lwlock__wait__done_semaphore, 0)
#define postgresql_lwlock__wait__done_semaphore lwlock__wait__done_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_WAIT_DONE_ENABLED() __builtin_expect (postgresql_lwlock__wait__done_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__wait__done_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(arg1,arg2) \
DTRACE_PROBE2(postgresql,lwlock__wait__done,arg1,arg2)
/* TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE ( int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_ENABLED() __builtin_expect (lwlock__condacquire_semaphore, 0)
#define postgresql_lwlock__condacquire_semaphore lwlock__condacquire_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_ENABLED() __builtin_expect (postgresql_lwlock__condacquire_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__condacquire_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(arg1,arg2) \
DTRACE_PROBE2(postgresql,lwlock__condacquire,arg1,arg2)
/* TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL ( int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL_ENABLED() __builtin_expect (lwlock__condacquire__fail_semaphore, 0)
#define postgresql_lwlock__condacquire__fail_semaphore lwlock__condacquire__fail_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL_ENABLED() __builtin_expect (postgresql_lwlock__condacquire__fail_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__condacquire__fail_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(arg1,arg2) \
DTRACE_PROBE2(postgresql,lwlock__condacquire__fail,arg1,arg2)
/* TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE ( int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_ENABLED() __builtin_expect (lwlock__wait__until__free_semaphore, 0)
#define postgresql_lwlock__wait__until__free_semaphore lwlock__wait__until__free_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_ENABLED() __builtin_expect (postgresql_lwlock__wait__until__free_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__wait__until__free_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE(arg1,arg2) \
DTRACE_PROBE2(postgresql,lwlock__wait__until__free,arg1,arg2)
/* TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL ( int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL_ENABLED() __builtin_expect (lwlock__wait__until__free__fail_semaphore, 0)
#define postgresql_lwlock__wait__until__free__fail_semaphore lwlock__wait__until__free__fail_semaphore
#else
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL_ENABLED() __builtin_expect (postgresql_lwlock__wait__until__free__fail_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lwlock__wait__until__free__fail_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL(arg1,arg2) \
DTRACE_PROBE2(postgresql,lwlock__wait__until__free__fail,arg1,arg2)
/* TRACE_POSTGRESQL_LOCK_WAIT_START ( unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LOCK_WAIT_START_ENABLED() __builtin_expect (lock__wait__start_semaphore, 0)
#define postgresql_lock__wait__start_semaphore lock__wait__start_semaphore
#else
#define TRACE_POSTGRESQL_LOCK_WAIT_START_ENABLED() __builtin_expect (postgresql_lock__wait__start_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lock__wait__start_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LOCK_WAIT_START(arg1,arg2,arg3,arg4,arg5,arg6) \
DTRACE_PROBE6(postgresql,lock__wait__start,arg1,arg2,arg3,arg4,arg5,arg6)
/* TRACE_POSTGRESQL_LOCK_WAIT_DONE ( unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, int) */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_LOCK_WAIT_DONE_ENABLED() __builtin_expect (lock__wait__done_semaphore, 0)
#define postgresql_lock__wait__done_semaphore lock__wait__done_semaphore
#else
#define TRACE_POSTGRESQL_LOCK_WAIT_DONE_ENABLED() __builtin_expect (postgresql_lock__wait__done_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_lock__wait__done_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_LOCK_WAIT_DONE(arg1,arg2,arg3,arg4,arg5,arg6) \
DTRACE_PROBE6(postgresql,lock__wait__done,arg1,arg2,arg3,arg4,arg5,arg6)
/* TRACE_POSTGRESQL_DEADLOCK_FOUND () */
#if defined STAP_SDT_V1
#define TRACE_POSTGRESQL_DEADLOCK_FOUND_ENABLED() __builtin_expect (deadlock__found_semaphore, 0)
#define postgresql_deadlock__found_semaphore deadlock__found_semaphore
#else
#define TRACE_POSTGRESQL_DEADLOCK_FOUND_ENABLED() __builtin_expect (postgresql_deadlock__found_semaphore, 0)
#endif
__extension__ extern unsigned short postgresql_deadlock__found_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));
#define TRACE_POSTGRESQL_DEADLOCK_FOUND() \
DTRACE_PROBE(postgresql,deadlock__found)
4. 探针在源码中的信息:
轻量锁
src/backend/storage/lmgr/lwlock.c
/*
* LWLockAcquire - acquire a lightweight lock in the specified mode
*
* If the lock is not available, sleep until it is.
*
* Side effect: cancel/die interrupts are held off until lock release.
*/
void
LWLockAcquire(LWLockId lockid, LWLockMode mode)
...
TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
for (;;)
{
/* "false" means cannot accept cancel/die interrupt here. */
PGSemaphoreLock(&proc->sem, false);
if (!proc->lwWaiting)
break;
extraWaits++;
}
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);
...
/* We are done updating shared state of the lock itself. */
SpinLockRelease(&lock->mutex);
TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode);
...
/*
* LWLockConditionalAcquire - acquire a lightweight lock in the specified mode
*
* If the lock is not available, return FALSE with no side-effects.
*
* If successful, cancel/die interrupts are held off until lock release.
*/
bool
LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
{
...
/* We are done updating shared state of the lock itself. */
SpinLockRelease(&lock->mutex);
if (mustwait)
{
/* Failed to get lock, so release interrupt holdoff */
RESUME_INTERRUPTS();
LOG_LWDEBUG("LWLockConditionalAcquire", lockid, "failed");
TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE_FAIL(lockid, mode);
}
else
{
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks++] = lockid;
TRACE_POSTGRESQL_LWLOCK_CONDACQUIRE(lockid, mode);
}
...
/*
* LWLockAcquireOrWait - Acquire lock, or wait until it's free
*
* The semantics of this function are a bit funky. If the lock is currently
* free, it is acquired in the given mode, and the function returns true. If
* the lock isn't immediately free, the function waits until it is released
* and returns false, but does not acquire the lock.
*
* This is currently used for WALWriteLock: when a backend flushes the WAL,
* holding WALWriteLock, it can flush the commit records of many other
* backends as a side-effect. Those other backends need to wait until the
* flush finishes, but don't need to acquire the lock anymore. They can just
* wake up, observe that their records have already been flushed, and return.
*/
bool
LWLockAcquireOrWait(LWLockId lockid, LWLockMode mode)
{
...
TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode);
for (;;)
{
/* "false" means cannot accept cancel/die interrupt here. */
PGSemaphoreLock(&proc->sem, false);
if (!proc->lwWaiting)
break;
extraWaits++;
}
TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode);
...
/*
* Fix the process wait semaphore's count for any absorbed wakeups.
*/
while (extraWaits-- > 0)
PGSemaphoreUnlock(&proc->sem);
if (mustwait)
{
/* Failed to get lock, so release interrupt holdoff */
RESUME_INTERRUPTS();
LOG_LWDEBUG("LWLockAcquireOrWait", lockid, "failed");
TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE_FAIL(lockid, mode);
}
else
{
/* Add lock to list of locks held by this backend */
held_lwlocks[num_held_lwlocks++] = lockid;
TRACE_POSTGRESQL_LWLOCK_WAIT_UNTIL_FREE(lockid, mode);
}
...
/*
* LWLockRelease - release a previously acquired lock
*/
void
LWLockRelease(LWLockId lockid)
{
...
/* We are done updating shared state of the lock itself. */
SpinLockRelease(&lock->mutex);
TRACE_POSTGRESQL_LWLOCK_RELEASE(lockid);
重量锁
src/backend/storage/lmgr/lock.c
/*
* LockAcquireExtended - allows us to specify additional options
*
* reportMemoryError specifies whether a lock request that fills the
* lock table should generate an ERROR or not. This allows a priority
* caller to note that the lock table is full and then begin taking
* extreme action to reduce the number of other lock holders before
* retrying the action.
*/
LockAcquireResult
LockAcquireExtended(const LOCKTAG *locktag,
LOCKMODE lockmode,
bool sessionLock,
bool dontWait,
bool reportMemoryError)
{
...
/*
* Sleep till someone wakes me up.
*/
TRACE_POSTGRESQL_LOCK_WAIT_START(locktag->locktag_field1,
locktag->locktag_field2,
locktag->locktag_field3,
locktag->locktag_field4,
locktag->locktag_type,
lockmode);
WaitOnLock(locallock, owner);
TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1,
locktag->locktag_field2,
locktag->locktag_field3,
locktag->locktag_field4,
locktag->locktag_type,
lockmode);
...
死锁
src/backend/storage/lmgr/deadlock.c
/*
* DeadLockCheck -- Checks for deadlocks for a given process
*
* This code looks for deadlocks involving the given process. If any
* are found, it tries to rearrange lock wait queues to resolve the
* deadlock. If resolution is impossible, return DS_HARD_DEADLOCK ---
* the caller is then expected to abort the given proc's transaction.
*
* Caller must already have locked all partitions of the lock tables.
*
* On failure, deadlock details are recorded in deadlockDetails[] for
* subsequent printing by DeadLockReport(). That activity is separate
* because (a) we don't want to do it while holding all those LWLocks,
* and (b) we are typically invoked inside a signal handler.
*/
DeadLockState
DeadLockCheck(PGPROC *proc)
{
...
/* Search for deadlocks and possible fixes */
if (DeadLockCheckRecurse(proc))
{
/*
* Call FindLockCycle one more time, to record the correct
* deadlockDetails[] for the basic state with no rearrangements.
*/
int nSoftEdges;
TRACE_POSTGRESQL_DEADLOCK_FOUND();
nWaitOrders = 0;
if (!FindLockCycle(proc, possibleConstraints, &nSoftEdges))
elog(FATAL, "deadlock seems to have disappeared");
return DS_HARD_DEADLOCK; /* cannot find a non-deadlocked state */
}
5. LWLockID 类型定义
src/include/storage/lwlock.h
/*
* We have a number of predefined LWLocks, plus a bunch of LWLocks that are
* dynamically assigned (e.g., for shared buffers). The LWLock structures
* live in shared memory (since they contain shared data) and are identified
* by values of this enumerated type. We abuse the notion of an enum somewhat
* by allowing values not listed in the enum declaration to be assigned.
* The extra value MaxDynamicLWLock is there to keep the compiler from
* deciding that the enum can be represented as char or short ...
*
* If you remove a lock, please replace it with a placeholder. This retains
* the lock numbering, which is helpful for DTrace and other external
* debugging scripts.
*/
typedef enum LWLockId
{
BufFreelistLock,
ShmemIndexLock,
OidGenLock,
XidGenLock,
ProcArrayLock,
SInvalReadLock,
SInvalWriteLock,
WALInsertLock,
WALWriteLock,
ControlFileLock,
CheckpointLock,
CLogControlLock,
SubtransControlLock,
MultiXactGenLock,
MultiXactOffsetControlLock,
MultiXactMemberControlLock,
RelCacheInitLock,
CheckpointerCommLock,
TwoPhaseStateLock,
TablespaceCreateLock,
BtreeVacuumLock,
AddinShmemInitLock,
AutovacuumLock,
AutovacuumScheduleLock,
SyncScanLock,
RelationMappingLock,
AsyncCtlLock,
AsyncQueueLock,
SerializableXactHashLock,
SerializableFinishedListLock,
SerializablePredicateLockListLock,
OldSerXidLock,
SyncRepLock,
/* Individual lock IDs end here */
FirstBufMappingLock,
FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,
FirstPredicateLockMgrLock = FirstLockMgrLock + NUM_LOCK_PARTITIONS,
/* must be last except for MaxDynamicLWLock: */
NumFixedLWLocks = FirstPredicateLockMgrLock + NUM_PREDICATELOCK_PARTITIONS,
MaxDynamicLWLock = 1000000000
} LWLockId;
6. LWLockMode 类型定义
src/include/storage/lwlock.h
typedef enum LWLockMode
{
LW_EXCLUSIVE,
LW_SHARED,
LW_WAIT_UNTIL_FREE /* A special mode used in PGPROC->lwlockMode,
* when waiting for lock to become free. Not
* to be used as LWLockAcquire argument */
} LWLockMode;
7. LOCKTAG以及LockTagType 类型定义
src/include/storage/lock.h
/*
* LOCKTAG is the key information needed to look up a LOCK item in the
* lock hashtable. A LOCKTAG value uniquely identifies a lockable object.
*
* The LockTagType enum defines the different kinds of objects we can lock.
* We can handle up to 256 different LockTagTypes.
*/
typedef enum LockTagType
{
LOCKTAG_RELATION, /* whole relation */
/* ID info for a relation is DB OID + REL OID; DB OID = 0 if shared */
LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */
/* same ID info as RELATION */
LOCKTAG_PAGE, /* one page of a relation */
/* ID info for a page is RELATION info + BlockNumber */
LOCKTAG_TUPLE, /* one physical tuple */
/* ID info for a tuple is PAGE info + OffsetNumber */
LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */
/* ID info for a transaction is its TransactionId */
LOCKTAG_VIRTUALTRANSACTION, /* virtual transaction (ditto) */
/* ID info for a virtual transaction is its VirtualTransactionId */
LOCKTAG_OBJECT, /* non-relation database object */
/* ID info for an object is DB OID + CLASS OID + OBJECT OID + SUBID */
/*
* Note: object ID has same representation as in pg_depend and
* pg_description, but notice that we are constraining SUBID to 16 bits.
* Also, we use DB OID = 0 for shared objects such as tablespaces.
*/
LOCKTAG_USERLOCK, /* reserved for old contrib/userlock code */
LOCKTAG_ADVISORY /* advisory user locks */
} LockTagType;
/*
* The LOCKTAG struct is defined with malice aforethought to fit into 16
* bytes with no padding. Note that this would need adjustment if we were
* to widen Oid, BlockNumber, or TransactionId to more than 32 bits.
*
* We include lockmethodid in the locktag so that a single hash table in
* shared memory can store locks of different lockmethods.
*/
typedef struct LOCKTAG
{
uint32 locktag_field1; /* a 32-bit ID field */
uint32 locktag_field2; /* a 32-bit ID field */
uint32 locktag_field3; /* a 32-bit ID field */
uint16 locktag_field4; /* a 16-bit ID field */
uint8 locktag_type; /* see enum LockTagType */
uint8 locktag_lockmethodid; /* lockmethod indicator */
} LOCKTAG;
8. LOCKMODE 类型定义以及值定义.
src/include/storage/lock.h
/*
* LOCKMODE is an integer (1..N) indicating a lock type. LOCKMASK is a bit
* mask indicating a set of held or requested lock types (the bit 1<<mode
* corresponds to a particular lock mode).
*/
typedef int LOCKMASK;
typedef int LOCKMODE;
/*
* These are the valid values of type LOCKMODE for all the standard lock
* methods (both DEFAULT and USER).
*/
/* NoLock is not a lock mode, but a flag value meaning "don't get a lock" */
#define NoLock 0
#define AccessShareLock 1 /* SELECT */
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL),ANALYZE, CREATE
* INDEX CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW
* SHARE */
#define ExclusiveLock 7 /* blocks ROW SHARE/SELECT...FOR
* UPDATE */
#define AccessExclusiveLock 8 /* ALTER TABLE, DROP TABLE, VACUUM
* FULL, and unqualified LOCK TABLE */
9. 重量锁请求的rf by函数 :
Referenced by ConditionalLockPage(), ConditionalLockRelation(), ConditionalLockRelationOid(), ConditionalLockTuple(), ConditionalXactLockTableWait(), LockDatabaseObject(), LockPage(), LockRelation(), LockRelationForExtension(), LockRelationIdForSession(), LockRelationOid(), LockSharedObject(), LockSharedObjectForSession(), LockTuple(), pg_advisory_lock_int4(), pg_advisory_lock_int8(), pg_advisory_lock_shared_int4(), pg_advisory_lock_shared_int8(), pg_advisory_xact_lock_int4(), pg_advisory_xact_lock_int8(), pg_advisory_xact_lock_shared_int4(), pg_advisory_xact_lock_shared_int8(), pg_try_advisory_lock_int4(), pg_try_advisory_lock_int8(), pg_try_advisory_lock_shared_int4(), pg_try_advisory_lock_shared_int8(), pg_try_advisory_xact_lock_int4(), pg_try_advisory_xact_lock_int8(), pg_try_advisory_xact_lock_shared_int4(), pg_try_advisory_xact_lock_shared_int8(), VirtualXactLock(), XactLockTableInsert(), and XactLockTableWait().
本例中锁类型为TUPLE时是LockTuple函数调用参数的一个锁. 所以通过宏SET_LOCKTAG_TUPLE可以解释这个探针中前5个变量的值的含义为(dbid, relid, blocknum, tupleoffset_inblock).
LockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode)
388 {
389 LOCKTAG tag;
390
391 SET_LOCKTAG_TUPLE(tag,
392 relation->rd_lockInfo.lockRelId.dbId,
393 relation->rd_lockInfo.lockRelId.relId,
394 ItemPointerGetBlockNumber(tid),
395 ItemPointerGetOffsetNumber(tid));
396
397 (void) LockAcquire(&tag, lockmode, false, false);
398 }
10. 重量锁TAG设置宏定义请参考如下头文件 :
src/include/storage/lock.h
/*
* These macros define how we map logical IDs of lockable objects into
* the physical fields of LOCKTAG. Use these to set up LOCKTAG values,
* rather than accessing the fields directly. Note multiple eval of target!
*/
#define SET_LOCKTAG_RELATION(locktag,dboid,reloid) \
((locktag).locktag_field1 = (dboid), \
(locktag).locktag_field2 = (reloid), \
(locktag).locktag_field3 = 0, \
(locktag).locktag_field4 = 0, \
(locktag).locktag_type = LOCKTAG_RELATION, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
#define SET_LOCKTAG_RELATION_EXTEND(locktag,dboid,reloid) \
((locktag).locktag_field1 = (dboid), \
(locktag).locktag_field2 = (reloid), \
(locktag).locktag_field3 = 0, \
(locktag).locktag_field4 = 0, \
(locktag).locktag_type = LOCKTAG_RELATION_EXTEND, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
#define SET_LOCKTAG_PAGE(locktag,dboid,reloid,blocknum) \
((locktag).locktag_field1 = (dboid), \
(locktag).locktag_field2 = (reloid), \
(locktag).locktag_field3 = (blocknum), \
(locktag).locktag_field4 = 0, \
(locktag).locktag_type = LOCKTAG_PAGE, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
#define SET_LOCKTAG_TUPLE(locktag,dboid,reloid,blocknum,offnum) \
((locktag).locktag_field1 = (dboid), \
(locktag).locktag_field2 = (reloid), \
(locktag).locktag_field3 = (blocknum), \
(locktag).locktag_field4 = (offnum), \
(locktag).locktag_type = LOCKTAG_TUPLE, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
#define SET_LOCKTAG_TRANSACTION(locktag,xid) \
((locktag).locktag_field1 = (xid), \
(locktag).locktag_field2 = 0, \
(locktag).locktag_field3 = 0, \
(locktag).locktag_field4 = 0, \
(locktag).locktag_type = LOCKTAG_TRANSACTION, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
#define SET_LOCKTAG_VIRTUALTRANSACTION(locktag,vxid) \
((locktag).locktag_field1 = (vxid).backendId, \
(locktag).locktag_field2 = (vxid).localTransactionId, \
(locktag).locktag_field3 = 0, \
(locktag).locktag_field4 = 0, \
(locktag).locktag_type = LOCKTAG_VIRTUALTRANSACTION, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
#define SET_LOCKTAG_OBJECT(locktag,dboid,classoid,objoid,objsubid) \
((locktag).locktag_field1 = (dboid), \
(locktag).locktag_field2 = (classoid), \
(locktag).locktag_field3 = (objoid), \
(locktag).locktag_field4 = (objsubid), \
(locktag).locktag_type = LOCKTAG_OBJECT, \
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
#define SET_LOCKTAG_ADVISORY(locktag,id1,id2,id3,id4) \
((locktag).locktag_field1 = (id1), \
(locktag).locktag_field2 = (id2), \
(locktag).locktag_field3 = (id3), \
(locktag).locktag_field4 = (id4), \
(locktag).locktag_type = LOCKTAG_ADVISORY, \
(locktag).locktag_lockmethodid = USER_LOCKMETHOD)