Systemtap EXP: PostgreSQL IN-BUILD mark Class 1 - transaction

6 minute read

背景

本文开始将以PostgreSQL内建的探针为例, 探讨一下围绕这些探针可以实现哪些想要的功能.  
使用systemtap时, 需要把postgresql mark name中的-换成两个下划线.  
第一个分类是事务相关探针 :   
Name Parameters Description
transaction-start (LocalTransactionId) Probe that fires at the start of a new transaction. arg0 is the transaction ID.
transaction-commit (LocalTransactionId) Probe that fires when a transaction completes successfully. arg0 is the transaction ID.
transaction-abort (LocalTransactionId) Probe that fires when a transaction completes unsuccessfully. arg0 is the transaction ID.
事务开始, 事务提交, 事务回滚3个mark.  
探针信息 :   
src/backend/utils/probes.h  
/* TRACE_POSTGRESQL_TRANSACTION_START ( unsigned int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_TRANSACTION_START_ENABLED() __builtin_expect (transaction__start_semaphore, 0)  
#define postgresql_transaction__start_semaphore transaction__start_semaphore  
#else  
#define TRACE_POSTGRESQL_TRANSACTION_START_ENABLED() __builtin_expect (postgresql_transaction__start_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_transaction__start_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_TRANSACTION_START(arg1) \  
DTRACE_PROBE1(postgresql,transaction__start,arg1)  
  
/* TRACE_POSTGRESQL_TRANSACTION_COMMIT ( unsigned int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_TRANSACTION_COMMIT_ENABLED() __builtin_expect (transaction__commit_semaphore, 0)  
#define postgresql_transaction__commit_semaphore transaction__commit_semaphore  
#else  
#define TRACE_POSTGRESQL_TRANSACTION_COMMIT_ENABLED() __builtin_expect (postgresql_transaction__commit_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_transaction__commit_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_TRANSACTION_COMMIT(arg1) \  
DTRACE_PROBE1(postgresql,transaction__commit,arg1)  
  
/* TRACE_POSTGRESQL_TRANSACTION_ABORT ( unsigned int) */  
#if defined STAP_SDT_V1  
#define TRACE_POSTGRESQL_TRANSACTION_ABORT_ENABLED() __builtin_expect (transaction__abort_semaphore, 0)  
#define postgresql_transaction__abort_semaphore transaction__abort_semaphore  
#else  
#define TRACE_POSTGRESQL_TRANSACTION_ABORT_ENABLED() __builtin_expect (postgresql_transaction__abort_semaphore, 0)  
#endif  
__extension__ extern unsigned short postgresql_transaction__abort_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));  
#define TRACE_POSTGRESQL_TRANSACTION_ABORT(arg1) \  
DTRACE_PROBE1(postgresql,transaction__abort,arg1)  
探针在pg源码中的信息 :   
src/backend/access/transam/xact.c  
/*  
 *      StartTransaction  
 */  
static void  
StartTransaction(void)  
{  
...  
        VirtualTransactionId vxid;  
...  
        TRACE_POSTGRESQL_TRANSACTION_START(vxid.localTransactionId);  
...  
/*  
 *      CommitTransaction  
 *  
 * NB: if you change this routine, better look at PrepareTransaction too!  
 */  
static void  
CommitTransaction(void)  
{  
...  
        TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid);  
...   
/*  
 *      AbortTransaction  
 */  
static void  
AbortTransaction(void)  
...  
        TRACE_POSTGRESQL_TRANSACTION_ABORT(MyProc->lxid);  
这三个探针对应的参数为本地事务号, 和全局事务号无关.  
本地事务类型定义 :   
src/include/storage/lock.h  
/*  
 * Top-level transactions are identified by VirtualTransactionIDs comprising  
 * the BackendId of the backend running the xact, plus a locally-assigned  
 * LocalTransactionId.  These are guaranteed unique over the short term,  
 * but will be reused after a database restart; hence they should never  
 * be stored on disk.  
 *  
 * Note that struct VirtualTransactionId can not be assumed to be atomically  
 * assignable as a whole.  However, type LocalTransactionId is assumed to  
 * be atomically assignable, and the backend ID doesn't change often enough  
 * to be a problem, so we can fetch or assign the two fields separately.  
 * We deliberately refrain from using the struct within PGPROC, to prevent  
 * coding errors from trying to use struct assignment with it; instead use  
 * GET_VXID_FROM_PGPROC().  
 */  
typedef struct  
{  
        BackendId       backendId;              /* determined at backend startup */  
        LocalTransactionId localTransactionId;          /* backend-local transaction  
                                                                                                 * id */  
} VirtualTransactionId;  
使用事务相关的探针, 一般可以用于统计以下信息 :   
1. 每秒新建, 提交, 回滚的事务数  
2. 记录每个事务的时长  
3. 结合内核探针, 可以记录每个事务的io开销, 网络开销, 单步指令等.  
例子 :   
1. 每秒新建, 提交, 回滚的事务数  
digoal=#   
CREATE OR REPLACE FUNCTION public.f_test(i_id integer)  
 RETURNS void  
 LANGUAGE plpgsql  
 STRICT  
AS $function$  
declare  
begin  
  update test set info=md5(random()::text), crt_time=clock_timestamp() where id=i_id;  
  if not found then  
    insert into test(id,info,crt_time) values(i_id,md5(random()::text),clock_timestamp());  
  end if;  
  return;  
  exception when others then  
    return;  
end;  
$function$;  
digoal=# create table test(id int primary key, info text, crt_time timestamp);  
CREATE TABLE  
  
pg93@db-172-16-3-150-> cat test.sql  
\setrandom id 1 5000000  
select f_test(:id);  
  
[root@db-172-16-3-150 ~]# stap -e '  
global var1  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__start") {  
  var1["START"]++  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__commit") {  
  var1["COMMIT"]++  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__abort") {  
  var1["ABORT"]++  
}   
probe timer.s(1) {  
  printf("START/s:%d, COMMIT/s:%d, ABORT/s:%d\n", var1["START"], var1["COMMIT"], var1["ABORT"])  
  var1["START"]=0  
  var1["COMMIT"]=0  
  var1["ABORT"]=0  
}'  
  
pg93@db-172-16-3-150-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 1 -T 10  
transaction type: Custom query  
scaling factor: 1  
query mode: prepared  
number of clients: 8  
number of threads: 1  
duration: 10 s  
number of transactions actually processed: 175602  
tps = 17559.485329 (including connections establishing)  
tps = 17601.731285 (excluding connections establishing)  
statement latencies in milliseconds:  
        0.001609        \setrandom id 1 5000000  
        0.451537        select f_test(:id);  
  
stap输出 :   
START/s:0, COMMIT/s:0, ABORT/s:0  
START/s:0, COMMIT/s:0, ABORT/s:0  
START/s:7484, COMMIT/s:7483, ABORT/s:0  
START/s:18035, COMMIT/s:18032, ABORT/s:0  
START/s:17345, COMMIT/s:17346, ABORT/s:0  
START/s:17151, COMMIT/s:17150, ABORT/s:0  
START/s:17517, COMMIT/s:17520, ABORT/s:0  
START/s:18048, COMMIT/s:18046, ABORT/s:0  
START/s:17597, COMMIT/s:17600, ABORT/s:0  
START/s:17648, COMMIT/s:17645, ABORT/s:0  
START/s:17728, COMMIT/s:17724, ABORT/s:0  
START/s:17346, COMMIT/s:17348, ABORT/s:0  
START/s:9720, COMMIT/s:9725, ABORT/s:0  
START/s:0, COMMIT/s:0, ABORT/s:0  
START/s:0, COMMIT/s:0, ABORT/s:0  
  
2. 记录每个事务的时长, 输出柱状图  
首先要确保三个探针的本地事务号一致 :   
[root@db-172-16-3-150 ~]# stap -e '  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__start") {  
  println(pn(),$arg1)  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__commit") {  
  println(pn(),$arg1)  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__abort") {  
  println(pn(),$arg1)  
}'  
  
digoal=# select 1;  
 ?column?   
----------  
        1  
(1 row)  
  
digoal=# begin;  
BEGIN  
digoal=# abort;  
ROLLBACK  
  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__start")55238  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__commit")55238  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__start")55239  
process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__abort")55239  
stap脚本如下 :   
stap -e '  
global var1%[819200], var2, var3  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__start") {  
  var1[pid(),$arg1] = gettimeofday_ms()  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__commit") {  
  if (var1[pid(),$arg1] != 0)  
    var2 <<< (gettimeofday_ms()-var1[pid(),$arg1])  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__abort") {  
  if (var1[pid(),$arg1] != 0)  
    var3 <<< (gettimeofday_ms()-var1[pid(),$arg1])  
}   
probe timer.s($1) {  
  if (@count(var2) != 0) {  
    printf("COMMIT/s:%d\n", @count(var2) / $1)  
    println(@hist_log(var2))   
    delete var2  
  }  
  if (@count(var3) != 0) {  
    printf("ABORT/s:%d\n", @count(var3) / $1)  
    println(@hist_log(var3))   
    delete var3  
  }  
}' 3  
  
pg93@db-172-16-3-150-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 1 -T 30  
transaction type: Custom query  
scaling factor: 1  
query mode: prepared  
number of clients: 8  
number of threads: 1  
duration: 30 s  
number of transactions actually processed: 737912  
tps = 24596.734611 (including connections establishing)  
tps = 24615.235172 (excluding connections establishing)  
statement latencies in milliseconds:  
        0.002188        \setrandom id 1 5000000  
        0.321188        select f_test(:id);  
  
输出  
COMMIT/s:29660  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  72349  
    1 |@@@@@@@@@@@                                        16600  
    2 |                                                      32  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:57109  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  138167  
    1 |@@@@@@@@@@@                                         33128  
    2 |                                                       29  
    4 |                                                        2  
    8 |                                                        2  
   16 |                                                        0  
   32 |                                                        0  
  
COMMIT/s:24830  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  58175  
    1 |@@@@@@@@@@@@@@                                     16310  
    2 |                                                       7  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:22335  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  51034  
    1 |@@@@@@@@@@@@@@@                                    15960  
    2 |                                                       9  
    4 |                                                       0  
    8 |                                                       2  
   16 |                                                       0  
   32 |                                                       0  
  
COMMIT/s:20267  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  45321  
    1 |@@@@@@@@@@@@@@@@@                                  15468  
    2 |                                                      13  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:18493  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  40386  
    1 |@@@@@@@@@@@@@@@@@@                                 15087  
    2 |                                                       5  
    4 |                                                       1  
    8 |                                                       0  
   16 |                                                       0  
[其他]  
1. 第二个例子可能存在偏差, 与stap全局变量锁有关, 不要除以定量时间即可.  
修改如下 :   
stap -e '  
global var1%[819200], var2, var3, var4  
probe begin {  
  var4=gettimeofday_ms()  
}  
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__start") {  
  var1[pid(),$arg1] = gettimeofday_ms()  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__commit") {  
  if (var1[pid(),$arg1] != 0)  
    var2 <<< (gettimeofday_ms()-var1[pid(),$arg1])  
}   
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("transaction__abort") {  
  if (var1[pid(),$arg1] != 0)  
    var3 <<< (gettimeofday_ms()-var1[pid(),$arg1])  
}   
probe timer.s($1) {  
  now=gettimeofday_ms()  
  if (@count(var2) != 0) {  
    printf("COMMIT/s:%d\n", (1000*@count(var2)) / (now-var4))  
    println(@hist_log(var2))   
    delete var2  
  }  
  if (@count(var3) != 0) {  
    printf("ABORT/s:%d\n", (1000*@count(var3)) / (now-var4))  
    println(@hist_log(var3))   
    delete var3  
  }  
  var4=now  
}' 3  
  
输出精准 :   
COMMIT/s:16364  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  39936  
    1 |@@@@@@@@@@@                                         9137  
    2 |                                                      20  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:16427  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  39927  
    1 |@@@@@@@@@@@                                         9346  
    2 |                                                      10  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:16436  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  40073  
    1 |@@@@@@@@@@@                                         9223  
    2 |                                                      13  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:16287  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  39649  
    1 |@@@@@@@@@@@                                         9195  
    2 |                                                      17  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:16233  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 39250  
    1 |@@@@@@@@@@@@                                        9432  
    2 |                                                      17  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:15605  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  37605  
    1 |@@@@@@@@@@@@                                        9191  
    2 |                                                      21  
    4 |                                                       0  
    8 |                                                       0  
  
COMMIT/s:15581  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  37382  
    1 |@@@@@@@@@@@@                                        9350  
    2 |                                                      11  
    4 |                                                       1  
    8 |                                                       0  
   16 |                                                       0  
  
COMMIT/s:15422  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  36838  
    1 |@@@@@@@@@@@@                                        9412  
    2 |                                                      16  
    4 |                                                       1  
    8 |                                                       0  
   16 |                                                       0  
  
COMMIT/s:15521  
value |-------------------------------------------------- count  
    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  37117  
    1 |@@@@@@@@@@@@                                        9437  
    2 |                                                      11  
    4 |                                                       0  
    8 |                                                       0  
  
结合内核探针的例子可参考 :   

Systemtap EXP: Trace PostgreSQL iostat per SQL statement

http://blog.163.com/digoal@126/blog/static/16387704020139152191581/

Systemtap EXP: trace PostgreSQL netflow per session or per sql

http://blog.163.com/digoal@126/blog/static/16387704020139153195701/

Systemtap EXP: trace PostgreSQL instruction or block of instructions per sql or per session

http://blog.163.com/digoal@126/blog/static/16387704020139153455311/

参考

1. http://www.postgresql.org/docs/9.3/static/dynamic-trace.html

2. http://doxygen.postgresql.org/

3. src/backend/access/transam/xact.c

4. src/backend/utils/probes.h

5. src/include/storage/lock.h

6. https://sourceware.org/systemtap/tapsets/

Flag Counter

digoal’s 大量PostgreSQL文章入口