use systemtap statistics vs pgbench progress output
背景
PostgreSQL 9.4的pgbench添加了progress输出. 有兴趣朋友可以参考如下 :
http://blog.163.com/digoal@126/blog/static/1638770402013772212424/
progress输出时输出的是整个pgbench的统计信息, 如果要输出每个连接的tps, 可以通过systemtap来实现.
首先来看一个pgbench输出的例子 :
测试表 :
digoal=# \d test
Table "public.test"
Column | Type | Modifiers
----------+-----------------------------+-----------
id | integer | not null
info | text |
crt_time | timestamp without time zone |
Indexes:
"test_pkey" PRIMARY KEY, btree (id)
测试函数 :
CREATE OR REPLACE FUNCTION public.f_id_add(i_id integer)
RETURNS void
LANGUAGE plpgsql
STRICT
AS $function$
declare
begin
update test set crt_time=clock_timestamp() where id=i_id;
if not found then
insert into test values (i_id,md5(random()::text),clock_timestamp());
return;
else
return;
end if;
return;
exception when others then
return;
end;
$function$;
pgbench测试脚本 :
pg94@db-172-16-3-39-> cat test.sql
\setrandom id 1 5000000
select * from f_id_add(:id);
测试, 使用8个连接, 测试10秒, 每秒输出一次统计信息, 包含tps, 事务时长.
pg94@db-172-16-3-39-> pgbench -M prepared -n -r -f ./test.sql -P 1 -c 8 -j 4 -T 10
progress: 1.0 s, 30233.7 tps, 0.265 ms lat
progress: 2.0 s, 34207.7 tps, 0.234 ms lat
progress: 3.0 s, 35951.8 tps, 0.223 ms lat
progress: 4.0 s, 37339.0 tps, 0.214 ms lat
progress: 5.0 s, 38490.7 tps, 0.208 ms lat
progress: 6.0 s, 39307.9 tps, 0.204 ms lat
progress: 7.0 s, 39779.5 tps, 0.201 ms lat
progress: 8.0 s, 40124.5 tps, 0.199 ms lat
progress: 9.0 s, 40458.7 tps, 0.198 ms lat
progress: 10.0 s, 40294.4 tps, 0.199 ms lat
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
duration: 10 s
number of transactions actually processed: 376198
tps = 37618.306553 (including connections establishing)
tps = 37651.229364 (excluding connections establishing)
statement latencies in milliseconds:
0.002222 \setrandom id 1 5000000
0.208181 select * from f_id_add(:id);
接下来使用systemtap来输出每个连接的统计信息, 因为systemtap不支持浮点输出, 所以使用us作为时间单位 :
[root@db-172-16-3-39 ~]# cat test.stp
global v_starttime, v_time1, v_time2
probe begin {
v_starttime = gettimeofday_us()
}
probe process("/home/pg94/pgsql9.4devel/bin/postgres").mark("transaction__start") {
us = gettimeofday_us()
if (! v_time1[tid()])
v_time1[tid()] = us
v_time2[execname(),tid()] <<< (us-v_time1[tid()])
v_time1[tid()] = us
}
probe timer.s(5) {
us = gettimeofday_us()
foreach ([e,i+] in v_time2) {
printf("process:%s, tid:%d, sec:%d, tps:%d, lat:%d\n", e, i, (us-v_starttime)/1000000, @count(v_time2[e,i])/5, @avg(v_time2[e,i]))
}
delete v_time1
delete v_time2
}
使用pgbench执行压力测试, 将输出时间间隔改为5秒, 输出如下.
pg94@db-172-16-3-39-> pgbench -M prepared -n -r -f ./test.sql -P 5 -c 8 -j 4 -T 60
progress: 5.0 s, 33871.7 tps, 0.236 ms lat
progress: 10.0 s, 38076.4 tps, 0.210 ms lat
progress: 15.0 s, 38758.3 tps, 0.206 ms lat
progress: 20.0 s, 39018.6 tps, 0.205 ms lat
progress: 25.0 s, 38915.2 tps, 0.206 ms lat
progress: 30.0 s, 32252.4 tps, 0.248 ms lat
progress: 35.0 s, 32478.8 tps, 0.246 ms lat
progress: 40.0 s, 37543.3 tps, 0.213 ms lat
progress: 45.0 s, 40086.2 tps, 0.200 ms lat
progress: 50.0 s, 37702.3 tps, 0.212 ms lat
progress: 55.0 s, 37713.4 tps, 0.212 ms lat
progress: 60.0 s, 37763.7 tps, 0.212 ms lat
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 8
number of threads: 4
duration: 60 s
number of transactions actually processed: 2220929
tps = 37015.020029 (including connections establishing)
tps = 37019.755876 (excluding connections establishing)
statement latencies in milliseconds:
0.002056 \setrandom id 1 5000000
0.212123 select * from f_id_add(:id);
systemtap输出, 去除开始5秒和末尾5秒, 因为stap启动时间和pgbench启动时间有一定差异 :
可以看到每个连接处理的tps, 以及事务时长. 注意事务时长pgbench时间单位为ms, systemtap这里用了us.
每个连接的tps相加和pgbench输出的tps基本一致.
[root@db-172-16-3-39 ~]# stap test.stp
process:postgres, tid:26808, sec:5, tps:0, lat:0
process:postgres, tid:26812, sec:5, tps:3321, lat:220
process:postgres, tid:26813, sec:5, tps:3292, lat:222
process:postgres, tid:26814, sec:5, tps:3200, lat:228
process:postgres, tid:26815, sec:5, tps:2778, lat:263
process:postgres, tid:26816, sec:5, tps:2922, lat:249
process:postgres, tid:26817, sec:5, tps:3097, lat:235
process:postgres, tid:26818, sec:5, tps:2694, lat:270
process:postgres, tid:26819, sec:5, tps:2743, lat:266
process:postgres, tid:26812, sec:10, tps:5114, lat:195
process:postgres, tid:26813, sec:10, tps:4923, lat:203
process:postgres, tid:26814, sec:10, tps:4534, lat:220
process:postgres, tid:26815, sec:10, tps:4608, lat:216
process:postgres, tid:26816, sec:10, tps:4866, lat:205
process:postgres, tid:26817, sec:10, tps:4656, lat:214
process:postgres, tid:26818, sec:10, tps:4272, lat:234
process:postgres, tid:26819, sec:10, tps:4505, lat:221
process:postgres, tid:26812, sec:15, tps:5383, lat:185
process:postgres, tid:26813, sec:15, tps:4985, lat:200
process:postgres, tid:26814, sec:15, tps:4710, lat:212
process:postgres, tid:26815, sec:15, tps:4695, lat:212
process:postgres, tid:26816, sec:15, tps:4947, lat:202
process:postgres, tid:26817, sec:15, tps:4798, lat:208
process:postgres, tid:26818, sec:15, tps:4786, lat:208
process:postgres, tid:26819, sec:15, tps:4391, lat:227
process:postgres, tid:26812, sec:20, tps:5288, lat:189
process:postgres, tid:26813, sec:20, tps:5046, lat:198
process:postgres, tid:26814, sec:20, tps:4942, lat:202
process:postgres, tid:26815, sec:20, tps:4735, lat:211
process:postgres, tid:26816, sec:20, tps:4554, lat:219
process:postgres, tid:26817, sec:20, tps:4688, lat:213
process:postgres, tid:26818, sec:20, tps:4626, lat:216
process:postgres, tid:26819, sec:20, tps:5141, lat:194
process:postgres, tid:26812, sec:25, tps:5308, lat:188
process:postgres, tid:26813, sec:25, tps:4841, lat:206
process:postgres, tid:26814, sec:25, tps:4868, lat:205
process:postgres, tid:26815, sec:25, tps:5103, lat:195
process:postgres, tid:26816, sec:25, tps:4672, lat:213
process:postgres, tid:26817, sec:25, tps:4861, lat:205
process:postgres, tid:26818, sec:25, tps:4692, lat:213
process:postgres, tid:26819, sec:25, tps:4730, lat:211
process:postgres, tid:26812, sec:30, tps:4617, lat:216
process:postgres, tid:26813, sec:30, tps:4237, lat:235
process:postgres, tid:26814, sec:30, tps:4200, lat:238
process:postgres, tid:26815, sec:30, tps:3744, lat:267
process:postgres, tid:26816, sec:30, tps:3949, lat:253
process:postgres, tid:26817, sec:30, tps:4318, lat:231
process:postgres, tid:26818, sec:30, tps:4048, lat:246
process:postgres, tid:26819, sec:30, tps:4105, lat:243
process:postgres, tid:26812, sec:35, tps:4226, lat:236
process:postgres, tid:26813, sec:35, tps:3916, lat:255
process:postgres, tid:26814, sec:35, tps:3907, lat:255
process:postgres, tid:26815, sec:35, tps:3703, lat:269
process:postgres, tid:26816, sec:35, tps:3750, lat:266
process:postgres, tid:26817, sec:35, tps:4178, lat:239
process:postgres, tid:26818, sec:35, tps:3868, lat:258
process:postgres, tid:26819, sec:35, tps:4185, lat:238
process:postgres, tid:26812, sec:40, tps:4683, lat:213
process:postgres, tid:26813, sec:40, tps:4759, lat:210
process:postgres, tid:26814, sec:40, tps:4756, lat:210
process:postgres, tid:26815, sec:40, tps:4661, lat:214
process:postgres, tid:26816, sec:40, tps:4593, lat:217
process:postgres, tid:26817, sec:40, tps:4719, lat:211
process:postgres, tid:26818, sec:40, tps:4602, lat:217
process:postgres, tid:26819, sec:40, tps:4728, lat:211
process:postgres, tid:7324, sec:45, tps:0, lat:0
process:postgres, tid:26812, sec:45, tps:4948, lat:202
process:postgres, tid:26813, sec:45, tps:5003, lat:199
process:postgres, tid:26814, sec:45, tps:5064, lat:197
process:postgres, tid:26815, sec:45, tps:5113, lat:195
process:postgres, tid:26816, sec:45, tps:5036, lat:198
process:postgres, tid:26817, sec:45, tps:4978, lat:200
process:postgres, tid:26818, sec:45, tps:4910, lat:203
process:postgres, tid:26819, sec:45, tps:5043, lat:198
process:postgres, tid:26834, sec:45, tps:0, lat:10867
process:postgres, tid:26812, sec:50, tps:4712, lat:212
process:postgres, tid:26813, sec:50, tps:4802, lat:208
process:postgres, tid:26814, sec:50, tps:4761, lat:209
process:postgres, tid:26815, sec:50, tps:4668, lat:214
process:postgres, tid:26816, sec:50, tps:4768, lat:209
process:postgres, tid:26817, sec:50, tps:4696, lat:212
process:postgres, tid:26818, sec:50, tps:4739, lat:210
process:postgres, tid:26819, sec:50, tps:4646, lat:215
process:postgres, tid:26834, sec:50, tps:0, lat:0
process:postgres, tid:26812, sec:55, tps:4630, lat:215
process:postgres, tid:26813, sec:55, tps:4753, lat:210
process:postgres, tid:26814, sec:55, tps:4908, lat:203
process:postgres, tid:26815, sec:55, tps:4612, lat:216
process:postgres, tid:26816, sec:55, tps:4585, lat:218
process:postgres, tid:26817, sec:55, tps:4610, lat:216
process:postgres, tid:26818, sec:55, tps:4866, lat:205
process:postgres, tid:26819, sec:55, tps:4641, lat:215
process:postgres, tid:26812, sec:60, tps:4647, lat:215
process:postgres, tid:26813, sec:60, tps:4786, lat:208
process:postgres, tid:26814, sec:60, tps:4756, lat:210
process:postgres, tid:26815, sec:60, tps:4755, lat:210
process:postgres, tid:26816, sec:60, tps:4704, lat:212
process:postgres, tid:26817, sec:60, tps:4786, lat:208
process:postgres, tid:26818, sec:60, tps:4622, lat:216
process:postgres, tid:26819, sec:60, tps:4657, lat:214
process:postgres, tid:26812, sec:65, tps:1309, lat:204
process:postgres, tid:26813, sec:65, tps:1220, lat:219
process:postgres, tid:26814, sec:65, tps:1280, lat:209
process:postgres, tid:26815, sec:65, tps:1253, lat:214
process:postgres, tid:26816, sec:65, tps:1299, lat:206
process:postgres, tid:26817, sec:65, tps:1271, lat:211
process:postgres, tid:26818, sec:65, tps:1283, lat:209
process:postgres, tid:26819, sec:65, tps:1256, lat:213
另外stap跟踪的是postgres进程, 所以checkpoint, vacuum等进程都会被跟踪到, 例如本例输出的
process:postgres, tid:26834, sec:50, tps:0, lat:0
这个对应的就是auto vacuum 进程 :
2013-09-03 15:58:32.677 CST,,,26834,,522596a4.68d2,1,,2013-09-03 15:58:28 CST,10/3089,141847803,LOG,00000,"automatic analyze of table ""digoal.public.test"" system usage: CPU 0.13s/0.40u sec elapsed 3.97 sec",,,,,,,,"do_analyze_rel, analyze.c:655",""
systemtap还可以做柱状图统计, 例如统计输出pgbench提交的事务耗时的分布图 :
输出时间范围0-1000us, 间隔50us的耗时柱状图.
[root@db-172-16-3-39 ~]# cat test.stp
global v_starttime, v_time1, v_time2
probe begin {
v_starttime = gettimeofday_us()
}
probe process("/home/pg94/pgsql9.4devel/bin/postgres").mark("transaction__start") {
us = gettimeofday_us()
if (! v_time1[tid()])
v_time1[tid()] = us
v_time2[execname(),tid()] <<< (us-v_time1[tid()])
v_time1[tid()] = us
}
probe timer.s($1) {
us = gettimeofday_us()
foreach ([e,i+] in v_time2) {
printf("process:%s, tid:%d, sec:%d, tps:%d, lat:%d\n", e, i, (us-v_starttime)/1000000, @count(v_time2[e,i])/$1, @avg(v_time2[e,i]))
print(@hist_linear(v_time2[e,i], 0, 1000, 50))
}
delete v_time1
delete v_time2
exit()
}
开启pgbench, 进行压力测试.
pg94@db-172-16-3-39-> pgbench -M prepared -n -r -f ./test.sql -c 8 -j 4 -T 1000
然后开启systemtap.
例如我要统计5秒的柱状图.
[root@db-172-16-3-39 ~]# stap test.stp 5
process:postgres, tid:27912, sec:5, tps:4798, lat:208
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@@@ 2384
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13313
200 |@@@@@@@@@@@@@@@@@@@@@ 5746
250 |@@@@@ 1342
300 |@ 522
350 | 179
400 | 82
450 | 50
500 | 33
550 | 36
600 | 17
650 | 16
700 | 22
750 | 24
800 | 27
850 | 21
900 | 18
950 | 17
1000 | 13
>1000 | 128
process:postgres, tid:27913, sec:5, tps:5473, lat:182
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@@@@@@@@@@@@@@ 6281
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16498
200 |@@@@@@@@@@ 3387
250 |@ 552
300 | 302
350 | 69
400 | 41
450 | 25
500 | 16
550 | 16
600 | 25
650 | 11
700 | 17
750 | 12
800 | 12
850 | 10
900 | 14
950 | 13
1000 | 14
>1000 | 53
process:postgres, tid:27914, sec:5, tps:4626, lat:216
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@ 1594
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12477
200 |@@@@@@@@@@@@@@@@@@@@@@@@ 6247
250 |@@@@@@ 1683
300 |@@ 542
350 | 164
400 | 73
450 | 32
500 | 19
550 | 15
600 | 11
650 | 16
700 | 16
750 | 10
800 | 11
850 | 9
900 | 16
950 | 17
1000 | 13
>1000 | 168
process:postgres, tid:27915, sec:5, tps:5442, lat:183
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@@@@@@@@ 4877
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 17806
200 |@@@@@@@@ 3072
250 |@@ 759
300 | 340
350 | 90
400 | 40
450 | 20
500 | 14
550 | 15
600 | 24
650 | 14
700 | 15
750 | 12
800 | 17
850 | 14
900 | 17
950 | 11
1000 | 9
>1000 | 45
process:postgres, tid:27916, sec:5, tps:5435, lat:183
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@@@@@@@@@@@@ 5777
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16422
200 |@@@@@@@@@@ 3578
250 |@@ 687
300 |@ 348
350 | 103
400 | 50
450 | 20
500 | 15
550 | 12
600 | 16
650 | 19
700 | 10
750 | 11
800 | 11
850 | 12
900 | 13
950 | 11
1000 | 9
>1000 | 51
process:postgres, tid:27917, sec:5, tps:4092, lat:244
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@@ 1193
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6921
200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7957
250 |@@@@@@@@@@@@@@@@@@@ 3088
300 |@@@ 493
350 | 123
400 | 57
450 | 33
500 | 29
550 | 28
600 | 15
650 | 26
700 | 30
750 | 16
800 | 40
850 | 38
900 | 13
950 | 20
1000 | 26
>1000 |@ 313
process:postgres, tid:27918, sec:5, tps:4475, lat:223
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@@@@@@@@ 2712
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 9720
200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7403
250 |@@@@@@@ 1434
300 |@@ 391
350 | 106
400 | 64
450 | 30
500 | 23
550 | 23
600 | 20
650 | 46
700 | 27
750 | 18
800 | 23
850 | 27
900 | 17
950 | 21
1000 | 20
>1000 |@ 250
process:postgres, tid:27919, sec:5, tps:5470, lat:182
value |-------------------------------------------------- count
0 | 1
50 | 0
100 |@@@@@@@@@@@@@ 4936
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 18122
200 |@@@@@@@@ 2939
250 |@ 682
300 | 355
350 | 91
400 | 34
450 | 19
500 | 11
550 | 17
600 | 18
650 | 14
700 | 15
750 | 7
800 | 11
850 | 6
900 | 12
950 | 8
1000 | 8
>1000 | 48
process:postgres, tid:27977, sec:5, tps:0, lat:0
value |-------------------------------------------------- count
0 |@ 1
50 | 0
100 | 0
查看总的柱状图, 不分连接.
[root@db-172-16-3-39 ~]# cat test.stp
global v_starttime, v_time1, v_time2
probe begin {
v_starttime = gettimeofday_us()
}
probe process("/home/pg94/pgsql9.4devel/bin/postgres").mark("transaction__start") {
us = gettimeofday_us()
if (! v_time1[tid()])
v_time1[tid()] = us
v_time2 <<< (us-v_time1[tid()])
v_time1[tid()] = us
}
probe timer.s($1) {
us = gettimeofday_us()
printf("sec:%d, tps:%d, lat:%d\n", (us-v_starttime)/1000000, @count(v_time2)/$1, @avg(v_time2))
print(@hist_linear(v_time2, 0, 1000, 50))
delete v_time1
delete v_time2
exit()
}
输出
[root@db-172-16-3-39 ~]# stap test.stp 5
sec:5, tps:37929, lat:210
value |-------------------------------------------------- count
0 | 8
50 | 0
100 | 2172
150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 112183
200 |@@@@@@@@@@@@@@@@@@@@@@ 49877
250 |@@@@ 9449
300 |@@@@ 10530
350 |@ 3331
400 | 584
450 | 570
500 | 264
550 | 98
600 | 89
650 | 111
700 | 50
750 | 44
800 | 43
850 | 30
900 | 29
950 | 46
1000 | 27
>1000 | 113
从图中可以看出耗时基本上集中在150-250之间.
参考
1. http://blog.163.com/digoal@126/blog/static/163877040201062810748700/
2. http://blog.163.com/digoal@126/blog/static/16387704020137140265557/
3. http://blog.163.com/digoal@126/blog/static/16387704020138392759478/
4. http://blog.163.com/digoal@126/blog/static/16387704020138310438924/
5. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/SystemTap_Tapset_Reference/index.html
6. http://blog.163.com/digoal@126/blog/static/1638770402013772212424/