USE blockdev –setra 0 and systemtap test real BLOCKDEV iops
背景
我在上一篇文章中讨论了关于调整CPU亲和来降低systemtap带来的性能影响, 本文是一个扩展, 还是用到那个例子, 但是我增加了IO响应时间的柱状图输出, 从柱状图分析IO的一些”奇妙”的东西.
首先启动数据库, 亲和设置为1
pg93@db-172-16-3-150-> taskset -c 1 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1
然后清除OS CACHE
[root@db-172-16-3-150 ~]# sync; echo 3 > /proc/sys/vm/drop_caches
然后启动psql
pg93@db-172-16-3-150-> psql
psql (9.3.1)
Type "help" for help.
digoal=# select pg_backend_pid();
pg_backend_pid
----------------
5167
(1 row)
然后启动stap
[root@db-172-16-3-150 ~]# taskset -c 7 stap -e '
global a
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") {
delete a
println("query__start ", user_string($arg1), "pid:", pid())
}
probe vfs.read.return {
t = gettimeofday_ns() - @entry(gettimeofday_ns())
# if (execname() == "postgres" && devname != "N/A")
a[pid()] <<< t
}
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") {
if (@count(a[pid()]))
printdln("**", pid(), @count(a[pid()]), @avg(a[pid()]))
println("query__done ", user_string($arg1), "pid:", pid())
if (@count(a[pid()])) {
println(@hist_log(a[pid()]))
#println(@hist_linear(a[pid()],1024,4096,100))
}
delete a
}' -x 5167
然后执行一条查询
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
----
Seq Scan on postgres.tbl_cost_align (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.683..3252.479 rows=10100000 loops
=1)
Output: id, info, crt_time
Buffers: shared read=94393
Total runtime: 4325.405 ms
(4 rows)
stap的输出如下
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:5167
5167**94418**14464
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:5167
value |-------------------------------------------------- count
1024 | 0
2048 | 0
4096 | 109
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 86181
16384 |@ 2045
32768 | 90
65536 |@@@ 5903
131072 | 72
262144 | 7
524288 | 9
1048576 | 1
2097152 | 1
4194304 | 0
8388608 | 0
平均的IO响应时间为14464纳秒.
但是从IO响应时间分布图, 我们看到了比较奇妙的现象.
响应时间在8192纳秒以内的操作数有86181次, 占了所有IO的绝大部分.
而另一较大部分32768 到65536纳秒之间的操作数是5903个.
这是为什么呢? 原因是块设备的预读.
例如本例的tbl_cost_align表对应的磁盘为/dev/sdb
digoal=# \db
List of tablespaces
Name | Owner | Location
------------+----------+-----------------------
pg_default | postgres |
pg_global | postgres |
tbs_digoal | postgres | /ssd3/pg93/tbs_digoal
tbs_idx | postgres | /ssd4/pg93/tbs_idx
(4 rows)
digoal=# \d tbl_cost_align
Table "postgres.tbl_cost_align"
Column | Type | Modifiers
----------+-----------------------------+-----------
id | integer |
info | text |
crt_time | timestamp without time zone |
Indexes:
"idx_tbl_cost_align_id" btree (id), tablespace "tbs_idx"
Tablespace: "tbs_idx"
digoal=# \q
pg93@db-172-16-3-150-> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 29G 7.6G 20G 28% /
tmpfs 48G 0 48G 0% /dev/shm
/dev/sdc3 98G 27G 67G 29% /opt
/dev/sdd1 183G 18G 157G 10% /ssd1
/dev/sde1 469G 1.7G 444G 1% /ssd2
/dev/sda1 221G 12G 198G 6% /ssd3
/dev/sdb1 221G 26G 184G 13% /ssd4
sdb的预读是多少呢?
[root@db-172-16-3-150 ~]# blockdev --getra /dev/sdb
256
表的大小
digoal=# \dt+ tbl_cost_align
List of relations
Schema | Name | Type | Owner | Size | Description
----------+----------------+-------+----------+--------+-------------
postgres | tbl_cost_align | table | postgres | 738 MB |
(1 row)
在柱状图中的5903个操作, 如果算上预读的话刚好和表的大小差不多.
digoal=# select 5903*256*512/1024/1024.0;
?column?
----------------------
737.8750000000000000
(1 row)
柱状图反映的是一个真实的IO获取的响应时间的现象, 同时也体现了块设备readahead在这里所的作用.
接下来我把read ahead设置为0 , 那么每次响应就变得很慢了. 来看结果.
[root@db-172-16-3-150 ~]# blockdev --setra 0 /dev/sdb
[root@db-172-16-3-150 ~]# sync; echo 3 > /proc/sys/vm/drop_caches
pg93@db-172-16-3-150-> taskset -c 1 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1
pg93@db-172-16-3-150-> psql
psql (9.3.1)
Type "help" for help.
digoal=# select pg_backend_pid();
pg_backend_pid
----------------
5330
(1 row)
[root@db-172-16-3-150 ~]# taskset -c 7 stap -e '
global a
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") {
delete a
println("query__start ", user_string($arg1), "pid:", pid())
}
probe vfs.read.return {
t = gettimeofday_ns() - @entry(gettimeofday_ns())
# if (execname() == "postgres" && devname != "N/A")
a[pid()] <<< t
}
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") {
if (@count(a[pid()]))
printdln("**", pid(), @count(a[pid()]), @avg(a[pid()]))
println("query__done ", user_string($arg1), "pid:", pid())
if (@count(a[pid()])) {
println(@hist_log(a[pid()]))
#println(@hist_linear(a[pid()],1024,4096,100))
}
delete a
}' -x 5330
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
-----
Seq Scan on postgres.tbl_cost_align (cost=0.00..195393.00 rows=10100000 width=45) (actual time=1.115..28424.672 rows=10100000 loop
s=1)
Output: id, info, crt_time
Buffers: shared read=94393
Total runtime: 29471.988 ms
(4 rows)
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:5330
5330**94418**277500
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:5330
value |-------------------------------------------------- count
2048 | 0
4096 | 0
8192 | 7
16384 | 0
32768 | 1
65536 | 0
131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 62765
262144 |@@@@@@@@@@@@@@@@@@@@@@ 28131
524288 |@@ 3509
1048576 | 2
2097152 | 3
4194304 | 0
8388608 | 0
从结果看到, 平均IO响应时间变成277500纳秒了, 换算成IOPS只有3603.
参考
1. man blockdev
--setra N
Set readahead to N 512-byte sectors.
--getra
Print readahead (in 512-byte sectors).
2. 《设置进程亲和 - numactl 或 taskset - retrieve or set a process’s CPU affinity (affect SYSTEMTAP TIME)》