Systemtap BUG? : stap -R no effect

4 minute read

背景

在查看stap man手册时, 对手册上所述的stap处理流程的第一部分验证, 发现描述和测试结果不符有1处.  
1. 手册上所述stap -R 指定内核版本可以让stap根据指定的内核版本搜索-I 指定目录的相关子目录, 发现这个完全不符.  
手册上的描述如下 :   
PROCESSING  
The translator begins pass 1 by parsing the given input script, and all scripts (files named *.stp) found in a tapset directory.   
The directories listed with -I are processed in sequence, each processed in "guru mode".   
"guru mode" : Enable parsing of unsafe expert-level constructs like embedded C.  
For each directory, a number of subdirectories are also searched.   
These subdirectories are derived from the selected kernel version (the -R option), in order to allow more kernel-version-specific scripts to override less specific ones.   
For example, for a kernel version 2.6.12-23.FC3 the following patterns would be searched, in sequence:   
  2.6.12-23.FC3/*.stp,   
  2.6.12/*.stp,   
  2.6/*.stp,   
and finally *.stp Stopping the translator after pass 1 causes it to print the parse trees.  
测试方法很简单, 自建alias探针, 看看能否匹配到自建的探针.  
1. 查看当前的内核版本号  
[root@db-172-16-3-150 tmp]# uname -r  
2.6.32-358.el6.x86_64  
2. 创建自定义库文件以及自定义探针.  
[root@db-172-16-3-150 tmp]# cd /tmp  
[root@db-172-16-3-150 tmp]# mkdir 2.6.31  
[root@db-172-16-3-150 tmp]# cd 2.6.31  
[root@db-172-16-3-150 2.6.31]# vi test.stp  
probe p1 = syscall.read {  
  var1 = "hello"  
}  
3. 使用刚才的探针, 但是我们只指定/tmp目录, 同时指定-R参数, 验证第1条不符的情况  
[root@db-172-16-3-150 tmp]# cd  
[root@db-172-16-3-150 ~]# stap --vp 50000 -I /tmp -R 2.6.31 -e 'probe p1 {println(var1); exit()}'  
Parsed kernel "/lib/modules/2.6.32-358.el6.x86_64/build/.config", containing 3166 tuples  
Parsed kernel /lib/modules/2.6.32-358.el6.x86_64/build/Module.symvers, which contained 5541 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 10, processed: 10  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 82, processed: 82  
Pass 1: parsed user script and 92 library script(s) using 203488virt/31760res/3104shr/29424data kb, in 300usr/10sys/307real ms.  
semantic error: while resolving probe point: identifier 'p1' at <input>:1:7  
        source: probe p1 {println(var1); exit()}  
                      ^  
  
semantic error: probe point mismatch at position 0  (alternatives: __nfs __scheduler __signal __tcpmib __vm _linuxmib _signal _sunrpc _syscall _vfs begin begin(number) end end(number) error error(number) generic hotspot ioblock ioblock_trace ioscheduler ioscheduler_trace ipmib irq_handler kernel kprobe kprocess linuxmib module(string) nd_syscall netdev netfilter never nfs nfsd perf process process(number) process(string) procfs procfs(string) python scheduler scsi signal socket softirq stap staprun sunrpc syscall tcp tcpmib timer tty udp vfs vm workqueue): identifier 'p1' at :1:7  
        source: probe p1 {println(var1); exit()}  
                      ^  
  
Pass 2: analysis failed.  Try again with another '--vp 01' option.  
4. 如果把目录名改为2.6.32(uname -r输出的内核版本的母项) 可以匹配到这个子目录中的test.stp, 输出正常.  
[root@db-172-16-3-150 tmp]# cd /tmp  
[root@db-172-16-3-150 tmp]# mv 2.6.31 2.6.32  
[root@db-172-16-3-150 tmp]# cd  
[root@db-172-16-3-150 ~]# stap --vp 50000 -I /tmp -e 'probe p1 {println(var1); exit()}'  
Parsed kernel "/lib/modules/2.6.32-358.el6.x86_64/build/.config", containing 3166 tuples  
Parsed kernel /lib/modules/2.6.32-358.el6.x86_64/build/Module.symvers, which contained 5541 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 10, processed: 10  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 82, processed: 82  
Searched: " /tmp/2.6.32/*.stp ", found: 1, processed: 1  
Pass 1: parsed user script and 93 library script(s) using 203484virt/31760res/3104shr/29420data kb, in 290usr/20sys/307real ms.  
hello  
5. 验证当前目录以及子目录的匹配顺序.  
[root@db-172-16-3-150 tmp]# cd /tmp  
[root@db-172-16-3-150 tmp]# mkdir 2.6.32-358.el6.x86_64  
[root@db-172-16-3-150 tmp]# mkdir 2.6  
[root@db-172-16-3-150 tmp]# cp 2.6.32/test.stp 2.6/  
[root@db-172-16-3-150 tmp]# cp 2.6.32/test.stp 2.6.32-358.el6.x86_64/  
[root@db-172-16-3-150 tmp]# cp 2.6.32/test.stp ./  
[root@db-172-16-3-150 tmp]# cat test.stp  
probe p1 = syscall.read {  
  var1 = "hello"  
}  
一共在4个stp文件中定义了probe p1,   
/tmp/test.stp  
/tmp/2.6.32/test.stp  
/tmp/2.6.32-358.el6.x86_64/test.stp  
/tmp/2.6/test.stp  
最终会匹配哪个呢?  
[root@db-172-16-3-150 tmp]# cd  
[root@db-172-16-3-150 ~]# stap --vp 50000 -I /tmp -e 'probe p1 {println(var1); exit()}'  
Parsed kernel "/lib/modules/2.6.32-358.el6.x86_64/build/.config", containing 3166 tuples  
Parsed kernel /lib/modules/2.6.32-358.el6.x86_64/build/Module.symvers, which contained 5541 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 10, processed: 10  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 82, processed: 82  
Searched: " /tmp/2.6.32-358.el6.x86_64/*.stp ", found: 1, processed: 1  
Searched: " /tmp/2.6.32/*.stp ", found: 1, processed: 1  
Searched: " /tmp/2.6/*.stp ", found: 1, processed: 1  
Searched: " /tmp/*.stp ", found: 1, processed: 1  
Pass 1: parsed user script and 96 library script(s) using 203484virt/31776res/3104shr/29420data kb, in 290usr/20sys/309real ms.  
hello  
从以上输出的检索顺序来看, 应该使用的是/tmp/2.6.32-358.el6.x86_64/*.stp, 下面修改/tmp/2.6.32-358.el6.x86_64/test.stp看看是不是这样的.  
[root@db-172-16-3-150 ~]# vi /tmp/2.6.32-358.el6.x86_64/test.stp   
probe p1 = syscall.read {  
  var1 = "HELLO"  
}  
输出 :   
[root@db-172-16-3-150 ~]# stap --vp 50000 -I /tmp -e 'probe p1 {println(var1); exit()}'  
Parsed kernel "/lib/modules/2.6.32-358.el6.x86_64/build/.config", containing 3166 tuples  
Parsed kernel /lib/modules/2.6.32-358.el6.x86_64/build/Module.symvers, which contained 5541 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 10, processed: 10  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 82, processed: 82  
Searched: " /tmp/2.6.32-358.el6.x86_64/*.stp ", found: 1, processed: 1  
Searched: " /tmp/2.6.32/*.stp ", found: 1, processed: 1  
Searched: " /tmp/2.6/*.stp ", found: 1, processed: 1  
Searched: " /tmp/*.stp ", found: 1, processed: 1  
Pass 1: parsed user script and 96 library script(s) using 203480virt/31772res/3104shr/29416data kb, in 300usr/10sys/308real ms.  
hello  
结果不是这样, 还是输出小写的hello.  
原因 :   
对于重名的probe alias, 最后匹配的那个alias将覆盖前面的alias定义, 所以这里匹配的probe p1是/tmp/test.stp  
[root@db-172-16-3-150 ~]# vi /tmp/test.stp   
probe p1 = syscall.read {  
  var1 = "Hello"  
}  
输出 :   
[root@db-172-16-3-150 ~]# stap --vp 50000 -I /tmp -e 'probe p1 {println(var1); exit()}'  
Parsed kernel "/lib/modules/2.6.32-358.el6.x86_64/build/.config", containing 3166 tuples  
Parsed kernel /lib/modules/2.6.32-358.el6.x86_64/build/Module.symvers, which contained 5541 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 10, processed: 10  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 82, processed: 82  
Searched: " /tmp/2.6.32-358.el6.x86_64/*.stp ", found: 1, processed: 1  
Searched: " /tmp/2.6.32/*.stp ", found: 1, processed: 1  
Searched: " /tmp/2.6/*.stp ", found: 1, processed: 1  
Searched: " /tmp/*.stp ", found: 1, processed: 1  
Pass 1: parsed user script and 96 library script(s) using 203484virt/31776res/3104shr/29420data kb, in 290usr/20sys/309real ms.  
Hello  

参考

1. http://blog.163.com/digoal@126/blog/static/163877040201391391613269/

2. https://sourceware.org/systemtap/man/stap.1.html

3. man stap

       --vp ABCDE  
              Increase  verbosity  on  a  per-pass basis.  For example, "--vp 002" adds 2 units of verbosity to pass 3  
              only.  The combination "-v --vp 00004" adds 1 unit of verbosity for all passes, and 4 more for pass 5.  

ABCDE分别代表stap的5步, 每步可以指定对应的输出verbose级别, 例如–vp 04000表示 pass2的verbose级别为4, 其他pass的verbose级别为0.

Flag Counter

digoal’s 大量PostgreSQL文章入口