find systemtap pre-built probe points & probe points reference manual

8 minute read

背景

systemtap内建了大量的probe event. 以及alias.

一般放在/usr/share/systemtap/tapset/这个目录中, 以stp后缀结尾的文件中.

使用stap –vp 5输出详细信息, 可以看到stap搜索的目录.

[root@db-172-16-3-39 ~]# stap --vp 5 -e "probe begin { exit() }"  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 147264virt/23696res/3012shr/21860data kb, in 160usr/10sys/171real ms.  

如果自己定义了一些probe alias放在.stp文件中, 那么可以通过-I来添加存放自定义stp文件的目录.

man stap  
       -I DIR Add the given directory to the tapset search directory.  See the description of pass 2 for details.  

例如 :

[root@db-172-16-3-39 ~]# pwd  
/root  
[root@db-172-16-3-39 ~]# ll *.stp  
-rw-r--r-- 1 root root 107 Sep  6 16:50 p.stp  
-rw-r--r-- 1 root root 320 Sep 10 15:13 test.stp  
[root@db-172-16-3-39 ~]# stap -I /root --vp 5 -e "probe begin { exit() }"  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
parse error: command line argument index 1 out of range [1-0]  
        at: unknown token '$1' at /root/test.stp:6:17  
     source:     for (i=0; i<$1; i++) {  
                             ^  
1 parse error.  
WARNING: tapset '/root/test.stp' has errors, and will be skipped.  

在指定的目录中发现两个.stp文件, 处理了1个.

Searched: " /root/*.stp ", found: 2, processed: 1  
Pass 1: parsed user script and 86 library script(s) using 146916virt/23728res/3024shr/21512data kb, in 170usr/10sys/174real ms.  

一.

查找内建的probe event或者别名, 可以直接去这几个检索的目录中grep, 也可以使用stap命令行参数进行匹配检索.

1. 直接去预装的stp脚本目录中检索举例 :

[root@db-172-16-3-39 ~]# grep -r "^probe " /usr/share/systemtap/tapset/* | less  
/usr/share/systemtap/tapset/argv.stp:probe begin(-1) {  
/usr/share/systemtap/tapset/i386/nd_syscalls.stp:probe nd_syscall.get_thread_area = kprobe.function("sys_get_thread_area")  
/usr/share/systemtap/tapset/i386/nd_syscalls.stp:probe nd_syscall.get_thread_area.return = kprobe.function("sys_get_thread_area").return  
/usr/share/systemtap/tapset/i386/nd_syscalls.stp:probe nd_syscall.iopl = kprobe.function("sys_iopl")  
/usr/share/systemtap/tapset/i386/nd_syscalls.stp:probe nd_syscall.iopl.return = kprobe.function("sys_iopl").return  
  
... 略  

使用这种方法检索到的基本上都是probe alias, 方便我们使用, 其实这些别名都是kernel.funciton或者kprobe.funciton之类的event.

例如syscall :

[root@db-172-16-3-39 ~]# grep -r "^probe " /usr/share/systemtap/tapset/* | grep "probe syscall" | less  
/usr/share/systemtap/tapset/i386/syscalls.stp:probe syscall.get_thread_area = kernel.function("sys_get_thread_area")  
/usr/share/systemtap/tapset/i386/syscalls.stp:probe syscall.get_thread_area.return = kernel.function("sys_get_thread_area").return  
/usr/share/systemtap/tapset/i386/syscalls.stp:probe syscall.iopl = kernel.function("sys_iopl")  
/usr/share/systemtap/tapset/i386/syscalls.stp:probe syscall.iopl.return = kernel.function("sys_iopl").return  
... 略  

2. 使用stap -l 参数检索举例 :

[root@db-172-16-3-39 ~]# stap --vp 5 -l 'syscall.*'|less  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146604virt/23692res/3008shr/21200data kb, in 160usr/10sys/172real ms.  
syscall.accept  
syscall.access  
syscall.acct  
syscall.add_key  
syscall.adjtimex  
syscall.alarm  
syscall.arch_prctl  
... 略  

使用stap支持通配符查找.

[root@db-172-16-3-39 ~]# stap -l 'syscall.*.return'|less  
syscall.accept.return  
syscall.access.return  
syscall.acct.return  
syscall.add_key.return  
syscall.adjtimex.return  
syscall.alarm.return  
syscall.arch_prctl.return  
syscall.bdflush.return  
... 略  

查找kernel.function(“*”) , :

[root@db-172-16-3-39 ~]# stap --vp 5 -l 'kernel.function("*")'|less  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146604virt/23688res/3008shr/21200data kb, in 170usr/0sys/173real ms.  
kernel.function("BDEV_I@fs/block_dev.c:32")  
kernel.function("DQUOT_ALLOC_INODE@include/linux/quotaops.h:168")  
kernel.function("DQUOT_ALLOC_SPACE@include/linux/quotaops.h:148")  
kernel.function("DQUOT_ALLOC_SPACE_NODIRTY@include/linux/quotaops.h:126")  
kernel.function("DQUOT_DROP@include/linux/quotaops.h:83")  
kernel.function("DQUOT_FREE_INODE@include/linux/quotaops.h:219")  
kernel.function("DQUOT_FREE_SPACE@include/linux/quotaops.h:213")  
... 略  
kernel.function("*")有18124个, 而tapset目录中只有2千多个alias.   

所以需要更精细的事件, 还是去kernel.funciton中找吧. 当然2千多个alias日常已经够用了.

[root@db-172-16-3-39 ~]# stap --vp 5 -l 'kernel.function("*")'|wc -l  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146604virt/23688res/3008shr/21200data kb, in 170usr/0sys/172real ms.  
18124  
[root@db-172-16-3-39 ~]# grep -r "^probe " /usr/share/systemtap/tapset/|wc -l  
2230  
常用的查询条件, '*', '*.*' '**'  
[root@db-172-16-3-39 ~]# stap --vp 5 -l '*'  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146608virt/23696res/3008shr/21204data kb, in 160usr/10sys/172real ms.  
begin  
end  
error  
never  
  
[root@db-172-16-3-39 ~]# stap --vp 5 -l '*.*'| less  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146604virt/23700res/3008shr/21200data kb, in 160usr/10sys/173real ms.  
_syscall.accept  
_vfs.block_prepare_write  
_vfs.block_write_begin  
_vfs.block_write_end  
_vfs.generic_block_bmap  
_vfs.generic_commit_write  
_vfs.generic_file_readonly_mmap  
... 略  
  
[root@db-172-16-3-39 tapset]# stap --vp 5 -l '**'| less  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146608virt/23696res/3008shr/21204data kb, in 170usr/0sys/172real ms.  
__nfs.aop.write_begin  
__nfs.aop.write_begin.return  
__nfs.aop.write_end  
__nfs.aop.write_end.return  
__scheduler.kthread_stop.kp  

这些通配符的含义可以参考man stapprobes

The  general  probe  point  syntax is a dotted-symbol sequence.  This allows a breakdown of the event namespace  
       into parts, somewhat like the Domain Name System does on  the  Internet.   Each  component  identifier  may  be  
       parametrized  by a string or number literal, with a syntax like a function call.  A component may include a "*"  
       character, to expand to a set of matching probe points.  It may also include "**" to match multiple  sequential  
       components at once.  Probe aliases likewise expand to other probe points.  Each and every resulting probe point  
       is normally resolved to some low-level system instrumentation facility (e.g., a kprobe address,  marker,  or  a  
       timer configuration), otherwise the elaboration phase will fail.  

二.

获得event或者alias后, 怎么获得帮助呢? 比如要得到这些事件可用的target variable,

可以通过帮助手册,或者使用stap -L 查找, 或者使用$$vars打印.

例如 :

[root@db-172-16-3-39 ~]# stap -l "*.*"|grep vm.brk  
vm.brk  

变量获得 :

1. 通过stap -L

[root@db-172-16-3-39 ~]# stap -L vm.brk  
vm.brk name:string address:long length:long $addr:long unsigned int $len:long unsigned int $prev:struct vm_area_struct*  
man stap  
       -L PROBE  
              Similar to "-l", but list probe points and script-level local variables.  

2. 通过probe::*帮助手册 :

取自SystemTap Tapset Reference Manual

[root@db-172-16-3-39 ~]# man probe::vm.brk  
PROBE::VM.BRK(3stap)             Memory Tapset            PROBE::VM.BRK(3stap)  
  
NAME  
       probe::vm.brk - Fires when a brk is requested (i.e. the heap will be resized)  
  
SYNOPSIS  
       vm.brk  
  
VALUES  
       length the length of the memory segment  
  
       name   name of the probe point  
  
       address  
              the requested address  
  
CONTEXT  
       The process calling brk.  
  
SystemTap Tapset Reference       January 2013             PROBE::VM.BRK(3stap)  

3. 还可以使用stap这种的特殊变量$$vars打印这个event的变量, 但是这种方法必须要当事件被触发才行.

[root@db-172-16-3-39 ~]# cat test.stp   
probe vm.brk {  
  printf("vars: %s\n", $$vars)  
  exit()  
}  
[root@db-172-16-3-39 ~]# stap test.stp   
vars: addr=0x1fd59000 len=0x21000 mm=? vma=? prev=0xc000003e0000000c flags=? rb_link=? rb_parent=? pgoff=? error=?  

使用$$vars打印出来的是这个内核函数中定义的所有变量, 比stap -L和帮助手册更全面.

expands to a character string that is equivalent to sprintf("parm1=%x ...  parmN=%x  var1=%x  ...  
              varN=%x", parm1, ..., parmN, var1, ..., varN)  

三.

probe points 帮助手册的结构 :

probe::*(3stap), tapset::*(3stap), function::*(3stap)  

这三个man手册其实都是源自以下这本手册.

https://sourceware.org/systemtap/tapsets/ ```tapset::*``` 指大的分类帮助查询, 分类中包含了属于该分类的probe和funciton. ```probe::*``` 指单个probe point的帮助查询 ```function::*``` 指单个函数的帮助查询 另外一个更全的帮助索引见如下页面 https://sourceware.org/systemtap/man/index.html 例如 : ``` tapset::kprocess(3stap) tapset::kprocess(3stap) NAME tapset::kprocess - systemtap kprocess tapset DESCRIPTION This family of probe points is used to probe process-related activities. kprocess.create Fires whenever a new process or thread is successfully created See probe::kprocess.create(3stap) for details. ... 略 ``` 对应的是 第16章. 里面包含了kernel process tapset中的probe point和funciton. 例如 ``` [root@db-172-16-3-39 ~]# man probe::kprocess.create PROBE::KPROCESS.CREA(3stap) Kernel Process Tapset PROBE::KPROCESS.CREA(3stap) NAME probe::kprocess.create - Fires whenever a new process or thread is successfully created SYNOPSIS kprocess.create VALUES new_tid The TID of the newly created task new_pid The PID of the newly created process CONTEXT Parent of the created process. DESCRIPTION Fires whenever a new process is successfully created, either as a result of fork (or one of its syscall variants), or a new kernel thread. SystemTap Tapset Reference January 2013 PROBE::KPROCESS.CREA(3stap) [root@db-172-16-3-39 ~]# man function::target_set_pid FUNCTION::TARGET_SET(3stap) Kernel Process Tapset FUNCTION::TARGET_SET(3stap) NAME function::target_set_pid - Does pid descend from target process? SYNOPSIS target_set_pid(pid:) ARGUMENTS pid The pid of the process to query DESCRIPTION This function returns whether the given process-id is within the “target set”, that is whether it is a descendant of the top-level target process. SystemTap Tapset Reference January 2013 FUNCTION::TARGET_SET(3stap) ``` ### 四. 查看vm.brk对应的源文件 : 首先在tapset目录中查找到alias对应的kernel.funciton. ``` [root@db-172-16-3-39 ~]# grep -r "^probe " /usr/share/systemtap/tapset/|grep vm.brk /usr/share/systemtap/tapset/memory.stp:probe vm.brk = kernel.function("do_brk") { ``` 然后用stap -l 查出对应的kernel.function详细信息 ``` [root@db-172-16-3-39 ~]# stap -l 'kernel.function("*")'|grep do_brk kernel.function("do_brk@mm/mmap.c:2037") ``` 在源码的debug分支中可以找到这个源码文件. ``` cd /usr/src/debug/kernel-2.6.18/linux-2.6.18-348.12.1.el5.x86_64 less mm/mmap.c ``` 对应的行查找到函数do_brk ``` /* * this is really a simplified "do_mmap". it only handles * anonymous maps. eventually we may be able to do some * brk-specific accounting here. */ unsigned long do_brk(unsigned long addr, unsigned long len) 2037: { struct mm_struct * mm = current->mm; struct vm_area_struct * vma, * prev; unsigned long flags; struct rb_node ** rb_link, * rb_parent; pgoff_t pgoff = addr >> PAGE_SHIFT; int error; len = PAGE_ALIGN(len); if (!len) return addr; error = security_file_mmap_addr(0, 0, 0, 0, addr, 1); if (error) return error; flags = VM_DATA_DEFAULT_FLAGS | VM_ACCOUNT | mm->def_flags; error = get_unmapped_area(NULL, addr, len, 0, MAP_FIXED); error = get_unmapped_area(NULL, addr, len, 0, MAP_FIXED); if (error & ~PAGE_MASK) return error; /* * mlock MCL_FUTURE? */ if (mm->def_flags & VM_LOCKED) { unsigned long locked, lock_limit; locked = len >> PAGE_SHIFT; locked += mm->locked_vm; lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur; lock_limit >>= PAGE_SHIFT; if (locked > lock_limit && !capable(CAP_IPC_LOCK)) return -EAGAIN; } /* * mm->mmap_sem is required to protect against another thread * changing the mappings in case we sleep. */ verify_mm_writelocked(mm); /* * Clear old maps. this also does some error checking for us */ munmap_back: vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent); if (vma && vma->vm_start < addr + len) { if (do_munmap(mm, addr, len)) return -ENOMEM; goto munmap_back; } /* Check against address space limits *after* clearing old maps... */ if (!may_expand_vm(mm, len >> PAGE_SHIFT)) /* Check against address space limits *after* clearing old maps... */ if (!may_expand_vm(mm, len >> PAGE_SHIFT)) return -ENOMEM; if (mm->map_count > sysctl_max_map_count) return -ENOMEM; if (security_vm_enough_memory(len >> PAGE_SHIFT)) return -ENOMEM; /* Can we just expand an old private anonymous mapping? */ if (vma_merge(mm, prev, addr, addr + len, flags, NULL, NULL, pgoff, NULL)) goto out; /* * create a vma struct for an anonymous mapping */ vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL); if (!vma) { vm_unacct_memory(len >> PAGE_SHIFT); return -ENOMEM; } vma->vm_mm = mm; vma->vm_start = addr; vma->vm_end = addr + len; vma->vm_pgoff = pgoff; vma->vm_flags = flags; vma->vm_page_prot = protection_map[flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]; vma_link(mm, vma, prev, rb_link, rb_parent); out: mm->total_vm += len >> PAGE_SHIFT; if (flags & VM_LOCKED) { mm->locked_vm += len >> PAGE_SHIFT; make_pages_present(addr, addr + len); mm->locked_vm += len >> PAGE_SHIFT; make_pages_present(addr, addr + len); } return addr; } EXPORT_SYMBOL(do_brk); ``` ## 参考 1\. https://sourceware.org/systemtap/langref/Components_SystemTap_script.html 2\. man stapprobes 3\. man probe::* tapset::* 4\. stappaths 5\. SystemTap Tapset Reference Manual https://sourceware.org/systemtap/tapsets/ 6\. 帮助手册索引 https://sourceware.org/systemtap/man/index.html Flag Counter ## [digoal's 大量PostgreSQL文章入口](https://github.com/digoal/blog/blob/master/README.md "22709685feb7cab07d30f30387f0a9ae")