Systemtap Userspace probing - 1

8 minute read

背景

Systemtap Support for userspace probing is supported on kernels that are configured to include the utrace or uprobes extensions.

前面几篇blog介绍了DWARF和DWARF-less探针, 本文将要介绍一个新的探针分类, userspace probe, 这个顾名思义是和用户或者运行的程序更相关的探针.

要支持user-space probing, 首先内核要支持, 内核需包含了utrace或者uprobes扩展.

Systemtap最初是为内核空间监视设计的, 在0.6的版本后才加入了进程空间的监视, 即user-space probe.

SystemTap initially focused on kernel-space probing. However, there are many instances where user-space probing can help diagnose a problem. SystemTap 0.6 added support to allow probing user-space processes. SystemTap includes support for probing the entry into and return from a function in user-space processes, probing predefined markers in user-space code, and monitoring user-process events.  
  
The SystemTap user-space probing requires the utrace kernel extensions which provide an API for tracking various user-space events. More details about the utrace infrastructure are available athttp://sourceware.org/systemtap/wiki/utrace. The following command determines whether the currently running Linux kernel provides the needed utrace support:  
  
grep CONFIG_UTRACE /boot/config-`uname -r`  
  
If the Linux kernel support user-space probing, the following output is printed:  
  
CONFIG_UTRACE=y  
  
The SystemTap user-space probing also needs the uprobes kernel module. If the uprobes kernel module is not available, you will see an error message like the following when attempting to run a script that requires the uprobes kernel module:  
  
SystemTap's version of uprobes is out of date.  
  
As root, or a member of the 'root' group, run  
  
"make -C /usr/share/systemtap/runtime/uprobes".  
  
Pass 4: compilation failed.  Try again with another '--vp 0001' option.  
  
If this occurs, you need to generate a uprobes.ko module for the kernel as directed.  

UTRACE介绍截取自/usr/share/doc/kernel-doc-x.x.x/Documentation/utrace.txt :

The UTRACE is infrastructure code for tracing and controlling user  
threads.  This is the foundation for writing tracing engines, which  
can be loadable kernel modules.  The UTRACE interfaces provide three  
basic facilities:  
  
* Thread event reporting  
  
  Tracing engines can request callbacks for events of interest in  
  the thread: signals, system calls, exit, exec, clone, etc.  
  
* Core thread control  
  
  Tracing engines can prevent a thread from running (keeping it in  
  TASK_TRACED state), or make it single-step or block-step (when  
  hardware supports it).  Engines can cause a thread to abort system  
  calls, they change the behaviors of signals, and they can inject  
  signal-style actions at will.  
  
* Thread machine state access  
  
  Tracing engines can read and write a thread's registers and  
  similar per-thread CPU state.  

检查内核是否支持utrace :

[root@db-172-16-3-150 pg93]# grep CONFIG_UTRACE /boot/config-`uname -r`  
CONFIG_UTRACE=y  

userspace probe的几种使用方法.

1. 进程创建和进程消亡.

process.begin   所有进程创建时触发. (如果为target process模式则受指定进程限制.stap -c or -x; 或者是stap --unprivileged 则限定为当前用户的进程)  
process("PATH").begin PATH可以为当前的相对路径或者绝对路径或者在当前环境下的$PATH中搜索指定的进程名称. 在该进程创建时触发.  
process(PID).begin 指进程ID为指定PID的进程被创建时触发  

以下同上, 只不过代表线程.

process.thread.begin   
process("PATH").thread.begin  
process(PID).thread.begin  
.end 指进程消亡时触发. 其他含义同上.  
process.end  
process("PATH").end  
process(PID).end  
  
process.thread.end  
process("PATH").thread.end  
process(PID).thread.end  
  
The .begin variant is called when a new process described by PID or PATH is created. If no PID or PATH argument is specified (for example process.begin), the probe flags any new process being spawned.  
  
The .thread.begin variant is called when a new thread described by PID or PATH is created.  
  
The .end variant is called when a process described by PID or PATH dies.  
  
The .thread.end variant is called when a thread described by PID or PATH dies.  

例如 :

相对路径 :

[root@db-172-16-3-39 pgsql]# cd /opt  
[root@db-172-16-3-39 opt]# ll pgsql9.3beta2/bin/postgres  
-rwxr-xr-x 1 root root 5710772 Aug  8 22:01 pgsql9.3beta2/bin/postgres  
[root@db-172-16-3-39 opt]# stap -e 'probe process("pgsql9.3beta2/bin/postgres").begin {printf("%s, %s, %d, %d\n", execname(), pp(), pid(), cpu()); exit();}'  
postgres, process("/opt/pgsql9.3beta2/bin/postgres").begin, 13707, 0  

绝对路径 :

[root@db-172-16-3-39 opt]# cd  
[root@db-172-16-3-39 ~]# stap -e 'probe process("/opt/pgsql9.3beta2/bin/postgres").begin {printf("%s, %s, %d, %d\n", execname(), pp(), pid(), cpu()); exit();}'  
postgres, process("/opt/pgsql9.3beta2/bin/postgres").begin, 13707, 0  

注意这个进程并不是探测过程中新建的, 而是已经存在的, 和systemtap介绍的不一样?

[root@db-172-16-3-39 ~]# ps -ewf|grep 13707  
pg93     13707  3917  0 16:27 ?        00:00:00 postgres: postgres digoal [local] idle  
root     14008  3537  0 16:29 pts/2    00:00:00 grep 13707  
  
.end 指进程消亡时倒是对的.  
  
[root@db-172-16-3-39 ~]# stap -e 'probe process("/opt/pgsql9.3beta2/bin/postgres").end {printf("%s, %s, %d, %d\n", execname(), pp(), pid(), cpu()); exit();}'  

进程退出后, 输出如下 :

postgres, process("/opt/pgsql9.3beta2/bin/postgres").end, 13707, 4  

2. 进程的系统调用

Constructs:  
process.syscall  
process("PATH").syscall  
process(PID).syscall  
  
process.syscall.return  
process("PATH").syscall.return  
process(PID).syscall.return  
The .syscall variant is called when a thread described by PID or PATH makes a system call.   
The system call number is available in the $syscall context variable.   
The first six arguments of the system call are available in the $argN parameter, for example $arg1, $arg2, and so on.  
  
The .syscall.return variant is called when a thread described by PID or PATH returns from a system call.   
The system call number is available in the $syscall context variable.   
The return value of the system call is available in the $return context variable.  

以上为指定进程或所有进程的系统调用探针的用法.

操作系统中支持哪些系统调用可以通过stap -l 'syscall.**' 来输出.

例如 :

[root@db-172-16-3-39 ~]# stap -l 'syscall.**'|less  
syscall.accept  
syscall.accept.return  
syscall.access  
syscall.access.return  
.....略  
syscall.write  
syscall.write.return  

举例 :

$syscall表示系统调用号, $arg1 - $arg6表示该系统调函数的前6个参数. .return不支持$arg, 但是支持$return.

[root@db-172-16-3-39 ~]# stap -e 'probe process("/opt/pgsql9.3beta2/bin/postgres").syscall { printf("%s, %d, %s, %d, %d, %d\n", execname(), $syscall, pp(), pid(), cpu(), $arg1); } probe timer.s(1) { exit(); }'  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3918, 1, 3  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3918, 1, 140734936355648  
postgres, 45, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 4583, 1, 6  
postgres, 45, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 4591, 1, 7  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3920, 2, 3  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3921, 4, 3  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 3  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3924, 7, 3  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3921, 4, 140734936357808  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3920, 2, 140734936357776  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 140734936357808  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3924, 7, 140734936356176  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3923, 4, 3  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3923, 4, 140734936358000  
postgres, 45, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 4758, 4, 7  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 3  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 140734936357808  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 3  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 140734936357808  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 3  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 140734936357808  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 3  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall, 3922, 6, 140734936357808  

系统调用返回探针

[root@db-172-16-3-39 ~]# stap -e 'probe process("/opt/pgsql9.3beta2/bin/postgres").syscall.return { printf("%s, %d, %s, %d, %d, %d\n", execname(), $syscall, pp(), pid(), cpu(), $return); } probe timer.s(1) { exit(); }'  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, -11  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3921, 4, -11  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3923, 4, -11  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, 0  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, -11  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, 0  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, -11  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, 0  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, -11  
postgres, 7, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, 0  
postgres, 0, process("/opt/pgsql9.3beta2/bin/postgres").syscall.return, 3922, 6, -11  

3. 进程的函数或者语句级探针.

与内核的函数和语句级探针类似, 只是这里限定在指定进程的函数调用或者语句探针.

Constructs:  
process("PATH").function("NAME")  
process("PATH").statement("*@FILE.c:123")  
process("PATH").function("*").return  
process("PATH").function("myfun").label("foo")  
Full symbolic source-level probes in userspace programs and shared libraries are supported.   
These are exactly analogous to the symbolic DWARF-based kernel or module probes described previously and expose similar contextual $-variables.  
Here is an example of prototype symbolic userspace probing support:  
  
# stap -e 'probe process("ls").function("*").call {  
           log (probefunc()." ".$$parms)  
           }' \  
       -c 'ls -l'  
  
To run, this script requires debugging information for the named program and utrace support in the kernel.   
  
If you see a "pass 4a-time" build failure, check that your kernel supports utrace.  

注意如果要使用进程的函数或语句级探针, 与DWARF-based kernel or module探针一样, 需要先安装debuginfo包(指对应的function或者statement指定的函数对应的debug信息), 当然也需要内核支持utrace.

[root@db-172-16-3-39 ~]# rpm -qa|grep debuginf  
kernel-debuginfo-common-2.6.18-348.12.1.el5  
kernel-debuginfo-2.6.18-348.12.1.el5  

进程级的函数或语句探针, 同样支持函数参数, 本地变量, 全局变量等. 详细介绍请参考

http://blog.163.com/digoal@126/blog/static/1638770402013823101827553/

举例 :

当进程不支持debuginfo时, 会报错 :

[root@db-172-16-3-39 ~]# stap -e 'probe process("/opt/pgsql9.3beta2/bin/postgres").function("tcp_v4_connect") { printf("%s, %d, %d, %s\n", pp(), pid(), cpu(), $$vars); } probe timer.s(1) { exit(); }'  
WARNING: cannot find module /opt/pgsql9.3beta2/bin/postgres debuginfo: No DWARF information found  
semantic error: while resolving probe point: identifier 'process' at <input>:1:7  
        source: probe process("/opt/pgsql9.3beta2/bin/postgres").function("tcp_v4_connect") { printf("%s, %d, %d, %s\n", pp(), pid(), cpu(), $$vars); } probe timer.s(1) { exit(); }  
                      ^  
  
semantic error: no match  
Pass 2: analysis failed.  Try again with another '--vp 01' option.  

postgresql编译时未加–enable-dtrace参数 :

pg93@db-172-16-3-39-> pg_config --configure  
'--prefix=/opt/pgsql9.3beta2' '--with-pgport=1921' '--with-perl' '--with-python' '--with-openssl' '--with-pam' '--without-ldap' '--with-libxml' '--with-libxslt' '--enable-thread-safety' '--with-wal-blocksize=16'  

换个加了–enable-dtrace以及–enable-debug的试试 :

pg94@db-172-16-3-39-> pg_config --configure  
'--prefix=/home/pg94/pgsql9.4devel' '--with-pgport=2999' '--with-perl' '--with-tcl' '--with-python' '--with-openssl' '--with-pam' '--without-ldap' '--with-libxml' '--with-libxslt' '--enable-thread-safety' '--with-wal-blocksize=16' '--enable-dtrace' '--enable-debug'  

可以了

[root@db-172-16-3-150 ~]# /usr/bin/stap -e 'probe process("/home/pg93/pgsql9.3.1/bin/postgres").function("*") { printf("%s, %d, %d, %s\n", pp(), pid(), cpu(), probefunc()); }'  
WARNING: u*probe failed postgres[10269] 'process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../../src/include/storage/s_lock.h:204")' addr 00000000004b08fd rc -1  
WARNING: u*probe failed postgres[10268] 'process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../../src/include/storage/s_lock.h:204")' addr 00000000004b08fd rc -1  
WARNING: u*probe failed postgres[10269] 'process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../src/include/storage/s_lock.h:204")' addr 0000000000622070 rc -1  
WARNING: u*probe failed postgres[10269] 'process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../src/include/storage/s_lock.h:204")' addr 0000000000622160 rc -1  
WARNING: u*probe failed postgres[10268] 'process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../src/include/storage/s_lock.h:204")' addr 0000000000622070 rc -1  
WARNING: u*probe failed postgres[10268] 'process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../src/include/storage/s_lock.h:204")' addr 0000000000622160 rc -1  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("ResetLatch@/opt/soft_bak/postgresql-9.3.1/src/backend/port/pg_latch.c:552"), 10269, 1, ResetLatch  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("XLogBackgroundFlush@/opt/soft_bak/postgresql-9.3.1/src/backend/access/transam/xlog.c:2074"), 10269, 1, XLogBackgroundFlush  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("RecoveryInProgress@/opt/soft_bak/postgresql-9.3.1/src/backend/access/transam/xlog.c:6223"), 10269, 1, RecoveryInProgress  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../../src/include/storage/s_lock.h:204"), 10269, 1, tas  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("tas@../../../../src/include/storage/s_lock.h:204"), 10269, 1, tas  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("WaitLatch@/opt/soft_bak/postgresql-9.3.1/src/backend/port/pg_latch.c:194"), 10269, 1, WaitLatch  
  
......略  

使用stap -l输出支持的探针.

[root@db-172-16-3-150 ~]# /usr/bin/stap -l 'process("/home/pg93/pgsql9.3.1/bin/postgres").function("*")'|less  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("ATAddCheckConstraint@/opt/soft_bak/postgresql-9.3.1/src/backend/commands/tablecmds.c:5659")  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("ATAddForeignKeyConstraint@/opt/soft_bak/postgresql-9.3.1/src/backend/commands/tablecmds.c:5780")  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("ATColumnChangeRequiresRewrite@/opt/soft_bak/postgresql-9.3.1/src/backend/commands/tablecmds.c:7382")  
process("/home/pg93/pgsql9.3.1/bin/postgres").function("ATController@/opt/soft_bak/postgresql-9.3.1/src/backend/commands/tablecmds.c:2913")  
... 略  

使用系统提供的例子 :

[root@db-172-16-3-39 ~]# stap -e 'probe process("ls").function("*").call {  
>            log (probefunc()." ".$$parms)  
>            }' \  
>        -c 'ls -l'  
WARNING: cannot find module /bin/ls debuginfo: No DWARF information found  
semantic error: while resolving probe point: identifier 'process' at <input>:1:7  
        source: probe process("ls").function("*").call {  
                      ^  
  
semantic error: no match  
Pass 2: analysis failed.  Try again with another '--vp 01' option.  
Missing separate debuginfos, use: debuginfo-install coreutils-5.97-34.el5.x86_64   

以上报错原因是系统中未安装ls的debuginfo.

[root@db-172-16-3-39 ~]# stap -l 'process("/bin/ls").function("**")'  
[root@db-172-16-3-39 ~]# stap -l 'process("/bin/ls").function("*")'  
[root@db-172-16-3-39 ~]# stap -l 'process("/bin/ls").function("*.*")'  

4. Absolute variant

A non-symbolic probe point such as process(PID).statement(ADDRESS).absolute is analogous to   
kernel.statement(ADDRESS).absolute in that both use raw, unverified virtual addresses and provide no $variables.   
The target PID parameter must identify a running process and ADDRESS must identify a valid instruction address.   
All threads of the listed process will be probed.   
This is a guru mode probe.  

必须使用stap -g 模式运行.

5. 进程路径搜索(PATH).

For all process probes, PATH names refer to executables that are searched the same way that shells do: the explicit path specified if the path name begins with a slash (/) character sequence; otherwise $PATH is searched. For example, the following probe syntax:  
probe process("ls").syscall {}  
probe process("./a.out").syscall {}  
works the same as:  
  
probe process("/bin/ls").syscall {}  
probe process("/my/directory/a.out").syscall {}  
If a process probe is specified without a PID or PATH parameter, all user threads are probed. However, if systemtap is invoked in target process mode, process probes are restricted to the process hierarchy associated with the target process. If stap is running in -unprivileged mode, only processes owned by the current user are selected.  

相对路径, 绝对路径, 或者当前环境$PATH下搜索.

例如 :

root@db-172-16-3-39-> which postgres  
/home/pg94/pgsql9.4devel/bin/postgres  
root@db-172-16-3-39-> whoami  
root  

当前环境中搜索

root@db-172-16-3-39-> stap -e 'probe process("postgres").syscall { printf("%s, %d\n", pp(), $syscall); exit() }'  
process("/home/pg94/pgsql9.4devel/bin/postgres").syscall, 23  

相对路径

root@db-172-16-3-39-> stap -e 'probe process("pgsql9.4devel/bin/postgres").syscall { printf("%s, %d\n", pp(), $syscall); exit() }'  
process("/home/pg94/pgsql9.4devel/bin/postgres").syscall, 0  

绝对路径

root@db-172-16-3-39-> stap -e 'probe process("/home/pg94/pgsql9.4devel/bin/postgres").syscall { printf("%s, %d\n", pp(), $syscall); exit() }'  
process("/home/pg94/pgsql9.4devel/bin/postgres").syscall, 0  

未完待续.

参考

1. https://sourceware.org/systemtap/langref/Probe_points.html

2. http://lwn.net/Articles/499190/

3. http://lwn.net/Articles/224772/

4. /usr/share/doc/kernel-doc-x.x.x/Documentation/utrace.txt

5. http://blog.163.com/digoal@126/blog/static/1638770402013823101827553/

6. https://sourceware.org/systemtap/SystemTap_Beginners_Guide/userspace-probing.html

Flag Counter

digoal’s 大量PostgreSQL文章入口