Systemtap Userspace probing - 2

4 minute read

背景

接上一篇BLOG

http://blog.163.com/digoal@126/blog/static/163877040201382941342901/

本文讲一下 Target process mode

Target process mode (invoked with stap -c CMD or -x PID) implicitly restricts all process.* probes to the given child process. It does not affect kernel.* or other probe types.   
The CMD string is normally run directly, rather than from a ``/bin/sh -c'' sub-shell, since utrace and uprobe probes receive a fairly "clean" event stream.   
If meta-characters such as redirection operators are present in CMD, ``/bin/sh -c CMD'' is still used, and utrace and uprobe probes will receive events from the shell.  

目标进程模式使用方法stap -c CMDstap -x PID. 使用这种模式可限制探针只在指定进程生效, 对于其他进程不会触发. 注意目标进程模式只限制process.*探针, 其他探针不受此限制, 例如kernel.*, syscall.*不受此限制. 例如 :

[root@db-172-16-3-39 ~]# ps -ewf|grep postgres|grep pg94  
pg94     15967     1  0 Sep29 ?        00:00:00 /home/pg94/pgsql9.4devel/bin/postgres  

process.*探针在目标进程模式中, 被限制在目标进程激活.

[root@db-172-16-3-39 ~]# stap -x 15967 -e 'probe process.syscall {printf("%s, %s, %d\n", pp(), execname(), $syscall); }'  
process.syscall, postgres, 23  

其他探针不受-x限制. 如以下的stapio也触发了

[root@db-172-16-3-39 ~]# stap -x 15967 -e 'probe syscall.* {printf("%s, %s\n", pp(), execname()); exit();}'  
kernel.function("sys_fcntl@fs/fcntl.c:357").call?, stapio  

其他探针需要限制target pid的话可以使用if (pid() == target())

stap -x 15967 -e 'probe syscall.* {if(pid() == target()) {printf("%s, %s\n", pp(), execname()); exit();}}  

另一个书上的例子 :

% stap -e 'probe process.syscall, process.end {  
           printf("%s %d %s\n", execname(), pid(), pp())}' \  
       -c ls  
Here is the output from this command:  
ls 2323 process.syscall  
ls 2323 process.syscall  
ls 2323 process.end  

一般在探针的process(“PATH”)中PATH指定的是可执行程序, 如果指定的是动态库文件会怎么样呢?

If PATH names a shared library, all processes that map that shared library can be probed.   
If dwarf debugging information is installed, try using a command with this syntax:  
        probe process("/lib64/libc-2.8.so").function("....") { ... }  
This command probes all threads that call into that library.   

以上探针表示所有调用到libc-2.8.so这个动态库的进程都被触发, 如果加了.function这限定到函数级别(必须有对应的debufinfo包).

  
Typing ``stap -c CMD'' or ``stap -x PID'' restricts this to the target command and descendants only.   
You can use $$vars and others.   

使用目标进程模式时加stap -c或者 -x参数来限定进程. 如果使用了.function函数探针, 同样可以使用$$vars以及其他输出参数, 本地变量等的输出.

You can provide the location of debug information to the stap command with the -d DIRECTORY option.   

1.8的stap版本中未找到对应的-d DIRECTORY参数.

To qualify a probe point to a location in a library required by a particular process try using a command with this syntax:  
        probe process("...").library("...").function("....") { ... }  
The library name may use wildcards.  

如果要在探针中限定进程号或者程序名与库文件对应时的触发, 而非目标进程模式的话, 可以使用以上探针语法.

举例 :

[root@db-172-16-3-39 lib]# stap --download-debuginfo=yes -e 'probe process("/lib64/ld-2.5.so").function("*") { printf ("%s, %s\n", pp(), execname()); exit(); }'  
WARNING: abrt-action-install-debuginfo-to-abrt-cache is not installed. Continuing without downloading debuginfo.  
WARNING: cannot find module /lib64/ld-2.5.so debuginfo: No DWARF information found  
semantic error: while resolving probe point: identifier 'process' at <input>:1:7  
        source: probe process("/lib64/ld-2.5.so").function("*") { printf ("%s, %s\n", pp(), execname()); exit(); }  
                      ^  
semantic error: no match  
Pass 2: analysis failed.  Try again with another '--vp 01' option.  
Missing separate debuginfos, use: debuginfo-install glibc-2.5-65.x86_64  

被告知缺少debuginfo.

换个有debuginfo的动态库

[root@db-172-16-3-39 lib]# stap --download-debuginfo=yes -e 'probe process("/usr/local/lib/libevent-1.4.so.2.2.0").function("*") { printf ("%s, %s\n", pp(), execname()); } probe timer.s(1) {exit();}'  
WARNING: abrt-action-install-debuginfo-to-abrt-cache is not installed. Continuing without downloading debuginfo.  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("evsignal_process@/opt/soft_bak/libevent-1.4.14b-stable/signal.c:314"), rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_process@/opt/soft_bak/libevent-1.4.14b-stable/event.c:927"), rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_correct@/opt/soft_bak/libevent-1.4.14b-stable/event.c:892"), rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("min_heap_top@/opt/soft_bak/libevent-1.4.14b-stable/min_heap.h:63"), rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_next@/opt/soft_bak/libevent-1.4.14b-stable/event.c:856"), rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("epoll_dispatch@/opt/soft_bak/libevent-1.4.14b-stable/epoll.c:183"), rpc.idmapd  

可以输出函数参数, 本地变量等.

[root@db-172-16-3-39 lib]# stap --download-debuginfo=yes -e 'probe process("/usr/local/lib/libevent-1.4.so.2.2.0").function("*") { printf ("%s, %s, %s\n", pp(), execname(), $$vars); } probe timer.s(1) {exit();}'  
WARNING: abrt-action-install-debuginfo-to-abrt-cache is not installed. Continuing without downloading debuginfo.  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("evsignal_process@/opt/soft_bak/libevent-1.4.14b-stable/signal.c:314"), rpc.idmapd, base=0x2b27590e74f0 sig=? ev=? next_ev=? ncalls=? i=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_process@/opt/soft_bak/libevent-1.4.14b-stable/event.c:927"), rpc.idmapd, base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd, base=0x2b27590e74f0 tp=0x2b27590e7b08  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_correct@/opt/soft_bak/libevent-1.4.14b-stable/event.c:892"), rpc.idmapd, tv=? base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("min_heap_top@/opt/soft_bak/libevent-1.4.14b-stable/min_heap.h:63"), rpc.idmapd, s=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_next@/opt/soft_bak/libevent-1.4.14b-stable/event.c:856"), rpc.idmapd, tv_p=? base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd, base=0x2b27590e74f0 tp=0x2b27590e7ae8  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("epoll_dispatch@/opt/soft_bak/libevent-1.4.14b-stable/epoll.c:183"), rpc.idmapd, base=0x2b27590e74f0 arg=0x2b27590e6170 tv=0x0 epollop=? events=? evep=? i=? res=? timeout=?  

对应的进程如下

[root@db-172-16-3-39 lib]# which rpc.idmapd  
/usr/sbin/rpc.idmapd  

使用目标进程模式

[root@db-172-16-3-39 lib]# ps -ewf|grep rpc.idmapd  
root      3299     1  0 Sep26 ?        00:00:00 rpc.idmapd  
[root@db-172-16-3-39 lib]# stap -x 3299 -e 'probe process("/usr/local/lib/libevent-1.4.so.2.2.0").function("*") { printf ("%s, %s, %s\n", pp(), execname(), $$vars); } probe timer.s(1) {exit();}'  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("evsignal_process@/opt/soft_bak/libevent-1.4.14b-stable/signal.c:314"), rpc.idmapd, base=0x2b27590e74f0 sig=? ev=? next_ev=? ncalls=? i=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_process@/opt/soft_bak/libevent-1.4.14b-stable/event.c:927"), rpc.idmapd, base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd, base=0x2b27590e74f0 tp=0x2b27590e7b08  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_correct@/opt/soft_bak/libevent-1.4.14b-stable/event.c:892"), rpc.idmapd, tv=? base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("min_heap_top@/opt/soft_bak/libevent-1.4.14b-stable/min_heap.h:63"), rpc.idmapd, s=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_next@/opt/soft_bak/libevent-1.4.14b-stable/event.c:856"), rpc.idmapd, tv_p=? base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd, base=0x2b27590e74f0 tp=0x2b27590e7ae8  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("epoll_dispatch@/opt/soft_bak/libevent-1.4.14b-stable/epoll.c:183"), rpc.idmapd, base=0x2b27590e74f0 arg=0x2b27590e6170 tv=0x0 epollop=? events=? evep=? i=? res=? timeout=?  

在探针中限制进程:

[root@db-172-16-3-39 lib]# stap -e 'probe process("/usr/sbin/rpc.idmapd").library("/usr/local/lib/libevent-1.4.so.2.2.0").function("*") { printf ("%s, %s, %s\n", pp(), execname(), $$vars); } probe timer.s(1) {exit();}'  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("evsignal_process@/opt/soft_bak/libevent-1.4.14b-stable/signal.c:314"), rpc.idmapd, base=0x2b27590e74f0 sig=? ev=? next_ev=? ncalls=? i=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_process@/opt/soft_bak/libevent-1.4.14b-stable/event.c:927"), rpc.idmapd, base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd, base=0x2b27590e74f0 tp=0x2b27590e7b08  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_correct@/opt/soft_bak/libevent-1.4.14b-stable/event.c:892"), rpc.idmapd, tv=? base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("min_heap_top@/opt/soft_bak/libevent-1.4.14b-stable/min_heap.h:63"), rpc.idmapd, s=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("timeout_next@/opt/soft_bak/libevent-1.4.14b-stable/event.c:856"), rpc.idmapd, tv_p=? base=?  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("gettime@/opt/soft_bak/libevent-1.4.14b-stable/event.c:140"), rpc.idmapd, base=0x2b27590e74f0 tp=0x2b27590e7ae8  
process("/usr/local/lib/libevent-1.4.so.2.2.0").function("epoll_dispatch@/opt/soft_bak/libevent-1.4.14b-stable/epoll.c:183"), rpc.idmapd, base=0x2b27590e74f0 arg=0x2b27590e6170 tv=0x0 epollop=? events=? evep=? i=? res=? timeout=?  

程序链表探针 :

The first syntax in the following will probe the functions in the program linkage table of a particular process.   
The second syntax will also add the program linkage tables of libraries required by that process.   
.plt("...") can be specified to match particular plt entries.  
  
probe process("...").plt { ... }  
probe process("...").plt process("...").library("...").plt { ... }  

举例 :

[root@db-172-16-3-39 lib]# stap -e 'probe process("/usr/sbin/rpc.idmapd").library("/usr/local/lib/libevent-1.4.so.2.2.0").plt { printf ("%s, %s\n", pp(), execname()); } probe timer.s(1) {exit();}'  
process("/usr/local/lib/libevent-1.4.so.2.2.0").statement(0x5080)?, rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").statement(0x50a0)?, rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").statement(0x5350)?, rpc.idmapd  
process("/usr/local/lib/libevent-1.4.so.2.2.0").statement(0x5160)?, rpc.idmapd  
  
[root@db-172-16-3-39 lib]# objdump -d /usr/local/lib/libevent-1.4.so.2.2.0|grep -E "5080|50a0|5350|5160"  
0000000000005080 <__errno_location@plt>:  
    5080:       ff 25 c2 41 21 00       jmpq   *2179522(%rip)        # 219248 <_GLOBAL_OFFSET_TABLE_+0x380>  
00000000000050a0 <evsignal_process@plt>:  
    50a0:       ff 25 b2 41 21 00       jmpq   *2179506(%rip)        # 219258 <_GLOBAL_OFFSET_TABLE_+0x390>  
0000000000005160 <epoll_wait@plt>:  
    5160:       ff 25 52 41 21 00       jmpq   *2179410(%rip)        # 2192b8 <_GLOBAL_OFFSET_TABLE_+0x3f0>  
0000000000005350 <clock_gettime@plt>:  
    5350:       ff 25 5a 40 21 00       jmpq   *2179162(%rip)        # 2193b0 <_GLOBAL_OFFSET_TABLE_+0x4e8>  

参考

1. http://blog.163.com/digoal@126/blog/static/163877040201382941342901/

2. http://docs.oracle.com/cd/E22055_01/html/821-2505/afamv.html

3. http://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html

4. http://stackoverflow.com/questions/9688076/process-linkage-table-and-global-offset-table

5. man readelf

6. man nm

7. man objdump

8.

共享对象之间的函数调用

一个共享对象中的函数调用另一个共享对象中的函数时,其执行情况比在程序内对函数的简单调用更复杂。每个共享对象都包含一个程序链接表 (Program Linkage Table, PLT),该表包含位于该共享对象外部并从该共享对象引用的每个函数的条目。最初,PLT 中每个外部函数的地址实际上是 ld.so(即动态链接程序)内的地址。第一次调用这样的函数时,控制权将转移到动态链接程序,该动态链接程序会解析对实际外部函数的调用并为后续调用修补 PLT 地址。

如果在执行三个 PLT 指令之一的过程中发生分析事件,则 PLT PC 会被删除,并将独占时间归属到调用指令。如果在通过 PLT 条目首次调用期间发生分析事件,但是叶 PC 不是 PLT 指令之一,PLT 和 ld.so 中的代码引起的任何 PC 都将归属到人工函数 @plt 中,该函数将累计非独占时间。每个共享对象都存在一个这样的人工函数。如果程序使用 LD_AUDIT 接口,则可能从不修补 PLT 条目,而且来自 @plt 的非叶 PC 可能发生得更频繁。

Flag Counter

digoal’s 大量PostgreSQL文章入口