systemtap Auxiliary functions and Embedded C

6 minute read

背景

一. systemtap函数

systemtap 中的函数只支持标量变量. 也就是systemtap支持的字符串和长整型.

Functions may take any number of scalar arguments, and must return a single scalar value. Scalars in this context are integers or strings.  

语法如下

function <name>[:<type>] ( <arg1>[:<type>], ... ) { <stmts> }  

可以指定变量类型, 也可以不指定变量类型

function thisfn (arg1, arg2) {  
    return arg1 + arg2  
}  
  
function thatfn:string(arg1:long, arg2) {  
    return sprintf("%d%s", arg1, arg2)  
}  

并且systemtap函数支持递归调用, 但是递归次数受到安全限制. 参考 :

http://blog.163.com/digoal@126/blog/static/163877040201381021752228/

二. Embedded C

systemtap脚本中支持嵌入C代码. 嵌入的C代码使用符号%{和%}包围起来.

嵌入的C代码可以嵌入到脚本的最顶层, 也就是最外层, 与函数, 全局变量, probe一个级别.

嵌入的C代码也可以作为函数体使用,

甚至可以用在表达式中.

例如 : 

%{  
#include <linux/in.h>  
#include <linux/ip.h>  
%} /* <-- top level */  
  
/* Reads the char value stored at a given address: */   
function __read_char:long(addr:long) %{ /* pure */  
         STAP_RETVALUE = kderef(sizeof(char), STAP_ARG_addr);  
         CATCH_DEREF_FAULT ();  
%} /* <-- function body */  
  
/* Determines whether an IP packet is TCP, based on the iphdr: */  
function is_tcp_packet:long(iphdr) {  
         protocol = @cast(iphdr, "iphdr")->protocol  
         return (protocol == %{ IPPROTO_TCP %}) /* <-- expression */  
}  

三. Embedded C functions

嵌入C代码作为systemtap 函数体时, 语法如下 :

function <name>:<type> ( <arg1>:<type>, ... ) %{ <C_stmts> %}  
The enclosed code may do anything reasonable and safe as allowed by the C parser.  

在函数体内的C代码中可以执行c parser允许的任何事情. (也就是说在systemtap脚本中的C代码不是完全没有保护的. 保护的责任交给了c parser)

c parser在这里就不深究了.

同时在systemtap脚本中还有一套复杂的约束体现, 在并发, 资源开销, handler的执行时间方面做了一些限制. 以免handler给内核带来太大的负担或者造成系统crash. 但是这些约束无法对嵌入的C代码进行检查, 所以使用C代码比使用systemtap 普通的脚本更危险.

There are a number of undocumented but complex safety constraints on concurrency, resource consumption and runtime limits that are applied to code written in the SystemTap language. These constraints are not applied to embedded C code, so use embedded C code with extreme caution. Be especially careful when dereferencing pointers. Use the kread() macro to dereference any pointers that could potentially be invalid or dangerous. If you are unsure, err on the side of caution and use kread(). The kread() macro is one of the safety mechanisms used in code generated by embedded C. It protects against pointer accesses that could crash the system.

例如dereference操作, 尽量使用kread()宏来完成.

For example, to access the pointer chain name = skb->dev->name in embedded C, use the following code.  
struct net_device *dev;  
char *name;  
dev = kread(&(skb->dev));  
name = kread(&(dev->name));  

C函数体和systemtap函数的输入输出交互使用宏

STAP_ARG_foo (for arguments named foo)  

STAP_RETVALUE  

例如 :

The memory locations reserved for input and output values are provided to a function using macros named STAP_ARG_foo (for arguments named foo) and STAP_RETVALUE. The following are examples.  
function add_one (val:long) %{  
    STAP_RETVALUE = STAP_ARG_val + 1;  
%}  
function add_one_str:string (val:string) %{  
    strlcpy (STAP_RETVALUE, STAP_ARG_val, MAXSTRINGLEN);  
    strlcat (STAP_RETVALUE, "one", MAXSTRINGLEN);  
%}  
The function argument and return value types should be stated; the translator does not analyze the embedded C code within the function. You should examine C code generated for ordinary script language functions to write compatible embedded-C.   

注意使用C函数体的函数, 最好强制指定返回值类型和参数类型. 例如 function add_one_str:string (val:string) %{

Note that all SystemTap functions and probes run with interrupts disabled, thus you cannot call functions that might sleep within the embedded C.

最后还要注意的是, 在systemtap函数以及probe handler中不允许使用中断, 所以在函数中类似sleep这样的处理是不允许的.

四. Embedded C pragma comments

用户可以在嵌入的C代码的注释中包含一些特定的信息, 这些信息的用途是告诉systemtap在对这部分代码解释时实施什么样的优化手段以及赋予什么样的安全属性.

例如嵌入C代码的systemtap脚本, 正常情况下是不能直接执行的. 需要带上-g 参数, 用guru模式执行.

例如 :

[root@db-172-16-3-39 ~]# cat test.stp   
function test:long (arg1:long)   
%{  
  STAP_RETVALUE = ++STAP_ARG_arg1 ;  
%}  
  
probe begin {  
  v1 = test($1)  
  printf("%d\n", v1)  
  exit()  
}  

直接执行的话, 会出现一个报错. (parse error: embedded code in unprivileged script; need stap -g)

[root@db-172-16-3-39 ~]# stap --vp 5 test.stp 10  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
parse error: embedded code in unprivileged script; need stap -g  
        saw: embedded-code at test.stp:2:1  
     source: %{  
             ^  
1 parse error.  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146804virt/23692res/3016shr/21400data kb, in 160usr/10sys/172real ms.  
Pass 1: parse failed.  Try again with another '--vp 1' option.  
Running rm -rf /tmp/stapkyhpL4  
Spawn waitpid result (0x0): 0  
Removed temporary directory "/tmp/stapkyhpL4"  

使用-g参数才可以执行 :

[root@db-172-16-3-39 ~]# stap --vp 5 -g test.stp 10  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146800virt/23712res/3008shr/21396data kb, in 190usr/10sys/205real ms.  
11  

或者使用comment. 如下 :

Embedded C blocks may contain various markers to assert optimization and safety properties.  
/* pure */ means that the C code has no side effects and may be elided entirely if its value is not used by script code.  
/* unprivileged */ means that the C code is so safe that even unprivileged users are permitted to use it. (This is useful, in particular, to define an embedded-C function inside a tapset that may be used by unprivileged code.)  
/* myproc-unprivileged */ means that the C code is so safe that even unprivileged users are permitted to use it, provided that the target of the current probe is within the user's own process.  
/* guru */ means that the C code is so unsafe that a systemtap user must specify -g (guru mode) to use this, even if the C code is being exported from a tapset.  
/* unmangled */, used in an embedded-C function, means that the legacy (pre-1.8) argument access syntax should be made available inside the function. Hence, in addition to STAP_ARG_foo and STAP_RETVALUE one can use THIS->foo and THIS->__retvalue respectively inside the function. This is useful for quickly migrating code written for SystemTap version 1.7 and earlier.  
/* string */ in embedded-C expressions only, means that the expression has const char * type and should be treated as a string value, instead of the default long numeric.  

在tapset中大量的使用了嵌入C函数, 例如 :

/usr/share/systemtap/tapset/ioblock.stp  
/* returns 0 for read, 1 for write */  
function bio_rw_num:long(rw:long)  
%{ /* pure */  
    long rw = (long)STAP_ARG_rw;  
    STAP_RETVALUE = (rw & REQ_WRITE);  
%}  

注意它这里用到的comment, /* pure */ , 这个是告知systemtap c parser这部分C代码是安全的. 可以被任意用户使用.

下面我们在一个脚本中调用这个函数试试, 是不是不需要-g参数.

[root@db-172-16-3-39 ~]# cat test.stp   
probe begin {  
  v1 = bio_rw_num($1)  
  printf("%d\n", v1)  
  exit()  
}  
[root@db-172-16-3-39 ~]# stap test.stp 101  
1  

果然不需要-g参数也可以正常执行.

那么我们如果把这个注释改成/* guru */ , 按照手册的介绍, 它会需要-g才可以被调用.

[root@db-172-16-3-39 tapset]# vi ioblock.stp   
/* returns 0 for read, 1 for write */  
function bio_rw_num:long(rw:long)  
%{ /* guru */  
    long rw = (long)STAP_ARG_rw;  
    STAP_RETVALUE = (rw & REQ_WRITE);  
%}  

再次调用前面的test.stp看看情况如何 :

[root@db-172-16-3-39 ~]# stap test.stp 101  
semantic error: function may not be used unless -g is specified: identifier 'bio_rw_num' at /usr/share/systemtap/tapset/ioblock.stp:41:10  
        source: function bio_rw_num:long(rw:long)  
                         ^  
Pass 2: analysis failed.  Try again with another '--vp 01' option.  

好了, 现在systemtap告诉你, 这个是需要加-g才能执行的.

最后需要注意的是, comment只有加到库stp文件中(-I指定或者系统指定默认的路径/usr/share/systemtap/tapset中)才有效.d

如果是放在本地需要执行的脚本中这些注释被无视. 例如 :

[root@db-172-16-3-39 ~]# cat test.stp   
function test:long (arg1:long)    
%{ /* pure */  
  STAP_RETVALUE = ++STAP_ARG_arg1 ;  
%}  
  
probe begin {  
  v1 = test($1)  
  printf("%d\n", v1)  
  exit()  
}  
  
[root@db-172-16-3-39 ~]# stap --vp 5 test.stp 99  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
parse error: embedded code in unprivileged script; need stap -g  
        saw: embedded-code at test.stp:2:1  
     source: %{ /* pure */  
             ^  
1 parse error.  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146796virt/23688res/3016shr/21392data kb, in 170usr/0sys/172real ms.  
Pass 1: parse failed.  Try again with another '--vp 1' option.  
Running rm -rf /tmp/stapyH0TYC  
Spawn waitpid result (0x0): 0  
Removed temporary directory "/tmp/stapyH0TYC"  

这个例子即使函数中带了注释/* pure */ 也是无助的. 还是需要-g参数.

[root@db-172-16-3-39 ~]# stap --vp 5 -g test.stp 99  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146800virt/23704res/3008shr/21396data kb, in 150usr/20sys/173real ms.  
100  

但是如果把这个函数拆出来, 放到一个自定义库stp文件中, 这个注释就有效了.

[root@db-172-16-3-39 ~]# vi /tmp/p.stp  
function test:long (arg1:long)  
%{ /* pure */  
  STAP_RETVALUE = ++STAP_ARG_arg1 ;  
%}  

然后修改test.stp

[root@db-172-16-3-39 ~]# cat test.stp   
probe begin {  
  v1 = test($1)  
  printf("%d\n", v1)  
  exit()  
}  

再次执行, ok了.

[root@db-172-16-3-39 ~]# stap --vp 5 -I /tmp test.stp 99  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Searched: " /tmp/*.stp ", found: 2, processed: 2  
Pass 1: parsed user script and 87 library script(s) using 146916virt/23700res/3012shr/21512data kb, in 170usr/10sys/174real ms.  
100  

参考

1. https://sourceware.org/systemtap/langref/Components_SystemTap_script.html

2. http://blog.163.com/digoal@126/blog/static/163877040201381021752228/

Flag Counter

digoal’s 大量PostgreSQL文章入口