SystemTap Errors Introduce

10 minute read

背景

SystemTap的常见错误大致可以分为两类.  
  
一, 解析和语义阶段产生的错误  
  
这类错误发生在systemtap解析stp脚本以及转换成C代码的阶段.  
  
错误举例  
1. 语义错误, 错误表现  
parse error: expected foo, saw bar  
例如, 缺失handler部分, 导致语义错误.  
[root@db-172-16-3-150 share]# stap -e 'probe vfs.read                                  
probe vfs.write'  
parse error: expected one of '. , ( ? ! { = +='  
        saw: keyword at <input>:2:1  
     source: probe vfs.write  
             ^  
parse error: expected one of '. , ( ? ! { = +='  
        saw: <input> EOF  
2 parse errors.  
Pass 1: parse failed.  [man error::pass1]  
补充handler即可修正错误 :   
[root@db-172-16-3-150 share]# stap -e 'probe vfs.read {}                                  
probe vfs.write {}'  
  
2. 权限错误  
parse error: embedded code in unprivileged script  
例如, 在代码中使用了%{ embedded C code }%, 但是未使用stap -g选项会导致这个错误.  
[root@db-172-16-3-150 share]# stap -e '     
function square:long (i:long) %{  
  STAP_RETVALUE = STAP_ARG_i * STAP_ARG_i;  
%}  
probe begin {  
  i=square(9)  
  println(i)  
  exit()  
}'  
parse error: embedded code in unprivileged script; need stap -g  
        saw: embedded-code at <input>:2:31  
     source: function square:long (i:long) %{  
                                           ^  
1 parse error.  
Pass 1: parse failed.  [man error::pass1]  
使用-g选项修正错误.  
[root@db-172-16-3-150 share]# stap -g -e '  
function square:long (i:long) %{  
  STAP_RETVALUE = STAP_ARG_i * STAP_ARG_i;  
%}  
probe begin {  
  i=square(9)  
  println(i)  
  exit()  
}'  
81  
  
3. 类型匹配错误  
semantic error: type mismatch for identifier 'foo' ... string vs. long  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  a = 10  
  a = execname()  
  println("a is:", a)  
  exit()  
}'  
semantic error: type mismatch (long vs. string): identifier 'a' at <input>:3:3  
        source:   a = 10  
                  ^  
  
semantic error: type was first inferred here (string): identifier 'a' at :3:3  
        source:   a = 10  
                  ^  
  
Pass 2: analysis failed.  [man error::pass2]  
a开始=10, 是long类型, 后来又赋值execname(), 是string, 所以发生了不匹配的错误.  
使用一致的类型修正即可.  
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  a = 10  
  a = pid()       
  println("a is:", a)  
  exit()  
}'  
a is:23014  
  
4. 不能推测出变量的类型时, 会报这个错误.  
semantic error: unresolved type for identifier 'foo'  
例如, 在printf函数中使用了一个未初始化的变量.  
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  println("v is:", v)  
  exit()  
}'  
WARNING: never-assigned local variable 'v' : identifier 'v' at <input>:3:20  
 source:   println("v is:", v)  
                            ^  
semantic error: unresolved type : identifier 'v' at :3:20  
        source:   println("v is:", v)  
                                   ^  
  
semantic error: unresolved type : identifier 'println' at :3:3  
        source:   println("v is:", v)  
                  ^  
  
Pass 2: analysis failed.  [man error::pass2]  
变量初始化即可解决 :   
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  v = 100  
  println("v is:", v)  
  exit()  
}'  
v is:100  
  
5. 当赋值对象不是一个有效的变量或数组元素时, 会报如下错误.  
semantic error: Expecting symbol or array index expression.  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  println("hello") = 1  
  exit()  
}'  
semantic error: Expecting symbol or array index expression: identifier 'println' at <input>:3:3  
        source:   println("hello") = 1  
                  ^  
  
Pass 2: analysis failed.  [man error::pass2]  
  
6. 调用函数时, 传入的参数个数和函数参数个数不匹配.  
或者是数组的索引个数不匹配时报错.  
while searching for arity N function, semantic error: unresolved function call  
例如 :   
函数参数个数不匹配  
[root@db-172-16-3-150 share]# stap -e '  
function add:long (a:long, b:long) {  
  return a+b  
}  
global arr  
probe begin {  
  println("add(10): ", add(10))  
  exit()  
}'  
WARNING: mismatched arity-2 function found: identifier 'add' at <input>:2:10  
 source: function add:long (a:long, b:long) {  
                  ^  
semantic error: unresolved arity-1 function: identifier 'add' at :7:24  
        source:   println("add(10): ", add(10))  
                                       ^  
  
Pass 2: analysis failed.  [man error::pass2]  
数组索引个数不匹配  
[root@db-172-16-3-150 share]# stap -e '  
global arr  
probe begin {  
  arr[1,2,3]="hello"  
  println("arr: ", arr[1,2])  
  exit()  
}'  
semantic error: inconsistent arity (3 vs 2): identifier 'arr' at <input>:5:20  
        source:   println("arr: ", arr[1,2])  
                                   ^  
  
semantic error: arity 3 first inferred here: identifier 'arr' at :4:3  
        source:   arr[1,2,3]="hello"  
                  ^  
  
Pass 2: analysis failed.  [man error::pass2]  
  
7. 当数组变量未定义为全局变量时报错,  
semantic error: array locals not supported, missing global declaration?  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  arr[1,2]= "hello"  
  exit()  
}'  
semantic error: unresolved arity-2 global array arr, missing global declaration?: identifier 'arr' at <input>:3:3  
        source:   arr[1,2]= "hello"  
                  ^  
  
Pass 2: analysis failed.  [man error::pass2]  
  
8. 在foreach中, 不允许修改数组的值, 否则会报错. 这样的限制是为了提高stap 一个handler的运行速度. 减少带来的性能问题.  
semantic error: variable ’foo’ modi?ed during ’foreach’ iteration  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
global arr  
probe begin {  
  arr[1]="a"  
  arr[2]="b"  
  foreach(idx in arr)   
    arr[idx]="new"  
  exit()  
}'  
semantic error: variable 'arr' modified during 'foreach' iteration: identifier 'arr' at <input>:7:5  
        source:     arr[idx]="new"  
                    ^  
  
Pass 2: analysis failed.  [man error::pass2]  
  
9. 当event不存在或者在tapset库中无法找到时, 会报如下错误  
semantic error: probe point mismatch at position N, while resolving probe point foo  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
probe test {  
}'  
semantic error: while resolving probe point: identifier 'test' at <input>:2:7  
        source: probe test {  
                      ^  
  
semantic error: probe point mismatch  (alternatives: __nd_syscall __nfs __scheduler __signal __tcpmib __vm _linuxmib _nfs _signal _sunrpc _syscall _vfs begin begin(number) end end(number) error error(number) generic ioblock ioblock_trace ioscheduler ioscheduler_trace ipmib irq_handler java(number) java(string) kernel kprobe kprocess linuxmib module(string) nd_syscall netdev netfilter never nfs nfsd perf process process(number) process(string) procfs procfs(string) scheduler scsi signal socket softirq stap staprun sunrpc syscall tcp tcpmib timer tty udp vfs vm workqueue): identifier 'test' at :2:7  
        source: probe test {  
                      ^  
  
Pass 2: analysis failed.  [man error::pass2]  
  
10. 当探针中的函数不存在时, 报如下错误. 例如kernel.function("test"), test函数不存在.  
semantic error: no match for probe point, while resolving probe point foo  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
probe kernel.function("test") {  
}'  
semantic error: while resolving probe point: identifier 'kernel' at <input>:2:7  
        source: probe kernel.function("test") {  
                      ^  
  
semantic error: no match (similar functions: bs, del, dget, dput, eat)  
Pass 2: analysis failed.  [man error::pass2]  
  
11. 在handler中获取探针处的上下文变量(target variables)的值时, 可能由于变量值不可获取(或变量不存在等)报错 :   
semantic error: unresolved target-symbol expression  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
probe vfs.read {  
  println($$vars)  
  exit()  
}'  
file=0xffff8818169bc140 buf=0x7fff453edb70 count=0x2004 pos=0xffff88141aa27f48 ret=?  
读取一个不存在的target variable将报错 :   
[root@db-172-16-3-150 share]# stap -e '  
probe vfs.read {  
  println($abc)    
  exit()  
}'  
semantic error: unable to find local 'abc', [man error::dwarf] dieoffset 0x125bd59 in kernel, near pc 0xffffffff81181610 in vfs_read fs/read_write.c (alternatives: $file $buf $count $pos $ret): identifier '$abc' at <input>:3:11  
        source:   println($abc)  
                          ^  
  
Pass 2: analysis failed.  [man error::pass2]  
或者该变量的地址中无法获得相应的值.  
[root@db-172-16-3-150 share]# stap -e '  
probe vfs.read {  
  println($ret)  
  exit()  
}'  
semantic error: not accessible at this address [man error::dwarf] (0xffffffff81181610, dieoffset: 0x125bdbd): identifier '$ret' at <input>:3:11  
        source:   println($ret)  
                          ^  
  
Pass 2: analysis failed.  [man error::pass2]  
这个错误也可能是由于代码优化导致的.  
This may be a result of compiler optimization of the generated code.  
  
12. 当安装的kernel-debuginfo包和运行的kernel版本不一致, 或者需要探针对应的包的debuginfo但是对应的debuginfo包版本不一致时可能产生如下类型的错误.  
semantic error: libdw? failure  
例如 :   
[root@db-172-16-3-150 share]# uname -r   
2.6.32-358.el6.x86_64  
[root@db-172-16-3-150 share]# rpm -qa|grep kernel-debuginfo  
kernel-debuginfo-2.6.32-358.23.2.el6.centos.plus.x86_64  
kernel-debuginfo-common-x86_64-2.6.32-358.23.2.el6.centos.plus.x86_64  
[root@db-172-16-3-150 share]# stap -e '           
probe vfs.read {  
  println($$vars)  
  exit()  
}'  
semantic error: while resolving probe point: identifier 'kernel' at /opt/systemtap/share/systemtap/tapset/linux/vfs.stp:768:18  
        source: probe vfs.read = kernel.function("vfs_read")  
                                 ^  
  
semantic error: missing x86_64 kernel/module debuginfo [man warning::debuginfo] under '/lib/modules/2.6.32-358.el6.x86_64/build'  
semantic error: while resolving probe point: identifier 'vfs' at <input>:2:7  
        source: probe vfs.read {  
                      ^  
  
semantic error: no match  
Pass 2: analysis failed.  [man error::pass2]  
安装与kernel版本对应的kernel-debuginfo包即可.  
[root@db-172-16-3-150 share]# yum install -y kernel-debuginfo-2.6.32-358.el6.x86_64  
或者本文第13条中的例子中如果使用了不同版本的debuginfo, 也是会报类似错误.  
rpm -ivh coreutils-debuginfo.x86_64 0:8.4-19.el6_4.2   
[root@db-172-16-3-150 share]# rpm -qa|grep coreutils  
coreutils-debuginfo-8.4-19.el6_4.2.x86_64  
coreutils-libs-8.4-19.el6.x86_64  
coreutils-8.4-19.el6.x86_64  
policycoreutils-2.0.83-19.30.el6.x86_64  
[root@db-172-16-3-150 share]# stap -d /bin/ls --ldd -e 'probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}' -c "ls /"  
WARNING: cannot find module /bin/ls debuginfo: No DWARF information found [man warning::debuginfo]  
semantic error: while resolving probe point: identifier 'process' at <input>:1:7  
        source: probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}  
                      ^  
  
semantic error: no match  
Pass 2: analysis failed.  [man error::pass2]  
  
13. 当需要探针对应的包的debuginfo时, 但是该包未安装. 会产生类似如下错误.  
semantic error: cannot find foo debuginfo  
例如 :   
[root@db-172-16-3-150 pg93]# stap -d /bin/ls --ldd -e 'probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}' -c "ls /"  
WARNING: cannot find module /bin/ls debuginfo: No DWARF information found [man warning::debuginfo]  
semantic error: while resolving probe point: identifier 'process' at <input>:1:7  
        source: probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}  
                      ^  
semantic error: no match  
Pass 2: analysis failed.  [man error::pass2]  
安装对应的debuginfo即可解决  
查找/bin/ls所在的包名  
[root@db-172-16-3-150 pg93]# rpm -qf /bin/ls  
coreutils-8.4-19.el6.x86_64  
安装coreutils对于的debuginfo包.  
[root@db-172-16-3-150 pg93]# yum install -y coreutils-debuginfo-8.4-19.el6.x86_64  
  
二, 生产模块后, 模块在内核中运行阶段产生的错误和警告.  
这类错误发生在运行时, staprun通过模块与内核交互, 采集数据的阶段.  
错误举例  
1. 执行过程中产生了多少错误以及跳过了多少probe.  
WARNING: Number of errors: N, skipped probes: M  
例如  
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  error("1.error funn\n")  
}  
probe end {  
  printf("2.end probe\n")  
}  
probe error {  
  printf("3.error probe\n")  
}'  
ERROR: 1.error funn  
3.error probe  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
  
2. 除数为0时报错  
division by 0  
例如  
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  println(10/0)  
  exit()  
}'  
ERROR: division by 0 near operator '/' at <input>:3:13  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
  
3. 当统计类型变量中没有元素, 但是使用了@count, @sum以外的操作符(avg, min, max)时, 会报如下错误  
aggregate element not found  
例如  
[root@db-172-16-3-150 share]# /usr/bin/stap -e '  
global s  
probe begin {  
  println(@count(s))     
  exit()  
}'  
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8  
 source: global s  
                ^  
0  
[root@db-172-16-3-150 share]# /usr/bin/stap -e '  
global s  
probe begin {  
  println(@sum(s))    
  exit()  
}'  
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8  
 source: global s  
                ^  
0  
avg, min, max报错  
[root@db-172-16-3-150 share]# /usr/bin/stap -e '  
global s  
probe begin {  
  println(@avg(s))  
  exit()  
}'  
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8  
 source: global s  
                ^  
ERROR: empty aggregate near identifier '@avg' at <input>:4:11  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /usr/bin/staprun exited with status: 1  
Pass 5: run failed.  Try again with another '--vp 00001' option.  
[root@db-172-16-3-150 share]# /usr/bin/stap -e '  
global s  
probe begin {  
  println(@min(s))  
  exit()  
}'  
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8  
 source: global s  
                ^  
ERROR: empty aggregate near identifier '@min' at <input>:4:11  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /usr/bin/staprun exited with status: 1  
Pass 5: run failed.  Try again with another '--vp 00001' option.  
[root@db-172-16-3-150 share]# /usr/bin/stap -e '  
global s  
probe begin {  
  println(@max(s))  
  exit()  
}'  
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8  
 source: global s  
                ^  
ERROR: empty aggregate near identifier '@max' at <input>:4:11  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /usr/bin/staprun exited with status: 1  
Pass 5: run failed.  Try again with another '--vp 00001' option.  
  
4. 数组中包含的索引个数超出数组初始化的元素个数时, 报错  
aggregation overflow  
Array overflow  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
global arr[10]  
probe timer.ms(1) {  
  arr[gettimeofday_ms()] <<< gettimeofday_ms()  
}  
probe timer.s(1) {  
  foreach (i in arr) {  
    println(@count(arr[i]))  
  }  
}'  
ERROR: Array overflow, check size limit (10) near identifier 'arr' at <input>:4:3  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
解决办法, 使用-D MAXMAPENTRIES=n 指定更大的元素初始值, 或者使用global arr[n] 定义更大的初始值.  
  
5. 函数嵌套调用次数超出限制  
MAXNESTING exceeded  
例如  
[root@db-172-16-3-150 share]# stap -e '  
> function fibonacci(i) {  
>     if (i < 1) error ("bad number")  
>     if (i == 1) return 1  
>     if (i == 2) return 2  
>     return fibonacci (i-1) + fibonacci (i-2)  
> }  
> probe begin {  
>   println(fibonacci(10))  
>   exit()  
> }  
> '  
89  
[root@db-172-16-3-150 share]# stap -e '  
function fibonacci(i) {  
    if (i < 1) error ("bad number")  
    if (i == 1) return 1  
    if (i == 2) return 2  
    return fibonacci (i-1) + fibonacci (i-2)  
}  
probe begin {  
  println(fibonacci(100))  
  exit()  
}  
'  
ERROR: MAXNESTING exceeded near identifier 'fibonacci' at <input>:2:10  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
解决办法, 使用-D MAXNESTING=n指定更大的允许嵌套次数  
  
6. 当handler执行的语句数超出限制时报错  
MAXACTION exceeded  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
> probe begin {  
>   for(i=0;i<10000;i++) {  
>   }  
>   exit()  
> }'  
ERROR: MAXACTION exceeded near keyword at <input>:3:3  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
解决办法, 使用-D MAXACTION=n 提高限制数.  
  
7. 当地址不存在, 或者其他原因导致获取制定地址信息错误.  
kernel/user string copy fault at ADDR  
例如 :   
[root@db-172-16-3-150 share]# stap -e '  
> probe begin {  
>   println(user_string(123))  
>   exit()  
> }'  
ERROR: user string copy fault -14 at 000000000000007b near identifier 'user_string_n' at /opt/systemtap/share/systemtap/tapset/uconversions.stp:120:10  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  println(kernel_string(123))  
  exit()  
}'  
ERROR: kernel string copy fault at 0x000000000000007b near identifier 'kernel_string' at /opt/systemtap/share/systemtap/tapset/linux/conversions.stp:18:10  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
[root@db-172-16-3-150 share]# stap -e '  
probe begin {  
  println(kernel_int(123))     
  exit()  
}'  
ERROR: kernel int copy fault at 0x000000000000007b near identifier 'kernel_int' at /opt/systemtap/share/systemtap/tapset/linux/conversions.stp:198:10  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /opt/systemtap/bin/staprun exited with status: 1  
Pass 5: run failed.  [man error::pass5]  
  
8. 取消引用上下文指针变量时的报错.  
pointer dereference fault  
There was a fault encountered during a pointer dereference operation such as a target variable evaluation.  

参考

1. https://sourceware.org/systemtap/SystemTap_Beginners_Guide/errors.html

2. https://sourceware.org/systemtap/SystemTap_Beginners_Guide/runtimeerror.html

3. https://sourceware.org/systemtap/wiki/TipExhaustedResourceErrors

Flag Counter

digoal’s 大量PostgreSQL文章入口