SystemTap Errors Introduce
背景
SystemTap的常见错误大致可以分为两类.
一, 解析和语义阶段产生的错误
这类错误发生在systemtap解析stp脚本以及转换成C代码的阶段.
错误举例
1. 语义错误, 错误表现
parse error: expected foo, saw bar
例如, 缺失handler部分, 导致语义错误.
[root@db-172-16-3-150 share]# stap -e 'probe vfs.read
probe vfs.write'
parse error: expected one of '. , ( ? ! { = +='
saw: keyword at <input>:2:1
source: probe vfs.write
^
parse error: expected one of '. , ( ? ! { = +='
saw: <input> EOF
2 parse errors.
Pass 1: parse failed. [man error::pass1]
补充handler即可修正错误 :
[root@db-172-16-3-150 share]# stap -e 'probe vfs.read {}
probe vfs.write {}'
2. 权限错误
parse error: embedded code in unprivileged script
例如, 在代码中使用了%{ embedded C code }%, 但是未使用stap -g选项会导致这个错误.
[root@db-172-16-3-150 share]# stap -e '
function square:long (i:long) %{
STAP_RETVALUE = STAP_ARG_i * STAP_ARG_i;
%}
probe begin {
i=square(9)
println(i)
exit()
}'
parse error: embedded code in unprivileged script; need stap -g
saw: embedded-code at <input>:2:31
source: function square:long (i:long) %{
^
1 parse error.
Pass 1: parse failed. [man error::pass1]
使用-g选项修正错误.
[root@db-172-16-3-150 share]# stap -g -e '
function square:long (i:long) %{
STAP_RETVALUE = STAP_ARG_i * STAP_ARG_i;
%}
probe begin {
i=square(9)
println(i)
exit()
}'
81
3. 类型匹配错误
semantic error: type mismatch for identifier 'foo' ... string vs. long
例如 :
[root@db-172-16-3-150 share]# stap -e '
probe begin {
a = 10
a = execname()
println("a is:", a)
exit()
}'
semantic error: type mismatch (long vs. string): identifier 'a' at <input>:3:3
source: a = 10
^
semantic error: type was first inferred here (string): identifier 'a' at :3:3
source: a = 10
^
Pass 2: analysis failed. [man error::pass2]
a开始=10, 是long类型, 后来又赋值execname(), 是string, 所以发生了不匹配的错误.
使用一致的类型修正即可.
[root@db-172-16-3-150 share]# stap -e '
probe begin {
a = 10
a = pid()
println("a is:", a)
exit()
}'
a is:23014
4. 不能推测出变量的类型时, 会报这个错误.
semantic error: unresolved type for identifier 'foo'
例如, 在printf函数中使用了一个未初始化的变量.
[root@db-172-16-3-150 share]# stap -e '
probe begin {
println("v is:", v)
exit()
}'
WARNING: never-assigned local variable 'v' : identifier 'v' at <input>:3:20
source: println("v is:", v)
^
semantic error: unresolved type : identifier 'v' at :3:20
source: println("v is:", v)
^
semantic error: unresolved type : identifier 'println' at :3:3
source: println("v is:", v)
^
Pass 2: analysis failed. [man error::pass2]
变量初始化即可解决 :
[root@db-172-16-3-150 share]# stap -e '
probe begin {
v = 100
println("v is:", v)
exit()
}'
v is:100
5. 当赋值对象不是一个有效的变量或数组元素时, 会报如下错误.
semantic error: Expecting symbol or array index expression.
例如 :
[root@db-172-16-3-150 share]# stap -e '
probe begin {
println("hello") = 1
exit()
}'
semantic error: Expecting symbol or array index expression: identifier 'println' at <input>:3:3
source: println("hello") = 1
^
Pass 2: analysis failed. [man error::pass2]
6. 调用函数时, 传入的参数个数和函数参数个数不匹配.
或者是数组的索引个数不匹配时报错.
while searching for arity N function, semantic error: unresolved function call
例如 :
函数参数个数不匹配
[root@db-172-16-3-150 share]# stap -e '
function add:long (a:long, b:long) {
return a+b
}
global arr
probe begin {
println("add(10): ", add(10))
exit()
}'
WARNING: mismatched arity-2 function found: identifier 'add' at <input>:2:10
source: function add:long (a:long, b:long) {
^
semantic error: unresolved arity-1 function: identifier 'add' at :7:24
source: println("add(10): ", add(10))
^
Pass 2: analysis failed. [man error::pass2]
数组索引个数不匹配
[root@db-172-16-3-150 share]# stap -e '
global arr
probe begin {
arr[1,2,3]="hello"
println("arr: ", arr[1,2])
exit()
}'
semantic error: inconsistent arity (3 vs 2): identifier 'arr' at <input>:5:20
source: println("arr: ", arr[1,2])
^
semantic error: arity 3 first inferred here: identifier 'arr' at :4:3
source: arr[1,2,3]="hello"
^
Pass 2: analysis failed. [man error::pass2]
7. 当数组变量未定义为全局变量时报错,
semantic error: array locals not supported, missing global declaration?
例如 :
[root@db-172-16-3-150 share]# stap -e '
probe begin {
arr[1,2]= "hello"
exit()
}'
semantic error: unresolved arity-2 global array arr, missing global declaration?: identifier 'arr' at <input>:3:3
source: arr[1,2]= "hello"
^
Pass 2: analysis failed. [man error::pass2]
8. 在foreach中, 不允许修改数组的值, 否则会报错. 这样的限制是为了提高stap 一个handler的运行速度. 减少带来的性能问题.
semantic error: variable ’foo’ modi?ed during ’foreach’ iteration
例如 :
[root@db-172-16-3-150 share]# stap -e '
global arr
probe begin {
arr[1]="a"
arr[2]="b"
foreach(idx in arr)
arr[idx]="new"
exit()
}'
semantic error: variable 'arr' modified during 'foreach' iteration: identifier 'arr' at <input>:7:5
source: arr[idx]="new"
^
Pass 2: analysis failed. [man error::pass2]
9. 当event不存在或者在tapset库中无法找到时, 会报如下错误
semantic error: probe point mismatch at position N, while resolving probe point foo
例如 :
[root@db-172-16-3-150 share]# stap -e '
probe test {
}'
semantic error: while resolving probe point: identifier 'test' at <input>:2:7
source: probe test {
^
semantic error: probe point mismatch (alternatives: __nd_syscall __nfs __scheduler __signal __tcpmib __vm _linuxmib _nfs _signal _sunrpc _syscall _vfs begin begin(number) end end(number) error error(number) generic ioblock ioblock_trace ioscheduler ioscheduler_trace ipmib irq_handler java(number) java(string) kernel kprobe kprocess linuxmib module(string) nd_syscall netdev netfilter never nfs nfsd perf process process(number) process(string) procfs procfs(string) scheduler scsi signal socket softirq stap staprun sunrpc syscall tcp tcpmib timer tty udp vfs vm workqueue): identifier 'test' at :2:7
source: probe test {
^
Pass 2: analysis failed. [man error::pass2]
10. 当探针中的函数不存在时, 报如下错误. 例如kernel.function("test"), test函数不存在.
semantic error: no match for probe point, while resolving probe point foo
例如 :
[root@db-172-16-3-150 share]# stap -e '
probe kernel.function("test") {
}'
semantic error: while resolving probe point: identifier 'kernel' at <input>:2:7
source: probe kernel.function("test") {
^
semantic error: no match (similar functions: bs, del, dget, dput, eat)
Pass 2: analysis failed. [man error::pass2]
11. 在handler中获取探针处的上下文变量(target variables)的值时, 可能由于变量值不可获取(或变量不存在等)报错 :
semantic error: unresolved target-symbol expression
例如 :
[root@db-172-16-3-150 share]# stap -e '
probe vfs.read {
println($$vars)
exit()
}'
file=0xffff8818169bc140 buf=0x7fff453edb70 count=0x2004 pos=0xffff88141aa27f48 ret=?
读取一个不存在的target variable将报错 :
[root@db-172-16-3-150 share]# stap -e '
probe vfs.read {
println($abc)
exit()
}'
semantic error: unable to find local 'abc', [man error::dwarf] dieoffset 0x125bd59 in kernel, near pc 0xffffffff81181610 in vfs_read fs/read_write.c (alternatives: $file $buf $count $pos $ret): identifier '$abc' at <input>:3:11
source: println($abc)
^
Pass 2: analysis failed. [man error::pass2]
或者该变量的地址中无法获得相应的值.
[root@db-172-16-3-150 share]# stap -e '
probe vfs.read {
println($ret)
exit()
}'
semantic error: not accessible at this address [man error::dwarf] (0xffffffff81181610, dieoffset: 0x125bdbd): identifier '$ret' at <input>:3:11
source: println($ret)
^
Pass 2: analysis failed. [man error::pass2]
这个错误也可能是由于代码优化导致的.
This may be a result of compiler optimization of the generated code.
12. 当安装的kernel-debuginfo包和运行的kernel版本不一致, 或者需要探针对应的包的debuginfo但是对应的debuginfo包版本不一致时可能产生如下类型的错误.
semantic error: libdw? failure
例如 :
[root@db-172-16-3-150 share]# uname -r
2.6.32-358.el6.x86_64
[root@db-172-16-3-150 share]# rpm -qa|grep kernel-debuginfo
kernel-debuginfo-2.6.32-358.23.2.el6.centos.plus.x86_64
kernel-debuginfo-common-x86_64-2.6.32-358.23.2.el6.centos.plus.x86_64
[root@db-172-16-3-150 share]# stap -e '
probe vfs.read {
println($$vars)
exit()
}'
semantic error: while resolving probe point: identifier 'kernel' at /opt/systemtap/share/systemtap/tapset/linux/vfs.stp:768:18
source: probe vfs.read = kernel.function("vfs_read")
^
semantic error: missing x86_64 kernel/module debuginfo [man warning::debuginfo] under '/lib/modules/2.6.32-358.el6.x86_64/build'
semantic error: while resolving probe point: identifier 'vfs' at <input>:2:7
source: probe vfs.read {
^
semantic error: no match
Pass 2: analysis failed. [man error::pass2]
安装与kernel版本对应的kernel-debuginfo包即可.
[root@db-172-16-3-150 share]# yum install -y kernel-debuginfo-2.6.32-358.el6.x86_64
或者本文第13条中的例子中如果使用了不同版本的debuginfo, 也是会报类似错误.
rpm -ivh coreutils-debuginfo.x86_64 0:8.4-19.el6_4.2
[root@db-172-16-3-150 share]# rpm -qa|grep coreutils
coreutils-debuginfo-8.4-19.el6_4.2.x86_64
coreutils-libs-8.4-19.el6.x86_64
coreutils-8.4-19.el6.x86_64
policycoreutils-2.0.83-19.30.el6.x86_64
[root@db-172-16-3-150 share]# stap -d /bin/ls --ldd -e 'probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}' -c "ls /"
WARNING: cannot find module /bin/ls debuginfo: No DWARF information found [man warning::debuginfo]
semantic error: while resolving probe point: identifier 'process' at <input>:1:7
source: probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}
^
semantic error: no match
Pass 2: analysis failed. [man error::pass2]
13. 当需要探针对应的包的debuginfo时, 但是该包未安装. 会产生类似如下错误.
semantic error: cannot find foo debuginfo
例如 :
[root@db-172-16-3-150 pg93]# stap -d /bin/ls --ldd -e 'probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}' -c "ls /"
WARNING: cannot find module /bin/ls debuginfo: No DWARF information found [man warning::debuginfo]
semantic error: while resolving probe point: identifier 'process' at <input>:1:7
source: probe process("ls").function("xmalloc") {print_usyms(ubacktrace())}
^
semantic error: no match
Pass 2: analysis failed. [man error::pass2]
安装对应的debuginfo即可解决
查找/bin/ls所在的包名
[root@db-172-16-3-150 pg93]# rpm -qf /bin/ls
coreutils-8.4-19.el6.x86_64
安装coreutils对于的debuginfo包.
[root@db-172-16-3-150 pg93]# yum install -y coreutils-debuginfo-8.4-19.el6.x86_64
二, 生产模块后, 模块在内核中运行阶段产生的错误和警告.
这类错误发生在运行时, staprun通过模块与内核交互, 采集数据的阶段.
错误举例
1. 执行过程中产生了多少错误以及跳过了多少probe.
WARNING: Number of errors: N, skipped probes: M
例如
[root@db-172-16-3-150 share]# stap -e '
probe begin {
error("1.error funn\n")
}
probe end {
printf("2.end probe\n")
}
probe error {
printf("3.error probe\n")
}'
ERROR: 1.error funn
3.error probe
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
2. 除数为0时报错
division by 0
例如
[root@db-172-16-3-150 share]# stap -e '
probe begin {
println(10/0)
exit()
}'
ERROR: division by 0 near operator '/' at <input>:3:13
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
3. 当统计类型变量中没有元素, 但是使用了@count, @sum以外的操作符(avg, min, max)时, 会报如下错误
aggregate element not found
例如
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
println(@count(s))
exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8
source: global s
^
0
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
println(@sum(s))
exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8
source: global s
^
0
avg, min, max报错
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
println(@avg(s))
exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8
source: global s
^
ERROR: empty aggregate near identifier '@avg' at <input>:4:11
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed. Try again with another '--vp 00001' option.
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
println(@min(s))
exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8
source: global s
^
ERROR: empty aggregate near identifier '@min' at <input>:4:11
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed. Try again with another '--vp 00001' option.
[root@db-172-16-3-150 share]# /usr/bin/stap -e '
global s
probe begin {
println(@max(s))
exit()
}'
WARNING: never assigned global variable 's' : identifier 's' at <input>:2:8
source: global s
^
ERROR: empty aggregate near identifier '@max' at <input>:4:11
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed. Try again with another '--vp 00001' option.
4. 数组中包含的索引个数超出数组初始化的元素个数时, 报错
aggregation overflow
Array overflow
例如 :
[root@db-172-16-3-150 share]# stap -e '
global arr[10]
probe timer.ms(1) {
arr[gettimeofday_ms()] <<< gettimeofday_ms()
}
probe timer.s(1) {
foreach (i in arr) {
println(@count(arr[i]))
}
}'
ERROR: Array overflow, check size limit (10) near identifier 'arr' at <input>:4:3
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
解决办法, 使用-D MAXMAPENTRIES=n 指定更大的元素初始值, 或者使用global arr[n] 定义更大的初始值.
5. 函数嵌套调用次数超出限制
MAXNESTING exceeded
例如
[root@db-172-16-3-150 share]# stap -e '
> function fibonacci(i) {
> if (i < 1) error ("bad number")
> if (i == 1) return 1
> if (i == 2) return 2
> return fibonacci (i-1) + fibonacci (i-2)
> }
> probe begin {
> println(fibonacci(10))
> exit()
> }
> '
89
[root@db-172-16-3-150 share]# stap -e '
function fibonacci(i) {
if (i < 1) error ("bad number")
if (i == 1) return 1
if (i == 2) return 2
return fibonacci (i-1) + fibonacci (i-2)
}
probe begin {
println(fibonacci(100))
exit()
}
'
ERROR: MAXNESTING exceeded near identifier 'fibonacci' at <input>:2:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
解决办法, 使用-D MAXNESTING=n指定更大的允许嵌套次数
6. 当handler执行的语句数超出限制时报错
MAXACTION exceeded
例如 :
[root@db-172-16-3-150 share]# stap -e '
> probe begin {
> for(i=0;i<10000;i++) {
> }
> exit()
> }'
ERROR: MAXACTION exceeded near keyword at <input>:3:3
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
解决办法, 使用-D MAXACTION=n 提高限制数.
7. 当地址不存在, 或者其他原因导致获取制定地址信息错误.
kernel/user string copy fault at ADDR
例如 :
[root@db-172-16-3-150 share]# stap -e '
> probe begin {
> println(user_string(123))
> exit()
> }'
ERROR: user string copy fault -14 at 000000000000007b near identifier 'user_string_n' at /opt/systemtap/share/systemtap/tapset/uconversions.stp:120:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
[root@db-172-16-3-150 share]# stap -e '
probe begin {
println(kernel_string(123))
exit()
}'
ERROR: kernel string copy fault at 0x000000000000007b near identifier 'kernel_string' at /opt/systemtap/share/systemtap/tapset/linux/conversions.stp:18:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
[root@db-172-16-3-150 share]# stap -e '
probe begin {
println(kernel_int(123))
exit()
}'
ERROR: kernel int copy fault at 0x000000000000007b near identifier 'kernel_int' at /opt/systemtap/share/systemtap/tapset/linux/conversions.stp:198:10
WARNING: Number of errors: 1, skipped probes: 0
WARNING: /opt/systemtap/bin/staprun exited with status: 1
Pass 5: run failed. [man error::pass5]
8. 取消引用上下文指针变量时的报错.
pointer dereference fault
There was a fault encountered during a pointer dereference operation such as a target variable evaluation.
参考
1. https://sourceware.org/systemtap/SystemTap_Beginners_Guide/errors.html
2. https://sourceware.org/systemtap/SystemTap_Beginners_Guide/runtimeerror.html
3. https://sourceware.org/systemtap/wiki/TipExhaustedResourceErrors