systemtap SAFETY AND SECURITY

8 minute read

背景

SAFETY AND SECURITY
systemtap 两个重要的组成部分, 脚本解释器编译器(包含安全性检测)stap程序, 以及加载stap编译好的模块的staprun后台程序(不做安全性检查).

       Systemtap is an administrative tool.  It exposes kernel internal data structures and potentially  private  user  
       information.  

执行stap需要的操作系统权限如下

       To actually run the kernel objects it builds, a user must be one of the following:  
  
       ·   the root user;  
  
       ·   a member of the stapdev and stapusr groups;  
  
       ·   a member of the stapsys and stapusr groups; or  
  
       ·   a member of the stapusr group.  

可以build以及运行任何systemtap脚本的用户

       The  root user or a user who is a member of both the stapdev and stapusr groups can build and run any systemtap  
       script.  

可以在限定条件下运行预build好的模块的用户

       A user who is a member of both the stapsys and stapusr groups can only use pre-built modules under the  follow-  
       ing conditions:  
  
       ·   The  module  has  been  signed  by a trusted signer. Trusted signers are normally systemtap compile-servers  
           which sign modules when the --privilege option is specified by the client. See  the  stap-server(8)  manual  
           page for more information.  
  
       ·   The module was built using the --privilege=stapsys or the --privilege=stapusr options.  

可以在限定条件下运行预build好的模块的用户

       Members of only the stapusr group can only use pre-built modules under the following conditions:  
  
       ·   The  module  is  located  in the /lib/modules/VERSION/systemtap directory.  This directory must be owned by  
           root and not be world writable.  
  
       or  
  
       ·   The module has been signed by a trusted signer. Trusted  signers  are  normally  systemtap  compile-servers  
           which  sign  modules  when the --privilege option is specified by the client. See the stap-server(8) manual  
           page for more information.  
       ·   The module was built using the --privilege=stapusr option.  
  
       The kernel modules generated by stap program are run by the staprun program.  The latter is a part of the  Sys-  
       temtap package, dedicated to module loading and unloading (but only in the white zone), and kernel-to-user data  
       transfer.  Since staprun does not perform any additional security checks on the kernel objects it is given,  it  
       would be unwise for a system administrator to add untrusted users to the stapdev or stapusr groups.  
  
       The  translator asserts certain safety constraints.  It aims to ensure that no handler routine can run for very  
       long, allocate memory, perform unsafe operations, or in unintentionally interfere with  the  kernel.   Uses  of  
       script  global variables are automatically read/write locked as appropriate, to protect against manipulation by  
       concurrent probe handlers.  (Deadlocks are detected with timeouts.  Use the -t flag to receive reports  of  ex-  
       cessive  lock contention.)  Use of guru mode constructs such as embedded C can violate these constraints, lead-  
       ing to kernel crash or data corruption.  

以下宏被用作stap安全检测, 或者在运行过程中达到某些阈值退出运行.

为了减少stap对系统性能的影响, 同步事件的handler必须尽快处理并释放资源.

所以有如下限制. 使用stap -D 参数可以自定义.

       The resource use limits are set by macros in the generated C code.  These may be overridden with the  -D  flag.  
       A selection of these is as follows:  

函数递归调用最大次数

       MAXNESTING  
              Maximum  number  of nested function calls.  Default determined by script analysis, with a bonus 10 slots  
              added for recursive scripts.  

字符串最大长度

       MAXSTRINGLEN  
              Maximum length of strings, default 128.  

全局变量上获得锁的最大尝试次数, 超过次数后被视作死锁, 并跳过此handler, 记作一次skip.

       MAXTRYLOCK  
              Maximum number of iterations to wait for locks on global variables before  declaring  possible  deadlock  
              and skipping the probe, default 1000.  

单个probe hit(with interrupts disabled)的handler处理最大允许多少条语句. 包含递归.

       MAXACTION  
              Maximum  number of statements to execute during any single probe hit (with interrupts disabled), default  
              1000.  
  
       MAXACTION_INTERRUPTIBLE  
              Maximum number of statements to execute during any single probe hit which is  executed  with  interrupts  
              enabled (such as begin/end probes), default (MAXACTION * 10).  
  
       MAXBACKTRACE  
              Maximum number of stack frames that will be be processed by the stap runtime unwinder as produced by the  
              backtrace functions in the [u]context-unwind.stp tapsets, default 20.  

数组长度限制, 仅仅限制未指定长度的数组

       MAXMAPENTRIES  
              Default maximum number of rows in any single global array, default 2048.  Individual arrays may  be  de-  
              clared with a larger or smaller limit instead:  
              global big[10000],little[5]  

错误退出限制

       MAXERRORS  
              Maximum  number  of soft errors before an exit is triggered, default 0, which means that the first error  
              will exit the script.  Note that with the --suppress-handler-errors option, this limit is not  enforced.  

跳过限制

       MAXSKIPPED  
              Maximum  number  of  skipped probes before an exit is triggered, default 100.  Running systemtap with -t  
              (timing) mode gives more details about skipped probes.   With  the  default  -DINTERRUPTIBLE=1  setting,  
              probes  skipped  due  to  reentrancy  are not accumulated against this limit.  Note that with the --sup-  
              press-handler-errors option, this limit is not enforced.  
  
       MINSTACKSPACE  
              Minimum number of free kernel stack bytes required in order to run a probe handler, default 1024.   This  
              number should be large enough for the probe handler’s own needs, plus a safety margin.  
  
       MAXUPROBES  
              Maximum  number of concurrently armed user-space probes (uprobes), default somewhat larger than the num-  
              ber of user-space probe points named in the script.  This pool needs to be potentialy large because  in-  
              dividual  uprobe  objects (about 64 bytes each) are allocated for each process for each matching script-  
              level probe.  
  
       STP_MAXMEMORY  
              Maximum amount of memory (in kilobytes) that the systemtap module should use,  default  unlimited.   The  
              memory  size  includes the size of the module itself, plus any additional allocations.  This only tracks  
              direct allocations by the systemtap runtime.  This does not  track  indirect  allocations  (as  done  by  
              kprobes/uprobes/etc. internals).  
  
       STP_PROCFS_BUFSIZE  
              Size  of  procfs probe read buffers (in bytes).  Defaults to MAXSTRINGLEN.  This value can be overridden  
              on a per-procfs file basis using the procfs read probe .maxsize(MAXSIZE) parameter.  
      With scripts that contain probes on any interrupt path, it is possible that those interrupts may occur  in  the  
       middle  of  another  probe  handler.  The probe in the interrupt handler would be skipped in this case to avoid  
       reentrance.  To work around this issue, execute stap with  the  option  -DINTERRUPTIBLE=0  to  mask  interrupts  
       throughout  the  probe handler.  This does add some extra overhead to the probes, but it may prevent reentrance  
       for common problem cases.  However, probes in NMI handlers and in the callpath of the stap runtime may still be  
       skipped due to reentrance.  
  
       Multiple  scripts  can write data into a relay buffer concurrently. A host script provides an interface for ac-  
       cessing its relay buffer to guest scripts.  Then, the output of the guests are merged into the  output  of  the  
       host.   To  run  a script as a host, execute stap with -DRELAYHOST[=name] option. The name identifies your host  
       script among several hosts.  While running the host, execute stap  with  -DRELAYGUEST[=name]  to  add  a  guest  
       script  to  the  host.  Note that you must unload guests before unloading a host. If there are some guests con-  
       nected to the host, unloading the host will be failed.  
  
       In case something goes wrong with stap or staprun after a probe has already started  running,  one  may  safely  
       kill both user processes, and remove the active probe kernel module with rmmod.  Any pending trace messages may  
       be lost.  
  
       In addition to the methods outlined above, the generated kernel module also uses overload  processing  to  make  
       sure  that  probes can’t run for too long.  If more than STP_OVERLOAD_THRESHOLD cycles (default 500000000) have  
       been spent in all the probes on a single cpu during the last STP_OVERLOAD_INTERVAL cycles (default 1000000000),  
       the probes have overloaded the system and an exit is triggered.  
  
       By  default,  overload processing is turned on for all modules.  If you would like to disable overload process-  
       ing, define STP_NO_OVERLOAD (or its alias STAP_NO_OVERLOAD).  

下面举几个例子 :

函数嵌套次数限制 :

[root@db-172-16-3-39 ~]# cat test.stp   
global nest  
function fibonacci(i) {  
    if (i < 1) error ("bad number")  
    if (i == 1) return 1  
    if (i == 2) return 2  
    nest++  
    printf("nest: %d, i: %d\n", nest, i)  
    return fibonacci (i-1) + fibonacci (i-2)  
}  
  
probe begin {  
    printf ("%d's fibonacci number: %d\n", $1, fibonacci ($1))  
    exit ()  
}  

单函数调用递归调用次数超出限制时 :

[root@db-172-16-3-39 ~]# stap --vp 00001 -D MAXNESTING=6 test.stp 8  
Pass 5: starting run.  
ERROR: MAXNESTING exceeded near identifier 'fibonacci' at test.stp:2:10  
nest: 1, i: 8  
nest: 2, i: 7  
nest: 3, i: 6  
nest: 4, i: 5  
nest: 5, i: 4  
nest: 6, i: 3  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /usr/bin/staprun exited with status: 1  
Pass 5: run completed in 10usr/20sys/307real ms.  
Pass 5: run failed.  Try again with another '--vp 00001' option.  

函数嵌套次数限制MAXNESTING加大

[root@db-172-16-3-39 ~]# stap --vp 00001 -D MAXNESTING=7 test.stp 8  
Pass 5: starting run.  
nest: 1, i: 8  
nest: 2, i: 7  
nest: 3, i: 6  
nest: 4, i: 5  
nest: 5, i: 4  
nest: 6, i: 3  
nest: 7, i: 3  
nest: 8, i: 4  
nest: 9, i: 3  
nest: 10, i: 5  
nest: 11, i: 4  
nest: 12, i: 3  
nest: 13, i: 3  
nest: 14, i: 6  
nest: 15, i: 5  
nest: 16, i: 4  
nest: 17, i: 3  
nest: 18, i: 3  
nest: 19, i: 4  
nest: 20, i: 3  
8's fibonacci number: 34  
Pass 5: run completed in 20usr/30sys/308real ms.  

单probe hit语句条数限制 :

[root@db-172-16-3-39 ~]# stap --vp 00001 -D MAXNESTING=7 -D MAXACTION=10 test.stp 8  
Pass 5: starting run.  
ERROR: MAXACTION exceeded near keyword at test.stp:5:17  
nest: 1, i: 8  
nest: 2, i: 7  
nest: 3, i: 6  
nest: 4, i: 5  
nest: 5, i: 4  
nest: 6, i: 3  
nest: 7, i: 3  
nest: 8, i: 4  
nest: 9, i: 3  
nest: 10, i: 5  
nest: 11, i: 4  
nest: 12, i: 3  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /usr/bin/staprun exited with status: 1  
Pass 5: run completed in 10usr/20sys/307real ms.  
Pass 5: run failed.  Try again with another '--vp 00001' option.  

字符串长度限制

[root@db-172-16-3-39 ~]# cat test.stp   
probe begin {  
    a=@1  
    printf ("%s\n", a)  
    exit ()  
}  
[root@db-172-16-3-39 ~]# stap -D MAXSTRINGLEN=2 test.stp abcdefghijkfffffffffffffffffffffffffffffff  
a  
[root@db-172-16-3-39 ~]# stap -D MAXSTRINGLEN=3 test.stp abcdefghijkfffffffffffffffffffffffffffffff  
ab  
超出部分被截断了.(注意字符串末尾\0需占1位.)  

数组长度限制, 仅仅限制未指定长度的数组.

[root@db-172-16-3-39 ~]# cat test.stp   
global arr1[20], arr2  
probe begin {  
    for (i=0; i<20; i++) {  
      arr1[i] = "test,arr1"  
    }  
    for (i=0; i<$1; i++) {  
      arr2[i] = "test,arr2"  
    }  
    foreach (s1- in arr1) {  
      printf ("%d, %s\n", s1, arr1[s1])  
    }  
    foreach (s2- in arr2) {  
      printf ("%d, %s\n", s2, arr2[s2])  
    }  
    exit ()  
}  

指定长度为20的数组不受MAXMAPENTRIES的限制, 只有未指定长度的arr2收到这个限制.

[root@db-172-16-3-39 ~]# stap --vp 00001 -D MAXMAPENTRIES=4 test.stp 5  
Pass 5: starting run.  
ERROR: Array overflow, check MAXMAPENTRIES near identifier 'arr2' at test.stp:7:7  
WARNING: Number of errors: 1, skipped probes: 0  
WARNING: /usr/bin/staprun exited with status: 1  
Pass 5: run completed in 10usr/20sys/307real ms.  
Pass 5: run failed.  Try again with another '--vp 00001' option.  

如下, 指定长度为20的数组, 正常使用.

[root@db-172-16-3-39 ~]# stap --vp 00001 -D MAXMAPENTRIES=5 test.stp 5  
Pass 5: starting run.  
19, test,arr1  
18, test,arr1  
17, test,arr1  
16, test,arr1  
15, test,arr1  
14, test,arr1  
13, test,arr1  
12, test,arr1  
11, test,arr1  
10, test,arr1  
9, test,arr1  
8, test,arr1  
7, test,arr1  
6, test,arr1  
5, test,arr1  
4, test,arr1  
3, test,arr1  
2, test,arr1  
1, test,arr1  
0, test,arr1  
4, test,arr2  
3, test,arr2  
2, test,arr2  
1, test,arr2  
0, test,arr2  
Pass 5: run completed in 10usr/20sys/308real ms.  

参考

1. man stap

2. https://sourceware.org/systemtap/langref/SystemTap_overview.html

Flag Counter

digoal’s 大量PostgreSQL文章入口