Systemtap kernel Trace probes

2 minute read

背景

This family of probe points hooks to static probing tracepoints inserted into the kernel or kernel modules.   
As with marker probes, these tracepoints are special macro calls inserted by kernel developers to make probing faster and more reliable than with DWARF-based probes.   
DWARF debugging information is not required to probe tracepoints.   
Tracepoints have more strongly-typed parameters than marker probes.  
内核trace探针和前面讲的内核mark探针类似, 也是一些内核中的特殊宏, 编译内核时就有了, 不需要DWARF-based debuginfo包的支持.  
使用trace探针比使用DWARF-based探针速度快, 可靠.  
trace探针比mark探针有更多的强类型参数可以使用, 命名规则也不一样, mark探针的参数$arg1 ... $argNN, trace探针不是以此规则命名, 见下面的讲解.  
  
Tracepoint probes begin with kernel. The next part names the tracepoint itself: trace("name").   
The tracepoint name string, which can contain wildcard characters, is matched against the names defined by the kernel developers in the tracepoint header files.  
trace探针使用方法: kernel.trace("NAME"),  这里NAME同样可以使用通配符.  
  
The handler associated with a tracepoint-based probe can read the optional parameters specified at the macro call site.   
These parameters are named according to the declaration by the tracepoint author.   
For example, the tracepoint probe kernel.trace("sched_switch") provides the parameters $rq, $prev, and $next.   
If the parameter is a complex type such as a struct pointer, then a script can access fields with the same syntax as DWARF $target variables.   
Tracepoint parameters cannot be modified; however, in guru mode a script can modify fields of parameters.  
trace探针的参数名是探针的作者指定的, 如 kernel.trace("sched_switch") provides the parameters $rq, $prev, and $next.  
如果参数类型为结构类型, 可以使用suffix $来输出子结构或者$$来输出所有子结构直到基础类型. (注意前面讲过的mark探针不行.)  
  
The name of the tracepoint is available in $$name, and a string of name=value pairs for all parameters of the tracepoint is available in $$vars or $$parms.  
这里有几个特殊的变量, $$name指tracepoint名字.  
$$vars指tracepoint的所有变量以name=value形式输出  
$$parms指tracepoint的参数以name=value形式输出  

举例 :

使用stap -l输出系统支持的trace探针 :

[root@db-172-16-3-39 ~]# stap -l 'kernel.trace("**")'  
kernel.trace("activate_task")  
kernel.trace("add_to_page_cache")  
kernel.trace("block_bio_backmerge")  
kernel.trace("block_bio_bounce")  
kernel.trace("block_bio_complete")  
kernel.trace("block_bio_frontmerge")  
kernel.trace("block_bio_queue")  
kernel.trace("block_getrq")  
kernel.trace("block_plug")  
kernel.trace("block_remap")  
kernel.trace("block_rq_complete")  
kernel.trace("block_rq_insert")  
kernel.trace("block_rq_issue")  
kernel.trace("block_rq_requeue")  
kernel.trace("block_sleeprq")  
kernel.trace("block_split")  
kernel.trace("block_unplug_io")  
kernel.trace("block_unplug_timer")  
kernel.trace("consume_skb")  
kernel.trace("deactivate_task")  
kernel.trace("irq_entry")  
kernel.trace("irq_exit")  
kernel.trace("irq_softirq_entry")  
kernel.trace("irq_softirq_exit")  
kernel.trace("irq_tasklet_high_entry")  
kernel.trace("irq_tasklet_high_exit")  
kernel.trace("irq_tasklet_low_entry")  
kernel.trace("irq_tasklet_low_exit")  
kernel.trace("itimer_expire")  
kernel.trace("itimer_state")  
kernel.trace("kfree_skb")  
kernel.trace("mm_anon_cow")  
kernel.trace("mm_anon_fault")  
kernel.trace("mm_anon_pgin")  
kernel.trace("mm_anon_unmap")  
kernel.trace("mm_anon_userfree")  
kernel.trace("mm_directreclaim_reclaimall")  
kernel.trace("mm_directreclaim_reclaimzone")  
kernel.trace("mm_filemap_cow")  
kernel.trace("mm_filemap_fault")  
kernel.trace("mm_filemap_unmap")  
kernel.trace("mm_filemap_userunmap")  
kernel.trace("mm_kernel_pagefault")  
kernel.trace("mm_kswapd_runs")  
kernel.trace("mm_page_allocation")  
kernel.trace("mm_page_free")  
kernel.trace("mm_pagereclaim_free")  
kernel.trace("mm_pagereclaim_pgout")  
kernel.trace("mm_pagereclaim_shrinkactive")  
kernel.trace("mm_pagereclaim_shrinkactive_a2a")  
kernel.trace("mm_pagereclaim_shrinkactive_a2i")  
kernel.trace("mm_pagereclaim_shrinkinactive")  
kernel.trace("mm_pagereclaim_shrinkinactive_i2a")  
kernel.trace("mm_pagereclaim_shrinkinactive_i2i")  
kernel.trace("mm_pagereclaim_shrinkzone")  
kernel.trace("mm_pdflush_bgwriteout")  
kernel.trace("mm_pdflush_kupdate")  
kernel.trace("napi_poll")  
kernel.trace("net_dev_queue")  
kernel.trace("net_dev_receive")  
kernel.trace("net_dev_xmit")  
kernel.trace("netif_rx")  
kernel.trace("remove_from_page_cache")  
kernel.trace("rpc_bind_status")  
kernel.trace("rpc_call_status")  
kernel.trace("rpc_connect_status")  
kernel.trace("sched_process_exit")  
kernel.trace("sched_process_fork")  
kernel.trace("sched_process_free")  
kernel.trace("sched_process_wait")  
kernel.trace("sched_switch")  
kernel.trace("sched_wakeup")  
kernel.trace("sched_wakeup_new")  
kernel.trace("scsi_dispatch_cmd_done")  
kernel.trace("scsi_dispatch_cmd_error")  
kernel.trace("scsi_dispatch_cmd_start")  
kernel.trace("scsi_dispatch_cmd_timeout")  
kernel.trace("scsi_eh_wakeup")  
kernel.trace("signal_coredump")  
kernel.trace("signal_deliver")  
kernel.trace("signal_generate")  
kernel.trace("signal_lose_info")  
kernel.trace("signal_overflow_fail")  
kernel.trace("socket_recvmsg")  
kernel.trace("socket_sendmsg")  
kernel.trace("socket_sendpage")  
kernel.trace("softirq_raise")  
kernel.trace("timer_cancel")  
kernel.trace("timer_expire_entry")  
kernel.trace("timer_expire_exit")  
kernel.trace("timer_init")  
kernel.trace("timer_start")  

探针使用举例 :

[root@db-172-16-3-39 ~]# stap -e 'probe kernel.trace("socket_sendmsg") { printf("%s\n%s\n%s\n", $$name, $$vars, $$parms) }'  
socket_sendmsg  
sock=0xffff81012f485580 msg=0xffff81012eb97da0 size=0xb0 ret=0xb0  
sock=0xffff81012f485580 msg=0xffff81012eb97da0 size=0xb0 ret=0xb0  
socket_sendmsg  
sock=0xffff81012efbfac0 msg=0xffff81022af7fe18 size=0xe8 ret=0xe8  
sock=0xffff81012efbfac0 msg=0xffff81022af7fe18 size=0xe8 ret=0xe8  

使用suffix $$输出结构数据 :

[root@db-172-16-3-39 ~]# stap -e 'probe kernel.trace("socket_sendmsg") { printf("%s\n%s\n%s\n%s\n", $$name, $$vars, $$parms, $sock$$) }'  
socket_sendmsg  
sock=0xffff81012f485580 msg=0xffff81012eb97da0 size=0xb0 ret=0xb0  
sock=0xffff81012f485580 msg=0xffff81012eb97da0 size=0xb0 ret=0xb0  
{.state=1, .flags=0, .ops=0xffffffff802b5860, .fasync_list=0x0, .file=0xffff81012f5e69c0, .sk=0xffff81012fc6f100, .wait={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012f4855b8, .prev=0xffff81012f4855b8}}, .type=2}  
socket_sendmsg  
sock=0xffff8102004e8d00 msg=0xffff810202efdda0 size=0x1a4 ret=0x1a4  
sock=0xffff8102004e8d00 msg=0xffff810202efdda0 size=0x1a4 ret=0x1a4  
{.state=3, .flags=0, .ops=0xffffffff802b4960, .fasync_list=0x0, .file=0xffff81022ff75480, .sk=0xffff81022ac866c0, .wait={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff8102004e8d38, .prev=0xffff8102004e8d38}}, .type=1}  

参考

  1. https://sourceware.org/systemtap/langref/Probe_points.html
/usr/share/doc/systemtap-client-1.8/examples/memory/vm.tracepoints.meta  
/usr/share/doc/systemtap-client-1.8/examples/memory/vm.tracepoints.stp  
/usr/share/systemtap/runtime/autoconf-utrace-via-tracepoints.c  
/usr/src/debug/kernel-2.6.18/linux-2.6.18-348.12.1.el5.x86_64/include/linux/tracepoint.h  
/usr/src/debug/kernel-2.6.18/linux-2.6.18-348.12.1.el5.x86_64/kernel/tracepoint.c  
/usr/src/kernels/2.6.18-348.12.1.el5-x86_64/include/config/tracepoints.h  
/usr/src/kernels/2.6.18-348.12.1.el5-x86_64/include/config/have/syscall/tracepoints.h  
/usr/src/kernels/2.6.18-348.12.1.el5-x86_64/include/config/sample/tracepoints.h  
/usr/src/kernels/2.6.18-348.12.1.el5-x86_64/include/linux/tracepoint.h  
/usr/src/kernels/2.6.18-348.12.1.el5-x86_64/samples/tracepoints  
/usr/src/kernels/2.6.18-348.12.1.el5-x86_64/samples/tracepoints/Makefile  

Flag Counter

digoal’s 大量PostgreSQL文章入口