6.6 KiB
eBPF 入门实践教程:
origin
origin from:
https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqslower.bpf.c
result:
$ sudo ecli/build/bin/Release/ecli run examples/bpftools/runqslower/package.json
running and waiting for the ebpf events from perf event...
time task prev_task delta_us pid prev_pid
20:11:59 gnome-shell swapper/0 32 2202 0
20:11:59 ecli swapper/3 23 3437 0
20:11:59 rcu_sched swapper/1 1 14 0
20:11:59 gnome-terminal- swapper/1 13 2714 0
20:11:59 ecli swapper/3 2 3437 0
20:11:59 kworker/3:3 swapper/3 3 215 0
20:11:59 containerd swapper/1 8 1088 0
20:11:59 ecli swapper/2 5 3437 0
20:11:59 HangDetector swapper/3 6 854 0
20:11:59 ecli swapper/2 60 3437 0
20:11:59 rcu_sched swapper/1 26 14 0
20:11:59 kworker/0:1 swapper/0 26 3414 0
20:11:59 ecli swapper/2 6 3437 0
这段代码定义了一个 eBPF 程序,该程序用于跟踪进程在运行队列中的等待时间。它通过使用 tracepoint 和 perf event 输出来实现。
程序首先定义了两个 BPF 内核映射:start 映射用于存储每个进程在被调度运行之前的时间戳,events 映射用于存储 perf 事件。
然后,程序定义了一些帮助函数,用于跟踪每个进程的调度状态。 trace_enqueue 函数用于在进程被调度运行之前记录时间戳, handle_switch 函数用于处理进程切换,并计算进程在队列中等待的时间。
接下来,程序定义了五个 tracepoint 程序,用于捕获不同的调度器事件。 sched_wakeup 和 sched_wakeup_new 程序用于捕获新进程被唤醒的事件, sched_switch 程序用于捕获进程切换事件, handle_sched_wakeup 和 handle_sched_wakeup_new 程序用于捕获 raw tracepoint 事件。这些 tracepoint 程序调用了前面定义的帮助函数来跟踪进程的调度状态。
最后,程序将计算得到的等待时间输出到 perf 事件中,供用户空间工具进行捕获和分析。
Compile and Run
Compile:
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
Run:
sudo ./ecli run examples/bpftools/runqslower/package.json
details in bcc
Demonstrations of runqslower, the Linux eBPF/bcc version.
runqslower traces high scheduling delays between tasks being ready to run and them running on CPU after that. Example output:
# ./runqslower
Tracing run queue latency higher than 10000 us
TIME COMM TID LAT(us)
13:11:43 b'kworker/0:2' 8680 10250
13:12:18 b'irq/16-vmwgfx' 422 10838
13:12:18 b'systemd-oomd' 753 11012
13:12:18 b'containerd' 8272 11254
13:12:18 b'HangDetector' 764 12042
^C
``
This measures the time a task spends waiting on a run queue for a turn on-CPU, and shows this time as a individual events. This time should be small, but a task may need to wait its turn due to CPU load.
This measures two types of run queue latency:
1. The time from a task being enqueued on a run queue to its context switch and execution. This traces ttwu_do_wakeup(), wake_up_new_task() -> finish_task_switch() with either raw tracepoints (if supported) or kprobes and instruments the run queue latency after a voluntary context switch.
2. The time from when a task was involuntary context switched and still in the runnable state, to when it next executed. This is instrumented from finish_task_switch() alone.
The overhead of this tool may become significant for some workloads: see the OVERHEAD section.
This works by tracing various kernel scheduler functions using dynamic tracing, and will need updating to match any changes to these functions.
Since this uses BPF, only the root user can use this tool.
```console
Usage: runqslower [-h] [-p PID | -t TID | -P] [min_us]
The min_us option sets the latency of the run queue to track:
# ./runqslower 100
Tracing run queue latency higher than 100 us
TIME COMM TID LAT(us)
20:48:26 b'gnome-shell' 3005 201
20:48:26 b'gnome-shell' 3005 202
20:48:26 b'gnome-shell' 3005 254
20:48:26 b'gnome-shell' 3005 208
20:48:26 b'gnome-shell' 3005 132
20:48:26 b'gnome-shell' 3005 213
20:48:26 b'gnome-shell' 3005 205
20:48:26 b'python3' 5224 127
20:48:26 b'gnome-shell' 3005 214
20:48:26 b'gnome-shell' 3005 126
20:48:26 b'gnome-shell' 3005 285
20:48:26 b'Xorg' 2869 296
20:48:26 b'gnome-shell' 3005 119
20:48:26 b'gnome-shell' 3005 206
The -p PID option only traces this PID:
# ./runqslower -p 3005
Tracing run queue latency higher than 10000 us
TIME COMM TID LAT(us)
20:46:22 b'gnome-shell' 3005 16024
20:46:45 b'gnome-shell' 3005 11494
20:46:45 b'gnome-shell' 3005 21430
20:46:45 b'gnome-shell' 3005 14948
20:47:16 b'gnome-shell' 3005 10164
20:47:16 b'gnome-shell' 3005 18070
20:47:17 b'gnome-shell' 3005 13272
20:47:18 b'gnome-shell' 3005 10451
20:47:18 b'gnome-shell' 3005 15010
20:47:18 b'gnome-shell' 3005 19449
20:47:22 b'gnome-shell' 3005 19327
20:47:23 b'gnome-shell' 3005 13178
20:47:23 b'gnome-shell' 3005 13483
20:47:23 b'gnome-shell' 3005 15562
20:47:23 b'gnome-shell' 3005 13655
20:47:23 b'gnome-shell' 3005 19571
The -P option also shows previous task name and TID:
# ./runqslower -P
Tracing run queue latency higher than 10000 us
TIME COMM TID LAT(us) PREV COMM PREV TID
20:42:48 b'sysbench' 5159 10562 b'sysbench' 5152
20:42:48 b'sysbench' 5159 10367 b'sysbench' 5152
20:42:49 b'sysbench' 5158 11818 b'sysbench' 5159
20:42:49 b'sysbench' 5160 16913 b'sysbench' 5153
20:42:49 b'sysbench' 5157 13742 b'sysbench' 5160
20:42:49 b'sysbench' 5152 13746 b'sysbench' 5160
20:42:49 b'sysbench' 5153 13731 b'sysbench' 5160
20:42:49 b'sysbench' 5158 14688 b'sysbench' 5161
20:42:50 b'sysbench' 5155 10468 b'sysbench' 5152
20:42:50 b'sysbench' 5156 17695 b'sysbench' 5158
20:42:50 b'sysbench' 5155 11251 b'sysbench' 5152
20:42:50 b'sysbench' 5154 13283 b'sysbench' 5152
20:42:50 b'sysbench' 5158 22278 b'sysbench' 5157
For more details, see docs/special_filtering.md