mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-05-09 23:32:34 +08:00
rename README to chinese documents
This commit is contained in:
@@ -1,35 +1,36 @@
|
||||
# eBPF 入门开发实践教程二:在 eBPF 中使用 kprobe 监测捕获 unlink 系统调用
|
||||
# eBPF Tutorial by Example 2: Monitoring unlink System Calls with kprobe
|
||||
|
||||
eBPF (Extended Berkeley Packet Filter) 是 Linux 内核上的一个强大的网络和性能分析工具。它允许开发者在内核运行时动态加载、更新和运行用户定义的代码。
|
||||
eBPF (Extended Berkeley Packet Filter) is a powerful network and performance analysis tool on the Linux kernel. It allows developers to dynamically load, update, and run user-defined code at runtime.
|
||||
|
||||
本文是 eBPF 入门开发实践教程的第二篇,在 eBPF 中使用 kprobe 捕获 unlink 系统调用。本文会先讲解关于 kprobes 的基本概念和技术背景,然后介绍如何在 eBPF 中使用 kprobe 捕获 unlink 系统调用。
|
||||
This article is the second part of the eBPF Tutorial by Example, focusing on using kprobe to capture the unlink system call in eBPF. The article will first explain the basic concepts and technical background of kprobes, and then introduce how to use kprobe to capture the unlink system call in eBPF.
|
||||
|
||||
## kprobes 技术背景
|
||||
## Background of kprobes Technology
|
||||
|
||||
开发人员在内核或者模块的调试过程中,往往会需要要知道其中的一些函数有无被调用、何时被调用、执行是否正确以及函数的入参和返回值是什么等等。比较简单的做法是在内核代码对应的函数中添加日志打印信息,但这种方式往往需要重新编译内核或模块,重新启动设备之类的,操作较为复杂甚至可能会破坏原有的代码执行过程。
|
||||
During the debugging process of the kernel or modules, developers often need to know whether certain functions are called, when they are called, whether the execution is correct, and what the input and return values of the functions are. A simple approach is to add log print information to the corresponding functions in the kernel code. However, this approach often requires recompiling the kernel or modules, restarting the device, etc., which is complex and may disrupt the original code execution process.
|
||||
|
||||
而利用 kprobes 技术,用户可以定义自己的回调函数,然后在内核或者模块中几乎所有的函数中(有些函数是不可探测的,例如kprobes自身的相关实现函数,后文会有详细说明)动态地插入探测点,当内核执行流程执行到指定的探测函数时,会调用该回调函数,用户即可收集所需的信息了,同时内核最后还会回到原本的正常执行流程。如果用户已经收集足够的信息,不再需要继续探测,则同样可以动态地移除探测点。因此 kprobes 技术具有对内核执行流程影响小和操作方便的优点。
|
||||
By using the kprobes technology, users can define their own callback functions and dynamically insert probes into almost all functions in the kernel or modules (some functions cannot be probed, such as the kprobes' own implementation functions, which will be explained in detail later). When the kernel execution flow reaches the specified probe function, it will invoke the callback function, allowing the user to collect the desired information. The kernel will then return to the normal execution flow. If the user has collected sufficient information and no longer needs to continue probing, the probes can be dynamically removed. Therefore, the kprobes technology has the advantages of minimal impact on the kernel execution flow and easy operation.
|
||||
|
||||
kprobes 技术包括的3种探测手段分别时 kprobe、jprobe 和 kretprobe。首先 kprobe 是最基本的探测方式,是实现后两种的基础,它可以在任意的位置放置探测点(就连函数内部的某条指令处也可以),它提供了探测点的调用前、调用后和内存访问出错3种回调方式,分别是 `pre_handler`、`post_handler` 和 `fault_handler`,其中 `pre_handler` 函数将在被探测指令被执行前回调,`post_handler` 会在被探测指令执行完毕后回调(注意不是被探测函数),`fault_handler` 会在内存访问出错时被调用;jprobe 基于 kprobe 实现,它用于获取被探测函数的入参值;最后 kretprobe 从名字中就可以看出其用途了,它同样基于 kprobe 实现,用于获取被探测函数的返回值。
|
||||
The kprobes technology includes three detection methods: kprobe, jprobe, and kretprobe. First, kprobe is the most basic detection method and serves as the basis for the other two. It allows probes to be placed at any position (including within a function). It provides three callback modes for probes: `pre_handler`, `post_handler`, and `fault_handler`. The `pre_handler` function is called before the probed instruction is executed, the `post_handler` is called after the probed instruction is completed (note that it is not the probed function), and the `fault_handler` is called when a memory access error occurs. The jprobe is based on kprobe and is used to obtain the input values of the probed function. Finally, as the name suggests, kretprobe is also based on kprobe and is used to obtain the return values of the probed function.
|
||||
|
||||
kprobes 的技术原理并不仅仅包含纯软件的实现方案,它也需要硬件架构提供支持。其中涉及硬件架构相关的是 CPU 的异常处理和单步调试技术,前者用于让程序的执行流程陷入到用户注册的回调函数中去,而后者则用于单步执行被探测点指令,因此并不是所有的架构均支持 kprobes。目前 kprobes 技术已经支持多种架构,包括 i386、x86_64、ppc64、ia64、sparc64、arm、ppc 和 mips(有些架构实现可能并不完全,具体可参考内核的 Documentation/kprobes.txt)。
|
||||
The kprobes technology is not only implemented through software but also requires support from the hardware architecture. This involves CPU exception handling and single-step debugging techniques. The former is used to make the program's execution flow enter the user-registered callback function, and the latter is used to single-step execute the probed instruction. Therefore, not all architectures support kprobes. Currently, kprobes technology supports various architectures, including i386, x86_64, ppc64, ia64, sparc64, arm, ppc, and mips (note that some architecture implementations may not be complete, see the kernel's Documentation/kprobes.txt for details).
|
||||
|
||||
kprobes 的特点与使用限制:
|
||||
Features and Usage Restrictions of kprobes:
|
||||
|
||||
1. kprobes 允许在同一个被探测位置注册多个 kprobe,但是目前 jprobe 却不可以;同时也不允许以其他的 jprobe 回调函数和 kprobe 的 `post_handler` 回调函数作为被探测点。
|
||||
2. 一般情况下,可以探测内核中的任何函数,包括中断处理函数。不过在 kernel/kprobes.c 和 arch/*/kernel/kprobes.c 程序中用于实现 kprobes 自身的函数是不允许被探测的,另外还有`do_page_fault` 和 `notifier_call_chain`;
|
||||
3. 如果以一个内联函数为探测点,则 kprobes 可能无法保证对该函数的所有实例都注册探测点。由于 gcc 可能会自动将某些函数优化为内联函数,因此可能无法达到用户预期的探测效果;
|
||||
4. 一个探测点的回调函数可能会修改被探测函数的运行上下文,例如通过修改内核的数据结构或者保存与`struct pt_regs`结构体中的触发探测器之前寄存器信息。因此 kprobes 可以被用来安装 bug 修复代码或者注入故障测试代码;
|
||||
5. kprobes 会避免在处理探测点函数时再次调用另一个探测点的回调函数,例如在`printk()`函数上注册了探测点,而在它的回调函数中可能会再次调用`printk`函数,此时将不再触发`printk`探测点的回调,仅仅是增加了`kprobe`结构体中`nmissed`字段的数值;
|
||||
6. 在 kprobes 的注册和注销过程中不会使用 mutex 锁和动态的申请内存;
|
||||
7. kprobes 回调函数的运行期间是关闭内核抢占的,同时也可能在关闭中断的情况下执行,具体要视CPU架构而定。因此不论在何种情况下,在回调函数中不要调用会放弃 CPU 的函数(如信号量、mutex 锁等);
|
||||
8. kretprobe 通过替换返回地址为预定义的 trampoline 的地址来实现,因此栈回溯和 gcc 内嵌函数`__builtin_return_address()`调用将返回 trampoline 的地址而不是真正的被探测函数的返回地址;
|
||||
9. 如果一个函数的调用次数和返回次数不相等,则在类似这样的函数上注册 kretprobe 将可能不会达到预期的效果,例如`do_exit()`函数会存在问题,而`do_execve()`函数和`do_fork()`函数不会;
|
||||
10. 当在进入和退出一个函数时,如果 CPU 运行在非当前任务所有的栈上,那么往该函数上注册 kretprobe 可能会导致不可预料的后果,因此,kprobes 不支持在 X86_64 的结构下为`__switch_to()`函数注册 kretprobe,将直接返回`-EINVAL`。
|
||||
1. kprobes allows multiple kprobes to be registered at the same probe position, but jprobe currently does not support this. It is also not allowed to use other jprobe callback functions or the `post_handler` callback function of kprobe as probe points.
|
||||
2. In general, any function in the kernel can be probed, including interrupt handlers. However, the functions used to implement kprobes themselves in kernel/kprobes.c and arch/*/kernel/kprobes.c are not allowed to be probed. Additionally, `do_page_fault` and `notifier_call_chain` are also not allowed.
|
||||
3. If an inline function is used as a probe point, kprobes may not be able to guarantee that probe points are registered for all instances of that function. Since gcc may automatically optimize certain functions as inline functions, the desired probing effect may not be achieved.
|
||||
4. The callback function of a probe point may modify the runtime context of the probed function, such as by modifying the kernel's data structure or saving register information before triggering the prober in the `struct pt_regs` structure. Therefore, kprobes can be used to install bug fixes or inject fault testing code.
|
||||
5. kprobes avoids calling the callback function of another probe point again when processing the probe point function. For example, if a probe point is registered on the `printk()` function and the callback function may call `printk()` again, the callback for the `printk` probe point will not be triggered again. Only the `nmissed` field in the `kprobe` structure will be incremented.
|
||||
6. mutex locks and dynamic memory allocation are not used in the registration and removal process of kprobes.
|
||||
|
||||
## kprobe 示例
|
||||
7. During the execution of kprobes callback functions, kernel preemption is disabled, and it may also be executed with interrupts disabled, which depends on the CPU architecture. Therefore, regardless of the situation, do not call functions that will give up the CPU in the callback function (such as semaphore, mutex lock, etc.);
|
||||
8. kretprobe is implemented by replacing the return address with the pre-defined trampoline address, so stack backtraces and gcc inline function `__builtin_return_address()` will return the address of the trampoline instead of the actual return address of the probed function;
|
||||
9. If the number of function calls and return calls of a function are unequal, registering kretprobe on such a function may not achieve the expected effect, for example, the `do_exit()` function will have problems, while the `do_execve()` function and `do_fork()` function will not;
|
||||
10. When entering and exiting a function, if the CPU is running on a stack that does not belong to the current task, registering kretprobe on that function may have unpredictable consequences. Therefore, kprobes does not support registering kretprobe for the `__switch_to()` function under the X86_64 architecture and will directly return `-EINVAL`.
|
||||
|
||||
完整代码如下:
|
||||
## kprobe Example
|
||||
|
||||
The complete code is as follows:
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
@@ -62,9 +63,9 @@ int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
}
|
||||
```
|
||||
|
||||
这段代码是一个简单的 eBPF 程序,用于监测和捕获在 Linux 内核中执行的 unlink 系统调用。unlink 系统调用的功能是删除一个文件,这个 eBPF 程序通过使用 kprobe(内核探针)在`do_unlinkat`函数的入口和退出处放置钩子,实现对该系统调用的跟踪。
|
||||
This code is a simple eBPF program used to monitor and capture the unlink system call executed in the Linux kernel. The unlink system call is used to delete a file. This eBPF program traces this system call by placing hooks at the entry and exit points of the `do_unlinkat` function using a kprobe (kernel probe).
|
||||
|
||||
首先,我们导入必要的头文件,如 vmlinux.h,bpf_helpers.h,bpf_tracing.h 和 bpf_core_read.h。接着,我们定义许可证,以允许程序在内核中运行。
|
||||
First, we import necessary header files such as vmlinux.h, bpf_helpers.h, bpf_tracing.h, and bpf_core_read.h. Then, we define a license to allow the program to run in the kernel.
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
@@ -75,7 +76,7 @@ int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
```
|
||||
|
||||
接下来,我们定义一个名为`BPF_KPROBE(do_unlinkat)`的 kprobe,当进入`do_unlinkat`函数时,它会被触发。该函数接受两个参数:`dfd`(文件描述符)和`name`(文件名结构体指针)。在这个 kprobe 中,我们获取当前进程的 PID(进程标识符),然后读取文件名。最后,我们使用`bpf_printk`函数在内核日志中打印 PID 和文件名。
|
||||
Next, we define a kprobe named `BPF_KPROBE(do_unlinkat)` which gets triggered when the `do_unlinkat` function is entered. It takes two parameters: `dfd` (file descriptor) and `name` (filename structure pointer). In this kprobe, we retrieve the PID (process identifier) of the current process and then read the filename. Finally, we use the `bpf_printk` function to print the PID and filename in the kernel log.
|
||||
|
||||
```c
|
||||
SEC("kprobe/do_unlinkat")
|
||||
@@ -91,7 +92,7 @@ int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
|
||||
}
|
||||
```
|
||||
|
||||
接下来,我们定义一个名为`BPF_KRETPROBE(do_unlinkat_exit)`的 kretprobe,当从`do_unlinkat`函数退出时,它会被触发。这个 kretprobe 的目的是捕获函数的返回值(ret)。我们再次获取当前进程的 PID,并使用`bpf_printk`函数在内核日志中打印 PID 和返回值。
|
||||
Next, we define a kretprobe named `BPF_KRETPROBE(do_unlinkat_exit)` that will be triggered when exiting the `do_unlinkat` function. The purpose of this kretprobe is to capture the return value (`ret`) of the function. We again obtain the PID of the current process and use the `bpf_printk` function to print the PID and return value in the kernel log.
|
||||
|
||||
```c
|
||||
SEC("kretprobe/do_unlinkat")
|
||||
@@ -105,9 +106,9 @@ int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
}
|
||||
```
|
||||
|
||||
eunomia-bpf 是一个结合 Wasm 的开源 eBPF 动态加载运行时和开发工具链,它的目的是简化 eBPF 程序的开发、构建、分发、运行。可以参考 <https://github.com/eunomia-bpf/eunomia-bpf> 下载和安装 ecc 编译工具链和 ecli 运行时。
|
||||
eunomia-bpf is an open-source eBPF dynamic loading runtime and development toolchain that combines with Wasm. Its goal is to simplify the development, build, distribution, and execution of eBPF programs. You can refer to <https://github.com/eunomia-bpf/eunomia-bpf> to download and install the ecc compiler toolchain and ecli runtime.
|
||||
|
||||
要编译这个程序,请使用 ecc 工具:
|
||||
To compile this program, use the ecc tool:
|
||||
|
||||
```console
|
||||
$ ecc kprobe-link.bpf.c
|
||||
@@ -115,13 +116,13 @@ Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
然后运行:
|
||||
Then run:
|
||||
|
||||
```console
|
||||
sudo ecli run package.json
|
||||
```
|
||||
|
||||
在另外一个窗口中:
|
||||
In another window:
|
||||
|
||||
```shell
|
||||
touch test1
|
||||
@@ -130,7 +131,7 @@ touch test2
|
||||
rm test2
|
||||
```
|
||||
|
||||
在 /sys/kernel/debug/tracing/trace_pipe 文件中,应该能看到类似下面的 kprobe 演示输出:
|
||||
You should see kprobe demo output similar to the following in the /sys/kernel/debug/tracing/trace_pipe file:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
@@ -140,10 +141,10 @@ $ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9346 [005] d..4 4710.951895: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
```
|
||||
|
||||
## 总结
|
||||
## Summary
|
||||
|
||||
通过本文的示例,我们学习了如何使用 eBPF 的 kprobe 和 kretprobe 捕获 unlink 系统调用。更多的例子和详细的开发指南,请参考 eunomia-bpf 的官方文档:<https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
In this article's example, we learned how to use eBPF's kprobe and kretprobe to capture the unlink system call. For more examples and detailed development guides, please refer to the official documentation of eunomia-bpf: <https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
|
||||
本文是 eBPF 入门开发实践教程的第二篇。下一篇文章将介绍如何在 eBPF 中使用 fentry 监测捕获 unlink 系统调用。
|
||||
This article is the second part of the introductory eBPF development tutorial. The next article will explain how to use fentry to monitor and capture the unlink system call in eBPF.
|
||||
|
||||
如果您希望学习更多关于 eBPF 的知识和实践,可以访问我们的教程代码仓库 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 或网站 <https://eunomia.dev/zh/tutorials/> 以获取更多示例和完整的教程。
|
||||
If you'd like to learn more about eBPF knowledge and practices, you can visit our tutorial code repository at <https://github.com/eunomia-bpf/bpf-developer-tutorial> or website <https://eunomia.dev/tutorials/> for more examples and complete tutorials.
|
||||
|
||||
149
src/2-kprobe-unlink/README.zh.md
Normal file
149
src/2-kprobe-unlink/README.zh.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# eBPF 入门开发实践教程二:在 eBPF 中使用 kprobe 监测捕获 unlink 系统调用
|
||||
|
||||
eBPF (Extended Berkeley Packet Filter) 是 Linux 内核上的一个强大的网络和性能分析工具。它允许开发者在内核运行时动态加载、更新和运行用户定义的代码。
|
||||
|
||||
本文是 eBPF 入门开发实践教程的第二篇,在 eBPF 中使用 kprobe 捕获 unlink 系统调用。本文会先讲解关于 kprobes 的基本概念和技术背景,然后介绍如何在 eBPF 中使用 kprobe 捕获 unlink 系统调用。
|
||||
|
||||
## kprobes 技术背景
|
||||
|
||||
开发人员在内核或者模块的调试过程中,往往会需要要知道其中的一些函数有无被调用、何时被调用、执行是否正确以及函数的入参和返回值是什么等等。比较简单的做法是在内核代码对应的函数中添加日志打印信息,但这种方式往往需要重新编译内核或模块,重新启动设备之类的,操作较为复杂甚至可能会破坏原有的代码执行过程。
|
||||
|
||||
而利用 kprobes 技术,用户可以定义自己的回调函数,然后在内核或者模块中几乎所有的函数中(有些函数是不可探测的,例如kprobes自身的相关实现函数,后文会有详细说明)动态地插入探测点,当内核执行流程执行到指定的探测函数时,会调用该回调函数,用户即可收集所需的信息了,同时内核最后还会回到原本的正常执行流程。如果用户已经收集足够的信息,不再需要继续探测,则同样可以动态地移除探测点。因此 kprobes 技术具有对内核执行流程影响小和操作方便的优点。
|
||||
|
||||
kprobes 技术包括的3种探测手段分别时 kprobe、jprobe 和 kretprobe。首先 kprobe 是最基本的探测方式,是实现后两种的基础,它可以在任意的位置放置探测点(就连函数内部的某条指令处也可以),它提供了探测点的调用前、调用后和内存访问出错3种回调方式,分别是 `pre_handler`、`post_handler` 和 `fault_handler`,其中 `pre_handler` 函数将在被探测指令被执行前回调,`post_handler` 会在被探测指令执行完毕后回调(注意不是被探测函数),`fault_handler` 会在内存访问出错时被调用;jprobe 基于 kprobe 实现,它用于获取被探测函数的入参值;最后 kretprobe 从名字中就可以看出其用途了,它同样基于 kprobe 实现,用于获取被探测函数的返回值。
|
||||
|
||||
kprobes 的技术原理并不仅仅包含纯软件的实现方案,它也需要硬件架构提供支持。其中涉及硬件架构相关的是 CPU 的异常处理和单步调试技术,前者用于让程序的执行流程陷入到用户注册的回调函数中去,而后者则用于单步执行被探测点指令,因此并不是所有的架构均支持 kprobes。目前 kprobes 技术已经支持多种架构,包括 i386、x86_64、ppc64、ia64、sparc64、arm、ppc 和 mips(有些架构实现可能并不完全,具体可参考内核的 Documentation/kprobes.txt)。
|
||||
|
||||
kprobes 的特点与使用限制:
|
||||
|
||||
1. kprobes 允许在同一个被探测位置注册多个 kprobe,但是目前 jprobe 却不可以;同时也不允许以其他的 jprobe 回调函数和 kprobe 的 `post_handler` 回调函数作为被探测点。
|
||||
2. 一般情况下,可以探测内核中的任何函数,包括中断处理函数。不过在 kernel/kprobes.c 和 arch/*/kernel/kprobes.c 程序中用于实现 kprobes 自身的函数是不允许被探测的,另外还有`do_page_fault` 和 `notifier_call_chain`;
|
||||
3. 如果以一个内联函数为探测点,则 kprobes 可能无法保证对该函数的所有实例都注册探测点。由于 gcc 可能会自动将某些函数优化为内联函数,因此可能无法达到用户预期的探测效果;
|
||||
4. 一个探测点的回调函数可能会修改被探测函数的运行上下文,例如通过修改内核的数据结构或者保存与`struct pt_regs`结构体中的触发探测器之前寄存器信息。因此 kprobes 可以被用来安装 bug 修复代码或者注入故障测试代码;
|
||||
5. kprobes 会避免在处理探测点函数时再次调用另一个探测点的回调函数,例如在`printk()`函数上注册了探测点,而在它的回调函数中可能会再次调用`printk`函数,此时将不再触发`printk`探测点的回调,仅仅是增加了`kprobe`结构体中`nmissed`字段的数值;
|
||||
6. 在 kprobes 的注册和注销过程中不会使用 mutex 锁和动态的申请内存;
|
||||
7. kprobes 回调函数的运行期间是关闭内核抢占的,同时也可能在关闭中断的情况下执行,具体要视CPU架构而定。因此不论在何种情况下,在回调函数中不要调用会放弃 CPU 的函数(如信号量、mutex 锁等);
|
||||
8. kretprobe 通过替换返回地址为预定义的 trampoline 的地址来实现,因此栈回溯和 gcc 内嵌函数`__builtin_return_address()`调用将返回 trampoline 的地址而不是真正的被探测函数的返回地址;
|
||||
9. 如果一个函数的调用次数和返回次数不相等,则在类似这样的函数上注册 kretprobe 将可能不会达到预期的效果,例如`do_exit()`函数会存在问题,而`do_execve()`函数和`do_fork()`函数不会;
|
||||
10. 当在进入和退出一个函数时,如果 CPU 运行在非当前任务所有的栈上,那么往该函数上注册 kretprobe 可能会导致不可预料的后果,因此,kprobes 不支持在 X86_64 的结构下为`__switch_to()`函数注册 kretprobe,将直接返回`-EINVAL`。
|
||||
|
||||
## kprobe 示例
|
||||
|
||||
完整代码如下:
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
|
||||
SEC("kprobe/do_unlinkat")
|
||||
int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
const char *filename;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
filename = BPF_CORE_READ(name, name);
|
||||
bpf_printk("KPROBE ENTRY pid = %d, filename = %s\n", pid, filename);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("kretprobe/do_unlinkat")
|
||||
int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("KPROBE EXIT: pid = %d, ret = %ld\n", pid, ret);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
这段代码是一个简单的 eBPF 程序,用于监测和捕获在 Linux 内核中执行的 unlink 系统调用。unlink 系统调用的功能是删除一个文件,这个 eBPF 程序通过使用 kprobe(内核探针)在`do_unlinkat`函数的入口和退出处放置钩子,实现对该系统调用的跟踪。
|
||||
|
||||
首先,我们导入必要的头文件,如 vmlinux.h,bpf_helpers.h,bpf_tracing.h 和 bpf_core_read.h。接着,我们定义许可证,以允许程序在内核中运行。
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
```
|
||||
|
||||
接下来,我们定义一个名为`BPF_KPROBE(do_unlinkat)`的 kprobe,当进入`do_unlinkat`函数时,它会被触发。该函数接受两个参数:`dfd`(文件描述符)和`name`(文件名结构体指针)。在这个 kprobe 中,我们获取当前进程的 PID(进程标识符),然后读取文件名。最后,我们使用`bpf_printk`函数在内核日志中打印 PID 和文件名。
|
||||
|
||||
```c
|
||||
SEC("kprobe/do_unlinkat")
|
||||
int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
const char *filename;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
filename = BPF_CORE_READ(name, name);
|
||||
bpf_printk("KPROBE ENTRY pid = %d, filename = %s\n", pid, filename);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
接下来,我们定义一个名为`BPF_KRETPROBE(do_unlinkat_exit)`的 kretprobe,当从`do_unlinkat`函数退出时,它会被触发。这个 kretprobe 的目的是捕获函数的返回值(ret)。我们再次获取当前进程的 PID,并使用`bpf_printk`函数在内核日志中打印 PID 和返回值。
|
||||
|
||||
```c
|
||||
SEC("kretprobe/do_unlinkat")
|
||||
int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("KPROBE EXIT: pid = %d, ret = %ld\n", pid, ret);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
eunomia-bpf 是一个结合 Wasm 的开源 eBPF 动态加载运行时和开发工具链,它的目的是简化 eBPF 程序的开发、构建、分发、运行。可以参考 <https://github.com/eunomia-bpf/eunomia-bpf> 下载和安装 ecc 编译工具链和 ecli 运行时。
|
||||
|
||||
要编译这个程序,请使用 ecc 工具:
|
||||
|
||||
```console
|
||||
$ ecc kprobe-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
然后运行:
|
||||
|
||||
```console
|
||||
sudo ecli run package.json
|
||||
```
|
||||
|
||||
在另外一个窗口中:
|
||||
|
||||
```shell
|
||||
touch test1
|
||||
rm test1
|
||||
touch test2
|
||||
rm test2
|
||||
```
|
||||
|
||||
在 /sys/kernel/debug/tracing/trace_pipe 文件中,应该能看到类似下面的 kprobe 演示输出:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9346 [005] d..3 4710.951696: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test1
|
||||
rm-9346 [005] d..4 4710.951819: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
rm-9346 [005] d..3 4710.951852: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test2
|
||||
rm-9346 [005] d..4 4710.951895: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
```
|
||||
|
||||
## 总结
|
||||
|
||||
通过本文的示例,我们学习了如何使用 eBPF 的 kprobe 和 kretprobe 捕获 unlink 系统调用。更多的例子和详细的开发指南,请参考 eunomia-bpf 的官方文档:<https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
|
||||
本文是 eBPF 入门开发实践教程的第二篇。下一篇文章将介绍如何在 eBPF 中使用 fentry 监测捕获 unlink 系统调用。
|
||||
|
||||
如果您希望学习更多关于 eBPF 的知识和实践,可以访问我们的教程代码仓库 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 或网站 <https://eunomia.dev/zh/tutorials/> 以获取更多示例和完整的教程。
|
||||
@@ -1,150 +0,0 @@
|
||||
# eBPF Tutorial by Example 2: Monitoring unlink System Calls with kprobe
|
||||
|
||||
eBPF (Extended Berkeley Packet Filter) is a powerful network and performance analysis tool on the Linux kernel. It allows developers to dynamically load, update, and run user-defined code at runtime.
|
||||
|
||||
This article is the second part of the eBPF Tutorial by Example, focusing on using kprobe to capture the unlink system call in eBPF. The article will first explain the basic concepts and technical background of kprobes, and then introduce how to use kprobe to capture the unlink system call in eBPF.
|
||||
|
||||
## Background of kprobes Technology
|
||||
|
||||
During the debugging process of the kernel or modules, developers often need to know whether certain functions are called, when they are called, whether the execution is correct, and what the input and return values of the functions are. A simple approach is to add log print information to the corresponding functions in the kernel code. However, this approach often requires recompiling the kernel or modules, restarting the device, etc., which is complex and may disrupt the original code execution process.
|
||||
|
||||
By using the kprobes technology, users can define their own callback functions and dynamically insert probes into almost all functions in the kernel or modules (some functions cannot be probed, such as the kprobes' own implementation functions, which will be explained in detail later). When the kernel execution flow reaches the specified probe function, it will invoke the callback function, allowing the user to collect the desired information. The kernel will then return to the normal execution flow. If the user has collected sufficient information and no longer needs to continue probing, the probes can be dynamically removed. Therefore, the kprobes technology has the advantages of minimal impact on the kernel execution flow and easy operation.
|
||||
|
||||
The kprobes technology includes three detection methods: kprobe, jprobe, and kretprobe. First, kprobe is the most basic detection method and serves as the basis for the other two. It allows probes to be placed at any position (including within a function). It provides three callback modes for probes: `pre_handler`, `post_handler`, and `fault_handler`. The `pre_handler` function is called before the probed instruction is executed, the `post_handler` is called after the probed instruction is completed (note that it is not the probed function), and the `fault_handler` is called when a memory access error occurs. The jprobe is based on kprobe and is used to obtain the input values of the probed function. Finally, as the name suggests, kretprobe is also based on kprobe and is used to obtain the return values of the probed function.
|
||||
|
||||
The kprobes technology is not only implemented through software but also requires support from the hardware architecture. This involves CPU exception handling and single-step debugging techniques. The former is used to make the program's execution flow enter the user-registered callback function, and the latter is used to single-step execute the probed instruction. Therefore, not all architectures support kprobes. Currently, kprobes technology supports various architectures, including i386, x86_64, ppc64, ia64, sparc64, arm, ppc, and mips (note that some architecture implementations may not be complete, see the kernel's Documentation/kprobes.txt for details).
|
||||
|
||||
Features and Usage Restrictions of kprobes:
|
||||
|
||||
1. kprobes allows multiple kprobes to be registered at the same probe position, but jprobe currently does not support this. It is also not allowed to use other jprobe callback functions or the `post_handler` callback function of kprobe as probe points.
|
||||
2. In general, any function in the kernel can be probed, including interrupt handlers. However, the functions used to implement kprobes themselves in kernel/kprobes.c and arch/*/kernel/kprobes.c are not allowed to be probed. Additionally, `do_page_fault` and `notifier_call_chain` are also not allowed.
|
||||
3. If an inline function is used as a probe point, kprobes may not be able to guarantee that probe points are registered for all instances of that function. Since gcc may automatically optimize certain functions as inline functions, the desired probing effect may not be achieved.
|
||||
4. The callback function of a probe point may modify the runtime context of the probed function, such as by modifying the kernel's data structure or saving register information before triggering the prober in the `struct pt_regs` structure. Therefore, kprobes can be used to install bug fixes or inject fault testing code.
|
||||
5. kprobes avoids calling the callback function of another probe point again when processing the probe point function. For example, if a probe point is registered on the `printk()` function and the callback function may call `printk()` again, the callback for the `printk` probe point will not be triggered again. Only the `nmissed` field in the `kprobe` structure will be incremented.
|
||||
6. mutex locks and dynamic memory allocation are not used in the registration and removal process of kprobes.
|
||||
|
||||
7. During the execution of kprobes callback functions, kernel preemption is disabled, and it may also be executed with interrupts disabled, which depends on the CPU architecture. Therefore, regardless of the situation, do not call functions that will give up the CPU in the callback function (such as semaphore, mutex lock, etc.);
|
||||
8. kretprobe is implemented by replacing the return address with the pre-defined trampoline address, so stack backtraces and gcc inline function `__builtin_return_address()` will return the address of the trampoline instead of the actual return address of the probed function;
|
||||
9. If the number of function calls and return calls of a function are unequal, registering kretprobe on such a function may not achieve the expected effect, for example, the `do_exit()` function will have problems, while the `do_execve()` function and `do_fork()` function will not;
|
||||
10. When entering and exiting a function, if the CPU is running on a stack that does not belong to the current task, registering kretprobe on that function may have unpredictable consequences. Therefore, kprobes does not support registering kretprobe for the `__switch_to()` function under the X86_64 architecture and will directly return `-EINVAL`.
|
||||
|
||||
## kprobe Example
|
||||
|
||||
The complete code is as follows:
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
|
||||
SEC("kprobe/do_unlinkat")
|
||||
int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
const char *filename;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
filename = BPF_CORE_READ(name, name);
|
||||
bpf_printk("KPROBE ENTRY pid = %d, filename = %s\n", pid, filename);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("kretprobe/do_unlinkat")
|
||||
int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("KPROBE EXIT: pid = %d, ret = %ld\n", pid, ret);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This code is a simple eBPF program used to monitor and capture the unlink system call executed in the Linux kernel. The unlink system call is used to delete a file. This eBPF program traces this system call by placing hooks at the entry and exit points of the `do_unlinkat` function using a kprobe (kernel probe).
|
||||
|
||||
First, we import necessary header files such as vmlinux.h, bpf_helpers.h, bpf_tracing.h, and bpf_core_read.h. Then, we define a license to allow the program to run in the kernel.
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
```
|
||||
|
||||
Next, we define a kprobe named `BPF_KPROBE(do_unlinkat)` which gets triggered when the `do_unlinkat` function is entered. It takes two parameters: `dfd` (file descriptor) and `name` (filename structure pointer). In this kprobe, we retrieve the PID (process identifier) of the current process and then read the filename. Finally, we use the `bpf_printk` function to print the PID and filename in the kernel log.
|
||||
|
||||
```c
|
||||
SEC("kprobe/do_unlinkat")
|
||||
int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
const char *filename;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
filename = BPF_CORE_READ(name, name);
|
||||
bpf_printk("KPROBE ENTRY pid = %d, filename = %s\n", pid, filename);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
Next, we define a kretprobe named `BPF_KRETPROBE(do_unlinkat_exit)` that will be triggered when exiting the `do_unlinkat` function. The purpose of this kretprobe is to capture the return value (`ret`) of the function. We again obtain the PID of the current process and use the `bpf_printk` function to print the PID and return value in the kernel log.
|
||||
|
||||
```c
|
||||
SEC("kretprobe/do_unlinkat")
|
||||
int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("KPROBE EXIT: pid = %d, ret = %ld\n", pid, ret);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
eunomia-bpf is an open-source eBPF dynamic loading runtime and development toolchain that combines with Wasm. Its goal is to simplify the development, build, distribution, and execution of eBPF programs. You can refer to <https://github.com/eunomia-bpf/eunomia-bpf> to download and install the ecc compiler toolchain and ecli runtime.
|
||||
|
||||
To compile this program, use the ecc tool:
|
||||
|
||||
```console
|
||||
$ ecc kprobe-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Then run:
|
||||
|
||||
```console
|
||||
sudo ecli run package.json
|
||||
```
|
||||
|
||||
In another window:
|
||||
|
||||
```shell
|
||||
touch test1
|
||||
rm test1
|
||||
touch test2
|
||||
rm test2
|
||||
```
|
||||
|
||||
You should see kprobe demo output similar to the following in the /sys/kernel/debug/tracing/trace_pipe file:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9346 [005] d..3 4710.951696: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test1
|
||||
rm-9346 [005] d..4 4710.951819: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
rm-9346 [005] d..3 4710.951852: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test2
|
||||
rm-9346 [005] d..4 4710.951895: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
In this article's example, we learned how to use eBPF's kprobe and kretprobe to capture the unlink system call. For more examples and detailed development guides, please refer to the official documentation of eunomia-bpf: <https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
|
||||
This article is the second part of the introductory eBPF development tutorial. The next article will explain how to use fentry to monitor and capture the unlink system call in eBPF.
|
||||
|
||||
If you'd like to learn more about eBPF knowledge and practices, you can visit our tutorial code repository at <https://github.com/eunomia-bpf/bpf-developer-tutorial> or website <https://eunomia.dev/tutorials/> for more examples and complete tutorials.
|
||||
Reference in New Issue
Block a user