Add new post for 28 29 34 (#92)

* add 28 blog

* update test

* add 34

* Update test-libbpf.yml

* add link

* update 28

* add read only

* Update english version

* update 29

* update 29

* fix ci for 34
This commit is contained in:
云微
2024-01-19 23:48:42 +00:00
committed by GitHub
parent 0587db4c42
commit deee286952
21 changed files with 1047 additions and 82 deletions

View File

@@ -36,6 +36,7 @@ jobs:
run: |
./ecc src/4-opensnoop/opensnoop.bpf.c
sudo timeout -s 2 3 ./ecli run src/4-opensnoop/package.json || if [ $? = 124 ]; then exit 0; else exit $?; fi
sudo timeout -s 2 3 ./ecli run src/4-opensnoop/package.json --pid_target 1 || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 5 bashreadline
run: |
./ecc src/5-uprobe-bashreadline/bashreadline.bpf.c
@@ -68,7 +69,12 @@ jobs:
run: |
./ecc src/23-http/accept.bpf.c src/23-http/accept.h
sudo timeout -s 2 3 ./ecli run src/23-http/package.json || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 34 syscall
run: |
./ecc src/34-syscall/open_modify.bpf.c src/34-syscall/open_modify.h
sudo timeout -s 2 3 ./ecli run src/34-syscall/package.json || if [ $? = 124 ]; then exit 0; else exit $?; fi
./ecc src/34-syscall/exechijack.bpf.c src/34-syscall/exechijack.h
sudo timeout -s 2 3 ./ecli run src/34-syscall/package.json || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 25 signal
run: |
./ecc src/25-signal/signal.bpf.c src/25-signal/signal.h

View File

@@ -30,24 +30,34 @@ jobs:
- name: test 13 tcpconnlat
run: |
make -C src/13-tcpconnlat
# sudo timeout -s 2 3 src/13-tcpconnlat/tcpconnlat || if [ $? = 124 ]; then exit 0; else exit $?; fi
sudo timeout -s 2 3 src/13-tcpconnlat/tcpconnlat || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 14 tcpstates
run: |
make -C src/14-tcpstates
# sudo timeout -s 2 3 src/14-tcpstates/tcpstates || if [ $? = 124 ]; then exit 0; else exit $?; fi
sudo timeout -s 2 3 src/14-tcpstates/tcpstates || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 16 memleak
run: |
make -C src/16-memleak
# sudo timeout -s 2 3 src/16-memleak/memleak || if [ $? = 124 ]; then exit 0; else exit $?; fi
sudo timeout -s 2 3 src/16-memleak/memleak || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 17 biopattern
run: |
make -C src/17-biopattern
# sudo timeout -s 2 3 src/17-biopattern/biopattern || if [ $? = 124 ]; then exit 0; else exit $?; fi
sudo timeout -s 2 3 src/17-biopattern/biopattern || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 23 http
run: |
make -C src/23-http
# sudo timeout -s 2 3 src/23-http/sockfilter || if [ $? = 124 ]; then exit 0; else exit $?; fi
sudo timeout -s 2 3 src/23-http/sockfilter || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 28 detach
run: |
make -C src/28-detach
sudo mount bpffs -t bpf /sys/fs/bpf
sudo mkdir /sys/fs/bpf/textreplace
# sudo src/28-detach/textreplace2 -f /proc/modules -i 'joydev' -r 'cryptd' -d || if [ $? = 124 ]; then exit 0; else exit $?; fi
- name: test 29 sockops
run: |
make -C src/29-sockops
# TODO: add test
- name: test 30 sslsniff
run: |
make -C src/30-sslsniff
# sudo timeout -s 2 3 src/30-sslsniff/sslsniff || if [ $? = 124 ]; then exit 0; else exit $?; fi
sudo timeout -s 2 3 src/30-sslsniff/sslsniff || if [ $? = 124 ]; then exit 0; else exit $?; fi

View File

@@ -2,7 +2,7 @@
eBPF (扩展的伯克利数据包过滤器) 是 Linux 内核的一种革命性技术,允许用户在内核空间执行自定义程序,而不需要修改内核源代码或加载任何内核模块。这使得开发人员可以非常灵活地对 Linux 系统进行观测、修改和控制。
本文介绍了如何使用 eBPF 的 bpf_send_signal 功能,向指定的进程发送信号进行干预。更多的教程文档,请参考 <https://github.com/eunomia-bpf/bpf-developer-tutorial>
本文介绍了如何使用 eBPF 的 bpf_send_signal 功能,向指定的进程发送信号进行干预。本文完整的源代码和更多的教程文档,请参考 <https://github.com/eunomia-bpf/bpf-developer-tutorial>
## 使用场景

View File

@@ -2,7 +2,7 @@
eBPF (Extended Berkeley Packet Filter) is a revolutionary technology in the Linux kernel that allows users to execute custom programs in kernel space without modifying the kernel source code or loading any kernel modules. This provides developers with great flexibility to observe, modify, and control the Linux system.
This article introduces how to use the `bpf_send_signal` feature of eBPF to intervene by sending signals to specified processes. For more tutorial documentation, please refer to <https://github.com/eunomia-bpf/bpf-developer-tutorial>.
This article introduces how to use the `bpf_send_signal` feature of eBPF to intervene by sending signals to specified processes. For more tutorial documentation and complete source code, please refer to <https://github.com/eunomia-bpf/bpf-developer-tutorial>.
## Use Cases

View File

@@ -1,5 +1,7 @@
# 使用 eBPF 添加 sudo 用户
本文完整的源代码:<https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/26-sudo>
编译:
```bash

View File

@@ -1,5 +1,7 @@
# Using eBPF to add sudo user
The full source code for this article can be found at <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/26-sudo>
Compilation:
```bash

View File

@@ -1,52 +1,106 @@
# 在用户态应用退出后运行 eBPF 程序eBPF 程序的生命周期
# 在应用程序退出后运行 eBPF 程序eBPF 程序的生命周期
通过使用 detach 的方式运行 eBPF 程序,用户空间加载器可以退出,而不会停止 eBPF 程序
eBPFExtended Berkeley Packet Filter是 Linux 内核中的一项重大技术创新,允许用户在内核空间中执行自定义程序,而无需修改内核源代码或加载任何内核模块。这为开发人员提供了极大的灵活性,可以观察、修改和控制 Linux 系统
本文将介绍 eBPF 程序的生命周期,以及如何在用户空间应用程序退出后继续运行 eBPF 程序的方法,还将介绍如何使用 "pin" 在不同进程之间共享 eBPF 对象。本文是 eBPF 开发者教程的一部分,更多详细信息可以在 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 和 <https://eunomia.dev/tutorials> 中找到。
通过使用 "detach" 方法来运行 eBPF 程序,用户空间加载程序可以在不停止 eBPF 程序的情况下退出。另外,使用 "pin" 的方法可以在进程之间共享 eBPF 对象,使其保持活动状态。
## eBPF 程序的生命周期
首先,我们需要了解一些关键的概念,如 BPF 对象(包括程序,地图和调试信息)文件描述符 (FD)引用计数refcnt等。在 eBPF 系统中,用户空间通过文件描述符访问 BPF 对象,而每个对象都有一个引用计数。当一个对象被创建时,其引用计数初始为1。如果该对象不再被使用即没有其他程序或文件描述符引用它它的引用计数将降至0并在 RCU 宽限期后被内存清理
BPF对象包括程序、映射和调试信息)通过文件描述符FD进行访问并具有引用计数器。每个对象都有一个引用计数器用于追踪对象被引用的次数。例如当创建一个映射时内核会分配一个struct bpf_map对象并将其引用计数初始为1。然后,将映射的文件描述符返回给用户空间进程。如果进程退出或崩溃,文件描述符将被关闭,并且映射的引用计数将减少。当引用计数为零时,内存将被释放
接下来,我们需要了解 eBPF 程序的生命周期。首先,当你创建一个 BPF 程序,并将它连接到某个“钩子”(例如网络接口,系统调用等),它的引用计数会增加。然后,即使原始创建和加载该程序的用户空间进程退出,只要 BPF 程序的引用计数大于 0它就会保持活动状态。然而这个过程中有一个重要的点是不是所有的钩子都是相等的。有些钩子是全局的比如 XDP、tc's clsact 和 cgroup-based 钩子。这些全局钩子会一直保持 BPF 程序的活动状态,直到这些对象自身消失。而有些钩子是局部的,只在拥有它们的进程存活期间运行
BPF程序使用 maps 有两个阶段。首先,创建 maps 并将其文件描述符存储为BPF_LD_IMM64指令的一部分。当内核验证程序时它会增加程序使用的 maps 的引用计数并将程序的引用计数初始化为1。此时用户空间可以关闭与maps 相关的文件描述符,但 maps 不会被销毁,因为程序仍然在使用它们。当程序文件描述符关闭且引用计数为零时,销毁逻辑将减少 maps 的引用计数。这允许多个不同类型的程序同时使用同一个 maps
对于 BPF 对象程序或映射的生命周期管理另一个关键的操作是“分离”detach。这个操作会阻止已附加程序的任何未来执行。然后对于需要替换 BPF 程序的情况你可以使用替换replace操作。这是一个复杂的过程因为你需要确保在替换过程中不会丢失正在处理的事件而且新旧程序可能在不同的 CPU 上同时运行
当程序附加到一个挂钩时,程序的引用计数增加。用户空间进程创建 maps 和程序,然后加载程序并将其附加到挂钩上后,就可以退出了。此时,由用户空间创建的 maps 和程序将保持活动状态,因为引用计数>0。这就是BPF对象的生命周期。只要BPF对象的引用计数>0内核将保持其活动状态
最后,除了通过文件描述符和引用计数来管理 BPF 对象的生命周期,还有一个叫做 BPFFS 的方法也就是“BPF 文件系统”。用户空间进程可以在 BPFFS 中“固定”pin一个 BPF 程序或映射,这将增加对象的引用计数,使得即使 BPF 程序未附加到任何地方或 BPF 映射未被任何程序使用,该 BPF 对象也将保持活动状态
然而不同的附加点的行为不同。一些附加点如XDP、tc的clsact和基于cgroup的hooks是全局的即使没有进程使用它们程序也会继续处理数据包。另一些附加点如kprobe、uprobe、tracepoint、perf_event、raw_tracepoint、socket过滤器和so_reuseport挂钩只在持有事件的进程的生命周期内生效。当这些进程崩溃时内核将分离BPF程序并减少其引用计数
所以,当我们谈论在后台运行 eBPF 程序时,我们需要清楚这个过程的含义。在某些情况下,即使用户空间进程已经退出,我们可能还希望 BPF 程序保持运行。这就需要我们正确地管理 BPF 对象的生命周期
总结XDP、tc、lwt和cgroup挂钩是全局的而kprobe、uprobe、tracepoint、perf_event、raw_tracepoint、socket过滤器和so_reuseport挂钩是本地于进程的。基于文件描述符的API具有自动清理的优点因此如果用户空间进程出现问题内核将自动清理所有对象。在网络方面基于文件描述符的API可以防止程序无限制地运行。
## 运行
另一种保持 BPF 程序和映射活动的方法是 BPFFS即BPF文件系统。通过将程序或 maps 固定(pin)到BPFFS中的某个位置可以增加其引用计数并使其保持活动状态即使没有附加到任何位置或任何程序使用固定的BPF程序和 maps 。
这里还是采用了上一个的字符串替换的应用,来体现对应可能的安全风险。通过使用 `--detach` 运行程序,用户空间加载器可以退出,而不会停止 eBPF 程序
了解BPF程序和 maps 的生命周期对于用户安全、可靠地使用BPF是非常重要的。文件描述符、引用计数器和 BPFFS 等机制有助于管理BPF对象的生命周期确保它们的正确创建、附加、分离和替换
编译:
### Kubernetes 中的 eBPF通过远程过程调用RPC部署 eBPF 程序
```bash
make
在 Kubernetes 环境中,部署 eBPF 程序通常需要更高级别的系统权限。通常,这些应用程序需要至少 CAP_BPF 权限,根据程序类型的不同,可能还需要其他权限。在多租户的 Kubernetes 环境中,为每个容器或应用程序授予广泛的权限可能带来安全风险。
为了解决权限问题一种方法是通过固定pinningeBPF 映射来减轻权限要求。固定允许 eBPF 对象在创建它们的进程的生命周期之外保持活动状态,以便其他进程可以访问它们。在 Kubernetes 中,不同的容器可能需要与相同的 eBPF 对象进行交互,因此固定对象很有用。
例如,可以使用特权的初始化器容器来创建并固定一个 eBPF 映射。随后的容器(可能以较低权限运行)可以与固定的 eBPF 对象进行交互。这种方法将权限要求限制在初始化阶段,增强了整体安全性。
在这种背景下bpfman 项目发挥了关键作用。bpfman即 BPF Daemon旨在以更受控且更安全的方式管理 eBPF 程序和映射的生命周期。它充当用户空间与内核空间之间的中间层,提供加载和管理 eBPF 程序的机制,而无需为每个单独的容器或应用程序授予广泛的权限。
在 Kubernetes 中bpfman 可以作为特权服务部署,负责在集群的不同节点上加载和管理 eBPF 程序。它可以处理 eBPF 生命周期管理的复杂性,如加载、卸载、更新 eBPF 程序,并对其状态进行管理。这种集中化的方法简化了在 Kubernetes 集群中部署和管理 eBPF 程序的过程,同时符合安全最佳实践。
## 使用 Detach 在应用程序退出后通过任何程序替换 eBPF
在 libbpf 中,可以使用 `bpf_object__pin_maps` 函数将映射固定到 BPF 对象中。对于程序和链接,也有类似的 API。
以下是一个示例,演示如何使用类似于前一节中的 textreplace 程序的字符串替换示例来展示 detach 方法。可以使用类似的代码将程序、映射和链接固定到 BPF 对象中:
```c
int pin_program(struct bpf_program *prog, const char* path)
{
int err;
err = bpf_program__pin(prog, path);
if (err) {
fprintf(stdout, "could not pin prog %s: %d\n", path, err);
return err;
}
return err;
}
int pin_map(struct bpf_map *map, const char* path)
{
int err;
err = bpf_map__pin(map, path);
if (err) {
fprintf(stdout, "could not pin map %s: %d\n", path, err);
return err;
}
return err;
}
int pin_link(struct bpf_link *link, const char* path)
{
int err;
err = bpf_link__pin(link, path);
if (err) {
fprintf(stdout, "could not pin link %s: %d\n", path, err);
return err;
}
return err;
}
```
在运行前,请首先确保 bpf 文件系统已经被挂载:
## 运行示例
在这个示例中,我们将继续使用前一节中的字符串替换示例来演示在应用程序退出后运行 eBPF 程序的方法,并展示潜在的安全风险。通过使用 `--detach` 参数运行该程序,可以使用户空间加载程序在不停止 eBPF 程序的情况下退出。完整的示例代码可以在 <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/28-detach> 中找到。
在运行之前,请确保已经挂载了 BPF 文件系统:
```bash
sudo mount bpffs -t bpf /sys/fs/bpf
mkdir /sys/fs/bpf/textreplace
```
然后,可以分离运行 text-replace2
然后,可以使用以下命令运行带有 detach 参数的 text-replace2 程序
```bash
./textreplace2 -f /proc/modules -i 'joydev' -r 'cryptd' -d
```
这将在 `/sys/fs/bpf/textreplace` 下创建一些 eBPF 链接文件。
一旦加载器成功运行,你可以通过运行以下命令检查日志:
这将在 `/sys/fs/bpf/textreplace` 目录下创建一些 eBPF 链接文件。加载程序成功运行后,可以使用以下命令检查日志:
```bash
sudo cat /sys/kernel/debug/tracing/trace_pipe
# 确认链接文件存在
# 确认链接文件是否存在
sudo ls -l /sys/fs/bpf/textreplace
```
后,要停止,只需删除链接文件即可
后,要停止程序,只需删除链接文件:
```bash
sudo rm -r /sys/fs/bpf/textreplace
@@ -54,5 +108,8 @@ sudo rm -r /sys/fs/bpf/textreplace
## 参考资料
- <https://github.com/pathtofile/bad-bpf>
- <https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html>
您可以访问我们的教程代码仓库 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 或网站 <https://eunomia.dev/zh/tutorials/> 以获取更多示例和完整的教程。
- [bad-bpf](https://github.com/pathtofile/bad-bpf)
- [Object Lifetime in the Linux kernel](https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html)
- [BPFMan: A Novel Way to Manage eBPF—Beyond Capsule Mode](https://bpfman.io/main/blog/2023/09/07/bpfman-a-novel-way-to-manage-ebpf)

View File

@@ -1,23 +1,91 @@
# Running eBPF Programs After User-Space Application Exits: The Lifecycle of eBPF Programs
# Running eBPF After Application Exits: The Lifecycle of eBPF Programs
By using the detach method to run eBPF programs, the user space loader can exit without stopping the eBPF program.
eBPF (Extended Berkeley Packet Filter) is a revolutionary technology in the Linux kernel that allows users to execute custom programs in kernel space without modifying the kernel source code or loading any kernel modules. This provides developers with great flexibility to observe, modify, and control the Linux system.
This article introduces the Lifecycle of eBPF Programs, how to run eBPF programs after user-space application exits, and how to use pin to share eBPF objects between processes. This article is part of the eBPF Developer Tutorial, more details can be found in <https://github.com/eunomia-bpf/bpf-developer-tutorial> and <https://eunomia.dev/tutorials>
By using the detach method to run eBPF programs, the user space loader can exit without stopping the eBPF program. Another common use case for pinning is sharing eBPF objects between processes. For example, one could create a Map from Go, pin it, and inspect it using `bpftool map dump pinned /sys/fs/bpf/my_map`.
## The Lifecycle of eBPF Programs
First, we need to understand some key concepts, such as BPF objects (including programs, maps, and debug information), file descriptors (FDs), reference counting (refcnt), etc. In the eBPF system, user space accesses BPF objects through file descriptors, and each object has a reference count. When an object is created, its reference count is initialized to 1. If the object is no longer in use (i.e., no other programs or file descriptors reference it), its reference count will decrease to 0 and be cleaned up in memory after the RCU grace period.
File descriptors and reference counters are used to manage BPF objects (progs, maps, and debug info). When a map is created, the kernel initializes its reference counter to 1 and returns a file descriptor to the user space process. If the process exits or crashes, the file descriptor is closed and the reference counter of the map is decremented. After the RCU grace period, the map is freed from memory.
Next, we need to understand the lifecycle of eBPF programs. First, when you create a BPF program and attach it to a "hook" (e.g., a network interface, a system call, etc.), its reference count increases. Then, even if the user space process that originally created and loaded the program exits, as long as the reference count of the BPF program is greater than 0, it will remain active. However, there is an important point in this process: not all hooks are equal. Some hooks are global, such as XDP, tc's clsact, and cgroup-based hooks. These global hooks keep the BPF program active until the objects themselves disappear. Some hooks are local and only run during the lifetime of the process that owns them.
BPF programs that use BPF maps are loaded in two phases. The maps are created and their file descriptors are stored in the program's 'imm' field. The kernel increments the reference counters of the maps used by the program and initializes the program's reference counter to 1. Even if the user space process closes the file descriptors associated with the maps, the maps will not disappear because the program is still "using" them. When the file descriptor of the program is closed and its reference counter reaches zero, the destruction logic decrements the reference counters of all maps used by the program. This allows the same map to be used by multiple programs at once.
For managing the lifecycle of BPF objects (programs or maps), another key operation is "detach." This operation prevents any future execution of the attached program. Then, in cases where you need to replace a BPF program, you can use the replace operation. This is a complex process because you need to ensure that during the replacement process, no events being processed are lost, and the old and new program may run simultaneously on different CPUs.
When a program is attached to a hook, its reference counter is incremented. The user space process that created the maps and program can then exit, and the maps and program will remain alive as long as their reference counters are greater than zero. This is the lifecycle of a BPF object.
Finally, in addition to managing the lifecycle of BPF objects through file descriptors and reference counting, there is another method called BPFFS, which is the "BPF Filesystem." User space processes can "pin" a BPF program or map in BPFFS, which increases the reference count of the object, keeping the BPF object active even if the BPF program is not attached anywhere or the BPF map is not used by any program.
Not all attachment points are the same. XDP, tc's clsact, and cgroup-based hooks are global, meaning that programs will stay attached to them as long as the corresponding objects are alive. On the other hand, programs attached to kprobe, uprobe, tracepoint, perf_event, raw_tracepoint, socket filters, and so_reuseport hooks are local to the process. If the process crashes or closes the file descriptors associated with these hooks, the kernel will detach the BPF program and decrement its reference counter.
So when we talk about running eBPF programs in the background, we need to understand the meaning of this process. In some cases, even if the user space process has exited, we may still want the BPF program to keep running. This requires us to manage the lifecycle of BPF objects correctly.
The file descriptor based interface provides auto-cleanup, meaning that if anything goes wrong with the user space process, the kernel will automatically clean up all BPF objects. This interface is useful for networking as well. The use of BPFFS (BPF File System) allows a process to pin a BPF program or map, which increments their reference counters and keeps them alive even if they are not attached or used by any program. This is useful when an admin wants to examine a map even when the associated program is not running.
Detach and replace are important aspects of the lifetime of a BPF program. The detach hook prevents the execution of a previously attached program from any future events, while the replace feature allows a program to be replaced in cgroup-based hooks. There is a window where the old and new programs can be executing on different CPUs, but the kernel guarantees that one of them will be processing events. Some BPF developers use a scheme where the new program is loaded with the same maps as the old program to ensure safe replacement.
Overall, understanding the lifetime of BPF programs and maps is crucial for users to use BPF safely and without surprises. The use of file descriptors, reference counters, and BPFFS helps manage the lifecycle of BPF objects, ensuring their proper creation, attachment, detachment, and replacement.
### eBPF in Kubernetes: Deploy eBPF Programs via Remote Procedure Call
In a Kubernetes environment, deploying eBPF programs often necessitates a higher level of system privileges. Typically, these applications require at least CAP_BPF permissions, and depending on the program type, they may need even more. This requirement poses a challenge in a multi-tenant Kubernetes environment where granting extensive privileges can be a security risk.
#### Using Pin to Mitigate Privilege Requirements
One way to address the privilege issue is through the use of pinning eBPF maps. Pinning allows eBPF objects to persist beyond the life of the process that created them, making them accessible to other processes. This method can be particularly useful in Kubernetes, where different containers might need to interact with the same eBPF objects.
For example, an eBPF map can be created and pinned by a privileged initializer container. Subsequent containers, which may run with fewer privileges, can then interact with the pinned eBPF objects. This approach limits the need for elevated privileges to the initialization phase, thereby enhancing overall security.
#### The Role of bpfman in eBPF Lifecycle Management
The bpfman project can play a crucial role in this context. bpfman, or BPF Daemon, is designed to manage the lifecycle of eBPF programs and maps in a more controlled and secure manner. It acts as a mediator between user space and kernel space, providing a mechanism to load and manage eBPF programs without granting extensive privileges to each individual container or application.
In Kubernetes, bpfman could be deployed as a privileged service, responsible for loading and managing eBPF programs across different nodes in the cluster. It can handle the intricacies of eBPF lifecycle management, such as loading, unloading, updating eBPF programs, and managing their state. This centralized approach simplifies the deployment and management of eBPF programs in a Kubernetes cluster, while adhering to security best practices.
## Use Detach to Replace by Any Program with eBPF After it Exits
In libbpf, the `bpf_object__pin_maps` function can be used to pin the maps in the BPF object, the programs and links has similar API.
Here we use similar programs as textreplace in the previous section to demonstrate the detach method, the pin eBPF code is like:
```c
int pin_program(struct bpf_program *prog, const char* path)
{
int err;
err = bpf_program__pin(prog, path);
if (err) {
fprintf(stdout, "could not pin prog %s: %d\n", path, err);
return err;
}
return err;
}
int pin_map(struct bpf_map *map, const char* path)
{
int err;
err = bpf_map__pin(map, path);
if (err) {
fprintf(stdout, "could not pin map %s: %d\n", path, err);
return err;
}
return err;
}
int pin_link(struct bpf_link *link, const char* path)
{
int err;
err = bpf_link__pin(link, path);
if (err) {
fprintf(stdout, "could not pin link %s: %d\n", path, err);
return err;
}
return err;
}
```
## Running
Here, we still use the example of string replacement used in the previous application to demonstrate potential security risks. By using `--detach` to run the program, the user space loader can exit without stopping the eBPF program.
The code of This example can be found in <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/28-detach>
Compilation:
```bash
@@ -53,5 +121,8 @@ sudo rm -r /sys/fs/bpf/textreplace
## References
You can visit our tutorial code repository [at https://github.com/eunomia-bpf/bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) or our website [at https://eunomia.dev/zh/tutorials/](https://eunomia.dev/zh/tutorials/) for more examples and a complete tutorial.
- <https://github.com/pathtofile/bad-bpf>
- <https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html>
- <https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html>
- <https://bpfman.io/main/blog/2023/09/07/bpfman-a-novel-way-to-manage-ebpf>

View File

@@ -245,6 +245,8 @@ int BPF_PROG(find_possible_addrs, struct pt_regs *regs, long ret)
return 0;
}
char name_cmp[TEXT_LEN_MAX+1];
SEC("fexit/__x64_sys_read")
int BPF_PROG(check_possible_addresses, struct pt_regs *regs, long ret)
{
@@ -260,7 +262,6 @@ int BPF_PROG(check_possible_addresses, struct pt_regs *regs, long ret)
unsigned int newline_counter = 0;
unsigned int match_counter = 0;
char name[TEXT_LEN_MAX+1];
unsigned int j = 0;
char old = 0;
@@ -289,14 +290,15 @@ int BPF_PROG(check_possible_addresses, struct pt_regs *regs, long ret)
if (name_addr == 0) {
break;
}
bpf_probe_read_user(&name, TEXT_LEN_MAX, (char*)name_addr);
bpf_probe_read_user(&name_cmp, TEXT_LEN_MAX, (char*)name_addr);
for (j = 0; j < TEXT_LEN_MAX; j++) {
if (name[j] != pFind->text[j]) {
if (name_cmp[j] != pFind->text[j]) {
break;
}
}
// for newer kernels, maybe use bpf_strncmp
// if (bpf_strncmp(pFind->text, TEXT_LEN_MAX, name) == 0) {
// const char *p = name_cmp;
// if (bpf_strncmp(pFind->text, TEXT_LEN_MAX, p) == 0) {
if (j >= name_len) {
// ***********
// We've found out text!

View File

@@ -1,4 +1,12 @@
# eBPF sockops 示例
# eBPF 开发实践:使用 sockops 加速网络请求转发
eBPF扩展的伯克利数据包过滤器是 Linux 内核中的一个强大功能,可以在无需更改内核源代码或重启内核的情况下,运行、加载和更新用户定义的代码。这种功能让 eBPF 在网络和系统性能分析、数据包过滤、安全策略等方面有了广泛的应用。
本教程将关注 eBPF 在网络领域的应用,特别是如何使用 sockops 类型的 eBPF 程序来加速本地网络请求的转发。这种应用通常在使用软件负载均衡器进行请求转发的场景中很有价值,比如使用 Nginx 或 HAProxy 之类的工具。
在许多工作负载中如微服务架构下的服务间通信通过本机进行的网络请求的性能开销可能会对整个应用的性能产生显著影响。由于这些请求必须经过本机的网络栈其处理性能可能会成为瓶颈尤其是在高并发的场景下。为了解决这个问题sockops 类型的 eBPF 程序可以用于加速本地的请求转发。sockops 程序可以在内核空间管理套接字,实现在本机上的套接字之间直接转发数据包,从而降低了在 TCP/IP 栈中进行数据包转发所需的 CPU 时间。
本教程将会通过一个具体的示例演示如何使用 sockops 类型的 eBPF 程序来加速网络请求的转发。为了让你更好地理解如何使用 sockops 程序,我们将逐步介绍示例程序的代码,并讨论每个部分的工作原理。完整的源代码和工程可以在 <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/29-sockops> 中找到。
## 利用 eBPF 的 sockops 进行性能优化
@@ -10,12 +18,114 @@ Merbridge 项目就是这样实现了用 eBPF 代替 iptables 为 Istio 进行
![merbridge](merbridge.png)
## 运行样例
## 示例程序
此示例程序从发送者的套接字(出口)重定向流量至接收者的套接字(入口),**跳过 TCP/IP 内核网络栈**。在这个示例中,我们假定发送者和接收者都在**同一台**机器上运行。
此示例程序从发送者的套接字(出口)重定向流量至接收者的套接字(入口),**跳过 TCP/IP 内核网络栈**。在这个示例中,我们假定发送者和接收者都在**同一台**机器上运行。这个示例程序有两个部分,它们共享一个 map 定义:
bpf_sockmap.h
```c
#include "vmlinux.h"
#include <bpf/bpf_endian.h>
#include <bpf/bpf_helpers.h>
#define LOCALHOST_IPV4 16777343
struct sock_key {
__u32 sip;
__u32 dip;
__u32 sport;
__u32 dport;
__u32 family;
};
struct {
__uint(type, BPF_MAP_TYPE_SOCKHASH);
__uint(max_entries, 65535);
__type(key, struct sock_key);
__type(value, int);
} sock_ops_map SEC(".maps");
```
这个示例程序中的 BPF 程序被分为两个部分 `bpf_redirect.bpf.c``bpf_contrack.bpf.c`
- `bpf_contrack.bpf.c` 中的 BPF 代码定义了一个套接字操作(`sockops`)程序,它的功能主要是当本机(使用 localhost上的任意 TCP 连接被创建时,根据这个新连接的五元组(源地址,目标地址,源端口,目标端口,协议),在 `sock_ops_map` 这个 BPF MAP 中创建一个条目。这个 BPF MAP 被定义为 `BPF_MAP_TYPE_SOCKHASH` 类型,可以存储套接字和对应的五元组。这样使得每当本地 TCP 连接被创建的时候,这个连接的五元组信息也能够在 BPF MAP 中找到。
- `bpf_redirect.bpf.c` 中的 BPF 代码定义了一个网络消息 (sk_msg) 处理程序,当本地套接字上有消息到达时会调用这个程序。然后这个 sk_msg 程序检查该消息是否来自本地地址,如果是,根据获取的五元组信息(源地址,目标地址,源端口,目标端口,协议)在 `sock_ops_map` 查找相应的套接字,并将该消息重定向到在 `sock_ops_map` 中找到的套接字上,这样就实现了绕过内核网络栈。
举个例子,我们假设有两个进程在本地运行,进程 A 绑定在 8000 端口上,进程 B 绑定在 9000 端口上,进程 A 向进程 B 发送消息。
1. 当进程 A 首次和进程 B 建立 TCP 连接时,触发 `bpf_contrack.bpf.c` 中的 `sockops` 程序,这个程序将五元组 `{127.0.0.1, 127.0.0.1, 8000, 9000, TCP}` 存入 `sock_ops_map`,值为进程 A 的套接字。
2. 当进程 A 发送消息时,触发 `bpf_redirect.bpf.c` 中的 `sk_msg` 程序,然后 `sk_msg` 程序将消息从进程 A 的套接字重定向到 `sock_ops_map` 中存储的套接字(进程 A 的套接字)上,因此,消息被直接从进程 A 输送到进程 B绕过了内核网络栈。
这个示例程序就是通过 BPF 实现了在本地通信时,快速将消息从发送者的套接字重定向到接收者的套接字,从而绕过了内核网络栈,以提高传输效率。
bpf_redirect.bpf.c
```c
#include "bpf_sockmap.h"
char LICENSE[] SEC("license") = "Dual BSD/GPL";
SEC("sk_msg")
int bpf_redir(struct sk_msg_md *msg)
{
if(msg->remote_ip4 != LOCALHOST_IPV4 || msg->local_ip4!= LOCALHOST_IPV4)
return SK_PASS;
struct sock_key key = {
.sip = msg->remote_ip4,
.dip = msg->local_ip4,
.dport = bpf_htonl(msg->local_port), /* convert to network byte order */
.sport = msg->remote_port,
.family = msg->family,
};
return bpf_msg_redirect_hash(msg, &sock_ops_map, &key, BPF_F_INGRESS);
}
```
bpf_contrack.bpf.c
```c
#include "bpf_sockmap.h"
char LICENSE[] SEC("license") = "Dual BSD/GPL";
SEC("sockops")
int bpf_sockops_handler(struct bpf_sock_ops *skops){
u32 family, op;
family = skops->family;
op = skops->op;
if (op != BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
&& op != BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB) {
return BPF_OK;
}
if(skops->remote_ip4 != LOCALHOST_IPV4 || skops->local_ip4!= LOCALHOST_IPV4) {
return BPF_OK;
}
struct sock_key key = {
.dip = skops->remote_ip4,
.sip = skops->local_ip4,
.sport = bpf_htonl(skops->local_port), /* convert to network byte order */
.dport = skops->remote_port,
.family = skops->family,
};
bpf_printk(">>> new connection: OP:%d, PORT:%d --> %d\n", op, bpf_ntohl(key.sport), bpf_ntohl(key.dport));
bpf_sock_hash_update(skops, &sock_ops_map, &key, BPF_NOEXIST);
return BPF_OK;
}
```
### 编译 eBPF 程序
这里我们使用 libbpf 编译这个 eBPF 程序。完整的源代码和工程可以在 <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/29-sockops> 中找到。
```shell
# Compile the bpf program with libbpf
make
@@ -23,10 +133,50 @@ make
### 加载 eBPF 程序
我们编写了一个脚本来加载 eBPF 程序,它会自动加载两个 eBPF 程序并创建一个 BPF MAP
```shell
sudo ./load.sh
```
这个脚本实际上完成了这些操作:
```sh
#!/bin/bash
set -x
set -e
sudo mount -t bpf bpf /sys/fs/bpf/
# check if old program already loaded
if [ -e "/sys/fs/bpf/bpf_sockops" ]; then
echo ">>> bpf_sockops already loaded, uninstalling..."
./unload.sh
echo ">>> old program already deleted..."
fi
# load and attach sock_ops program
sudo bpftool prog load bpf_contrack.bpf.o /sys/fs/bpf/bpf_sockops type sockops pinmaps /sys/fs/bpf/
sudo bpftool cgroup attach "/sys/fs/cgroup/" sock_ops pinned "/sys/fs/bpf/bpf_sockops"
# load and attach sk_msg program
sudo bpftool prog load bpf_redirect.bpf.o "/sys/fs/bpf/bpf_redir" map name sock_ops_map pinned "/sys/fs/bpf/sock_ops_map"
sudo bpftool prog attach pinned /sys/fs/bpf/bpf_redir msg_verdict pinned /sys/fs/bpf/sock_ops_map
```
这是一个 BPF 的加载脚本。它的主要功能是加载和附加 BPF 程序到内核系统中,并将关联的 BPF map 一并存储pin到 BPF 文件系统中,以便 BPF 程序能访问和操作这些 map。
让我们详细地看一下脚本的每一行是做什么的。
1. `sudo mount -t bpf bpf /sys/fs/bpf/` 这一行用于挂载 BPF 文件系统,使得 BPF 程序和相关的 map 可以被系统访问和操作。
2. 判断条件 `[ -e "/sys/fs/bpf/bpf_sockops" ]` 是检查是否已经存在 `/sys/fs/bpf/bpf_sockops` 文件,如果存在,则说明 `bpf_sockops` 程序已经被加载到系统中,那么将会通过 `./unload.sh` 脚本将其卸载。
3. `sudo bpftool prog load bpf_contrack.bpf.o /sys/fs/bpf/bpf_sockops type sockops pinmaps /sys/fs/bpf/` 这一行是加载上文中 `bpf_contrack.bpf.c` 编译得到的 BPF 对象文件 `bpf_contrack.bpf.o` 到 BPF 文件系统中,存储至 `/sys/fs/bpf/bpf_sockops`,并且指定它的类型为 `sockops``pinmaps /sys/fs/bpf/` 是指定将加载的 BPF 程序相关的 map 存储在 `/sys/fs/bpf/` 下。
4. `sudo bpftool cgroup attach "/sys/fs/cgroup/" sock_ops pinned "/sys/fs/bpf/bpf_sockops"` 这一行是将已经加载到 BPF 文件系统的 `bpf_sockops` 程序附加到 cgroup此路径为"/sys/fs/cgroup/")。附加后,所有属于这个 cgroup 的套接字操作都会受到 `bpf_sockops` 的影响。
5. `sudo bpftool prog load bpf_redirect.bpf.o "/sys/fs/bpf/bpf_redir" map name sock_ops_map pinned "/sys/fs/bpf/sock_ops_map"` 这一行是加载 `bpf_redirect.bpf.c` 编译得到的 BPF 对象文件 `bpf_redirect.bpf.o` 到 BPF 文件系统中,存储至 `/sys/fs/bpf/bpf_redir` ,并且指定它的相关 map为 `sock_ops_map`这个map在 `/sys/fs/bpf/sock_ops_map` 中。
6. `sudo bpftool prog attach pinned /sys/fs/bpf/bpf_redir msg_verdict pinned /sys/fs/bpf/sock_ops_map` 这一行是将已经加载的 `bpf_redir` 附加到 `sock_ops_map` 上,附加方式为 `msg_verdict`,表示当该 map 对应的套接字收到消息时,将会调用 `bpf_redir` 程序处理。
综上,此脚本的主要作用就是将两个用于处理本地套接字流量的 BPF 程序分别加载到系统并附加到正确的位置,以便它们能被正确地调用,并且确保它们可以访问和操作相关的 BPF map。
您可以使用 [bpftool utility](https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool-prog.rst) 检查这两个 eBPF 程序是否已经加载。
```console
@@ -54,6 +204,7 @@ iperf3 -c 127.0.0.1 -t 10 -l 64k -p 5001
### 收集追踪
查看``sock_ops``追踪本地连接建立
```console
$ ./trace_bpf_output.sh
iperf3-9516 [001] .... 22500.634108: 0: <<< ipv4 op = 4, port 18583 --> 4135
@@ -66,9 +217,9 @@ iperf3-9516 [001] ..s1 22500.634536: 0: <<< ipv4 op = 5, port 4135 --> 19095
此外,当``sk_msg``生效后可以发现当使用tcpdump捕捉本地lo设备流量时只能捕获三次握手和四次挥手流量而iperf数据流量没有被捕获到。如果捕获到iperf数据流量那么 eBPF 程序可能没有正确地附加上。
```console
$ ./trace_lo_traffic.sh
$ ./trace_lo_traffic.sh # 实际上就是 sudo cat /sys/kernel/debug/tracing/trace_pipe
# 三次握手
13:24:07.181804 IP localhost.46506 > localhost.5001: Flags [S], seq 620239881, win 65495, options [mss 65495,sackOK,TS val 1982813394 ecr 0,nop,wscale 7], length 0
13:24:07.181815 IP localhost.5001 > localhost.46506: Flags [S.], seq 1084484879, ack 620239882, win 65483, options [mss 65495,sackOK,TS val 1982813394 ecr 1982813394,nop,wscale 7], length 0
@@ -79,7 +230,6 @@ $ ./trace_lo_traffic.sh
13:24:12.479621 IP localhost.5001 > localhost.46506: Flags [.], ack 2, win 512, options [nop,nop,TS val 1982818692 ecr 1982818688], length 0
13:24:12.481265 IP localhost.5001 > localhost.46506: Flags [F.], seq 1, ack 2, win 512, options [nop,nop,TS val 1982818694 ecr 1982818688], length 0
13:24:12.481270 IP localhost.46506 > localhost.5001: Flags [.], ack 2, win 512, options [nop,nop,TS val 1982818694 ecr 1982818694], length 0
```
### 卸载 eBPF 程序
@@ -90,5 +240,7 @@ sudo ./unload.sh
## 参考资料
最后,如果您对 eBPF 技术感兴趣,并希望进一步了解和实践,可以访问我们的教程代码仓库 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 和教程网站 <https://eunomia.dev/zh/tutorials/。>
- <https://github.com/zachidan/ebpf-sockops>
- <https://github.com/merbridge/merbridge>

View File

@@ -1,32 +1,182 @@
# eBPF sockops Example
# eBPF Development Practices: Accelerating Network Request Forwarding with Sockops
## Performance Optimization using eBPF sockops
eBPF (Extended Berkeley Packet Filter) is a powerful feature in the Linux kernel that allows running, loading, and updating user-defined code without the need to modify the kernel source code or reboot the kernel. This capability makes eBPF widely used in various areas such as network and system performance analysis, packet filtering, security policies, etc.
Network connections are essentially communication between sockets. eBPF provides a [bpf_msg_redirect_hash](https://man7.org/linux/man-pages/man7/bpf-helpers.7.html) function, which allows packets sent by applications to be directly forwarded to the destination socket, greatly accelerating the packet processing flow in the kernel.
This tutorial will focus on the application of eBPF in the networking domain, specifically how to use sockops-type eBPF programs to accelerate the forwarding of local network requests. This application is often valuable in scenarios where software load balancers are used for request forwarding, such as using tools like Nginx or HAProxy.
Here, sock_map is the crucial part that records socket rules, i.e., based on the current packet information, a socket connection is selected from sock_map to forward the request. Therefore, it is necessary to save the socket information to sock_map at the hook of sockops or elsewhere, and provide a key-based rule (generally a four-tuple) to find the socket.
In many workloads, such as inter-service communication in a microservices architecture, the performance overhead of network requests made through the loopback interface can significantly impact the overall application performance. Since these requests have to go through the local network stack, their processing performance can become a bottleneck, especially in high-concurrency scenarios. To address this issue, sockops-type eBPF programs can be used to accelerate local request forwarding, providing functionality similar to direct memory access (DMA). Sockops programs can manage sockets in the kernel space and directly forward packets between sockets on the local machine, reducing the CPU time required for packet forwarding in the TCP/IP stack.
The Merbridge project uses eBPF instead of iptables to accelerate Istio. After using Merbridge (eBPF) optimization, inbound and outbound traffic will bypass many kernel modules, significantly improving performance, as shown in the figure below:
This tutorial will demonstrate how to use sockops-type eBPF programs to accelerate network request forwarding through a specific example. To help you understand how to use sockops programs, we will step by step introduce the code of the example program and discuss the working principle of each part. The complete source code and project can be found at <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/29-sockops>.
## Leveraging eBPF Sockops for Performance Optimization
Network connections are essentially communication between sockets, and eBPF provides a `bpf_msg_redirect_hash` function that allows packets sent by an application to be directly forwarded to the corresponding socket on the recipient side, greatly accelerating the packet processing flow in the kernel.
Here, the `sock_map` is a key component that stores socket rules, i.e., it selects an existing socket connection from the `sock_map` based on the current packet information. Therefore, it is necessary to save the socket information to the `sock_map` at the hook of the sockops or elsewhere and provide a rule (usually a four-tuple) to find the socket based on the key.
The Merbridge project has achieved acceleration for Istio by replacing iptables with eBPF. After using Merbridge (eBPF) optimization, the inbound and outbound traffic bypasses many kernel modules, significantly improving performance, as shown in the following diagram:
![merbridge](merbridge.png)
## Running the Example
## Example Program
This example program redirects traffic from the sender's socket (outbound) to the receiver's socket (inbound), **bypassing the TCP/IP kernel network stack**. In this example, we assume that the sender and receiver are running on the **same machine**.
This example program redirects traffic from the senders socket (outgoing) to the recipients socket (incoming), bypassing the TCP/IP kernel network stack. In this example, we assume that the sender and recipient are both running on the **same** machine. This example program has two parts that share a map definition:
bpf_sockmap.h
```c
#include "vmlinux.h"
#include <bpf/bpf_endian.h>
#include <bpf/bpf_helpers.h>
#define LOCALHOST_IPV4 16777343
struct sock_key {
__u32 sip;
__u32 dip;
__u32 sport;
__u32 dport;
__u32 family;
};
struct {
__uint(type, BPF_MAP_TYPE_SOCKHASH);
__uint(max_entries, 65535);
__type(key, struct sock_key);
__type(value, int);
} sock_ops_map SEC(".maps");
```
The BPF program in this example is divided into two parts: `bpf_redirect.bpf.c` and `bpf_contrack.bpf.c`.
- The BPF code in `bpf_contrack.bpf.c` defines a socket operation (`sockops`) program, whose main function is to create an entry in the `sock_ops_map` BPF map in which it stores the five-tuple (source address, destination address, source port, destination port, protocol) for each new TCP connection established on the local machine (using localhost). This BPF map is defined as type `BPF_MAP_TYPE_SOCKHASH` and can store sockets and their corresponding five-tuple. This allows the five-tuple information of each local TCP connection to be found in the BPF map whenever the connection is created.
- The BPF code in `bpf_redirect.bpf.c` defines a sk_msg handler that is called when a message arrives on a local socket. The sk_msg program checks if the message is from a local address, and if so, it retrieves the five-tuple (source address, destination address, source port, destination port, protocol) from the message and looks up the corresponding socket in the `sock_ops_map` using the obtained key. Then, it redirects the message to the socket found in the `sock_ops_map`, thus bypassing the kernel network stack and directly delivering the message from the sender's socket to the receiver's socket.
For example, let's assume that there are two processes running locally, process A binds to port 8000, and process B binds to port 9000. Process A sends a message to process B.
1. When the TCP connection is first established between process A and process B, the `sockops` program in `bpf_contrack.bpf.c` is triggered, and it creates an entry in the `sock_ops_map` BPF map for the five-tuple `{127.0.0.1, 127.0.0.1, 8000, 9000, TCP}`, with the value being the socket of process A.
2. When process A sends a message, the `sk_msg` program in `bpf_redirect.bpf.c` is triggered, and it redirects the message from process A's socket to the socket stored in the `sock_ops_map` based on the obtained five-tuple information (source address, destination address, source port, destination port, protocol). As a result, the message is directly delivered from process A to process B, bypassing the kernel network stack.
This example program uses BPF to efficiently redirect messages from the sender's socket to the recipient's socket during local communication, bypassing the kernel network stack to improve transmission efficiency.
bpf_redirect.bpf.c
```c
#include "bpf_sockmap.h"
char LICENSE[] SEC("license") = "Dual BSD/GPL";
SEC("sk_msg")
int bpf_redir(struct sk_msg_md *msg)
{
if(msg->remote_ip4 != LOCALHOST_IPV4 || msg->local_ip4!= LOCALHOST_IPV4)
return SK_PASS;
struct sock_key key = {
.sip = msg->remote_ip4,
.dip = msg->local_ip4,
.dport = bpf_htonl(msg->local_port), /* convert to network byte order */
.sport = msg->remote_port,
.family = msg->family,
};
return bpf_msg_redirect_hash(msg, &sock_ops_map, &key, BPF_F_INGRESS);
}
```
bpf_contrack.bpf.c
```c
#include "bpf_sockmap.h"
char LICENSE[] SEC("license") = "Dual BSD/GPL";
SEC("sockops")
int bpf_sockops_handler(struct bpf_sock_ops *skops){
u32 family, op;
family = skops->family;
op = skops->op;
if (op != BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
&& op != BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB) {
return BPF_OK;
}
if(skops->remote_ip4 != LOCALHOST_IPV4 || skops->local_ip4!= LOCALHOST_IPV4) {
return BPF_OK;
}
struct sock_key key = {
.dip = skops->remote_ip4,
.sip = skops->local_ip4,
.sport = bpf_htonl(skops->local_port), /* convert to network byte order */
.dport = skops->remote_port,
.family = skops->family,
};
bpf_printk(">>> new connection: OP:%d, PORT:%d --> %d\n", op, bpf_ntohl(key.sport), bpf_ntohl(key.dport));
bpf_sock_hash_update(skops, &sock_ops_map, &key, BPF_NOEXIST);
return BPF_OK;
}
```
### Compiling the eBPF Program
Here, we use libbpf to compile the eBPF program. The complete source code and project can be found at <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/29-sockops>.
```shell
# Compile the bpf_sockops program
# Compile the bpf program with libbpf
make
```
### Loading the eBPF Program
We have created a script to load the eBPF program, which will automatically load both eBPF programs and create a BPF map:
```shell
sudo ./load.sh
```
This script actually performs the following operations:
```sh
#!/bin/bash
set -x
set -e
sudo mount -t bpf bpf /sys/fs/bpf/
# check if old program already loaded
if [ -e "/sys/fs/bpf/bpf_sockops" ]; then
echo ">>> bpf_sockops already loaded, uninstalling..."
./unload.sh
echo ">>> old program already deleted..."
fi
# load and attach sock_ops program
sudo bpftool prog load bpf_contrack.bpf.o /sys/fs/bpf/bpf_sockops type sockops pinmaps /sys/fs/bpf/
sudo bpftool cgroup attach "/sys/fs/cgroup/" sock_ops pinned "/sys/fs/bpf/bpf_sockops"
# load and attach sk_msg program
sudo bpftool prog load bpf_redirect.bpf.o "/sys/fs/bpf/bpf_redir" map name sock_ops_map pinned "/sys/fs/bpf/sock_ops_map"
sudo bpftool prog attach pinned /sys/fs/bpf/bpf_redir msg_verdict pinned /sys/fs/bpf/sock_ops_map
```
This is a script for loading BPF programs. Its main function is to load and attach BPF programs to the kernel system, and store the associated BPF maps in the BPF file system so that the BPF programs can access and operate on these maps.
Let's take a detailed look at what each line of the script does.
1. `sudo mount -t bpf bpf /sys/fs/bpf/` mounts the BPF file system, enabling access to and operation on BPF programs and related maps by the system.
2. The condition check `[ -e "/sys/fs/bpf/bpf_sockops" ]` checks whether the `/sys/fs/bpf/bpf_sockops` file already exists. If it does exist, it means that the `bpf_sockops` program has already been loaded into the system, so it will be uninstalled using the `./unload.sh` script.
3. `sudo bpftool prog load bpf_contrack.bpf.o /sys/fs/bpf/bpf_sockops type sockops pinmaps /sys/fs/bpf/` loads the BPF object file `bpf_contrack.bpf.o` compiled from the `bpf_contrack.bpf.c` into the BPF file system, storing it in `/sys/fs/bpf/bpf_sockops`, and specifying its type as `sockops`. `pinmaps /sys/fs/bpf/` specifies that the BPF maps associated with the loaded BPF program will be stored under `/sys/fs/bpf/`.
4. `sudo bpftool cgroup attach "/sys/fs/cgroup/" sock_ops pinned "/sys/fs/bpf/bpf_sockops"` attaches the `bpf_sockops` program that has been loaded into the BPF file system to the cgroup (the path is `"/sys/fs/cgroup/"`). After the attachment, all socket operations belonging to this cgroup will be affected by the `bpf_sockops` program.
5. `sudo bpftool prog load bpf_redirect.bpf.o "/sys/fs/bpf/bpf_redir" map name sock_ops_map pinned "/sys/fs/bpf/sock_ops_map"` loads the BPF object file `bpf_redirect.bpf.o` compiled from `bpf_redirect.bpf.c` into the BPF file system, storing it in `/sys/fs/bpf/bpf_redir`, and specifying the associated map as `sock_ops_map`, which is located in `/sys/fs/bpf/sock_ops_map`.
6. `sudo bpftool prog attach pinned /sys/fs/bpf/bpf_redir msg_verdict pinned /sys/fs/bpf/sock_ops_map` attaches the already loaded `bpf_redir` program to the `sock_ops_map` using the `msg_verdict` attachment type, which means that when the socket associated with this map receives a message, the `bpf_redir` program will be called to handle it.
In summary, the main function of this script is to load the two BPF programs used to process local socket traffic into the system and attach them to the correct locations so that they can be correctly called, ensuring that they can access and manipulate the associated BPF maps.
You can use the [bpftool utility](https://github.com/torvalds/linux/blob/master/tools/bpf/bpftool/Documentation/bpftool-prog.rst) to check if these two eBPF programs have been loaded.
```console
@@ -39,13 +189,13 @@ $ sudo bpftool prog show
xlated 304B jited 233B memlock 4096B map_ids 58
```
### Running the [iperf3](https://iperf.fr/) Server
### Running the iperf3 Server
```shell
iperf3 -s -p 5001
```
### Running the [iperf3](https://iperf.fr/) Client
### Running the iperf3 Client
```shell
iperf3 -c 127.0.0.1 -t 10 -l 64k -p 5001
@@ -53,7 +203,8 @@ iperf3 -c 127.0.0.1 -t 10 -l 64k -p 5001
### Collecting Traces
Show connection setup tracing for localhost connection.
Check the `sock_ops` trace for local connection establishments.
```console
$ ./trace_bpf_output.sh
iperf3-9516 [001] .... 22500.634108: 0: <<< ipv4 op = 4, port 18583 --> 4135
@@ -62,25 +213,25 @@ iperf3-9516 [001] .... 22500.634523: 0: <<< ipv4 op = 4, port 19095 --> 4135
iperf3-9516 [001] ..s1 22500.634536: 0: <<< ipv4 op = 5, port 4135 --> 19095
```
When ``iperf3 -c`` creates a connection, you should see the above events for socket setup. If you do not see any events, then the eBPF program may not have been properly attached.
When the connection is established between `iperf3 -c` and the server, you should see the events above for socket establishment. If you don't see any events, then the eBPF programs may not have been attached correctly.
In addition, when ``sk_msg`` is enabled, it can be observed that when using tcpdump to capture the traffic on the local lo device, only the three-way handshake and the four-way handshake traffic are captured, but the iperf data traffic is not captured. If the iperf data traffic is captured, then the eBPF program may not have been properly attached.
Furthermore, when `sk_msg` takes effect, you should observe that when capturing local traffic on the loopback interface using tcpdump, only the three-way handshake and four-way termination traffic are captured, and the actual data flow of iperf is not captured. If the iperf data flow is captured, then the eBPF programs may not have been attached correctly.
```console
$ ./trace_lo_traffic.sh
# three-way handshake
$ ./trace_lo_traffic.sh # which is basically sudo cat /sys/kernel/debug/tracing/trace_pipe
# Three-way handshake
13:24:07.181804 IP localhost.46506 > localhost.5001: Flags [S], seq 620239881, win 65495, options [mss 65495,sackOK,TS val 1982813394 ecr 0,nop,wscale 7], length 0
13:24:07.181815 IP localhost.5001 > localhost.46506: Flags [S.], seq 1084484879, ack 620239882, win 65483, options [mss 65495,sackOK,TS val 1982813394 ecr 1982813394,nop,wscale 7], length 0
13:24:07.181832 IP localhost.46506 > localhost.5001: Flags [.], ack 1, win 512, options [nop,nop,TS val 1982813394 ecr 1982813394], length 0
# four-way handshake traffic
# Four-way termination
13:24:12.475649 IP localhost.46506 > localhost.5001: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 1982818688 ecr 1982813394], length 0
13:24:12.479621 IP localhost.5001 > localhost.46506: Flags [.], ack 2, win 512, options [nop,nop,TS val 1982818692 ecr 1982818688], length 0
13:24:12.481265 IP localhost.5001 > localhost.46506: Flags [F.], seq 1, ack 2, win 512, options [nop,nop,TS val 1982818694 ecr 1982818688], length 0
13:24:12.481270 IP localhost.46506 > localhost.5001: Flags [.], ack 2, win 512, options [nop,nop,TS val 1982818694 ecr 1982818694], length 0
```
### Unloading the eBPF Program
```shell
@@ -89,4 +240,7 @@ sudo ./unload.sh
## References
- <https://github.com/zachidan/ebpf-sockops>- <https://github.com/merbridge/merbridge>
Finally, if you are interested in eBPF technology and want to learn more and practice further, you can visit our tutorial code repository at <https://github.com/eunomia-bpf/bpf-developer-tutorial> and the tutorial website at <https://eunomia.dev/>.
- <https://github.com/zachidan/ebpf-sockops>
- <https://github.com/merbridge/merbridge>

View File

@@ -4,7 +4,7 @@
但现在,我们有了新的解决方案。使用 eBPF 技术,通过其能力在用户空间进行探测,提供了一种方法重新获得明文数据,使得我们可以直观地查看加密前的通信内容。然而,每个应用可能使用不同的库,每个库都有多个版本,这种多样性给跟踪带来了复杂性。
在本教程中,我们将带您了解一种跨多种用户态 SSL/TLS 库的 eBPF 追踪技术,它不仅可以同时跟踪 GnuTLS 和 OpenSSL 等用户态库,而且相比以往,大大降低了对新版本库的维护工作。
在本教程中,我们将带您了解一种跨多种用户态 SSL/TLS 库的 eBPF 追踪技术,它不仅可以同时跟踪 GnuTLS 和 OpenSSL 等用户态库,而且相比以往,大大降低了对新版本库的维护工作。完整的源代码可以在这里查看:<https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/30-sslsniff>。
## 背景知识

View File

@@ -4,7 +4,7 @@ With the widespread use of TLS in modern network environments, tracing microserv
However, a new solution is now available. Through the use of eBPF technology and its capability to perform probing in user space, a method has emerged to regain plain text data, allowing us to intuitively view the pre-encrypted communication content. Nevertheless, each application might utilize different libraries, and each library comes in multiple versions, introducing complexity to the tracking process.
In this tutorial, we will guide you through an eBPF tracing technique that spans across various user-space SSL/TLS libraries. This technique not only allows simultaneous tracing of user-space libraries like GnuTLS and OpenSSL but also significantly reduces maintenance efforts for new library versions compared to previous methods.
In this tutorial, we will guide you through an eBPF tracing technique that spans across various user-space SSL/TLS libraries. This technique not only allows simultaneous tracing of user-space libraries like GnuTLS and OpenSSL but also significantly reduces maintenance efforts for new library versions compared to previous methods. The complete code for this tutorial can be found in <完整的源代码可以在这里查看:<https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/30-sslsniff>
## Background Knowledge

View File

@@ -1,20 +1,201 @@
# ebpf modify syscall parameters
# eBPF 开发实践:使用 eBPF 修改系统调用参数
## modify open filename
eBPF扩展的伯克利数据包过滤器是 Linux 内核中的一个强大功能,可以在无需更改内核源代码或重启内核的情况下,运行、加载和更新用户定义的代码。这种功能让 eBPF 在网络和系统性能分析、数据包过滤、安全策略等方面有了广泛的应用。
```bash
make
./victim
本教程介绍了如何使用 eBPF 修改正在进行的系统调用参数。这种技术可以用作安全审计、系统监视、或甚至恶意行为。然而需要特别注意,篡改系统调用参数可能对系统的稳定性和安全性带来负面影响,因此必须谨慎使用。实现这个功能需要使用到 eBPF 的 `bpf_probe_write_user` 功能,它可以修改用户空间的内存,因此能用来修改系统调用参数,在内核读取用户空间内存之前,将其修改为我们想要的值。
本文的完整代码可以在 <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/34-syscall/> 找到。
## 修改 open 系统调用的文件名
此功能用于修改 `openat` 系统调用的参数,让它打开一个不同的文件。这个功能可能可以用于:
1. **文件访问审计**:在对法律合规性和数据安全性有严格要求的环境中,审计员可能需要记录所有对敏感文件的访问行为。通过修改 `openat` 系统调用参数,可以将所有尝试访问某个敏感文件的行为重定向到一个备份文件或者日志文件。
2. **安全沙盒**:在开发早期阶段,可能希望监控应用程序尝试打开的文件。通过更改 `openat` 调用,可以让应用在一个安全的沙盒环境中运行,所有文件操作都被重定向到一个隔离的文件系统路径。
3. **敏感数据保护**:对于存储有敏感信息的文件,例如配置文件中包含有数据库密码,一个基于 eBPF 的系统可以将这些调用重定向到一个加密的或暂存的位置,以增强数据安全性。
如果该技术被恶意软件利用,攻击者可以重定向文件操作,导致数据泄漏或者破坏数据完整性。例如,程序写入日志文件时,攻击者可能将数据重定向到控制的文件中,干扰审计跟踪。
内核态代码(部分,完整内容请参考 Github bpf-developer-tutorial
```c
SEC("tracepoint/syscalls/sys_enter_openat")
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter *ctx)
{
u64 pid = bpf_get_current_pid_tgid() >> 32;
/* use kernel terminology here for tgid/pid: */
if (target_pid && pid != target_pid) {
return 0;
}
/* store arg info for later lookup */
// since we can manually specify the attach process in userspace,
// we don't need to check the process allowed here
struct args_t args = {};
args.fname = (const char *)ctx->args[1];
args.flags = (int)ctx->args[2];
if (rewrite) {
bpf_probe_write_user((char*)ctx->args[1], "hijacked", 9);
}
bpf_map_update_elem(&start, &pid, &args, 0);
return 0;
}
```
分析内核态代码:
- `bpf_get_current_pid_tgid()` 获取当前进程ID。
- 如果指定了 `target_pid` 并且不匹配当前进程ID函数直接返回。
- 我们创建一个 `args_t` 结构来存储文件名和标志。
- 使用 `bpf_probe_write_user` 修改用户空间内存中的文件名为 "hijacked"。
eunomia-bpf 是一个开源的 eBPF 动态加载运行时和开发工具链,它的目的是简化 eBPF 程序的开发、构建、分发、运行。可以参考 <https://github.com/eunomia-bpf/eunomia-bpf> 或 <https://eunomia.dev/tutorials/1-helloworld/> 下载和安装 ecc 编译工具链和 ecli 运行时。我们使用 eunomia-bpf 编译运行这个例子。
编译:
```bash
sudo ./ecli run package.json -- --rewrite --target_pid=$(pidof victim)
./ecc open_modify.bpf.c open_modify.h
```
## modify exec commands
使用 make 构建一个简单的 victim 程序,用来测试:
TODO
```c
int main()
{
char filename[100] = "my_test.txt";
// print pid
int pid = getpid();
std::cout << "current pid: " << pid << std::endl;
system("echo \"hello\" > my_test.txt");
system("echo \"world\" >> hijacked");
while (true) {
std::cout << "Opening my_test.txt" << std::endl;
## reference
int fd = open(filename, O_RDONLY);
assert(fd != -1);
- <https://github.com/pathtofile/bad-bpf/blob/main/src/exechijack.bpf.c>
std::cout << "test.txt opened, fd=" << fd << std::endl;
usleep(1000 * 300);
// print the file content
char buf[100] = {0};
int ret = read(fd, buf, 5);
std::cout << "read " << ret << " bytes: " << buf << std::endl;
std::cout << "Closing test.txt..." << std::endl;
close(fd);
std::cout << "test.txt closed" << std::endl;
}
return 0;
}
```
测试代码编译并运行:
```sh
$ ./victim
test.txt opened, fd=3
read 5 bytes: hello
Closing test.txt...
test.txt closed
```
可以使用以下命令指定应修改其 `openat` 系统调用参数的目标进程ID
```bash
sudo ./ecli run package.json --rewrite --target_pid=$(pidof victim)
```
然后就会发现输出变成了 world可以看到我们原先想要打开 "my_test.txt" 文件,但是实际上被劫持打开了 hijacked 文件:
```console
test.txt opened, fd=3
read 5 bytes: hello
Closing test.txt...
test.txt closed
Opening my_test.txt
test.txt opened, fd=3
read 5 bytes: world
Closing test.txt...
test.txt closed
Opening my_test.txt
test.txt opened, fd=3
read 5 bytes: world
```
包含测试用例的完整代码可以在 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 找到。
## 修改 bash execve 的进程名称
这段功能用于当 `execve` 系统调用进行时修改执行程序名称。在一些审计或监控场景,这可能用于记录特定进程的行为或修改其行为。然而,此类篡改可能会造成混淆,使得用户或管理员难以确定系统实际执行的程序是什么。最严重的风险是,如果恶意用户能够控制 eBPF 程序,他们可以将合法的系统命令重定向到恶意软件,造成严重的安全威胁。
```c
SEC("tp/syscalls/sys_enter_execve")
int handle_execve_enter(struct trace_event_raw_sys_enter *ctx)
{
size_t pid_tgid = bpf_get_current_pid_tgid();
// Check if we're a process of interest
if (target_ppid != 0) {
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
int ppid = BPF_CORE_READ(task, real_parent, tgid);
if (ppid != target_ppid) {
return 0;
}
}
// Read in program from first arg of execve
char prog_name[TASK_COMM_LEN];
char prog_name_orig[TASK_COMM_LEN];
__builtin_memset(prog_name, '\x00', TASK_COMM_LEN);
bpf_probe_read_user(&prog_name, TASK_COMM_LEN, (void*)ctx->args[0]);
bpf_probe_read_user(&prog_name_orig, TASK_COMM_LEN, (void*)ctx->args[0]);
prog_name[TASK_COMM_LEN-1] = '\x00';
bpf_printk("[EXECVE_HIJACK] %s\n", prog_name);
// Program can't be less than out two-char name
if (prog_name[1] == '\x00') {
bpf_printk("[EXECVE_HIJACK] program name too small\n");
return 0;
}
// Attempt to overwrite with hijacked binary path
prog_name[0] = '/';
prog_name[1] = 'a';
for (int i = 2; i < TASK_COMM_LEN ; i++) {
prog_name[i] = '\x00';
}
long ret = bpf_probe_write_user((void*)ctx->args[0], &prog_name, 3);
// Send an event
struct event *e;
e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
if (e) {
e->success = (ret == 0);
e->pid = (pid_tgid >> 32);
for (int i = 0; i < TASK_COMM_LEN; i++) {
e->comm[i] = prog_name_orig[i];
}
bpf_ringbuf_submit(e, 0);
}
return 0;
}
```
分析内核态代码:
- 执行 `bpf_get_current_pid_tgid` 获取当前进程ID和线程组ID。
- 如果设置了 `target_ppid`代码会检查当前进程的父进程ID是否匹配。
- 读取第一个 `execve` 参数到 `prog_name`,这通常是将要执行的程序的路径。
- 通过 `bpf_probe_write_user` 重写这个参数,使得系统实际执行的是一个不同的程序。
这种做法的风险在于它可以被用于劫持软件的行为,导致系统运行恶意代码。同样也可以使用 ecc 和 ecli 编译运行:
```bash
./ecc exechijack.bpf.c exechijack.h
sudo ./ecli run package.json
```
## 总结
eBPF 提供了强大的能力来实现对正在运行的系统进行实时监控和干预。在合适的监管和安全策略配合下,这可以带来诸多好处,如安全增强、性能优化和运维便利。然而,这项技术的使用必须非常小心,因为错误的操作或滥用可能会对系统的正常运作造成破坏或者引发严重的安全事件。实践中,应确保只有授权用户和程序能够部署和管理 eBPF 程序并且应当在隔离的测试环境中验证这些eBPF程序的行为在充分理解其影响后才能将其应用到生产环境中。
您还可以访问我们的教程代码仓库 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 或网站 <https://eunomia.dev/zh/tutorials/> 以获取更多示例和完整的教程。

199
src/34-syscall/README_en.md Normal file
View File

@@ -0,0 +1,199 @@
# eBPF Development Practice: Modifying System Call Arguments with eBPF
eBPF (Extended Berkeley Packet Filter) is a powerful feature in the Linux kernel that allows user-defined code to be run, loaded, and updated without the need to modify kernel source code or reboot the kernel. This functionality has enabled a wide range of applications for eBPF, such as network and system performance analysis, packet filtering, and security policies.
In this tutorial, we will explore how to use eBPF to modify the arguments of a running system call. This technique can be used for security auditing, system monitoring, or even malicious behavior. However, it is important to note that modifying system call arguments can have negative implications for system stability and security, so caution must be exercised. To implement this functionality, we will use the `bpf_probe_write_user` feature of eBPF, which allows us to modify memory in the user space and therefore modify system call arguments before the kernel reads them from user space.
The complete code for this tutorial can be found in the <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/34-syscall/> repository on GitHub.
## Modifying the File Name of the `open` System Call
This functionality is used to modify the arguments of the `openat` system call to open a different file. This technique can be useful for:
1. **File Access Auditing**: In environments with strict legal and data security requirements, auditors may need to record access to sensitive files. By modifying the `openat` system call arguments, all attempts to access a specific sensitive file can be redirected to a backup file or a log file.
2. **Secure Sandbox**: In the early stages of development, it may be desirable to monitor the files accessed by an application. By changing the `openat` calls, the application can be run in a secure sandbox environment where all file operations are redirected to an isolated file system path.
3. **Sensitive Data Protection**: For files containing sensitive information, such as a configuration file that contains database passwords, a eBPF-based system can redirect those calls to an encrypted or temporary location to enhance data security.
If leveraged by malicious software, this technique can be used to redirect file operations resulting in data leaks or compromise data integrity. For example, when a program is writing to a log file, an attacker could redirect the data to a controlled file, disrupting the audit trail.
Kernel code (partial code, see complete code on Github bpf-developer-tutorial):
```c
SEC("tracepoint/syscalls/sys_enter_openat")
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter *ctx)
{
u64 pid = bpf_get_current_pid_tgid() >> 32;
/* use kernel terminology here for tgid/pid: */
if (target_pid && pid != target_pid) {
return 0;
}
/* store arg info for later lookup */
// since we can manually specify the attach process in userspace,
// we don't need to check the process allowed here
struct args_t args = {};
args.fname = (const char *)ctx->args[1];
args.flags = (int)ctx->args[2];
if (rewrite) {
bpf_probe_write_user((char*)ctx->args[1], "hijacked", 9);
}
bpf_map_update_elem(&start, &pid, &args, 0);
return 0;
}
```
Analysis of the kernel code:
- `bpf_get_current_pid_tgid()` retrieves the current process ID.
- If `target_pid` is specified and does not match the current process ID, the function returns 0 and does not execute further.
- We create an `args_t` structure to store the file name and flags.
- We use `bpf_probe_write_user` to modify the file name in the user space memory to "hijacked".
The `eunomia-bpf` is an open-source eBPF dynamic loading runtime and development toolchain aimed at making eBPF program development, building, distribution, and execution easier. You can refer to <https://github.com/eunomia-bpf/eunomia-bpf> or <https://eunomia.dev/tutorials/1-helloworld/> for installing ecc compiler toolchain and ecli runtime. We will use `eunomia-bpf` to compile and run this example.
Compile the code:
```bash
./ecc open_modify.bpf.c open_modify.h
```
Build a simple victim program using make for testing:
```c
int main()
{
char filename[100] = "my_test.txt";
// print pid
int pid = getpid();
std::cout << "current pid: " << pid << std::endl;
system("echo \"hello\" > my_test.txt");
system("echo \"world\" >> hijacked");
while (true) {
std::cout << "Opening my_test.txt" << std::endl;
int fd = open(filename, O_RDONLY);
assert(fd != -1);
std::cout << "test.txt opened, fd=" << fd << std::endl;
usleep(1000 * 300);
// print the file content
char buf[100] = {0};
int ret = read(fd, buf, 5);
std::cout << "read " << ret << " bytes: " << buf << std::endl;
std::cout << "Closing test.txt..." << std::endl;
close(fd);
std::cout << "test.txt closed" << std::endl;
}
return 0;
}
```
Compile and run the test code:
```sh
$ ./victim
test.txt opened, fd=3
read 5 bytes: hello
Closing test.txt...
test.txt closed
```
Use the following command to specify the target process ID to modify the `openat` system call arguments:
```bash
sudo ./ecli run package.json --rewrite --target_pid=$(pidof victim)
```
You will see that the output changes to "world". Instead of opening the "my_test.txt" file, it opens the "hijacked" file:
```console
test.txt opened, fd=3
read 5 bytes: hello
Closing test.txt...
test.txt closed
Opening my_test.txt
test.txt opened, fd=3
read 5 bytes: world
Closing test.txt...
test.txt closed
Opening my_test.txt
test.txt opened, fd=3
read 5 bytes: world
```
The complete code with test cases can be found in the <https://github.com/eunomia-bpf/bpf-developer-tutorial> repository.
## Modifying the Process Name of bash `execve`
This functionality is used to modify the program name when the `execve` system call is made. In certain auditing or monitoring scenarios, this may be used to track the behavior of specific processes or modify their behavior. However, such modifications can lead to confusion and make it difficult for users or administrators to determine the actual program being executed by the system. The most serious risk is that if malicious users are able to control the eBPF program, they could redirect legitimate system commands to malicious software, resulting in a significant security threat.
```c
SEC("tp/syscalls/sys_enter_execve")
int handle_execve_enter(struct trace_event_raw_sys_enter *ctx)
{
size_t pid_tgid = bpf_get_current_pid_tgid();
// Check if we're a process of interest
if (target_ppid != 0) {
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
int ppid = BPF_CORE_READ(task, real_parent, tgid);
if (ppid != target_ppid) {
return 0;
}
}
// Read in program from first arg of execve
char prog_name[TASK_COMM_LEN];
char prog_name_orig[TASK_COMM_LEN];
__builtin_memset(prog_name, '\x00', TASK_COMM_LEN);
bpf_probe_read_user(&prog_name, TASK_COMM_LEN, (void*)ctx->args[0]);
bpf_probe_read_user(&prog_name_orig, TASK_COMM_LEN, (void*)ctx->args[0]);
prog_name[TASK_COMM_LEN-1] = '\x00';
bpf_printk("[EXECVE_HIJACK] %s\n", prog_name);
// Program can't be less than out two-char name
if (prog_name[1] == '\x00') {
bpf_printk("[EXECVE_HIJACK] program name too small\n");
return 0;
}
// Attempt to overwrite with hijacked binary path
prog_name[0] = '/';
prog_name[1] = 'a';
for (int i = 2; i < TASK_COMM_LEN ; i++) {
prog_name[i] = '\x00';
}
long ret = bpf_probe_write_user((void*)ctx->args[0], &prog_name, 3);
// Send an event
struct event *e;
e = bpf_ringbuf_reserve(&rb, sizeof(*e), 0);
if (e) {
e->success = (ret == 0);
e->pid = (pid_tgid >> 32);
for (int i = 0; i < TASK_COMM_LEN; i++) {
e->comm[i] = prog_name_orig[i];
}
bpf_ringbuf_submit(e, 0);
}
return 0;
}
```
Analysis of the kernel code:
- Execute `bpf_get_current_pid_tgid` to get the current process ID and thread group ID.
- If `target_ppid` is set, the code checks if the current process's parent process ID matches.
- Read the program name from the first argument of `execve`.
- Use `bpf_probe_write_user` to overwrite the argument with a hijacked binary path.
This approach poses a risk as it can be leveraged to hijack the behavior of software, resulting in the execution of malicious code on the system. Using ecc and ecli to compile and run:
```bash
./ecc exechijack.bpf.c exechijack.h
sudo ./ecli run package.json
```
## Conclusion
eBPF provides powerful capabilities for real-time monitoring and intervention in running systems. When used in conjunction with appropriate governance and security policies, this can bring many benefits such as enhanced security, performance optimization, and operational convenience. However, this technology must be used with great care as incorrect operations or misuse can result in system disruption or serious security incidents. In practice, it should be ensured that only authorized users and programs can deploy and manage eBPF programs, and their behavior should be validated in isolated test environments before they are applied in production.

View File

@@ -132,6 +132,70 @@ eBPF原本因其在内核空间的强大性能而被广泛认知但近年
从更宏观的角度看eBPF运行时和Wasm实际上可以被视为是相互补充的。尽管 eBPF 拥有出色的验证器机制来确保运行时安全性但由于其编程语言的局限性和相对较高的开发难度它并不总是适合作为业务逻辑的首选运行时。反之eBPF 更适用于像网络流量转发、可观测性和 livepatch 这样的高专业性任务。相对而言Wasm 运行时可以作为 Serverless 的运行时平台、插件系统和轻量级虚拟化等场景的首选。这两者都有自己的优势,但它们的选择取决于特定的用例和优先级。
## bpftime 快速入门
使用`bpftime`您可以使用熟悉的工具如clang和libbpf构建eBPF应用程序并在用户空间中执行它们。例如`malloc` eBPF程序使用uprobe跟踪malloc调用并使用哈希映射对其进行统计。
您可以参考[documents/build-and-test.md](https://eunomia.dev/bpftime/documents/build-and-test)上的构建项目的方法,或者使用来自[GitHub packages](https://github.com/eunomia-bpf/bpftime/pkgs/container/bpftime)的容器映像。
要开始请构建并运行一个基于libbpf的eBPF程序使用以下命令行
```console
make -C example/malloc # 构建示例的eBPF程序
bpftime load ./example/malloc/malloc
```
在另一个shell中运行带有eBPF的目标程序
```console
$ bpftime start ./example/malloc/victim
Hello malloc!
malloc called from pid 250215
continue malloc...
malloc called from pid 250215
```
您还可以动态地将eBPF程序附加到正在运行的进程上
```console
$ ./example/malloc/victim & echo $! # 进程ID为101771
[1] 101771
101771
continue malloc...
continue malloc...
```
然后附加到该进程:
```console
$ sudo bpftime attach 101771 # 您可能需要以root身份运行make install
Inject: "/root/.bpftime/libbpftime-agent.so"
成功注入。ID: 1
```
您可以看到原始程序的输出:
```console
$ bpftime load ./example/malloc/malloc
...
12:44:35
pid=247299 malloc calls: 10
pid=247322 malloc calls: 10
```
或者您也可以直接在内核eBPF中运行我们的示例eBPF程序以查看类似的输出
```console
$ sudo example/malloc/malloc
15:38:05
pid=30415 malloc calls: 1079
pid=30393 malloc calls: 203
pid=29882 malloc calls: 1076
pid=34809 malloc calls: 8
```
有关更多详细信息,请参阅[documents/usage.md](https://eunomia.dev/bpftime/documents/usage)。
## 总结与前景
用户空间的eBPF运行时正在打破边界将eBPF的能力从内核扩展到了更广阔的领域。这种扩展带来了显著的性能、灵活性和安全性提升。例如`bpftime`运行时显示了其在某些低级性能场景下,甚至超越了像 Wasm 这样的其他技术。也有越来越多的应用将用户空间的 eBPF 用于快速补丁、轻量级虚拟化、网络过滤等场景。

View File

@@ -129,6 +129,70 @@ For both technologies, reliance on underlying libraries for complex operations i
On the language support front, while eBPF's niche and specialized nature mean limited language support, Wasm boasts a broader language portfolio due to its origin and design for the web.
## bpftime Quick Start
With `bpftime`, you can build eBPF applications using familiar tools like clang and libbpf, and execute them in userspace. For instance, the `malloc` eBPF program traces malloc calls using uprobe and aggregates the counts using a hash map.
You can refer to [documents/build-and-test.md](https://eunomia.dev/bpftime/documents/build-and-test) for how to build the project, or using the container images from [GitHub packages](https://github.com/eunomia-bpf/bpftime/pkgs/container/bpftime).
To get started, you can build and run a libbpf based eBPF program starts with `bpftime` cli:
```console
make -C example/malloc # Build the eBPF program example
bpftime load ./example/malloc/malloc
```
In another shell, Run the target program with eBPF inside:
```console
$ bpftime start ./example/malloc/victim
Hello malloc!
malloc called from pid 250215
continue malloc...
malloc called from pid 250215
```
You can also dynamically attach the eBPF program with a running process:
```console
$ ./example/malloc/victim & echo $! # The pid is 101771
[1] 101771
101771
continue malloc...
continue malloc...
```
And attach to it:
```console
$ sudo bpftime attach 101771 # You may need to run make install in root
Inject: "/root/.bpftime/libbpftime-agent.so"
Successfully injected. ID: 1
```
You can see the output from original program:
```console
$ bpftime load ./example/malloc/malloc
...
12:44:35
pid=247299 malloc calls: 10
pid=247322 malloc calls: 10
```
Alternatively, you can also run our sample eBPF program directly in the kernel eBPF, to see the similar output:
```console
$ sudo example/malloc/malloc
15:38:05
pid=30415 malloc calls: 1079
pid=30393 malloc calls: 203
pid=29882 malloc calls: 1076
pid=34809 malloc calls: 8
```
See [documents/usage.md](https://eunomia.dev/bpftime/documents/usage) for more details.
## Conclusion
Userspace eBPF runtimes are an exciting development that expands the capabilities of eBPF beyond the kernel. As highlighted in this post, they offer compelling benefits like enhanced performance, flexibility, and security compared to kernel-based eBPF. Runtimes like bpftime demonstrate the potential for substantial speedups, even outperforming alternatives like Wasm runtimes in certain dimensions like low-level performance.

View File

@@ -49,7 +49,7 @@ char LICENSE[] SEC("license") = "GPL";
这个 eBPF 程序可以通过 libbpf 或 eunomia-bpf 等工具加载到内核并执行。它将捕获指定进程(或所有进程)的 sys_openat 系统调用,并在用户空间输出相关信息。
eunomia-bpf 是一个结合 Wasm 的开源 eBPF 动态加载运行时和开发工具链,它的目的是简化 eBPF 程序的开发、构建、分发、运行。可以参考 <https://github.com/eunomia-bpf/eunomia-bpf> 下载和安装 ecc 编译工具链和 ecli 运行时。我们使用 eunomia-bpf 编译运行这个例子。
eunomia-bpf 是一个结合 Wasm 的开源 eBPF 动态加载运行时和开发工具链,它的目的是简化 eBPF 程序的开发、构建、分发、运行。可以参考 <https://github.com/eunomia-bpf/eunomia-bpf> 下载和安装 ecc 编译工具链和 ecli 运行时。我们使用 eunomia-bpf 编译运行这个例子。完整代码请查看 <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/4-opensnoop> 。
编译运行上述代码:
@@ -100,7 +100,7 @@ See https://github.com/eunomia-bpf/eunomia-bpf for more information.
可以通过 `--pid_target` 选项来指定要捕获的进程的 pid例如
```console
$ sudo ./ecli run package.json --pid_target 618
$ sudo ./ecli run package.json --pid_target 618
Runing eBPF program...
```

View File

@@ -51,7 +51,7 @@ Translate the following Chinese text to English while maintaining the original f
"This eBPF program can be loaded into the kernel and executed using tools like libbpf or eunomia-bpf. It captures the sys_openat system call of the specified process (or all processes) and outputs relevant information in user-space.
eunomia-bpf is an open-source eBPF dynamic loading runtime and development toolchain combined with Wasm. Its purpose is to simplify the development, building, distribution, and execution of eBPF programs. You can refer to <https://github.com/eunomia-bpf/eunomia-bpf> to download and install the ecc compilation toolchain and ecli runtime. We will use eunomia-bpf to compile and run this example.
eunomia-bpf is an open-source eBPF dynamic loading runtime and development toolchain combined with Wasm. Its purpose is to simplify the development, building, distribution, and execution of eBPF programs. You can refer to <https://github.com/eunomia-bpf/eunomia-bpf> to download and install the ecc compilation toolchain and ecli runtime. We will use eunomia-bpf to compile and run this example. The complete code of this example can be found at <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/4-opensnoop> .
Compile and run the above code:

View File

@@ -46,14 +46,14 @@
- [使用 uprobe 捕获多种库的 SSL/TLS 明文数据](30-sslsniff/README.md)
- [使用 eBPF socket filter 或 syscall trace 追踪 HTTP 请求和其他七层协议](23-http/README.md)
- [使用 sockops 加速网络请求转发](29-sockops/README.md)
- [使用 eBPF 修改系统调用参数](34-syscall/README.md)
- [使用 eBPF 隐藏进程或文件信息](24-hide/README.md)
- [使用 bpf_send_signal 发送信号终止进程](25-signal/README.md)
- [使用 eBPF 添加 sudo 用户](26-sudo/README.md)
- [使用 eBPF 替换任意程序读取或写入的文本](27-replace/README.md)
- [BPF 的生命周期:使用 Detached 模式在用户态应用退出后持续运行 eBPF 程序](28-detach/README.md)
- [eBPF 运行时的安全性与面临的挑战](18-further-reading/ebpf-security.zh.md)
- [户空间 eBPF 运行时:深度解析与应用实践](src\36-userspace-ebpf\README.md)
- [使用 eBPF 修改系统调用参数](34-syscall/README.md)
- [用户空间 eBPF 运行时:深度解析与应用实践](36-userspace-ebpf/README.md)
# bcc 和 bpftrace 教程与文档

View File

@@ -62,6 +62,7 @@ Security:
- [Adding sudo users using eBPF](26-sudo/README.md)
- [Replacing text read or written by any program using eBPF](27-replace/README.md)
- [BPF lifecycle: Running eBPF programs continuously in Detached mode after user-mode applications exit](28-detach/README.md)
- [Modifying System Call Parameters with eBPF](34-syscall/README.md)
- [Userspace eBPF Runtimes: Overview and Applications](36-userspace-ebpf/README.md)
# bcc and bpftrace tutorial