mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-02-03 02:04:30 +08:00
Add more XDP related blogs (#135)
* add setup * update * fix code * move to dir * fix code * update the code * update code of 42 * update 21 * update * fix linter issues and add xdp in rust * update the docker file * fix CI * fix kernel code * update * update * add guidline
This commit is contained in:
72
README.md
72
README.md
@@ -1,6 +1,7 @@
|
||||
# eBPF Developer Tutorial: Learning eBPF Step by Step with Examples
|
||||
|
||||
[](https://github.com/eunomia-bpf/bpf-developer-tutorial/actions/workflows/main.yml)
|
||||
[](https://github.com/eunomia-bpf/bpf-developer-tutorial/actions/workflows/trigger-sync.yml)
|
||||
|
||||
[GitHub](https://github.com/eunomia-bpf/bpf-developer-tutorial)
|
||||
[Gitee Mirror](https://gitee.com/yunwei37/bpf-developer-tutorial)
|
||||
@@ -58,6 +59,8 @@ Android:
|
||||
Networking:
|
||||
|
||||
- [Accelerating network request forwarding using sockops](src/29-sockops/README.md)
|
||||
- [Capturing TCP Information with XDP](src/41-xdp-tcpdump/README.md)
|
||||
- [XDP Load Balancer](src/42-xdp-loadbalancer/README.md)
|
||||
|
||||
tracing:
|
||||
|
||||
@@ -65,7 +68,7 @@ tracing:
|
||||
- [Capturing Plain Text Data of Various Libraries' SSL/TLS Using uprobe](src/30-sslsniff/README.md)
|
||||
- [Using eBPF to Trace Go Routine States](src/31-goroutine/README.md)
|
||||
- [Measuring Function Latency with eBPF](src/33-funclatency/README.md)
|
||||
- [Use uprobe to trace Rust programs](src/37-uprobe-rust/README.md)
|
||||
- [Use Uprobe to trace Rust programs](src/37-uprobe-rust/README.md)
|
||||
- [Using eBPF to Trace Nginx Requests](src/39-nginx/README.md)
|
||||
- [Using eBPF to Trace MySQL Queries](src/40-mysql)
|
||||
|
||||
@@ -89,9 +92,9 @@ Continuously updating...
|
||||
|
||||
## Why write this tutorial?
|
||||
|
||||
In the process of learning eBPF, we have been inspired and helped by the [bcc python developer tutorial](src/bcc-documents/tutorial_bcc_python_developer.md). However, from the current perspective, using libbpf to develop eBPF applications is a relatively better choice. However, there seems to be few tutorials that focus on eBPF development based on libbpf and BPF CO-RE, introducing it through examples and tools. Therefore, we initiated this project, adopting a similar organization method as the bcc python developer tutorial, but using CO-RE's libbpf for development.
|
||||
In the process of learning eBPF, we have been inspired and helped by the [bcc python developer tutorial](src/bcc-documents/tutorial_bcc_python_developer.md). However, from the current perspective, using `libbpf` to develop eBPF applications is a relatively better choice.
|
||||
|
||||
This project is mainly based on [libbpf-bootstrap](https://github.com/libbpf/libbpf-bootstrap) and [eunomia-bpf](https://github.com/eunomia-bpf/eunomia-bpf) frameworks, and uses eunomia-bpf to help simplify the development of some user-space libbpf eBPF code, allowing developers to focus on kernel-space eBPF code development.
|
||||
This project is mainly based on [libbpf](https://github.com/libbpf/libbpf) frameworks.
|
||||
|
||||
> - We also provide a small tool called GPTtrace, which uses ChatGPT to automatically write eBPF programs and trace Linux systems through natural language descriptions. This tool allows you to interactively learn eBPF programs: [GPTtrace](https://github.com/eunomia-bpf/GPTtrace)
|
||||
> - Feel free to raise any questions or issues related to eBPF learning, or bugs encountered in practice, in the issue or discussion section of this repository. We will do our best to help you!
|
||||
@@ -145,65 +148,18 @@ TIME COMM TID LAT(us)
|
||||
|
||||

|
||||
|
||||
## build
|
||||
## build
|
||||
|
||||
The example of local compilation is shown as follows:
|
||||
|
||||
```shell
|
||||
$ git clone https://github.com/eunomia-bpf/bpf-developer-tutorial.git
|
||||
$ cd bpf-developer-tutorial
|
||||
$ git submodule update --init --recursive # Synchronize submodule
|
||||
$ cd src/24-hide
|
||||
$ make
|
||||
git clone https://github.com/eunomia-bpf/bpf-developer-tutorial.git
|
||||
cd bpf-developer-tutorial
|
||||
git submodule update --init --recursive # Synchronize submodule
|
||||
cd src/24-hide
|
||||
make
|
||||
```
|
||||
|
||||
## Why do we need tutorials based on libbpf and BPF CO-RE?
|
||||
## LICENSE
|
||||
|
||||
> In history, when it comes to developing a BPF application, one could choose the BCC framework to load the BPF program into the kernel when implementing various BPF programs for Tracepoints. BCC provides a built-in Clang compiler that can compile BPF code at runtime and customize it into a program that conforms to a specific host kernel. This is the only way to develop maintainable BPF applications under the constantly changing internal kernel environment. The portability of BPF and the introduction of CO-RE are detailed in the article "BPF Portability and CO-RE", explaining why BCC was the only viable option before and why libbpf is now considered a better choice. Last year, Libbpf saw significant improvements in functionality and complexity, eliminating many differences with BCC (especially for Tracepoints applications) and adding many new and powerful features that BCC does not support (such as global variables and BPF skeletons)
|
||||
>
|
||||
> Admittedly, BCC does its best to simplify the work of BPF developers, but sometimes it also increases the difficulty of problem localization and fixing while providing convenience. Users must remember its naming conventions and the autogenerated structures for Tracepoints, and they must rely on rewriting this code to read kernel data and access kprobe parameters. When using BPF maps, it is necessary to write half-object-oriented C code that does not completely match what happens in the kernel. Furthermore, BCC leads to the writing of a large amount of boilerplate code in user space, with manually configuring the most trivial parts.
|
||||
>
|
||||
> As mentioned above, BCC relies on runtime compilation and embeds a large LLVM/Clang library, which creates certain gaps between BCC and an ideal usage scenario:
|
||||
>
|
||||
> - High resource utilization (memory and CPU) at compile time, which may interfere with the main process in busy servers.
|
||||
> - It relies on the kernel header package and needs to be installed on each target host. Even so, if certain kernel contents are not exposed through public header files, type definitions need to be copied and pasted into the BPF code to achieve the purpose.
|
||||
> - Even the smallest compile-time errors can only be detected at runtime, followed by recompiling and restarting the user-space application. This greatly affects the iteration time of development (and increases frustration...).
|
||||
>
|
||||
> Libbpf + BPF CO-RE (Compile Once - Run Everywhere) takes a different approach, considering BPF programs as normal user-space programs: they only need to be compiled into small binaries that can be deployed on target hosts without modification. libbpf acts as a loader for BPF programs, responsible for configuration work (relocating, loading, and verifying BPF programs, creating BPF maps, attaching to BPF hooks, etc.), and developers only need to focus on the correctness and performance of BPF programs. This approach minimizes overhead, eliminates dependencies, and improves the overall developer experience.
|
||||
>
|
||||
> In terms of API and code conventions, libbpf adheres to the philosophy of "least surprise", where most things need to be explicitly stated: no header files are implied, and no code is rewritten. Most monotonous steps can be eliminated using simple C code and appropriate auxiliary macros. In addition, what users write is the content that needs to be executed, and the structure of BPF applications is one-to-one, finally verified and executed by the kernel.
|
||||
|
||||
Reference: [BCC to Libbpf Conversion Guide (Translation) - Deep Dive into eBPF](https://www.ebpf.top/post/bcc-to-libbpf-guid/)
|
||||
|
||||
## eunomia-bpf
|
||||
|
||||
[eunomia-bpf](https://github.com/eunomia-bpf/eunomia-bpf) is an open-source eBPF dynamic loading runtime and development toolkit designed to simplify the development, building, distribution, and execution of eBPF programs. It is based on the libbpf CO-RE lightweight development framework.
|
||||
|
||||
With eunomia-bpf, you can:
|
||||
|
||||
- Write only the libbpf kernel mode code when writing eBPF programs or tools, automatically retrieving kernel mode export information.
|
||||
- Use Wasm to develop eBPF user mode programs, controlling the entire eBPF program loading and execution, as well as handling related data within the WASM virtual machine.
|
||||
- eunomia-bpf can package pre-compiled eBPF programs into universal JSON or WASM modules for distribution across architectures and kernel versions, allowing dynamic loading and execution without the need for recompilation.
|
||||
|
||||
eunomia-bpf consists of a compilation toolchain and a runtime library. Compared to traditional frameworks like BCC and native libbpf, it greatly simplifies the development process of eBPF programs, where in most cases, only the kernel mode code needs to be written to easily build, package, and publish complete eBPF applications. At the same time, the kernel mode eBPF code guarantees compatibility with mainstream development frameworks such as libbpf, libbpfgo, libbpf-rs, and more. When user mode code needs to be written, multiple languages can be used with the help of Webassembly. Compared to script tools like bpftrace, eunomia-bpf maintains similar convenience, while not being limited to trace scenarios and can be used in various other fields such as networking and security.
|
||||
|
||||
- eunomia-bpf project GitHub address: <https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
- gitee mirror: <https://gitee.com/anolis/eunomia>
|
||||
|
||||
## Let ChatGPT Help Us
|
||||
|
||||
This tutorial uses ChatGPT to learn how to write eBPF programs. At the same time, we try to teach ChatGPT how to write eBPF programs. The general steps are as follows:
|
||||
|
||||
1. Teach it the basic knowledge of eBPF programming.
|
||||
2. Show it some cases: hello world, basic structure of eBPF programs, how to use eBPF programs for tracing, and let it start writing tutorials.
|
||||
3. Manually adjust the tutorials and correct errors in the code and documents.
|
||||
4. Feed the modified code back to ChatGPT for further learning.
|
||||
5. Try to make ChatGPT generate eBPF programs and corresponding tutorial documents automatically! For example:
|
||||
|
||||

|
||||
|
||||
The complete conversation log can be found here: [ChatGPT.md](ChatGPT.md)
|
||||
|
||||
We have also built a demo of a command-line tool. Through training in this tutorial, it can automatically write eBPF programs and trace Linux systems using natural language descriptions: <https://github.com/eunomia-bpf/GPTtrace>
|
||||
|
||||

|
||||
MIT
|
||||
|
||||
@@ -61,12 +61,10 @@ Linux 内核的主要目的是抽象出硬件或虚拟硬件,并提供一个
|
||||
|
||||
了解并尝试一下 eBPF 开发框架:
|
||||
|
||||
- bpftrace 教程,对于最简单的应用来说,bpftrace 可能是最方便的:https://eunomia.dev/zh/tutorials/bpftrace-tutorial/ (试试,1h)
|
||||
- bpftrace 教程,对于最简单的应用来说,bpftrace 可能是最方便的:<https://eunomia.dev/zh/tutorials/bpftrace-tutorial/> (试试,1h)
|
||||
- BCC 开发各类小工具的例子:<https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md> (跑一遍,3-4h)
|
||||
- libbpf 的一些例子:<https://github.com/libbpf/libbpf-bootstrap> (选感兴趣的运行一下,并阅读一下源代码,2h)
|
||||
- 基于 libbpf 和 eunomia-bpf 的教程:<https://github.com/eunomia-bpf/bpf-developer-tutorial> (阅读 1-10 的部分,3-4h)
|
||||
|
||||
其他开发框架:Go 语言或者 Rust 语言,请自行搜索并且尝试(0-2h)
|
||||
- 基于 C 语言 libbpf, Go 语言或者 Rust 语言和 eunomia-bpf 的教程:<https://github.com/eunomia-bpf/bpf-developer-tutorial> (阅读 1-20 的部分,3-8h)
|
||||
|
||||
有任何问题或者想了解的东西,不管是不是和本项目相关,都可以在本项目的 discussions 里面开始讨论。
|
||||
|
||||
@@ -147,17 +145,14 @@ eBPF Go 库提供了一个通用的 eBPF 库,它解耦了获取 eBPF 字节码
|
||||
|
||||
### eunomia-bpf
|
||||
|
||||
开发、构建和分发 eBPF 一直以来都是一个高门槛的工作,使用 BCC、bpftrace 等工具开发效率高、可移植性好,但是分发部署时需要安装 LLVM、Clang 等编译环境,每次运行的时候执行本地或远程编译过程,资源消耗较大;使用原生的 CO-RE libbpf 时又需要编写不少用户态加载代码来帮助 eBPF 程序正确加载和从内核中获取上报的信息,同时对于 eBPF 程序的分发、管理也没有很好地解决方案。
|
||||
|
||||
[eunomia-bpf](https://github.com/eunomia-bpf/eunomia-bpf) 是一个开源的 eBPF 动态加载运行时和开发工具链,是为了简化 eBPF 程序的开发、构建、分发、运行而设计的,基于 libbpf 的 CO-RE 轻量级开发框架。
|
||||
|
||||
使用 eunomia-bpf ,可以:
|
||||
|
||||
- 在编写 eBPF 程序或工具时只编写内核态代码,自动获取内核态导出信息,并作为模块动态加载;
|
||||
- 使用 WASM 进行用户态交互程序的开发,在 WASM 虚拟机内部控制整个 eBPF 程序的加载和执行,以及处理相关数据;
|
||||
- eunomia-bpf 可以将预编译的 eBPF 程序打包为通用的 JSON 或 WASM 模块,跨架构和内核版本进行分发,无需重新编译即可动态加载运行。
|
||||
|
||||
eunomia-bpf 由一个编译工具链和一个运行时库组成, 对比传统的 BCC、原生 libbpf 等框架,大幅简化了 eBPF 程序的开发流程,在大多数时候只需编写内核态代码,即可轻松构建、打包、发布完整的 eBPF 应用,同时内核态 eBPF 代码保证和主流的 libbpf,libbpfgo,libbpf-rs 等开发框架的 100% 兼容性。需要编写用户态代码的时候,也可以借助 Webassembly 实现通过多种语言进行用户态开发。和 bpftrace 等脚本工具相比, eunomia-bpf 保留了类似的便捷性, 同时不仅局限于 trace 方面, 可以用于更多的场景, 如网络、安全等等。
|
||||
eunomia-bpf 由一个编译工具链和一个运行时库组成, 对比传统的 BCC、原生 libbpf 等框架,简化了 eBPF 程序的开发流程,在大多数时候只需编写内核态代码,即可轻松构建、打包、发布完整的 eBPF 应用。
|
||||
|
||||
> - eunomia-bpf 项目 Github 地址: <https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
> - gitee 镜像: <https://gitee.com/anolis/eunomia>
|
||||
|
||||
@@ -125,17 +125,14 @@ as well as the key functions used to load kernel space code. After the user-spac
|
||||
|
||||
### eunomia-bpf
|
||||
|
||||
Developing, building, and distributing eBPF has always been a high-threshold task. The use of tools such as BCC and bpftrace has high development efficiency and good portability. However, when it comes to distribution and deployment, it requires the installation of LLVM, Clang, and other compilation environments, and the compilation process needs to be executed locally or remotely every time, resulting in substantial resource consumption. On the other hand, using the native CO-RE libbpf requires writing a considerable amount of user-mode loading code to help properly load eBPF programs and obtain reported information from the kernel. At the same time, there is no good solution for distributing and managing eBPF programs.
|
||||
|
||||
[eunomia-bpf](https://github.com/eunomia-bpf/eunomia-bpf) is an open-source eBPF dynamic loading runtime and development toolchain designed to simplify the development, building, distribution, and execution of eBPF programs. It is based on the libbpf CO-RE lightweight development framework.
|
||||
[eunomia-bpf](https://github.com/eunomia-bpf/eunomia-bpf) is an open-source eBPF dynamic loading tool and development toolchain designed to simplify the development, building, distribution, and execution of eBPF programs. It is based on the libbpf CO-RE lightweight development framework.
|
||||
|
||||
With eunomia-bpf, you can:
|
||||
|
||||
- When writing eBPF programs or tools, only write kernel space code, automatically retrieve kernel space export information, and dynamically load it as a module.
|
||||
- Use WASM for user space interactive program development to control the loading and execution of the entire eBPF program, as well as the processing of related data inside the WASM virtual machine.
|
||||
- eunomia-bpf can package pre-compiled eBPF programs into universal JSON or WASM modules for distribution across architectures and kernel versions. They can be dynamically loaded and run without the need for recompilation.
|
||||
|
||||
eunomia-bpf consists of a compilation toolchain and a runtime library. Compared with traditional frameworks such as BCC and native libbpf, it greatly simplifies the development process of eBPF programs. In most cases, only writing kernel space code is required to easily build, package, and publish complete eBPF applications. At the same time, kernel space eBPF code ensures 100% compatibility with mainstream development frameworks such as libbpf, libbpfgo, libbpf-rs, etc. When there is a need to write user-space code, it can also be developed in multiple languages with the help of WebAssembly. Compared with script tools such as bpftrace, eunomia-bpf retains similar convenience, while not only limited to tracing but also applicable to more scenarios, such as networking, security, etc.
|
||||
eunomia-bpf consists of a compilation toolchain and a runtime library. Compared with traditional frameworks such as BCC and native libbpf, it simplifies the development process of eBPF programs.
|
||||
|
||||
> - eunomia-bpf project Github address: <https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
> - gitee mirror: <https://gitee.com/anolis/eunomia>
|
||||
|
||||
@@ -17,7 +17,7 @@
|
||||
要开发eBPF程序,您需要安装以下软件和工具:
|
||||
|
||||
- Linux 内核:由于eBPF是内核技术,因此您需要具备较新版本的Linux内核(至少 4.8 及以上版本,建议至少在 5.15 以上),以支持eBPF功能。
|
||||
- 建议使用最新的 Ubuntu 版本(例如 Ubuntu 23.10)以获得最佳的学习体验,较旧的内核 eBPF 功能支持可能相对不全。
|
||||
- 建议使用最新的 Ubuntu 版本(例如 Ubuntu 23.10)以获得最佳的学习体验,较旧的内核 eBPF 功能支持可能相对不全。
|
||||
- LLVM 和 Clang:这些工具用于编译eBPF程序。安装最新版本的LLVM和Clang可以确保您获得最佳的eBPF支持。
|
||||
|
||||
eBPF 程序主要由两部分构成:内核态部分和用户态部分。内核态部分包含 eBPF 程序的实际逻辑,用户态部分负责加载、运行和监控内核态程序。
|
||||
@@ -127,7 +127,7 @@ docker run -it -v `pwd`/:/src/ ghcr.io/eunomia-bpf/ecc-`uname -m`:latest
|
||||
|
||||
```console
|
||||
$ sudo ./ecli run package.json
|
||||
Runing eBPF program...
|
||||
Running eBPF program...
|
||||
```
|
||||
|
||||
运行这段程序后,可以通过查看 /sys/kernel/debug/tracing/trace_pipe 文件来查看 eBPF 程序的输出:
|
||||
|
||||
@@ -17,7 +17,7 @@ Before starting to write eBPF programs, we need to prepare a suitable developmen
|
||||
To develop eBPF programs, you need to install the following software and tools:
|
||||
|
||||
- Linux kernel: Since eBPF is a kernel technology, you need to have a relatively new version of the Linux kernel (minimum version 4.8 and above, suggested version is 5.15+ or 6.2+) to support eBPF functionality.
|
||||
- If possible, install a new version of Ubuntu (e.g. 23.10) would be better.
|
||||
- If possible, install a new version of Ubuntu (e.g. 23.10) would be better.
|
||||
- LLVM and Clang: These tools are used to compile eBPF programs. Installing the latest version of LLVM and Clang ensures that you get the best eBPF support.
|
||||
|
||||
An eBPF program consists of two main parts: the kernel space part and the user space part. The kernel space part contains the actual logic of the eBPF program, while the user space part is responsible for loading, running, and monitoring the kernel space program.
|
||||
@@ -58,6 +58,7 @@ eunomia-bpf compiler
|
||||
Usage: ecc [OPTIONS] <SOURCE_PATH> [EXPORT_EVENT_HEADER]
|
||||
....
|
||||
```
|
||||
|
||||
Note: If you are on the aarch64 platform, please use the [ecc-aarch64](https://github.com/eunomia-bpf/eunomia-bpf/releases/latest/download/ecc-aarch64) and [ecli-aarch64](https://github.com/eunomia-bpf/eunomia-bpf/releases/latest/download/ecli-aarch64).
|
||||
|
||||
You can also compile using the docker image:
|
||||
|
||||
@@ -299,7 +299,6 @@ $ sudo ./tcpstates
|
||||
SKADDR PID COMM LADDR LPORT RADDR RPORT OLDSTATE -> NEWSTATE MS
|
||||
ffff9bf61bb62bc0 164978 node 192.168.88.15 0 52.178.17.2 443 CLOSE -> SYN_SENT 0.000
|
||||
ffff9bf61bb62bc0 0 swapper/0 192.168.88.15 41596 52.178.17.2 443 SYN_SENT -> ESTABLISHED 225.794".
|
||||
format: Return only the translated content, not including the original text.```
|
||||
"ffff9bf61bb62bc0 0 swapper/0 192.168.88.15 41596 52.178.17.2 443 ESTABLISHED -> CLOSE_WAIT 901.454
|
||||
ffff9bf61bb62bc0 164978 node 192.168.88.15 41596 52.178.17.2 443 CLOSE_WAIT -> LAST_ACK 0.793
|
||||
ffff9bf61bb62bc0 164978 node 192.168.88.15 41596 52.178.17.2 443 LAST_ACK -> LAST_ACK 0.086
|
||||
|
||||
@@ -51,9 +51,8 @@ int tc_ingress(struct __sk_buff *ctx)
|
||||
char __license[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
This code defines an eBPF program that can capture and process packets through Linux TC (Transmission Control). In this program, we limit it to capture only IPv4 protocol packets, and then print out the total length and Time-To-Live (TTL) value of the packet using the bpf_printk function.Here is the translated text:
|
||||
This code defines an eBPF program that can capture and process packets through Linux TC (Transmission Control). In this program, we limit it to capture only IPv4 protocol packets, and then print out the total length and Time-To-Live (TTL) value of the packet using the bpf_printk function.
|
||||
|
||||
"
|
||||
What needs to be noted is that we use some BPF library functions in the code, such as the functions bpf_htons and bpf_ntohs, which are used for conversion between network byte order and host byte order. In addition, we also use some comments to provide additional points and option information for TC. For example, at the beginning of this code, we use the following comments:
|
||||
|
||||
```c
|
||||
|
||||
@@ -1,21 +1,76 @@
|
||||
# eBPF 入门实践教程二十一:使用 xdp 实现可编程包处理
|
||||
# eBPF 入门实践教程二十一: 使用 XDP 进行可编程数据包处理
|
||||
|
||||
## 背景
|
||||
在本教程中,我们将介绍 XDP(eXpress Data Path),并通过一个简单的例子帮助你入门。之后,我们将探讨更高级的 XDP 应用,例如负载均衡器、防火墙及其他实际应用。如果你对 eBPF 或 XDP 感兴趣,请在 [Github](https://github.com/eunomia-bpf/bpf-developer-tutorial) 上为我们点赞!
|
||||
|
||||
xdp(eXpress Data Path)是 Linux 内核中新兴的一种绕过内核的、可编程的包处理方案。相较于 cBPF,xdp 的挂载点非常底层,位于网络设备驱动的软中断处理过程,甚至早于 skb_buff 结构的分配。因此,在 xdp 上挂载 eBPF 程序适用于很多简单但次数极多的包处理操作(如防御 Dos 攻击),可以达到很高的性能(24Mpps/core)。
|
||||
## 什么是 XDP?
|
||||
|
||||
## XDP 概述
|
||||
XDP 是 Linux 内核中的一种高性能可编程数据路径,专为网络接口级的数据包处理而设计。通过将 eBPF 程序直接附加到网络设备驱动程序上,XDP 能够在数据包到达内核网络栈之前拦截并处理它们。这使得 XDP 能够进行极低延迟和高效的数据包处理,非常适合如 DDoS 防护、负载均衡和流量过滤等任务。实际上,XDP 每核心的吞吐量可以高达 **每秒 2400 万包(Mpps)**。
|
||||
|
||||
xdp 不是第一个支持可编程包处理的系统,在此之前,以 DPDK(Data Plane Development Kit)为代表的内核旁路方案甚至能够取得更高的性能,其思路为完全绕过内核,由用户态的网络应用接管网络设备,从而避免了用户态和内核态的切换开销。然而,这样的方式具有很多天然的缺陷:
|
||||
### 为什么选择 XDP?
|
||||
|
||||
+ 无法与内核中成熟的网络模块集成,而不得不在用户态将其重新实现;
|
||||
+ 破坏了内核的安全边界,使得内核提供的很多网络工具变得不可用;
|
||||
+ 在与常规的 socket 交互时,需要从用户态重新将包注入到内核;
|
||||
+ 需要占用一个或多个单独的 CPU 来进行包处理;
|
||||
XDP 运行在比传统 Linux 网络组件(如 cBPF)更低的层级,在网络设备驱动程序的软中断上下文中执行。它能够在数据包被内核标准网络栈处理之前对其进行处理,避免了创建 Linux 中表示网络数据包的 `skb_buff` 结构。这种早期处理为简单但频繁的操作(如丢弃恶意数据包或负载均衡服务器)带来了显著的性能提升。
|
||||
|
||||
除此之外,利用内核模块和内核网络协议栈中的 hook 点也是一种思路,然而前者对内核的改动大,出错的代价高昂;后者在整套包处理流程中位置偏后,其效率不够理想。
|
||||
与其他数据包处理机制相比,XDP 在性能和可用性之间取得了平衡,它利用了 Linux 内核的安全性和可靠性,同时通过可编程的 eBPF 提供了灵活性。
|
||||
|
||||
总而言之,xdp + eBPF 为可编程包处理系统提出了一种更为稳健的思路,在某种程度上权衡了上述方案的种种优点和不足,获取较高性能的同时又不会对内核的包处理流程进行过多的改变,同时借助 eBPF 虚拟机的优势将用户定义的包处理过程进行隔离和限制,提高了安全性。
|
||||
## XDP 与其他方法的比较
|
||||
|
||||
在 XDP 出现之前,一些解决方案通过完全绕过内核来加速数据包处理。其中一个显著的例子是 **DPDK**(数据平面开发工具包)。DPDK 允许用户空间应用程序直接控制网络设备,从而实现非常高的性能。然而,这种方法也存在一些权衡:
|
||||
|
||||
1. **缺乏内核集成**:DPDK 及其他内核绕过解决方案无法利用现有的内核网络功能,开发者必须在用户空间重新实现许多协议和功能。
|
||||
|
||||
2. **安全边界**:这些绕过技术破坏了内核的安全模型,使得难以利用内核提供的安全工具。
|
||||
3. **用户空间与内核的转换开销**:当用户空间数据包处理需要与传统内核网络交互时(例如基于套接字的应用程序),数据包必须重新注入到内核中,增加了开销和复杂性。
|
||||
4. **专用 CPU 使用**:为了处理高流量,DPDK 和类似解决方案通常需要专用的 CPU 核心来处理数据包,这限制了通用系统的可扩展性和效率。
|
||||
|
||||
另一个替代 XDP 的方法是使用 Linux 网络栈中的 **内核模块** 或 **挂钩**。虽然这种方法可以很好地集成现有的内核功能,但它需要大量的内核修改,且由于在数据包处理管道的后期运行,无法提供与 XDP 相同的性能优势。
|
||||
|
||||
### XDP + eBPF 的优势
|
||||
|
||||
XDP 与 eBPF 结合提供了介于内核绕过方案(如 DPDK)和内核集成方案之间的中间地带。以下是 XDP + eBPF 脱颖而出的原因:
|
||||
|
||||
- **高性能**:通过在网络接口卡(NIC)驱动程序级别拦截数据包,XDP 可以实现接近线速的性能,用于丢弃、重定向或负载均衡数据包,同时保持低资源消耗。
|
||||
|
||||
- **内核集成**:与 DPDK 不同,XDP 在 Linux 内核中工作,允许与现有的内核网络栈和工具(如 `iptables`、`nftables` 或套接字)无缝交互。无需在用户空间重新实现网络协议。
|
||||
|
||||
- **安全性**:eBPF 虚拟机确保用户定义的 XDP 程序是被隔离的,不会对内核造成不稳定影响。eBPF 的安全模型防止恶意或有缺陷的代码损害系统,提供了一个安全的可编程数据包处理环境。
|
||||
|
||||
- **不需要专用 CPU**:XDP 允许数据包处理而无需将整个 CPU 核心专用于网络任务。这提高了系统的整体效率,允许更灵活的资源分配。
|
||||
|
||||
总的来说,XDP + eBPF 提供了一种强大的可编程数据包处理解决方案,结合了高性能与内核集成的灵活性和安全性。它消除了完全绕过内核方案的缺点,同时保留了内核安全性和功能的优势。
|
||||
|
||||
## XDP 的项目和应用案例
|
||||
|
||||
XDP 已经在许多高调的项目中得到应用,这些项目展示了它在实际网络场景中的强大功能和灵活性:
|
||||
|
||||
### 1. **Cilium**
|
||||
|
||||
- **描述**:Cilium 是一个为云原生环境(尤其是 Kubernetes)设计的开源网络、安全和可观测性工具。它利用 XDP 实现高性能的数据包过滤和负载均衡。
|
||||
- **应用案例**:Cilium 将数据包过滤和安全策略卸载到 XDP,实现高吞吐量和低延迟的容器化环境流量管理,同时不牺牲可扩展性。
|
||||
- **链接**:[Cilium](https://cilium.io/)
|
||||
|
||||
### 2. **Katran**
|
||||
|
||||
- **描述**:Katran 是由 Facebook 开发的第 4 层负载均衡器,优化了高可扩展性和性能。它使用 XDP 处理数据包转发,开销极小。
|
||||
- **应用案例**:Katran 每秒处理数百万个数据包,高效地将流量分配到后端服务器上,利用 XDP 在大规模数据中心中实现低延迟和高性能的负载均衡。
|
||||
- **链接**:[Katran GitHub](https://github.com/facebookincubator/katran)
|
||||
|
||||
### 3. **Cloudflare 的 XDP DDoS 保护**
|
||||
|
||||
- **描述**:Cloudflare 已经实现了基于 XDP 的实时 DDoS 缓解。通过在 NIC 级别处理数据包,Cloudflare 能够在恶意流量进入网络栈之前过滤掉攻击流量,最小化 DDoS 攻击对其系统的影响。
|
||||
- **应用案例**:Cloudflare 利用 XDP 在管道早期丢弃恶意数据包,保护其基础设施免受大规模 DDoS 攻击,同时保持对合法流量的高可用性。
|
||||
- **链接**:[Cloudflare 博客关于 XDP](https://blog.cloudflare.com/l4drop-xdp-ebpf-based-ddos-mitigations/)
|
||||
|
||||
这些项目展示了 XDP 在不同领域的可扩展和高效的数据包处理能力,从安全和负载均衡到云原生网络。
|
||||
|
||||
### 为什么选择 XDP 而不是其他方法?
|
||||
|
||||
与传统方法(如 `iptables`、`nftables` 或 `tc`)相比,XDP 提供了几个明显的优势:
|
||||
|
||||
- **速度与低开销**:XDP 直接在 NIC 驱动程序中运行,绕过了内核的大部分开销,使数据包处理更快。
|
||||
|
||||
- **可定制性**:XDP 允许开发人员通过 eBPF 创建自定义的数据包处理程序,提供比传统工具(如 `iptables`)更大的灵活性和细粒度控制。
|
||||
|
||||
- **资源效率**:XDP 不需要像 DPDK 等用户空间解决方案那样将整个 CPU 核心专用于数据包处理,因此它是高性能网络的更高效选择。
|
||||
|
||||
## 编写 eBPF 程序
|
||||
|
||||
@@ -97,6 +152,6 @@ $ sudo cat /sys/kernel/tracing/trace_pipe
|
||||
|
||||
## 参考资料
|
||||
|
||||
+ <http://arthurchiao.art/blog/xdp-paper-acm-2018-zh/>
|
||||
+ <http://arthurchiao.art/blog/linux-net-stack-implementation-rx-zh/>
|
||||
+ <https://github.com/xdp-project/xdp-tutorial/tree/master/basic01-xdp-pass>
|
||||
- <http://arthurchiao.art/blog/xdp-paper-acm-2018-zh/>
|
||||
- <http://arthurchiao.art/blog/linux-net-stack-implementation-rx-zh/>
|
||||
- <https://github.com/xdp-project/xdp-tutorial/tree/master/basic01-xdp-pass>
|
||||
|
||||
@@ -1,23 +1,80 @@
|
||||
# eBPF Tutorial by Example 21: Programmable Packet Processing with XDP
|
||||
|
||||
## Background
|
||||
In this tutorial, we will introduce XDP (eXpress Data Path) and walk through a small example to help you get started. Later on, we will explore more advanced XDP applications, such as load balancers, firewalls, and other real-world use cases. Please give us a start on [Github](https://github.com/eunomia-bpf/bpf-developer-tutorial) if you are interested in eBPF or XDP!
|
||||
|
||||
XDP (eXpress Data Path) is an emerging scheme in the Linux kernel for programmable packet processing that bypasses the kernel. Compared to cBPF, XDP operates at a much lower level, residing within the network device driver's soft interrupt processing, even before the allocation of the `skb_buff` structure. Thus, eBPF programs mounted on XDP are suitable for many simple yet frequent packet processing operations (like defending against DoS attacks), achieving high performance (24Mpps/core).
|
||||
## What is XDP?
|
||||
|
||||
## Overview of XDP
|
||||
XDP is a high-performance, programmable data path in the Linux kernel, designed for packet processing at the network interface level. By attaching eBPF programs directly to network device drivers, XDP can intercept and handle packets before they reach the kernel’s networking stack. This allows for extremely low-latency and efficient packet processing, making it ideal for tasks like DDoS defense, load balancing, and traffic filtering. In fact, XDP can achieve throughput as high as **24 million packets per second (Mpps) per core**.
|
||||
|
||||
XDP isn't the first system supporting programmable packet processing. Before it, kernel-bypass solutions like DPDK (Data Plane Development Kit) could even achieve higher performance. The idea behind such solutions is to completely bypass the kernel and let user-level network applications take over network devices, eliminating the overhead of transitioning between user and kernel mode. However, this approach has inherent drawbacks:
|
||||
### Why XDP?
|
||||
|
||||
+ Inability to integrate with mature network modules in the kernel, necessitating reimplementation in user space.
|
||||
+ Breaking the kernel's security boundary, rendering many kernel-provided networking tools unusable.
|
||||
+ When interacting with conventional sockets, packets must be reinjected into the kernel from user space.
|
||||
+ Requires dedicating one or more separate CPUs for packet processing.
|
||||
XDP operates at a lower level than traditional Linux networking components, like cBPF (Classic BPF), by running inside the soft interrupt context of the network device driver. It can handle packets before they are even processed by the kernel's standard networking stack, bypassing the creation of the `skb_buff` structure, which represents network packets in Linux. This early-stage processing provides significant performance gains for simple but frequent operations like dropping malicious packets or load balancing across servers.
|
||||
|
||||
Additionally, using kernel modules and hook points in the kernel's network protocol stack is another approach. However, the former entails extensive kernel modifications with high error costs, while the latter, due to its position in the whole packet processing workflow, isn't as efficient.
|
||||
Compared to other packet processing mechanisms, XDP strikes a balance between performance and usability, leveraging the security and reliability of the Linux kernel while providing flexibility through programmable eBPF.
|
||||
|
||||
In summary, XDP + eBPF presents a more robust approach for programmable packet processing. It balances the strengths and weaknesses of the aforementioned solutions, achieving high performance without altering the kernel's packet processing workflow too much. Moreover, the eBPF virtual machine isolates and constrains user-defined packet processing routines, enhancing security.
|
||||
## Overview of XDP vs. Other Approaches
|
||||
|
||||
## Writing an eBPF Program
|
||||
Before XDP, several other solutions aimed to accelerate packet processing by bypassing the kernel entirely. One prominent example is **DPDK** (Data Plane Development Kit). DPDK allows user-space applications to take direct control of network devices, achieving very high performance. However, this approach comes with trade-offs:
|
||||
|
||||
1. **Lack of Kernel Integration**: DPDK and other kernel-bypass solutions cannot utilize existing kernel networking features, requiring developers to reimplement many protocols and functions in user space.
|
||||
|
||||
2. **Security Boundaries**: These bypass techniques break the kernel’s security model, making it harder to leverage security tools provided by the kernel.
|
||||
|
||||
3. **User-Kernel Transition Costs**: When user-space packet processing needs to interact with traditional kernel networking (like socket-based applications), packets must be reinjected into the kernel, adding overhead and complexity.
|
||||
|
||||
4. **Dedicated CPU Usage**: To handle high traffic, DPDK and similar solutions often require dedicating one or more CPU cores solely for packet processing, which limits the scalability and efficiency of general-purpose systems.
|
||||
|
||||
Another alternative to XDP is using **kernel modules** or **hooks** in the Linux networking stack. While this method integrates well with existing kernel features, it requires extensive kernel modifications and does not provide the same performance benefits, as it operates later in the packet processing pipeline.
|
||||
|
||||
### The XDP + eBPF Advantage
|
||||
|
||||
XDP combined with eBPF offers a middle ground between kernel-bypass solutions like DPDK and kernel-integrated solutions. Here’s why XDP + eBPF stands out:
|
||||
|
||||
- **High Performance**: By intercepting packets early at the NIC driver level, XDP achieves near-line rate performance for tasks like dropping, redirecting, or load balancing packets, all while keeping resource usage low.
|
||||
|
||||
- **Kernel Integration**: Unlike DPDK, XDP works within the Linux kernel, allowing seamless interaction with the existing kernel network stack and tools (such as `iptables`, `nftables`, or sockets). There’s no need to reimplement networking protocols in user space.
|
||||
|
||||
- **Security**: The eBPF virtual machine (VM) ensures that user-defined XDP programs are sandboxed and constrained, which means they cannot destabilize the kernel. The security model of eBPF prevents malicious or buggy code from harming the system, providing a safe environment for programmable packet processing.
|
||||
|
||||
- **No Dedicated CPUs Required**: XDP allows packet processing without dedicating entire CPU cores solely to network tasks. This improves the overall efficiency of the system, allowing for more flexible resource allocation.
|
||||
|
||||
In summary, XDP + eBPF delivers a robust solution for programmable packet processing that combines high performance with the flexibility and safety of kernel integration. It eliminates the drawbacks of full kernel-bypass solutions while retaining the benefits of kernel security and functionality.
|
||||
|
||||
## Projects and Use Cases with XDP
|
||||
|
||||
XDP is already being used in a number of high-profile projects that highlight its power and flexibility in real-world networking scenarios:
|
||||
|
||||
### 1. **Cilium**
|
||||
|
||||
- **Description**: Cilium is an open-source networking, security, and observability tool designed for cloud-native environments, especially Kubernetes. It leverages XDP to implement high-performance packet filtering and load balancing.
|
||||
- **Use Case**: Cilium offloads packet filtering and security policies to XDP, enabling high-throughput and low-latency traffic management in containerized environments without sacrificing scalability.
|
||||
- **Link**: [Cilium](https://cilium.io/)
|
||||
|
||||
### 2. **Katran**
|
||||
|
||||
- **Description**: Katran is a layer 4 load balancer developed by Facebook, optimized for high scalability and performance. It uses XDP to handle packet forwarding with minimal overhead.
|
||||
- **Use Case**: Katran processes millions of packets per second to distribute traffic across backend servers efficiently, using XDP to achieve low-latency and high-performance load balancing in large-scale data centers.
|
||||
- **Link**: [Katran GitHub](https://github.com/facebookincubator/katran)
|
||||
|
||||
### 3. **XDP DDoS Protection at Cloudflare**
|
||||
|
||||
- **Description**: Cloudflare has implemented XDP for real-time DDoS mitigation. By processing packets at the NIC level, Cloudflare can filter out attack traffic before it reaches the networking stack, minimizing the impact of DDoS attacks on their systems.
|
||||
- **Use Case**: Cloudflare leverages XDP to drop malicious packets early in the pipeline, protecting their infrastructure from large-scale DDoS attacks while maintaining high availability for legitimate traffic.
|
||||
- **Link**: [Cloudflare Blog on XDP](https://blog.cloudflare.com/l4drop-xdp-ebpf-based-ddos-mitigations/)
|
||||
|
||||
These projects demonstrate the real-world capabilities of XDP for scalable and efficient packet processing across different domains, from security and load balancing to cloud-native networking.
|
||||
|
||||
### Why Use XDP Over Other Methods?
|
||||
|
||||
Compared to traditional methods like `iptables`, `nftables`, or `tc`, XDP offers several clear advantages:
|
||||
|
||||
- **Speed and Low Overhead**: Operating directly in the NIC driver, XDP bypasses much of the kernel’s overhead, enabling faster packet processing.
|
||||
|
||||
- **Customizability**: XDP allows developers to create custom packet-processing programs with eBPF, providing more flexibility and granularity than legacy tools like `iptables`.
|
||||
|
||||
- **Resource Efficiency**: XDP does not require dedicating entire CPU cores to packet processing, unlike user-space solutions like DPDK, making it a more efficient choice for high-performance networking.
|
||||
|
||||
## Writing your first XDP Program
|
||||
|
||||
```C
|
||||
#include "vmlinux.h"
|
||||
@@ -99,8 +156,8 @@ For those interested in further exploring eBPF, visit our tutorial code reposito
|
||||
|
||||
For more information, you can refer to:
|
||||
|
||||
+ <http://arthurchiao.art/blog/xdp-paper-acm-2018-zh/>
|
||||
+ <http://arthurchiao.art/blog/linux-net-stack-implementation-rx-zh/>
|
||||
+ <https://github.com/xdp-project/xdp-tutorial/tree/master/basic01-xdp-pass>
|
||||
- <http://arthurchiao.art/blog/xdp-paper-acm-2018-zh/>
|
||||
- <http://arthurchiao.art/blog/linux-net-stack-implementation-rx-zh/>
|
||||
- <https://github.com/xdp-project/xdp-tutorial/tree/master/basic01-xdp-pass>
|
||||
|
||||
> The original link of this article: <https://eunomia.dev/tutorials/21-xdp>
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# 在 Andorid 上使用 eBPF 程序
|
||||
# 在 Android 上使用 eBPF 程序
|
||||
|
||||
> 本文主要记录了笔者在 Android Studio Emulator 中测试高版本 Android Kernel 对基于 libbpf 的 CO-RE 技术支持程度的探索过程、结果和遇到的问题。
|
||||
> 测试采用的方式是在 Android Shell 环境下构建 Debian 环境,并基于此尝试构建 eunomia-bpf 工具链、运行其测试用例。
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# eBPF Tutorial by Example: Using eBPF Programs on Android
|
||||
|
||||
> This article mainly documents the author's exploration process, results, and issues encountered while testing the level of support for CO-RE technology based on the libbpf library on high version Android kernels in the Android Studio Emulator.
|
||||
> This article mainly documents the author's exploration process, results, and issues encountered while testing the level of support for CO-RE technology based on the libbpf library on high version Android kernels in the Android Studio Emulator.
|
||||
> The test was conducted by building a Debian environment in the Android Shell environment and attempting to build the eunomia-bpf toolchain and run its test cases based on this.
|
||||
|
||||
## Background
|
||||
|
||||
@@ -1 +0,0 @@
|
||||
|
||||
@@ -1,53 +0,0 @@
|
||||
// xdp_lb.bpf.c
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <linux/if_ether.h>
|
||||
#include <linux/ip.h>
|
||||
#include <linux/in.h>
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__uint(max_entries, 2); // Two backends
|
||||
__type(key, __u32);
|
||||
__type(value, __u32); // Backend IPs
|
||||
} backends SEC(".maps");
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_load_balancer(struct xdp_md *ctx) {
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
void *data = (void *)(long)ctx->data;
|
||||
|
||||
struct ethhdr *eth = data;
|
||||
if ((void *)(eth + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
if (eth->h_proto != __constant_htons(ETH_P_IP))
|
||||
return XDP_PASS;
|
||||
|
||||
struct iphdr *iph = (struct iphdr *)(eth + 1);
|
||||
if ((void *)(iph + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
if (iph->protocol != IPPROTO_TCP && iph->protocol != IPPROTO_UDP)
|
||||
return XDP_PASS;
|
||||
|
||||
__u32 key = 0;
|
||||
static __u32 cnt = 0;
|
||||
__u32 *backend_ip = bpf_map_lookup_elem(&backends, &key);
|
||||
|
||||
if (!backend_ip)
|
||||
return XDP_PASS;
|
||||
|
||||
cnt = (cnt + 1) % 2; // Round-robin
|
||||
key = cnt;
|
||||
backend_ip = bpf_map_lookup_elem(&backends, &key);
|
||||
|
||||
if (backend_ip) {
|
||||
iph->daddr = *backend_ip; // Redirect to the backend IP
|
||||
iph->check = 0; // Needs recomputation in real cases
|
||||
}
|
||||
|
||||
return XDP_TX; // Transmit modified packet back
|
||||
}
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
@@ -1,83 +0,0 @@
|
||||
// xdp_lb.c
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <bpf/libbpf.h>
|
||||
#include <bpf/bpf.h>
|
||||
#include <net/if.h>
|
||||
#include <linux/if_link.h>
|
||||
#include "xdp_lb.skel.h" // This header is auto-generated by bpftool
|
||||
|
||||
#define IFACE "eth0" // Replace with your network interface
|
||||
|
||||
static int set_up_backends(struct xdp_lb_bpf *skel) {
|
||||
__u32 backend1 = htonl(0xC0A80102); // 192.168.1.2
|
||||
__u32 backend2 = htonl(0xC0A80103); // 192.168.1.3
|
||||
__u32 key = 0;
|
||||
|
||||
if (bpf_map_update_elem(bpf_map__fd(skel->maps.backends), &key, &backend1, BPF_ANY) < 0) {
|
||||
fprintf(stderr, "Failed to update backend 1\n");
|
||||
return -1;
|
||||
}
|
||||
|
||||
key = 1;
|
||||
if (bpf_map_update_elem(bpf_map__fd(skel->maps.backends), &key, &backend2, BPF_ANY) < 0) {
|
||||
fprintf(stderr, "Failed to update backend 2\n");
|
||||
return -1;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main() {
|
||||
struct xdp_lb_bpf *skel;
|
||||
int err, ifindex;
|
||||
|
||||
// Load and verify the eBPF skeleton
|
||||
skel = xdp_lb_bpf__open();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "Failed to open and load skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Load eBPF program
|
||||
err = xdp_lb_bpf__load(skel);
|
||||
if (err) {
|
||||
fprintf(stderr, "Failed to load BPF program: %d\n", err);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Set up the backend IP addresses
|
||||
if (set_up_backends(skel) < 0) {
|
||||
fprintf(stderr, "Failed to set up backend IP addresses\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Get interface index
|
||||
ifindex = if_nametoindex(IFACE);
|
||||
if (ifindex == 0) {
|
||||
perror("if_nametoindex");
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Attach the XDP program
|
||||
err = bpf_xdp_attach(ifindex, bpf_program__fd(skel->progs.xdp_load_balancer), XDP_FLAGS_SKB_MODE, NULL);
|
||||
if (err) {
|
||||
fprintf(stderr, "Failed to attach XDP program: %d\n", err);
|
||||
return 1;
|
||||
}
|
||||
|
||||
printf("XDP Load Balancer is running on interface %s...\n", IFACE);
|
||||
sleep(60); // Keep running for 60 seconds
|
||||
|
||||
// Detach the XDP program before exiting
|
||||
err = bpf_xdp_detach(ifindex, XDP_FLAGS_SKB_MODE, NULL);
|
||||
if (err) {
|
||||
fprintf(stderr, "Failed to detach XDP program: %d\n", err);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Clean up
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
|
||||
return 0;
|
||||
}
|
||||
6
src/41-xdp-tcpdump/.gitignore
vendored
Normal file
6
src/41-xdp-tcpdump/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
.output
|
||||
uprobe
|
||||
merge-btf
|
||||
*.btf
|
||||
xdp_lb
|
||||
xdp-tcpdump
|
||||
@@ -24,7 +24,7 @@ INCLUDES := -I$(OUTPUT) -I../third_party/libbpf/include/uapi -I$(dir $(VMLINUX))
|
||||
CFLAGS := -g -Wall
|
||||
ALL_LDFLAGS := $(LDFLAGS) $(EXTRA_LDFLAGS)
|
||||
|
||||
APPS = funclatency
|
||||
APPS = xdp-tcpdump
|
||||
|
||||
CARGO ?= $(shell which cargo)
|
||||
ifeq ($(strip $(CARGO)),)
|
||||
509
src/41-xdp-tcpdump/README.md
Normal file
509
src/41-xdp-tcpdump/README.md
Normal file
@@ -0,0 +1,509 @@
|
||||
# eBPF 示例教程:使用 XDP 捕获 TCP 信息
|
||||
|
||||
扩展伯克利包过滤器(eBPF)是 Linux 内核中的一项革命性技术,允许开发者在内核空间内运行沙箱程序。它提供了强大的网络、安全和跟踪能力,无需修改内核源代码或加载内核模块。本教程重点介绍如何使用 eBPF 结合 Express Data Path(XDP),在数据包进入时的最早阶段直接捕获 TCP 头信息。
|
||||
|
||||
## 使用 XDP 捕获 TCP 头信息
|
||||
|
||||
捕获网络数据包对于监控、调试和保护网络通信至关重要。传统工具如 `tcpdump` 在用户空间运行,可能会带来显著的开销。通过利用 eBPF 和 XDP,我们可以在内核中直接捕获 TCP 头信息,最小化开销并提高性能。
|
||||
|
||||
在本教程中,我们将开发一个 XDP 程序,该程序拦截传入的 TCP 数据包并提取其头信息。我们将这些数据存储在一个环形缓冲区中,用户空间的程序将读取并以可读的格式显示这些信息。
|
||||
|
||||
### 为什么使用 XDP 进行数据包捕获?
|
||||
|
||||
XDP 是 Linux 内核中一个高性能的数据路径,允许在网络栈的最低层进行可编程的数据包处理。通过将 eBPF 程序附加到 XDP,我们可以在数据包到达时立即处理它们,减少延迟并提高效率。
|
||||
|
||||
## 内核 eBPF 代码分析
|
||||
|
||||
让我们深入了解捕获 TCP 头信息的内核空间 eBPF 代码。
|
||||
|
||||
### 完整的内核代码
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
|
||||
#define ETH_P_IP 0x0800
|
||||
|
||||
// 定义环形缓冲区映射
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
__uint(max_entries, 1 << 24); // 16 MB 缓冲区
|
||||
} rb SEC(".maps");
|
||||
|
||||
// 检查数据包是否为 TCP 的辅助函数
|
||||
static bool is_tcp(struct ethhdr *eth, void *data_end)
|
||||
{
|
||||
// 确保以太网头在边界内
|
||||
if ((void *)(eth + 1) > data_end)
|
||||
return false;
|
||||
|
||||
// 仅处理 IPv4 数据包
|
||||
if (bpf_ntohs(eth->h_proto) != ETH_P_IP)
|
||||
return false;
|
||||
|
||||
struct iphdr *ip = (struct iphdr *)(eth + 1);
|
||||
|
||||
// 确保 IP 头在边界内
|
||||
if ((void *)(ip + 1) > data_end)
|
||||
return false;
|
||||
|
||||
// 检查协议是否为 TCP
|
||||
if (ip->protocol != IPPROTO_TCP)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_pass(struct xdp_md *ctx)
|
||||
{
|
||||
// 数据包数据指针
|
||||
void *data = (void *)(long)ctx->data;
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
|
||||
// 解析以太网头
|
||||
struct ethhdr *eth = data;
|
||||
|
||||
// 检查数据包是否为 TCP 数据包
|
||||
if (!is_tcp(eth, data_end)) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// 转换为 IP 头
|
||||
struct iphdr *ip = (struct iphdr *)(eth + 1);
|
||||
|
||||
// 计算 IP 头长度
|
||||
int ip_hdr_len = ip->ihl * 4;
|
||||
if (ip_hdr_len < sizeof(struct iphdr)) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// 确保 IP 头在数据包边界内
|
||||
if ((void *)ip + ip_hdr_len > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// 解析 TCP 头
|
||||
struct tcphdr *tcp = (struct tcphdr *)((unsigned char *)ip + ip_hdr_len);
|
||||
|
||||
// 确保 TCP 头在数据包边界内
|
||||
if ((void *)(tcp + 1) > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// 定义要捕获的 TCP 头字节数
|
||||
const int tcp_header_bytes = 32;
|
||||
|
||||
// 确保所需字节数不超过数据包边界
|
||||
if ((void *)tcp + tcp_header_bytes > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// 在环形缓冲区中预留空间
|
||||
void *ringbuf_space = bpf_ringbuf_reserve(&rb, tcp_header_bytes, 0);
|
||||
if (!ringbuf_space) {
|
||||
return XDP_PASS; // 如果预留失败,跳过处理
|
||||
}
|
||||
|
||||
// 将 TCP 头字节复制到环形缓冲区
|
||||
// 使用循环以确保符合 eBPF 验证器要求
|
||||
for (int i = 0; i < tcp_header_bytes; i++) {
|
||||
unsigned char byte = *((unsigned char *)tcp + i);
|
||||
((unsigned char *)ringbuf_space)[i] = byte;
|
||||
}
|
||||
|
||||
// 将数据提交到环形缓冲区
|
||||
bpf_ringbuf_submit(ringbuf_space, 0);
|
||||
|
||||
// 可选:打印调试信息
|
||||
bpf_printk("Captured TCP header (%d bytes)", tcp_header_bytes);
|
||||
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
char __license[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
### 代码解释
|
||||
|
||||
#### 定义环形缓冲区映射
|
||||
|
||||
我们定义了一个名为 `rb` 的环形缓冲区映射,用于高效地将数据从内核传递到用户空间。
|
||||
|
||||
```c
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
__uint(max_entries, 1 << 24); // 16 MB 缓冲区
|
||||
} rb SEC(".maps");
|
||||
```
|
||||
|
||||
#### 数据包解析与验证
|
||||
|
||||
`is_tcp` 辅助函数通过验证以太网和 IP 头,检查传入的数据包是否为 TCP 数据包。
|
||||
|
||||
```c
|
||||
static bool is_tcp(struct ethhdr *eth, void *data_end)
|
||||
{
|
||||
// ...(检查内容略)
|
||||
}
|
||||
```
|
||||
|
||||
#### 捕获 TCP 头信息
|
||||
|
||||
在 `xdp_pass` 函数中,我们:
|
||||
|
||||
1. 解析以太网、IP 和 TCP 头。
|
||||
2. 确保所有头信息在数据包边界内,以防止无效内存访问。
|
||||
3. 在环形缓冲区中预留空间以存储 TCP 头。
|
||||
4. 将 TCP 头字节复制到环形缓冲区。
|
||||
5. 提交数据到环形缓冲区,供用户空间使用。
|
||||
|
||||
```c
|
||||
// 在环形缓冲区中预留空间
|
||||
void *ringbuf_space = bpf_ringbuf_reserve(&rb, tcp_header_bytes, 0);
|
||||
if (!ringbuf_space) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// 复制 TCP 头字节
|
||||
for (int i = 0; i < tcp_header_bytes; i++) {
|
||||
unsigned char byte = *((unsigned char *)tcp + i);
|
||||
((unsigned char *)ringbuf_space)[i] = byte;
|
||||
}
|
||||
|
||||
// 提交到环形缓冲区
|
||||
bpf_ringbuf_submit(ringbuf_space, 0);
|
||||
```
|
||||
|
||||
#### 使用 bpf_printk 进行调试
|
||||
|
||||
`bpf_printk` 函数将消息记录到内核的跟踪管道,对于调试非常有用。
|
||||
|
||||
```c
|
||||
bpf_printk("Captured TCP header (%d bytes)", tcp_header_bytes);
|
||||
```
|
||||
|
||||
## 用户空间代码分析
|
||||
|
||||
让我们查看用户空间程序,该程序从环形缓冲区中读取捕获的 TCP 头信息并显示。
|
||||
|
||||
### 完整的用户空间代码
|
||||
|
||||
```c
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <errno.h>
|
||||
#include <unistd.h>
|
||||
#include <net/if.h>
|
||||
|
||||
#include <bpf/libbpf.h>
|
||||
#include <bpf/bpf.h>
|
||||
|
||||
#include "xdp-tcpdump.skel.h" // 生成的骨架头文件
|
||||
|
||||
// 处理环形缓冲区事件的回调函数
|
||||
static int handle_event(void *ctx, void *data, size_t data_sz)
|
||||
{
|
||||
if (data_sz < 20) { // 最小 TCP 头大小
|
||||
fprintf(stderr, "Received incomplete TCP header\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
// 解析原始 TCP 头字节
|
||||
struct tcphdr {
|
||||
uint16_t source;
|
||||
uint16_t dest;
|
||||
uint32_t seq;
|
||||
uint32_t ack_seq;
|
||||
uint16_t res1:4,
|
||||
doff:4,
|
||||
fin:1,
|
||||
syn:1,
|
||||
rst:1,
|
||||
psh:1,
|
||||
ack:1,
|
||||
urg:1,
|
||||
ece:1,
|
||||
cwr:1;
|
||||
uint16_t window;
|
||||
uint16_t check;
|
||||
uint16_t urg_ptr;
|
||||
// 可能还有选项和填充
|
||||
} __attribute__((packed));
|
||||
|
||||
if (data_sz < sizeof(struct tcphdr)) {
|
||||
fprintf(stderr, "Data size (%zu) less than TCP header size\n", data_sz);
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct tcphdr *tcp = (struct tcphdr *)data;
|
||||
|
||||
// 将字段从网络字节序转换为主机字节序
|
||||
uint16_t source_port = ntohs(tcp->source);
|
||||
uint16_t dest_port = ntohs(tcp->dest);
|
||||
uint32_t seq = ntohl(tcp->seq);
|
||||
uint32_t ack_seq = ntohl(tcp->ack_seq);
|
||||
uint16_t window = ntohs(tcp->window);
|
||||
|
||||
// 提取标志位
|
||||
uint8_t flags = 0;
|
||||
flags |= (tcp->fin) ? 0x01 : 0x00;
|
||||
flags |= (tcp->syn) ? 0x02 : 0x00;
|
||||
flags |= (tcp->rst) ? 0x04 : 0x00;
|
||||
flags |= (tcp->psh) ? 0x08 : 0x00;
|
||||
flags |= (tcp->ack) ? 0x10 : 0x00;
|
||||
flags |= (tcp->urg) ? 0x20 : 0x00;
|
||||
flags |= (tcp->ece) ? 0x40 : 0x00;
|
||||
flags |= (tcp->cwr) ? 0x80 : 0x00;
|
||||
|
||||
printf("Captured TCP Header:\n");
|
||||
printf(" 源端口: %u\n", source_port);
|
||||
printf(" 目的端口: %u\n", dest_port);
|
||||
printf(" 序列号: %u\n", seq);
|
||||
printf(" 确认号: %u\n", ack_seq);
|
||||
printf(" 数据偏移: %u\n", tcp->doff);
|
||||
printf(" 标志位: 0x%02x\n", flags);
|
||||
printf(" 窗口大小: %u\n", window);
|
||||
printf("\n");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
struct xdp_tcpdump_bpf *skel;
|
||||
struct ring_buffer *rb = NULL;
|
||||
int ifindex;
|
||||
int err;
|
||||
|
||||
if (argc != 2)
|
||||
{
|
||||
fprintf(stderr, "Usage: %s <ifname>\n", argv[0]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
const char *ifname = argv[1];
|
||||
ifindex = if_nametoindex(ifname);
|
||||
if (ifindex == 0)
|
||||
{
|
||||
fprintf(stderr, "Invalid interface name %s\n", ifname);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* 打开并加载 BPF 应用 */
|
||||
skel = xdp_tcpdump_bpf__open();
|
||||
if (!skel)
|
||||
{
|
||||
fprintf(stderr, "Failed to open BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* 加载并验证 BPF 程序 */
|
||||
err = xdp_tcpdump_bpf__load(skel);
|
||||
if (err)
|
||||
{
|
||||
fprintf(stderr, "Failed to load and verify BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
/* 附加 XDP 程序 */
|
||||
err = xdp_tcpdump_bpf__attach(skel);
|
||||
if (err)
|
||||
{
|
||||
fprintf(stderr, "Failed to attach BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
/* 将 XDP 程序附加到指定的接口 */
|
||||
skel->links.xdp_pass = bpf_program__attach_xdp(skel->progs.xdp_pass, ifindex);
|
||||
if (!skel->links.xdp_pass)
|
||||
{
|
||||
err = -errno;
|
||||
fprintf(stderr, "Failed to attach XDP program: %s\n", strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
printf("成功将 XDP 程序附加到接口 %s\n", ifname);
|
||||
|
||||
/* 设置环形缓冲区轮询 */
|
||||
rb = ring_buffer__new(bpf_map__fd(skel->maps.rb), handle_event, NULL, NULL);
|
||||
if (!rb)
|
||||
{
|
||||
fprintf(stderr, "Failed to create ring buffer\n");
|
||||
err = -1;
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
printf("开始轮询环形缓冲区\n");
|
||||
|
||||
/* 轮询环形缓冲区 */
|
||||
while (1)
|
||||
{
|
||||
err = ring_buffer__poll(rb, -1);
|
||||
if (err == -EINTR)
|
||||
continue;
|
||||
if (err < 0)
|
||||
{
|
||||
fprintf(stderr, "Error polling ring buffer: %d\n", err);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
cleanup:
|
||||
ring_buffer__free(rb);
|
||||
xdp_tcpdump_bpf__destroy(skel);
|
||||
return -err;
|
||||
}
|
||||
```
|
||||
|
||||
### 代码解释
|
||||
|
||||
#### 处理环形缓冲区事件
|
||||
|
||||
`handle_event` 函数处理从环形缓冲区接收到的 TCP 头数据。
|
||||
|
||||
```c
|
||||
static int handle_event(void *ctx, void *data, size_t data_sz)
|
||||
{
|
||||
// 验证数据大小
|
||||
if (data_sz < 20) {
|
||||
fprintf(stderr, "Received incomplete TCP header\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
// 解析 TCP 头
|
||||
// ...(解析代码)
|
||||
}
|
||||
```
|
||||
|
||||
#### 解析 TCP 头
|
||||
|
||||
我们定义了一个本地的 `tcphdr` 结构来解释原始字节。
|
||||
|
||||
```c
|
||||
struct tcphdr {
|
||||
uint16_t source;
|
||||
uint16_t dest;
|
||||
uint32_t seq;
|
||||
uint32_t ack_seq;
|
||||
// ...(其他字段)
|
||||
} __attribute__((packed));
|
||||
```
|
||||
|
||||
#### 显示捕获的信息
|
||||
|
||||
解析后,我们以可读的格式打印 TCP 头字段。
|
||||
|
||||
```c
|
||||
printf("Captured TCP Header:\n");
|
||||
printf(" 源端口: %u\n", source_port);
|
||||
printf(" 目的端口: %u\n", dest_port);
|
||||
// ...(其他字段)
|
||||
```
|
||||
|
||||
#### 设置 eBPF 骨架
|
||||
|
||||
我们使用生成的骨架 `xdp-tcpdump.skel.h` 来加载和附加 eBPF 程序。
|
||||
|
||||
```c
|
||||
/* 打开并加载 BPF 应用 */
|
||||
skel = xdp_tcpdump_bpf__open();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "Failed to open BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* 加载并验证 BPF 程序 */
|
||||
err = xdp_tcpdump_bpf__load(skel);
|
||||
if (err) {
|
||||
fprintf(stderr, "Failed to load and verify BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
```
|
||||
|
||||
#### 附加到网络接口
|
||||
|
||||
我们通过接口名称将 XDP 程序附加到指定的网络接口。
|
||||
|
||||
```c
|
||||
/* 将 XDP 程序附加到指定的接口 */
|
||||
skel->links.xdp_pass = bpf_program__attach_xdp(skel->progs.xdp_pass, ifindex);
|
||||
if (!skel->links.xdp_pass) {
|
||||
err = -errno;
|
||||
fprintf(stderr, "Failed to attach XDP program: %s\n", strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
```
|
||||
|
||||
## 编译和执行说明
|
||||
|
||||
### 前提条件
|
||||
|
||||
- 支持 eBPF 和 XDP 的 Linux 系统内核。
|
||||
- 安装了 libbpf 库。
|
||||
- 具有 eBPF 支持的编译器(如 clang)。
|
||||
|
||||
### 构建程序
|
||||
|
||||
假设您已从 [GitHub](https://github.com/eunomia-bpf/bpf-developer-tutorial) 克隆了仓库,请导航到 `bpf-developer-tutorial/src/41-xdp-tcpdump` 目录。
|
||||
|
||||
```bash
|
||||
cd bpf-developer-tutorial/src/41-xdp-tcpdump
|
||||
make
|
||||
```
|
||||
|
||||
此命令将编译内核 eBPF 代码和用户空间应用程序。
|
||||
|
||||
### 运行程序
|
||||
|
||||
首先,识别您的网络接口:
|
||||
|
||||
```bash
|
||||
ifconfig
|
||||
```
|
||||
|
||||
示例输出:
|
||||
|
||||
```
|
||||
wlp0s20f3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
|
||||
inet 192.168.1.10 netmask 255.255.255.0 broadcast 192.168.1.255
|
||||
ether 00:1a:2b:3c:4d:5e txqueuelen 1000 (Ethernet)
|
||||
```
|
||||
|
||||
使用所需的网络接口运行用户空间程序:
|
||||
|
||||
```bash
|
||||
sudo ./xdp-tcpdump wlp0s20f3
|
||||
```
|
||||
|
||||
示例输出:
|
||||
|
||||
```
|
||||
成功将 XDP 程序附加到接口 wlp0s20f3
|
||||
开始轮询环形缓冲区
|
||||
Captured TCP Header:
|
||||
源端口: 443
|
||||
目的端口: 53500
|
||||
序列号: 572012449
|
||||
确认号: 380198588
|
||||
数据偏移: 8
|
||||
标志位: 0x10
|
||||
窗口大小: 16380
|
||||
```
|
||||
|
||||
### 完整的源代码和资源
|
||||
|
||||
- **源代码仓库:** [GitHub - bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial)
|
||||
- **教程网站:** [eunomia.dev Tutorials](https://eunomia.dev/tutorials/)
|
||||
|
||||
## 总结与结论
|
||||
|
||||
在本教程中,我们探讨了如何使用 eBPF 和 XDP 在 Linux 内核中直接捕获 TCP 头信息。通过分析内核 eBPF 代码和用户空间应用程序,我们学习了如何拦截数据包、提取关键的 TCP 字段,并使用环形缓冲区高效地将这些数据传递到用户空间。
|
||||
|
||||
这种方法为传统的数据包捕获方法提供了一种高性能的替代方案,对系统资源的影响最小。它是网络监控、安全分析和调试的强大技术。
|
||||
|
||||
如果您想了解更多关于 eBPF 的内容,请访问我们的教程代码仓库 <https://github.com/eunomia-bpf/bpf-developer-tutorial> 或我们的网站 <https://eunomia.dev/tutorials/>。
|
||||
|
||||
编程愉快!
|
||||
508
src/41-xdp-tcpdump/README_en.md
Normal file
508
src/41-xdp-tcpdump/README_en.md
Normal file
@@ -0,0 +1,508 @@
|
||||
# eBPF Tutorial by Example: Capturing TCP Information with XDP
|
||||
|
||||
Extended Berkeley Packet Filter (eBPF) is a revolutionary technology in the Linux kernel that allows developers to run sandboxed programs within the kernel space. It enables powerful networking, security, and tracing capabilities without the need to modify the kernel source code or load kernel modules. This tutorial focuses on using eBPF with the Express Data Path (XDP) to capture TCP header information directly from network packets at the earliest point of ingress.
|
||||
|
||||
## Capturing TCP Headers with XDP
|
||||
|
||||
Capturing network packets is essential for monitoring, debugging, and securing network communications. Traditional tools like `tcpdump` operate in user space and can incur significant overhead. By leveraging eBPF and XDP, we can capture TCP header information directly within the kernel, minimizing overhead and improving performance.
|
||||
|
||||
In this tutorial, we'll develop an XDP program that intercepts incoming TCP packets and extracts their header information. We'll store this data in a ring buffer, which a user-space program will read and display in a human-readable format.
|
||||
|
||||
### Why Use XDP for Packet Capturing?
|
||||
|
||||
XDP is a high-performance data path within the Linux kernel that allows for programmable packet processing at the lowest level of the network stack. By attaching an eBPF program to XDP, we can process packets immediately as they arrive, reducing latency and improving efficiency.
|
||||
|
||||
## Kernel eBPF Code Analysis
|
||||
|
||||
Let's dive into the kernel-space eBPF code that captures TCP header information.
|
||||
|
||||
### Full Kernel Code
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
|
||||
#define ETH_P_IP 0x0800
|
||||
|
||||
// Define the ring buffer map
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
__uint(max_entries, 1 << 24); // 16 MB buffer
|
||||
} rb SEC(".maps");
|
||||
|
||||
// Helper function to check if the packet is TCP
|
||||
static bool is_tcp(struct ethhdr *eth, void *data_end)
|
||||
{
|
||||
// Ensure Ethernet header is within bounds
|
||||
if ((void *)(eth + 1) > data_end)
|
||||
return false;
|
||||
|
||||
// Only handle IPv4 packets
|
||||
if (bpf_ntohs(eth->h_proto) != ETH_P_IP)
|
||||
return false;
|
||||
|
||||
struct iphdr *ip = (struct iphdr *)(eth + 1);
|
||||
|
||||
// Ensure IP header is within bounds
|
||||
if ((void *)(ip + 1) > data_end)
|
||||
return false;
|
||||
|
||||
// Check if the protocol is TCP
|
||||
if (ip->protocol != IPPROTO_TCP)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_pass(struct xdp_md *ctx)
|
||||
{
|
||||
// Pointers to packet data
|
||||
void *data = (void *)(long)ctx->data;
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
|
||||
// Parse Ethernet header
|
||||
struct ethhdr *eth = data;
|
||||
|
||||
// Check if the packet is a TCP packet
|
||||
if (!is_tcp(eth, data_end)) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Cast to IP header
|
||||
struct iphdr *ip = (struct iphdr *)(eth + 1);
|
||||
|
||||
// Calculate IP header length
|
||||
int ip_hdr_len = ip->ihl * 4;
|
||||
if (ip_hdr_len < sizeof(struct iphdr)) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Ensure IP header is within packet bounds
|
||||
if ((void *)ip + ip_hdr_len > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Parse TCP header
|
||||
struct tcphdr *tcp = (struct tcphdr *)((unsigned char *)ip + ip_hdr_len);
|
||||
|
||||
// Ensure TCP header is within packet bounds
|
||||
if ((void *)(tcp + 1) > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Define the number of bytes you want to capture from the TCP header
|
||||
const int tcp_header_bytes = 32;
|
||||
|
||||
// Ensure that the desired number of bytes does not exceed packet bounds
|
||||
if ((void *)tcp + tcp_header_bytes > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Reserve space in the ring buffer
|
||||
void *ringbuf_space = bpf_ringbuf_reserve(&rb, tcp_header_bytes, 0);
|
||||
if (!ringbuf_space) {
|
||||
return XDP_PASS; // If reservation fails, skip processing
|
||||
}
|
||||
|
||||
// Copy the TCP header bytes into the ring buffer
|
||||
// Using a loop to ensure compliance with eBPF verifier
|
||||
for (int i = 0; i < tcp_header_bytes; i++) {
|
||||
unsigned char byte = *((unsigned char *)tcp + i);
|
||||
((unsigned char *)ringbuf_space)[i] = byte;
|
||||
}
|
||||
|
||||
// Submit the data to the ring buffer
|
||||
bpf_ringbuf_submit(ringbuf_space, 0);
|
||||
|
||||
// Optional: Print a debug message
|
||||
bpf_printk("Captured TCP header (%d bytes)", tcp_header_bytes);
|
||||
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
char __license[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
### Code Explanation
|
||||
|
||||
#### Defining the Ring Buffer Map
|
||||
|
||||
We define a ring buffer map named `rb` to pass data from the kernel to user space efficiently.
|
||||
|
||||
```c
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
__uint(max_entries, 1 << 24); // 16 MB buffer
|
||||
} rb SEC(".maps");
|
||||
```
|
||||
|
||||
#### Packet Parsing and Validation
|
||||
|
||||
The `is_tcp` helper function checks whether the incoming packet is a TCP packet by verifying the Ethernet and IP headers.
|
||||
|
||||
```c
|
||||
static bool is_tcp(struct ethhdr *eth, void *data_end)
|
||||
{
|
||||
// ... (checks omitted for brevity)
|
||||
}
|
||||
```
|
||||
|
||||
#### Capturing TCP Header Information
|
||||
|
||||
In the `xdp_pass` function, we:
|
||||
|
||||
1. Parse the Ethernet, IP, and TCP headers.
|
||||
2. Ensure all headers are within the packet bounds to prevent invalid memory access.
|
||||
3. Reserve space in the ring buffer to store the TCP header.
|
||||
4. Copy the TCP header bytes into the ring buffer.
|
||||
5. Submit the data to the ring buffer for user-space consumption.
|
||||
|
||||
```c
|
||||
// Reserve space in the ring buffer
|
||||
void *ringbuf_space = bpf_ringbuf_reserve(&rb, tcp_header_bytes, 0);
|
||||
if (!ringbuf_space) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Copy the TCP header bytes
|
||||
for (int i = 0; i < tcp_header_bytes; i++) {
|
||||
unsigned char byte = *((unsigned char *)tcp + i);
|
||||
((unsigned char *)ringbuf_space)[i] = byte;
|
||||
}
|
||||
|
||||
// Submit to ring buffer
|
||||
bpf_ringbuf_submit(ringbuf_space, 0);
|
||||
```
|
||||
|
||||
#### Using bpf_printk for Debugging
|
||||
|
||||
The `bpf_printk` function logs messages to the kernel's trace pipe, which can be invaluable for debugging.
|
||||
|
||||
```c
|
||||
bpf_printk("Captured TCP header (%d bytes)", tcp_header_bytes);
|
||||
```
|
||||
|
||||
## User-Space Code Analysis
|
||||
|
||||
Let's examine the user-space program that reads the captured TCP headers from the ring buffer and displays them.
|
||||
|
||||
### Full User-Space Code
|
||||
|
||||
```c
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <errno.h>
|
||||
#include <unistd.h>
|
||||
#include <net/if.h>
|
||||
|
||||
#include <bpf/libbpf.h>
|
||||
#include <bpf/bpf.h>
|
||||
|
||||
#include "xdp-tcpdump.skel.h" // Generated skeleton header
|
||||
|
||||
// Callback function to handle events from the ring buffer
|
||||
static int handle_event(void *ctx, void *data, size_t data_sz)
|
||||
{
|
||||
if (data_sz < 20) { // Minimum TCP header size
|
||||
fprintf(stderr, "Received incomplete TCP header\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Parse the raw TCP header bytes
|
||||
struct tcphdr {
|
||||
uint16_t source;
|
||||
uint16_t dest;
|
||||
uint32_t seq;
|
||||
uint32_t ack_seq;
|
||||
uint16_t res1:4,
|
||||
doff:4,
|
||||
fin:1,
|
||||
syn:1,
|
||||
rst:1,
|
||||
psh:1,
|
||||
ack:1,
|
||||
urg:1,
|
||||
ece:1,
|
||||
cwr:1;
|
||||
uint16_t window;
|
||||
uint16_t check;
|
||||
uint16_t urg_ptr;
|
||||
// Options and padding may follow
|
||||
} __attribute__((packed));
|
||||
|
||||
if (data_sz < sizeof(struct tcphdr)) {
|
||||
fprintf(stderr, "Data size (%zu) less than TCP header size\n", data_sz);
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct tcphdr *tcp = (struct tcphdr *)data;
|
||||
|
||||
// Convert fields from network byte order to host byte order
|
||||
uint16_t source_port = ntohs(tcp->source);
|
||||
uint16_t dest_port = ntohs(tcp->dest);
|
||||
uint32_t seq = ntohl(tcp->seq);
|
||||
uint32_t ack_seq = ntohl(tcp->ack_seq);
|
||||
uint16_t window = ntohs(tcp->window);
|
||||
|
||||
// Extract flags
|
||||
uint8_t flags = 0;
|
||||
flags |= (tcp->fin) ? 0x01 : 0x00;
|
||||
flags |= (tcp->syn) ? 0x02 : 0x00;
|
||||
flags |= (tcp->rst) ? 0x04 : 0x00;
|
||||
flags |= (tcp->psh) ? 0x08 : 0x00;
|
||||
flags |= (tcp->ack) ? 0x10 : 0x00;
|
||||
flags |= (tcp->urg) ? 0x20 : 0x00;
|
||||
flags |= (tcp->ece) ? 0x40 : 0x00;
|
||||
flags |= (tcp->cwr) ? 0x80 : 0x00;
|
||||
|
||||
printf("Captured TCP Header:\n");
|
||||
printf(" Source Port: %u\n", source_port);
|
||||
printf(" Destination Port: %u\n", dest_port);
|
||||
printf(" Sequence Number: %u\n", seq);
|
||||
printf(" Acknowledgment Number: %u\n", ack_seq);
|
||||
printf(" Data Offset: %u\n", tcp->doff);
|
||||
printf(" Flags: 0x%02x\n", flags);
|
||||
printf(" Window Size: %u\n", window);
|
||||
printf("\n");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
struct xdp_tcpdump_bpf *skel;
|
||||
struct ring_buffer *rb = NULL;
|
||||
int ifindex;
|
||||
int err;
|
||||
|
||||
if (argc != 2)
|
||||
{
|
||||
fprintf(stderr, "Usage: %s <ifname>\n", argv[0]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
const char *ifname = argv[1];
|
||||
ifindex = if_nametoindex(ifname);
|
||||
if (ifindex == 0)
|
||||
{
|
||||
fprintf(stderr, "Invalid interface name %s\n", ifname);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Open and load BPF application */
|
||||
skel = xdp_tcpdump_bpf__open();
|
||||
if (!skel)
|
||||
{
|
||||
fprintf(stderr, "Failed to open BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Load & verify BPF programs */
|
||||
err = xdp_tcpdump_bpf__load(skel);
|
||||
if (err)
|
||||
{
|
||||
fprintf(stderr, "Failed to load and verify BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
/* Attach XDP program */
|
||||
err = xdp_tcpdump_bpf__attach(skel);
|
||||
if (err)
|
||||
{
|
||||
fprintf(stderr, "Failed to attach BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
/* Attach the XDP program to the specified interface */
|
||||
skel->links.xdp_pass = bpf_program__attach_xdp(skel->progs.xdp_pass, ifindex);
|
||||
if (!skel->links.xdp_pass)
|
||||
{
|
||||
err = -errno;
|
||||
fprintf(stderr, "Failed to attach XDP program: %s\n", strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
printf("Successfully attached XDP program to interface %s\n", ifname);
|
||||
|
||||
/* Set up ring buffer polling */
|
||||
rb = ring_buffer__new(bpf_map__fd(skel->maps.rb), handle_event, NULL, NULL);
|
||||
if (!rb)
|
||||
{
|
||||
fprintf(stderr, "Failed to create ring buffer\n");
|
||||
err = -1;
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
printf("Start polling ring buffer\n");
|
||||
|
||||
/* Poll the ring buffer */
|
||||
while (1)
|
||||
{
|
||||
err = ring_buffer__poll(rb, -1);
|
||||
if (err == -EINTR)
|
||||
continue;
|
||||
if (err < 0)
|
||||
{
|
||||
fprintf(stderr, "Error polling ring buffer: %d\n", err);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
cleanup:
|
||||
ring_buffer__free(rb);
|
||||
xdp_tcpdump_bpf__destroy(skel);
|
||||
return -err;
|
||||
}
|
||||
```
|
||||
|
||||
### Code Explanation
|
||||
|
||||
#### Handling Ring Buffer Events
|
||||
|
||||
The `handle_event` function processes TCP header data received from the ring buffer.
|
||||
|
||||
```c
|
||||
static int handle_event(void *ctx, void *data, size_t data_sz)
|
||||
{
|
||||
// Validate data size
|
||||
if (data_sz < 20) {
|
||||
fprintf(stderr, "Received incomplete TCP header\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Parse the TCP header
|
||||
// ... (parsing code)
|
||||
}
|
||||
```
|
||||
|
||||
#### Parsing the TCP Header
|
||||
|
||||
We define a local `tcphdr` structure to interpret the raw bytes.
|
||||
|
||||
```c
|
||||
struct tcphdr {
|
||||
uint16_t source;
|
||||
uint16_t dest;
|
||||
uint32_t seq;
|
||||
uint32_t ack_seq;
|
||||
// ... (other fields)
|
||||
} __attribute__((packed));
|
||||
```
|
||||
|
||||
#### Displaying Captured Information
|
||||
|
||||
After parsing, we print the TCP header fields in a readable format.
|
||||
|
||||
```c
|
||||
printf("Captured TCP Header:\n");
|
||||
printf(" Source Port: %u\n", source_port);
|
||||
printf(" Destination Port: %u\n", dest_port);
|
||||
// ... (other fields)
|
||||
```
|
||||
|
||||
#### Setting Up the eBPF Skeleton
|
||||
|
||||
We use the generated skeleton `xdp-tcpdump.skel.h` to load and attach the eBPF program.
|
||||
|
||||
```c
|
||||
/* Open and load BPF application */
|
||||
skel = xdp_tcpdump_bpf__open();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "Failed to open BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Load & verify BPF programs */
|
||||
err = xdp_tcpdump_bpf__load(skel);
|
||||
if (err) {
|
||||
fprintf(stderr, "Failed to load and verify BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
```
|
||||
|
||||
#### Attaching to the Network Interface
|
||||
|
||||
We attach the XDP program to the specified network interface by name.
|
||||
|
||||
```c
|
||||
skel->links.xdp_pass = bpf_program__attach_xdp(skel->progs.xdp_pass, ifindex);
|
||||
if (!skel->links.xdp_pass) {
|
||||
err = -errno;
|
||||
fprintf(stderr, "Failed to attach XDP program: %s\n", strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
```
|
||||
|
||||
## Compilation and Execution Instructions
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- A Linux system with a kernel version that supports eBPF and XDP.
|
||||
- libbpf library installed.
|
||||
- Compiler with eBPF support (clang).
|
||||
|
||||
### Building the Program
|
||||
|
||||
Assuming you have cloned the repository from [GitHub](https://github.com/eunomia-bpf/bpf-developer-tutorial), navigate to the `bpf-developer-tutorial/src/41-xdp-tcpdump` directory.
|
||||
|
||||
```bash
|
||||
cd bpf-developer-tutorial/src/41-xdp-tcpdump
|
||||
make
|
||||
```
|
||||
|
||||
This command compiles both the kernel eBPF code and the user-space application.
|
||||
|
||||
### Running the Program
|
||||
|
||||
First, identify your network interfaces:
|
||||
|
||||
```bash
|
||||
ifconfig
|
||||
```
|
||||
|
||||
Sample output:
|
||||
|
||||
```
|
||||
wlp0s20f3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
|
||||
inet 192.168.1.10 netmask 255.255.255.0 broadcast 192.168.1.255
|
||||
ether 00:1a:2b:3c:4d:5e txqueuelen 1000 (Ethernet)
|
||||
```
|
||||
|
||||
Run the user-space program with the desired network interface:
|
||||
|
||||
```bash
|
||||
sudo ./xdp-tcpdump wlp0s20f3
|
||||
```
|
||||
|
||||
Sample output:
|
||||
|
||||
```
|
||||
Successfully attached XDP program to interface wlp0s20f3
|
||||
Start polling ring buffer
|
||||
Captured TCP Header:
|
||||
Source Port: 443
|
||||
Destination Port: 53500
|
||||
Sequence Number: 572012449
|
||||
Acknowledgment Number: 380198588
|
||||
Data Offset: 8
|
||||
Flags: 0x10
|
||||
Window Size: 16380
|
||||
```
|
||||
|
||||
### Complete Source Code and Resources
|
||||
|
||||
- **Source Code Repository:** [GitHub - bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial)
|
||||
- **Tutorial Website:** [eunomia.dev Tutorials](https://eunomia.dev/tutorials/)
|
||||
|
||||
## Summary and Conclusion
|
||||
|
||||
In this tutorial, we explored how to use eBPF and XDP to capture TCP header information directly within the Linux kernel. By analyzing both the kernel eBPF code and the user-space application, we learned how to intercept packets, extract essential TCP fields, and communicate this data to user space efficiently using a ring buffer.
|
||||
|
||||
This approach offers a high-performance alternative to traditional packet capturing methods, with minimal impact on system resources. It's a powerful technique for network monitoring, security analysis, and debugging.
|
||||
|
||||
If you would like to learn more about eBPF, visit our tutorial code repository at <https://github.com/eunomia-bpf/bpf-developer-tutorial> or our website at <https://eunomia.dev/tutorials/>.
|
||||
|
||||
Happy coding!
|
||||
107
src/41-xdp-tcpdump/xdp-tcpdump.bpf.c
Normal file
107
src/41-xdp-tcpdump/xdp-tcpdump.bpf.c
Normal file
@@ -0,0 +1,107 @@
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
|
||||
#define ETH_P_IP 0x0800
|
||||
|
||||
// Define the ring buffer map
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_RINGBUF);
|
||||
__uint(max_entries, 1 << 24); // 16 MB buffer
|
||||
} rb SEC(".maps");
|
||||
|
||||
// Helper function to check if the packet is TCP
|
||||
static bool is_tcp(struct ethhdr *eth, void *data_end)
|
||||
{
|
||||
// Ensure Ethernet header is within bounds
|
||||
if ((void *)(eth + 1) > data_end)
|
||||
return false;
|
||||
|
||||
// Only handle IPv4 packets
|
||||
if (bpf_ntohs(eth->h_proto) != ETH_P_IP)
|
||||
return false;
|
||||
|
||||
struct iphdr *ip = (struct iphdr *)(eth + 1);
|
||||
|
||||
// Ensure IP header is within bounds
|
||||
if ((void *)(ip + 1) > data_end)
|
||||
return false;
|
||||
|
||||
// Check if the protocol is TCP
|
||||
if (ip->protocol != IPPROTO_TCP)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_pass(struct xdp_md *ctx)
|
||||
{
|
||||
// Pointers to packet data
|
||||
void *data = (void *)(long)ctx->data;
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
|
||||
// Parse Ethernet header
|
||||
struct ethhdr *eth = data;
|
||||
|
||||
// Check if the packet is a TCP packet
|
||||
if (!is_tcp(eth, data_end)) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Cast to IP header
|
||||
struct iphdr *ip = (struct iphdr *)(eth + 1);
|
||||
|
||||
// Calculate IP header length
|
||||
int ip_hdr_len = ip->ihl * 4;
|
||||
if (ip_hdr_len < sizeof(struct iphdr)) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Ensure IP header is within packet bounds
|
||||
if ((void *)ip + ip_hdr_len > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Parse TCP header
|
||||
struct tcphdr *tcp = (struct tcphdr *)((unsigned char *)ip + ip_hdr_len);
|
||||
|
||||
// Ensure TCP header is within packet bounds
|
||||
if ((void *)(tcp + 1) > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Define the number of bytes you want to capture from the TCP header
|
||||
// Typically, the TCP header is 20 bytes, but with options, it can be longer
|
||||
// Here, we'll capture the first 32 bytes to include possible options
|
||||
const int tcp_header_bytes = 32;
|
||||
|
||||
// Ensure that the desired number of bytes does not exceed packet bounds
|
||||
if ((void *)tcp + tcp_header_bytes > data_end) {
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
// Reserve space in the ring buffer
|
||||
void *ringbuf_space = bpf_ringbuf_reserve(&rb, tcp_header_bytes, 0);
|
||||
if (!ringbuf_space) {
|
||||
return XDP_PASS; // If reservation fails, skip processing
|
||||
}
|
||||
|
||||
// Copy the TCP header bytes into the ring buffer
|
||||
// Using a loop to ensure compliance with eBPF verifier
|
||||
for (int i = 0; i < tcp_header_bytes; i++) {
|
||||
// Accessing each byte safely within bounds
|
||||
unsigned char byte = *((unsigned char *)tcp + i);
|
||||
((unsigned char *)ringbuf_space)[i] = byte;
|
||||
}
|
||||
|
||||
// Submit the data to the ring buffer
|
||||
bpf_ringbuf_submit(ringbuf_space, 0);
|
||||
|
||||
// Optional: Print a debug message (will appear in kernel logs)
|
||||
bpf_printk("Captured TCP header (%d bytes)", tcp_header_bytes);
|
||||
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
char __license[] SEC("license") = "GPL";
|
||||
165
src/41-xdp-tcpdump/xdp-tcpdump.c
Normal file
165
src/41-xdp-tcpdump/xdp-tcpdump.c
Normal file
@@ -0,0 +1,165 @@
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <errno.h>
|
||||
#include <unistd.h>
|
||||
#include <net/if.h>
|
||||
|
||||
#include <bpf/libbpf.h>
|
||||
#include <bpf/bpf.h>
|
||||
|
||||
#include "xdp-tcpdump.skel.h" // Generated skeleton header
|
||||
|
||||
// Callback function to handle events from the ring buffer
|
||||
static int handle_event(void *ctx, void *data, size_t data_sz)
|
||||
{
|
||||
if (data_sz < 20) { // Minimum TCP header size
|
||||
fprintf(stderr, "Received incomplete TCP header\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Parse the raw TCP header bytes
|
||||
struct tcphdr {
|
||||
uint16_t source;
|
||||
uint16_t dest;
|
||||
uint32_t seq;
|
||||
uint32_t ack_seq;
|
||||
uint16_t res1:4,
|
||||
doff:4,
|
||||
fin:1,
|
||||
syn:1,
|
||||
rst:1,
|
||||
psh:1,
|
||||
ack:1,
|
||||
urg:1,
|
||||
ece:1,
|
||||
cwr:1;
|
||||
uint16_t window;
|
||||
uint16_t check;
|
||||
uint16_t urg_ptr;
|
||||
// Options and padding may follow
|
||||
} __attribute__((packed));
|
||||
|
||||
if (data_sz < sizeof(struct tcphdr)) {
|
||||
fprintf(stderr, "Data size (%zu) less than TCP header size\n", data_sz);
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct tcphdr *tcp = (struct tcphdr *)data;
|
||||
|
||||
// Convert fields from network byte order to host byte order
|
||||
uint16_t source_port = ntohs(tcp->source);
|
||||
uint16_t dest_port = ntohs(tcp->dest);
|
||||
uint32_t seq = ntohl(tcp->seq);
|
||||
uint32_t ack_seq = ntohl(tcp->ack_seq);
|
||||
uint16_t window = ntohs(tcp->window);
|
||||
|
||||
// Extract flags
|
||||
uint8_t flags = 0;
|
||||
flags |= (tcp->fin) ? 0x01 : 0x00;
|
||||
flags |= (tcp->syn) ? 0x02 : 0x00;
|
||||
flags |= (tcp->rst) ? 0x04 : 0x00;
|
||||
flags |= (tcp->psh) ? 0x08 : 0x00;
|
||||
flags |= (tcp->ack) ? 0x10 : 0x00;
|
||||
flags |= (tcp->urg) ? 0x20 : 0x00;
|
||||
flags |= (tcp->ece) ? 0x40 : 0x00;
|
||||
flags |= (tcp->cwr) ? 0x80 : 0x00;
|
||||
|
||||
printf("Captured TCP Header:\n");
|
||||
printf(" Source Port: %u\n", source_port);
|
||||
printf(" Destination Port: %u\n", dest_port);
|
||||
printf(" Sequence Number: %u\n", seq);
|
||||
printf(" Acknowledgment Number: %u\n", ack_seq);
|
||||
printf(" Data Offset: %u\n", tcp->doff);
|
||||
printf(" Flags: 0x%02x\n", flags);
|
||||
printf(" Window Size: %u\n", window);
|
||||
printf("\n");
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
struct xdp_tcpdump_bpf *skel;
|
||||
struct ring_buffer *rb = NULL;
|
||||
int ifindex;
|
||||
int err;
|
||||
|
||||
if (argc != 2)
|
||||
{
|
||||
fprintf(stderr, "Usage: %s <ifname>\n", argv[0]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
const char *ifname = argv[1];
|
||||
ifindex = if_nametoindex(ifname);
|
||||
if (ifindex == 0)
|
||||
{
|
||||
fprintf(stderr, "Invalid interface name %s\n", ifname);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Open and load BPF application */
|
||||
skel = xdp_tcpdump_bpf__open();
|
||||
if (!skel)
|
||||
{
|
||||
fprintf(stderr, "Failed to open BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Load & verify BPF programs */
|
||||
err = xdp_tcpdump_bpf__load(skel);
|
||||
if (err)
|
||||
{
|
||||
fprintf(stderr, "Failed to load and verify BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
/* Attach XDP program */
|
||||
err = xdp_tcpdump_bpf__attach(skel);
|
||||
if (err)
|
||||
{
|
||||
fprintf(stderr, "Failed to attach BPF skeleton: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
/* Attach the XDP program to the specified interface */
|
||||
skel->links.xdp_pass = bpf_program__attach_xdp(skel->progs.xdp_pass, ifindex);
|
||||
if (!skel->links.xdp_pass)
|
||||
{
|
||||
err = -errno;
|
||||
fprintf(stderr, "Failed to attach XDP program: %s\n", strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
printf("Successfully attached XDP program to interface %s\n", ifname);
|
||||
|
||||
/* Set up ring buffer polling */
|
||||
rb = ring_buffer__new(bpf_map__fd(skel->maps.rb), handle_event, NULL, NULL);
|
||||
if (!rb)
|
||||
{
|
||||
fprintf(stderr, "Failed to create ring buffer\n");
|
||||
err = -1;
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
printf("Start polling ring buffer\n");
|
||||
|
||||
/* Poll the ring buffer */
|
||||
while (1)
|
||||
{
|
||||
err = ring_buffer__poll(rb, -1);
|
||||
if (err == -EINTR)
|
||||
continue;
|
||||
if (err < 0)
|
||||
{
|
||||
fprintf(stderr, "Error polling ring buffer: %d\n", err);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
cleanup:
|
||||
ring_buffer__free(rb);
|
||||
xdp_tcpdump_bpf__destroy(skel);
|
||||
return -err;
|
||||
}
|
||||
@@ -1 +0,0 @@
|
||||
|
||||
527
src/42-xdp-loadbalancer/README.md
Normal file
527
src/42-xdp-loadbalancer/README.md
Normal file
@@ -0,0 +1,527 @@
|
||||
# eBPF 开发者教程: 简单的 XDP 负载均衡器
|
||||
|
||||
在本教程中,我们将指导您如何使用eBPF(扩展的Berkeley Packet Filter)实现一个简单的XDP(eXpress Data Path)负载均衡器。只需使用C语言和libbpf库,无需外部依赖,这是一个适合开发者的实践指南,帮助您充分利用Linux内核的强大功能来构建高效的网络应用程序。
|
||||
|
||||
## 为什么选择XDP?
|
||||
|
||||
`XDP`(eXpress Data Path)是Linux中的一个高速、内核级网络框架,它允许在网络堆栈的最早阶段,即在网络接口卡(NIC)上处理数据包。这使得XDP可以进行超低延迟和高吞吐量的数据包处理,非常适合用于负载均衡、DDoS保护和流量过滤等任务。
|
||||
|
||||
XDP的关键特性:
|
||||
|
||||
1. **快速数据包处理**:XDP直接在网络接口卡(NIC)级别处理数据包,减少了延迟,并通过避免通常的网络堆栈开销来提高性能。
|
||||
2. **高效**:由于在数据包进入内核之前处理它们,XDP最大限度地减少了CPU使用率,能够在高流量负载下保持系统的快速响应。
|
||||
3. **可定制的eBPF**:XDP程序使用eBPF编写,允许您为特定的用例创建自定义的数据包处理逻辑,例如丢弃、重定向或转发数据包。
|
||||
4. **低CPU开销**:支持零拷贝数据包转发,XDP占用更少的系统资源,非常适合在最少CPU负载的情况下处理高流量。
|
||||
5. **简单操作**:XDP程序返回预定义的操作,例如丢弃、通过或重定向数据包,提供对流量处理的控制。
|
||||
|
||||
### 使用XDP的项目
|
||||
|
||||
- `Cilium` 是一个为云原生环境(如Kubernetes)设计的开源网络工具。它使用XDP高效处理数据包过滤和负载均衡,提升了高流量网络中的性能。
|
||||
- `Katran` 由Facebook开发,是一个负载均衡器,它使用XDP处理数百万的连接,且CPU使用率低。它高效地将流量分发到服务器,在Facebook内部被用于大规模的网络环境。
|
||||
- `Cloudflare` 使用XDP来防御DDoS攻击。通过在NIC级别过滤恶意流量,Cloudflare可以在攻击数据包进入内核之前将其丢弃,最大限度地减少对网络的影响。
|
||||
|
||||
### 为什么选择XDP而不是其他方法?
|
||||
|
||||
与传统工具如`iptables`或`tc`相比,XDP具有以下优势:
|
||||
|
||||
- **速度**:它直接在NIC驱动程序中操作,数据包处理速度远快于传统方法。
|
||||
- **灵活性**:通过eBPF,您可以编写自定义的数据包处理逻辑,以满足特定需求。
|
||||
- **效率**:XDP使用更少的资源,非常适合需要处理高流量而不使系统过载的环境。
|
||||
|
||||
## 项目:构建一个简单的负载均衡器
|
||||
|
||||
在本项目中,我们将专注于使用XDP构建一个负载均衡器。负载均衡器通过将传入的网络流量高效地分发到多个后端服务器,防止单个服务器过载。结合XDP和eBPF,我们可以构建一个运行在Linux网络堆栈边缘的负载均衡器,确保即使在高流量情况下也能保持高性能。
|
||||
|
||||
我们将实现的负载均衡器将具备以下功能:
|
||||
|
||||
- 监听传入的网络数据包。
|
||||
- 根据数据包的源IP和端口计算哈希值,从而将流量分发到多个后端服务器。
|
||||
- 根据计算出的哈希值将数据包转发到相应的后端服务器。
|
||||
|
||||
我们将保持设计简单但强大,向您展示如何利用eBPF的能力来创建一个轻量级的负载均衡解决方案。
|
||||
|
||||
## kernel eBPF code
|
||||
|
||||
```c
|
||||
// xdp_lb.bpf.c
|
||||
#include <bpf/bpf_endian.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <linux/if_ether.h>
|
||||
#include <linux/ip.h>
|
||||
#include <linux/in.h>
|
||||
#include <linux/tcp.h>
|
||||
#include "xx_hash.h"
|
||||
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[ETH_ALEN];
|
||||
};
|
||||
|
||||
// Backend IP and MAC address map
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__uint(max_entries, 2); // Two backends
|
||||
__type(key, __u32);
|
||||
__type(value, struct backend_config);
|
||||
} backends SEC(".maps");
|
||||
|
||||
int client_ip = bpf_htonl(0xa000001);
|
||||
unsigned char client_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x1};
|
||||
int load_balancer_ip = bpf_htonl(0xa00000a);
|
||||
unsigned char load_balancer_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x10};
|
||||
|
||||
static __always_inline __u16
|
||||
csum_fold_helper(__u64 csum)
|
||||
{
|
||||
int i;
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
if (csum >> 16)
|
||||
csum = (csum & 0xffff) + (csum >> 16);
|
||||
}
|
||||
return ~csum;
|
||||
}
|
||||
|
||||
static __always_inline __u16
|
||||
iph_csum(struct iphdr *iph)
|
||||
{
|
||||
iph->check = 0;
|
||||
unsigned long long csum = bpf_csum_diff(0, 0, (unsigned int *)iph, sizeof(struct iphdr), 0);
|
||||
return csum_fold_helper(csum);
|
||||
}
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_load_balancer(struct xdp_md *ctx) {
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
void *data = (void *)(long)ctx->data;
|
||||
|
||||
bpf_printk("xdp_load_balancer received packet");
|
||||
|
||||
// Ethernet header
|
||||
struct ethhdr *eth = data;
|
||||
if ((void *)(eth + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
// Check if the packet is IP (IPv4)
|
||||
if (eth->h_proto != __constant_htons(ETH_P_IP))
|
||||
return XDP_PASS;
|
||||
|
||||
// IP header
|
||||
struct iphdr *iph = (struct iphdr *)(eth + 1);
|
||||
if ((void *)(iph + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
// Check if the protocol is TCP or UDP
|
||||
if (iph->protocol != IPPROTO_TCP)
|
||||
return XDP_PASS;
|
||||
|
||||
bpf_printk("Received Source IP: 0x%x", bpf_ntohl(iph->saddr));
|
||||
bpf_printk("Received Destination IP: 0x%x", bpf_ntohl(iph->daddr));
|
||||
bpf_printk("Received Source MAC: %x:%x:%x:%x:%x:%x", eth->h_source[0], eth->h_source[1], eth->h_source[2], eth->h_source[3], eth->h_source[4], eth->h_source[5]);
|
||||
bpf_printk("Received Destination MAC: %x:%x:%x:%x:%x:%x", eth->h_dest[0], eth->h_dest[1], eth->h_dest[2], eth->h_dest[3], eth->h_dest[4], eth->h_dest[5]);
|
||||
|
||||
if (iph->saddr == client_ip)
|
||||
{
|
||||
bpf_printk("Packet from client");
|
||||
|
||||
__u32 key = xxhash32((const char*)iph, sizeof(struct iphdr), 0) % 2;
|
||||
|
||||
struct backend_config *backend = bpf_map_lookup_elem(&backends, &key);
|
||||
if (!backend)
|
||||
return XDP_PASS;
|
||||
|
||||
iph->daddr = backend->ip;
|
||||
__builtin_memcpy(eth->h_dest, backend->mac, ETH_ALEN);
|
||||
}
|
||||
else
|
||||
{
|
||||
bpf_printk("Packet from backend");
|
||||
iph->daddr = client_ip;
|
||||
__builtin_memcpy(eth->h_dest, client_mac, ETH_ALEN);
|
||||
}
|
||||
|
||||
// Update IP source address to the load balancer's IP
|
||||
iph->saddr = load_balancer_ip;
|
||||
// Update Ethernet source MAC address to the current lb's MAC
|
||||
__builtin_memcpy(eth->h_source, load_balancer_mac, ETH_ALEN);
|
||||
|
||||
// Recalculate IP checksum
|
||||
iph->check = iph_csum(iph);
|
||||
|
||||
bpf_printk("Redirecting packet to new IP 0x%x from IP 0x%x",
|
||||
bpf_ntohl(iph->daddr),
|
||||
bpf_ntohl(iph->saddr)
|
||||
);
|
||||
bpf_printk("New Dest MAC: %x:%x:%x:%x:%x:%x", eth->h_dest[0], eth->h_dest[1], eth->h_dest[2], eth->h_dest[3], eth->h_dest[4], eth->h_dest[5]);
|
||||
bpf_printk("New Source MAC: %x:%x:%x:%x:%x:%x\n", eth->h_source[0], eth->h_source[1], eth->h_source[2], eth->h_source[3], eth->h_source[4], eth->h_source[5]);
|
||||
// Return XDP_TX to transmit the modified packet back to the network
|
||||
return XDP_TX;
|
||||
}
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
## 内核代码关键部分解读
|
||||
|
||||
### 1. **头文件和数据结构**
|
||||
|
||||
代码首先包含了一些必要的头文件,例如 `<bpf/bpf_helpers.h>`、`<linux/if_ether.h>`、`<linux/ip.h>` 等。这些头文件提供了处理以太网帧、IP 数据包以及 BPF 辅助函数的定义。
|
||||
|
||||
`backend_config` 结构体被定义用于存储后端服务器的 IP 和 MAC 地址。这将在负载均衡逻辑中用于根据流量分配规则路由数据包。
|
||||
|
||||
```c
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[ETH_ALEN];
|
||||
};
|
||||
```
|
||||
|
||||
### 2. **后端和负载均衡器配置**
|
||||
|
||||
代码定义了一个名为 `backends` 的 eBPF map,用于存储两个后端的 IP 和 MAC 地址。`BPF_MAP_TYPE_ARRAY` 类型用于存储后端的配置信息,`max_entries` 设置为 2,表示该负载均衡器将把流量分配给两个后端服务器。
|
||||
|
||||
```c
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__uint(max_entries, 2);
|
||||
__type(key, __u32);
|
||||
__type(value, struct backend_config);
|
||||
} backends SEC(".maps");
|
||||
```
|
||||
|
||||
同时也预定义了客户端和负载均衡器的 IP 地址和 MAC 地址:
|
||||
|
||||
```c
|
||||
int client_ip = bpf_htonl(0xa000001);
|
||||
unsigned char client_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x1};
|
||||
int load_balancer_ip = bpf_htonl(0xa00000a);
|
||||
unsigned char load_balancer_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x10};
|
||||
```
|
||||
|
||||
### 3. **校验和函数**
|
||||
|
||||
`iph_csum()` 函数在修改数据包内容后重新计算 IP 头的校验和。在对头部进行任何修改时,确保 IP 数据包的完整性是至关重要的。
|
||||
|
||||
```c
|
||||
static __always_inline __u16 iph_csum(struct iphdr *iph) {
|
||||
iph->check = 0;
|
||||
unsigned long long csum = bpf_csum_diff(0, 0, (unsigned int *)iph, sizeof(struct iphdr), 0);
|
||||
return csum_fold_helper(csum);
|
||||
}
|
||||
```
|
||||
|
||||
### 4. **XDP 程序逻辑**
|
||||
|
||||
XDP 负载均衡器的核心逻辑在 `xdp_load_balancer` 函数中实现,该函数附加到 XDP 钩子上。它处理传入的数据包,并根据不同情况将数据包转发到后端或回传给客户端。
|
||||
|
||||
- **初始检查**:
|
||||
函数首先验证数据包是否是以太网帧,接着检查它是否是 IP 数据包(IPv4)并且使用了 TCP 协议。
|
||||
|
||||
```c
|
||||
if (eth->h_proto != __constant_htons(ETH_P_IP))
|
||||
return XDP_PASS;
|
||||
if (iph->protocol != IPPROTO_TCP)
|
||||
return XDP_PASS;
|
||||
```
|
||||
|
||||
- **客户端数据包处理**:
|
||||
如果源 IP 与客户端 IP 匹配,代码使用 `xxhash32` 对 IP 头进行哈希处理,以确定相应的后端(基于 key 对 2 取模)。
|
||||
|
||||
```c
|
||||
if (iph->saddr == client_ip) {
|
||||
__u32 key = xxhash32((const char*)iph, sizeof(struct iphdr), 0) % 2;
|
||||
struct backend_config *backend = bpf_map_lookup_elem(&backends, &key);
|
||||
```
|
||||
|
||||
之后将目标 IP 和 MAC 替换为选定的后端的值,并将数据包转发到后端。
|
||||
|
||||
- **后端数据包处理**:
|
||||
如果数据包来自后端服务器,代码将目标设置为客户端的 IP 和 MAC 地址,确保后端的响应数据包被正确地转发回客户端。
|
||||
|
||||
```c
|
||||
iph->daddr = client_ip;
|
||||
__builtin_memcpy(eth->h_dest, client_mac, ETH_ALEN);
|
||||
```
|
||||
|
||||
- **重写 IP 和 MAC 地址**:
|
||||
对于所有的出站数据包,源 IP 和 MAC 地址会被更新为负载均衡器的值,以确保在客户端与后端之间通信时,负载均衡器作为源进行标识。
|
||||
|
||||
```c
|
||||
iph->saddr = load_balancer_ip;
|
||||
__builtin_memcpy(eth->h_source, load_balancer_mac, ETH_ALEN);
|
||||
```
|
||||
|
||||
- **重新计算校验和**:
|
||||
修改 IP 头之后,使用之前定义的 `iph_csum()` 函数重新计算校验和。
|
||||
|
||||
```c
|
||||
iph->check = iph_csum(iph);
|
||||
```
|
||||
|
||||
- **最终动作**:
|
||||
使用 `XDP_TX` 动作发送数据包,这指示网卡将修改后的数据包传输出去。
|
||||
|
||||
```c
|
||||
return XDP_TX;
|
||||
```
|
||||
|
||||
### 5. **结论**
|
||||
|
||||
在这部分博客中,可以解释负载均衡器是如何通过检查源 IP、进行哈希计算来分配流量,并通过修改目标 IP 和 MAC 来确保数据包的转发。`XDP_TX` 动作是实现 eBPF 在 XDP 层中高速数据包处理的关键。
|
||||
|
||||
这一解释可以帮助读者理解数据包的流转过程,以及代码中每个部分在实现多个后端之间负载均衡的过程中所起的作用。
|
||||
|
||||
|
||||
## Userspace code
|
||||
|
||||
```c
|
||||
// xdp_lb.c
|
||||
#include <arpa/inet.h>
|
||||
#include <bpf/bpf.h>
|
||||
#include <bpf/libbpf.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
#include <net/if.h>
|
||||
#include "xdp_lb.skel.h" // The generated skeleton
|
||||
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[6];
|
||||
};
|
||||
|
||||
static int parse_mac(const char *str, unsigned char *mac) {
|
||||
if (sscanf(str, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
|
||||
&mac[0], &mac[1], &mac[2], &mac[3], &mac[4], &mac[5]) != 6) {
|
||||
fprintf(stderr, "Invalid MAC address format\n");
|
||||
return -1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
if (argc != 6) {
|
||||
fprintf(stderr, "Usage: %s <ifname> <backend1_ip> <backend1_mac> <backend2_ip> <backend2_mac>\n", argv[0]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
const char *ifname = argv[1];
|
||||
struct backend_config backend[2];
|
||||
|
||||
// Parse backend 1
|
||||
if (inet_pton(AF_INET, argv[2], &backend[0].ip) != 1) {
|
||||
fprintf(stderr, "Invalid backend 1 IP address\n");
|
||||
return 1;
|
||||
}
|
||||
if (parse_mac(argv[3], backend[0].mac) < 0) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Parse backend 2
|
||||
if (inet_pton(AF_INET, argv[4], &backend[1].ip) != 1) {
|
||||
fprintf(stderr, "Invalid backend 2 IP address\n");
|
||||
return 1;
|
||||
}
|
||||
if (parse_mac(argv[5], backend[1].mac) < 0) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Load and attach the BPF program
|
||||
struct xdp_lb_bpf *skel = xdp_lb_bpf__open_and_load();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "Failed to open and load BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
int ifindex = if_nametoindex(ifname);
|
||||
if (ifindex < 0) {
|
||||
perror("if_nametoindex");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (bpf_program__attach_xdp(skel->progs.xdp_load_balancer, ifindex) < 0) {
|
||||
fprintf(stderr, "Failed to attach XDP program\n");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Update backend configurations
|
||||
for (int i = 0; i < 2; i++) {
|
||||
if (bpf_map_update_elem(bpf_map__fd(skel->maps.backends), &i, &backend[i], 0) < 0) {
|
||||
perror("bpf_map_update_elem");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
printf("XDP load balancer configured with backends:\n");
|
||||
printf("Backend 1 - IP: %s, MAC: %s\n", argv[2], argv[3]);
|
||||
printf("Backend 2 - IP: %s, MAC: %s\n", argv[4], argv[5]);
|
||||
|
||||
printf("Press Ctrl+C to exit...\n");
|
||||
while (1) {
|
||||
sleep(1); // Keep the program running
|
||||
}
|
||||
|
||||
// Cleanup and detach
|
||||
bpf_xdp_detach(ifindex, 0, NULL);
|
||||
xdp_lb_bpf__detach(skel);
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
### 用户空间代码概述
|
||||
|
||||
提供的用户空间代码负责设置和配置在内核中运行的 XDP 负载均衡器程序。它接受命令行参数,加载 eBPF 程序,将其附加到网络接口,并更新后端服务器的配置信息。
|
||||
|
||||
### 1. **解析命令行参数和设置后端服务器**
|
||||
|
||||
程序期望五个命令行参数:网络接口的名称 (`ifname`)、两个后端服务器的 IP 地址和 MAC 地址。它通过 `inet_pton()` 函数解析 IP 地址,并使用 `parse_mac()` 函数解析 MAC 地址,确保提供的 MAC 地址格式正确。解析后的后端信息存储在 `backend_config` 结构体中。
|
||||
|
||||
### 2. **加载并附加 BPF 程序**
|
||||
|
||||
BPF skeleton(通过 `xdp_lb.skel.h` 生成)用于打开并将 XDP 程序加载到内核中。程序通过 `if_nametoindex()` 将网络接口名称转换为索引,然后使用 `bpf_program__attach_xdp()` 将加载的 BPF 程序附加到此接口上。
|
||||
|
||||
### 3. **配置后端服务器信息**
|
||||
|
||||
后端的 IP 和 MAC 地址被写入 `backends` BPF map 中,使用 `bpf_map_update_elem()` 函数。此步骤确保 BPF 程序能够访问后端配置,从而基于内核代码中的逻辑将数据包路由到正确的后端服务器。
|
||||
|
||||
### 4. **程序循环与清理**
|
||||
|
||||
程序进入无限循环(`while (1) { sleep(1); }`),使 XDP 程序保持运行。当用户通过按下 Ctrl+C 退出时,BPF 程序从网络接口上卸载,并通过调用 `xdp_lb_bpf__destroy()` 清理资源。
|
||||
|
||||
总的来说,这段用户空间代码负责配置和管理 XDP 负载均衡器的生命周期,使得可以动态更新后端配置,并确保负载均衡器正确附加到网络接口上。
|
||||
|
||||
### 测试环境拓扑
|
||||
|
||||
拓扑结构表示一个测试环境,其中本地机器通过负载均衡器与两个后端节点(h2 和 h3)通信。通过虚拟以太网对(veth0 到 veth6),本地机器与负载均衡器相连,在受控环境中模拟网络连接。每个虚拟接口都有自己的 IP 和 MAC 地址,代表不同的实体。
|
||||
|
||||
```txt
|
||||
+---------------------------+
|
||||
| 本地机器 |
|
||||
| IP: 10.0.0.1 (veth0) |
|
||||
| MAC: DE:AD:BE:EF:00:01 |
|
||||
+------------+---------------+
|
||||
|
|
||||
| (veth1)
|
||||
|
|
||||
+--------+---------------+
|
||||
| 负载均衡器 |
|
||||
| IP: 10.0.0.10 (veth6) |
|
||||
| MAC: DE:AD:BE:EF:00:10|
|
||||
+--------+---------------+
|
||||
|
|
||||
+---------+----------------------------+
|
||||
| |
|
||||
(veth2) (veth4)
|
||||
| |
|
||||
+--+---------------+ +--------+---------+
|
||||
| h2 | | h3 |
|
||||
| IP: | | IP: |
|
||||
|10.0.0.2 (veth3) | |10.0.0.3 (veth5) |
|
||||
| MAC: | | MAC: |
|
||||
|DE:AD:BE:EF:00:02 | |DE:AD:BE:EF:00:03 |
|
||||
+------------------+ +------------------+
|
||||
```
|
||||
|
||||
这个设置可以通过脚本(`setup.sh`)轻松初始化,并通过另一个脚本(`teardown.sh`)删除。
|
||||
|
||||
> 如果您对本教程感兴趣,请帮助我们创建一个容器化的版本,简化设置和拓扑结构!目前的设置和删除过程基于网络命名空间,容器化的版本会更加友好。
|
||||
|
||||
初始化:
|
||||
|
||||
```sh
|
||||
sudo ./setup.sh
|
||||
```
|
||||
|
||||
删除:
|
||||
|
||||
```sh
|
||||
sudo ./teardown.sh
|
||||
```
|
||||
|
||||
### 运行负载均衡器
|
||||
|
||||
要运行 XDP 负载均衡器,执行以下命令,指定接口和后端服务器的 IP 和 MAC 地址:
|
||||
|
||||
```console
|
||||
sudo ip netns exec lb ./xdp_lb veth6 10.0.0.2 de:ad:be:ef:00:02 10.0.0.3 de:ad:be:ef:00:03
|
||||
```
|
||||
|
||||
这将配置负载均衡器并输出后端服务器的详细信息:
|
||||
|
||||
```console
|
||||
XDP load balancer configured with backends:
|
||||
Backend 1 - IP: 10.0.0.2, MAC: de:ad:be:ef:00:02
|
||||
Backend 2 - IP: 10.0.0.3, MAC: de:ad:be:ef:00:03
|
||||
Press Ctrl+C to exit...
|
||||
```
|
||||
|
||||
### 测试设置
|
||||
|
||||
您可以通过在两个后端命名空间(`h2` 和 `h3`)启动 HTTP 服务器,并从本地机器向负载均衡器发送请求来测试设置:
|
||||
|
||||
在 `h2` 和 `h3` 上启动服务器:
|
||||
|
||||
```sh
|
||||
sudo ip netns exec h2 python3 -m http.server
|
||||
sudo ip netns exec h3 python3 -m http.server
|
||||
```
|
||||
|
||||
然后,向负载均衡器 IP 发送请求:
|
||||
|
||||
```sh
|
||||
curl 10.0.0.10:8000
|
||||
```
|
||||
|
||||
负载均衡器将根据哈希函数将流量分配到后端服务器(`h2` 和 `h3`)。
|
||||
|
||||
### 使用 `bpf_printk` 进行监控
|
||||
|
||||
您可以通过查看 `bpf_printk` 日志来监控负载均衡器的活动。BPF 程序在处理每个数据包时会打印诊断消息。您可以使用以下命令查看这些日志:
|
||||
|
||||
```console
|
||||
sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
```
|
||||
|
||||
日志示例:
|
||||
|
||||
```console
|
||||
<idle>-0 [004] ..s2. 24174.812722: bpf_trace_printk: xdp_load_balancer received packet
|
||||
<idle>-0 [004] .Ns2. 24174.812729: bpf_trace_printk: Received Source IP: 0xa000001
|
||||
<idle>-0 [004] .Ns2. 24174.812729: Received Destination IP: 0xa00000a
|
||||
<idle>-0 [004] .Ns2. 24174.812731: Received Source MAC: de:ad:be:ef:0:1
|
||||
<idle>-0 [004] .Ns2. 24174.812732: Received Destination MAC: de:ad:be:ef:0:10
|
||||
<idle>-0 [004] .Ns2. 24174.812732: Packet from client
|
||||
<idle>-0 [004] .Ns2. 24174.812734: bpf_trace_printk: Redirecting packet to new IP 0xa000002 from IP 0xa00000a
|
||||
<idle>-0 [004] .Ns2. 24174.812735: New Dest MAC: de:ad:be:ef:0:2
|
||||
<idle>-0 [004] .Ns2. 24174.812735: New Source MAC: de:ad:be:ef:0:10
|
||||
```
|
||||
|
||||
### 调试问题
|
||||
|
||||
某些系统可能会因为类似于此[博客文章](https://fedepaol.github.io/blog/2023/09/11/xdp-ate-my-packets-and-how-i-debugged-it/)中描述的问题而导致数据包丢失或转发失败。您可以使用 `bpftrace` 跟踪 XDP 错误进行调试:
|
||||
|
||||
```sh
|
||||
sudo bpftrace -e 'tracepoint:xdp:xdp_bulk_tx{@redir_errno[-args->err] = count();}'
|
||||
```
|
||||
|
||||
如果输出如下所示:
|
||||
|
||||
```sh
|
||||
@redir_errno[6]: 3
|
||||
```
|
||||
|
||||
这表明与 XDP 数据包转发相关的错误。错误代码 `6` 通常指向可以进一步调查的特定转发问题。
|
||||
|
||||
### 结论
|
||||
|
||||
本教程展示了如何使用 eBPF 设置一个简单的 XDP 负载均衡器,以实现高效的流量分发。对于那些想了解更多关于 eBPF 知识的用户,包括更高级的示例和教程,请访问我们的 [https://github.com/eunomia-bpf/bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) 或我们的网站 [https://eunomia.dev/tutorials/](https://eunomia.dev/tutorials/)。
|
||||
|
||||
### 参考文献
|
||||
|
||||
- [XDP 编程实践教程](https://github.com/xdp-project/xdp-tutorial)
|
||||
528
src/42-xdp-loadbalancer/README_en.md
Normal file
528
src/42-xdp-loadbalancer/README_en.md
Normal file
@@ -0,0 +1,528 @@
|
||||
|
||||
# eBPF Developer Tutorial: XDP Load Balancer
|
||||
|
||||
In this tutorial, we will guide you through the process of implementing a simple XDP (eXpress Data Path) load balancer using eBPF (Extended Berkeley Packet Filter). With just C, libbpf, and no external dependencies, this hands-on guide is perfect for developers interested in harnessing the full power of the Linux kernel to build highly efficient network applications.
|
||||
|
||||
## Why XDP?
|
||||
|
||||
`XDP` (eXpress Data Path) is a fast, in-kernel networking framework in Linux that allows packet processing at the earliest point in the network stack, right in the network interface card (NIC). This enables ultra-low-latency and high-throughput packet handling, making XDP ideal for tasks like load balancing, DDoS protection, and traffic filtering.
|
||||
|
||||
Key Features of XDP
|
||||
|
||||
1. **Fast Packet Processing**: XDP handles packets directly at the NIC level, reducing latency and improving performance by avoiding the usual networking stack overhead.
|
||||
2. **Efficient**: Because it processes packets before they reach the kernel, XDP minimizes CPU usage and handles high traffic loads without slowing down the system.
|
||||
3. **Customizable with eBPF**: XDP programs are written using eBPF, allowing you to create custom packet-handling logic for specific use cases like dropping, redirecting, or forwarding packets.
|
||||
4. **Low CPU Overhead**: With support for zero-copy packet forwarding, XDP uses fewer system resources, making it perfect for handling high traffic with minimal CPU load.
|
||||
5. **Simple Actions**: XDP programs return predefined actions like dropping, passing, or redirecting packets, providing control over how traffic is handled.
|
||||
|
||||
Projects That Use XDP
|
||||
|
||||
- `Cilium` is an open-source networking tool for cloud-native environments like Kubernetes. It uses XDP to efficiently handle packet filtering and load balancing, improving performance in high-traffic networks.
|
||||
- `Katran`, developed by Facebook, is a load balancer that uses XDP to handle millions of connections with low CPU usage. It distributes traffic efficiently across servers and is used internally at Facebook for large-scale networking.
|
||||
- `Cloudflare` uses XDP to protect against DDoS attacks. By filtering out malicious traffic at the NIC level, Cloudflare can drop attack packets before they even reach the kernel, minimizing the impact on their network.
|
||||
|
||||
### Why Choose XDP Over Other Methods?
|
||||
|
||||
Compared to traditional tools like `iptables` or `tc`, XDP offers:
|
||||
|
||||
- **Speed**: It operates directly in the NIC driver, processing packets much faster than traditional methods.
|
||||
- **Flexibility**: With eBPF, you can write custom packet-handling logic to meet specific needs.
|
||||
- **Efficiency**: XDP uses fewer resources, making it suitable for environments that need to handle high traffic without overloading the system.
|
||||
|
||||
## The Project: Building a Simple Load Balancer
|
||||
|
||||
In this project, we will be focusing on building a load balancer using XDP. A load balancer efficiently distributes incoming network traffic across multiple backend servers to prevent any single server from becoming overwhelmed. With the combination of XDP and eBPF, we can build a load balancer that operates at the edge of the Linux networking stack, ensuring high performance even under heavy traffic conditions.
|
||||
|
||||
The load balancer we’ll be implementing will:
|
||||
|
||||
- Listen for incoming network packets.
|
||||
- Calculate a hash based on the packet's source IP and port, allowing us to distribute the traffic across multiple backend servers.
|
||||
- Forward the packet to the appropriate backend server based on the calculated hash.
|
||||
|
||||
We'll keep the design simple but powerful, showing you how to leverage eBPF’s capabilities to create a lightweight load balancing solution.
|
||||
|
||||
## kernel eBPF code
|
||||
|
||||
```c
|
||||
// xdp_lb.bpf.c
|
||||
#include <bpf/bpf_endian.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <linux/if_ether.h>
|
||||
#include <linux/ip.h>
|
||||
#include <linux/in.h>
|
||||
#include <linux/tcp.h>
|
||||
#include "xx_hash.h"
|
||||
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[ETH_ALEN];
|
||||
};
|
||||
|
||||
// Backend IP and MAC address map
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__uint(max_entries, 2); // Two backends
|
||||
__type(key, __u32);
|
||||
__type(value, struct backend_config);
|
||||
} backends SEC(".maps");
|
||||
|
||||
int client_ip = bpf_htonl(0xa000001);
|
||||
unsigned char client_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x1};
|
||||
int load_balancer_ip = bpf_htonl(0xa00000a);
|
||||
unsigned char load_balancer_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x10};
|
||||
|
||||
static __always_inline __u16
|
||||
csum_fold_helper(__u64 csum)
|
||||
{
|
||||
int i;
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
if (csum >> 16)
|
||||
csum = (csum & 0xffff) + (csum >> 16);
|
||||
}
|
||||
return ~csum;
|
||||
}
|
||||
|
||||
static __always_inline __u16
|
||||
iph_csum(struct iphdr *iph)
|
||||
{
|
||||
iph->check = 0;
|
||||
unsigned long long csum = bpf_csum_diff(0, 0, (unsigned int *)iph, sizeof(struct iphdr), 0);
|
||||
return csum_fold_helper(csum);
|
||||
}
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_load_balancer(struct xdp_md *ctx) {
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
void *data = (void *)(long)ctx->data;
|
||||
|
||||
bpf_printk("xdp_load_balancer received packet");
|
||||
|
||||
// Ethernet header
|
||||
struct ethhdr *eth = data;
|
||||
if ((void *)(eth + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
// Check if the packet is IP (IPv4)
|
||||
if (eth->h_proto != __constant_htons(ETH_P_IP))
|
||||
return XDP_PASS;
|
||||
|
||||
// IP header
|
||||
struct iphdr *iph = (struct iphdr *)(eth + 1);
|
||||
if ((void *)(iph + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
// Check if the protocol is TCP or UDP
|
||||
if (iph->protocol != IPPROTO_TCP)
|
||||
return XDP_PASS;
|
||||
|
||||
bpf_printk("Received Source IP: 0x%x", bpf_ntohl(iph->saddr));
|
||||
bpf_printk("Received Destination IP: 0x%x", bpf_ntohl(iph->daddr));
|
||||
bpf_printk("Received Source MAC: %x:%x:%x:%x:%x:%x", eth->h_source[0], eth->h_source[1], eth->h_source[2], eth->h_source[3], eth->h_source[4], eth->h_source[5]);
|
||||
bpf_printk("Received Destination MAC: %x:%x:%x:%x:%x:%x", eth->h_dest[0], eth->h_dest[1], eth->h_dest[2], eth->h_dest[3], eth->h_dest[4], eth->h_dest[5]);
|
||||
|
||||
if (iph->saddr == client_ip)
|
||||
{
|
||||
bpf_printk("Packet from client");
|
||||
|
||||
__u32 key = xxhash32((const char*)iph, sizeof(struct iphdr), 0) % 2;
|
||||
|
||||
struct backend_config *backend = bpf_map_lookup_elem(&backends, &key);
|
||||
if (!backend)
|
||||
return XDP_PASS;
|
||||
|
||||
iph->daddr = backend->ip;
|
||||
__builtin_memcpy(eth->h_dest, backend->mac, ETH_ALEN);
|
||||
}
|
||||
else
|
||||
{
|
||||
bpf_printk("Packet from backend");
|
||||
iph->daddr = client_ip;
|
||||
__builtin_memcpy(eth->h_dest, client_mac, ETH_ALEN);
|
||||
}
|
||||
|
||||
// Update IP source address to the load balancer's IP
|
||||
iph->saddr = load_balancer_ip;
|
||||
// Update Ethernet source MAC address to the current lb's MAC
|
||||
__builtin_memcpy(eth->h_source, load_balancer_mac, ETH_ALEN);
|
||||
|
||||
// Recalculate IP checksum
|
||||
iph->check = iph_csum(iph);
|
||||
|
||||
bpf_printk("Redirecting packet to new IP 0x%x from IP 0x%x",
|
||||
bpf_ntohl(iph->daddr),
|
||||
bpf_ntohl(iph->saddr)
|
||||
);
|
||||
bpf_printk("New Dest MAC: %x:%x:%x:%x:%x:%x", eth->h_dest[0], eth->h_dest[1], eth->h_dest[2], eth->h_dest[3], eth->h_dest[4], eth->h_dest[5]);
|
||||
bpf_printk("New Source MAC: %x:%x:%x:%x:%x:%x\n", eth->h_source[0], eth->h_source[1], eth->h_source[2], eth->h_source[3], eth->h_source[4], eth->h_source[5]);
|
||||
// Return XDP_TX to transmit the modified packet back to the network
|
||||
return XDP_TX;
|
||||
}
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
Here’s a breakdown of the key sections of the kernel code for your blog:
|
||||
|
||||
### 1. **Header Files and Data Structures**
|
||||
|
||||
The code begins with necessary header files like `<bpf/bpf_helpers.h>`, `<linux/if_ether.h>`, `<linux/ip.h>`, and more. These headers provide definitions for handling Ethernet frames, IP packets, and BPF helper functions.
|
||||
|
||||
The `backend_config` struct is defined to hold the IP and MAC address of backend servers. This will later be used for routing packets based on load balancing logic.
|
||||
|
||||
```c
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[ETH_ALEN];
|
||||
};
|
||||
```
|
||||
|
||||
### 2. **Backend and Load Balancer Configuration**
|
||||
|
||||
The code defines an eBPF map named `backends` that stores IP and MAC addresses for two backends. The `BPF_MAP_TYPE_ARRAY` type is used to store backend configuration, with `max_entries` set to 2, indicating the load balancer will route to two backend servers.
|
||||
|
||||
```c
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__uint(max_entries, 2);
|
||||
__type(key, __u32);
|
||||
__type(value, struct backend_config);
|
||||
} backends SEC(".maps");
|
||||
```
|
||||
|
||||
There are also predefined IP addresses and MAC addresses for the client and load balancer:
|
||||
|
||||
```c
|
||||
int client_ip = bpf_htonl(0xa000001);
|
||||
unsigned char client_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x1};
|
||||
int load_balancer_ip = bpf_htonl(0xa00000a);
|
||||
unsigned char load_balancer_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x10};
|
||||
```
|
||||
|
||||
### 3. **Checksum Functions**
|
||||
|
||||
The function `iph_csum()` recalculates the IP header checksum after modifying the packet's contents. It's essential to keep the integrity of IP packets when any modification is done to the headers.
|
||||
|
||||
```c
|
||||
static __always_inline __u16 iph_csum(struct iphdr *iph) {
|
||||
iph->check = 0;
|
||||
unsigned long long csum = bpf_csum_diff(0, 0, (unsigned int *)iph, sizeof(struct iphdr), 0);
|
||||
return csum_fold_helper(csum);
|
||||
}
|
||||
```
|
||||
|
||||
### 4. **XDP Program Logic**
|
||||
|
||||
The core of the XDP load balancer logic is implemented in the `xdp_load_balancer` function, which is attached to the XDP hook. It processes incoming packets and directs them either to a backend or back to the client.
|
||||
|
||||
- **Initial Checks**:
|
||||
The function begins by verifying that the packet is an Ethernet frame, then checks if it's an IP packet (IPv4) and if it's using the TCP protocol.
|
||||
|
||||
```c
|
||||
if (eth->h_proto != __constant_htons(ETH_P_IP))
|
||||
return XDP_PASS;
|
||||
if (iph->protocol != IPPROTO_TCP)
|
||||
return XDP_PASS;
|
||||
```
|
||||
|
||||
- **Client Packet Handling**:
|
||||
If the source IP matches the client IP, the code hashes the IP header using `xxhash32` to determine the appropriate backend (based on the key modulo 2).
|
||||
|
||||
```c
|
||||
if (iph->saddr == client_ip) {
|
||||
__u32 key = xxhash32((const char*)iph, sizeof(struct iphdr), 0) % 2;
|
||||
struct backend_config *backend = bpf_map_lookup_elem(&backends, &key);
|
||||
```
|
||||
|
||||
The destination IP and MAC are replaced with those of the selected backend, and the packet is forwarded to the backend.
|
||||
|
||||
- **Backend Packet Handling**:
|
||||
If the packet is from a backend server, the destination is set to the client’s IP and MAC address, ensuring that the backend’s response is directed back to the client.
|
||||
|
||||
```c
|
||||
iph->daddr = client_ip;
|
||||
__builtin_memcpy(eth->h_dest, client_mac, ETH_ALEN);
|
||||
```
|
||||
|
||||
- **Rewriting IP and MAC Addresses**:
|
||||
The source IP and MAC are updated to the load balancer’s values for all outgoing packets, ensuring that the load balancer appears as the source for both client-to-backend and backend-to-client communication.
|
||||
|
||||
```c
|
||||
iph->saddr = load_balancer_ip;
|
||||
__builtin_memcpy(eth->h_source, load_balancer_mac, ETH_ALEN);
|
||||
```
|
||||
|
||||
- **Recalculate Checksum**:
|
||||
After modifying the IP header, the checksum is recalculated using the previously defined `iph_csum()` function.
|
||||
|
||||
```c
|
||||
iph->check = iph_csum(iph);
|
||||
```
|
||||
|
||||
- **Final Action**:
|
||||
The packet is transmitted using the `XDP_TX` action, which instructs the NIC to send the modified packet.
|
||||
|
||||
```c
|
||||
return XDP_TX;
|
||||
```
|
||||
|
||||
### 5. **Conclusion**
|
||||
|
||||
This part of the blog could explain how the load balancer ensures traffic is efficiently routed between the client and two backend servers by inspecting the source IP, hashing it for load distribution, and modifying the destination IP and MAC before forwarding the packet. The `XDP_TX` action is key to the high-speed packet handling provided by eBPF in the XDP layer.
|
||||
|
||||
This explanation can help readers understand the flow of the packet and the role of each section of the code in managing load balancing across multiple backends.
|
||||
|
||||
## Userspace code
|
||||
|
||||
```c
|
||||
// xdp_lb.c
|
||||
#include <arpa/inet.h>
|
||||
#include <bpf/bpf.h>
|
||||
#include <bpf/libbpf.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
#include <net/if.h>
|
||||
#include "xdp_lb.skel.h" // The generated skeleton
|
||||
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[6];
|
||||
};
|
||||
|
||||
static int parse_mac(const char *str, unsigned char *mac) {
|
||||
if (sscanf(str, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
|
||||
&mac[0], &mac[1], &mac[2], &mac[3], &mac[4], &mac[5]) != 6) {
|
||||
fprintf(stderr, "Invalid MAC address format\n");
|
||||
return -1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
if (argc != 6) {
|
||||
fprintf(stderr, "Usage: %s <ifname> <backend1_ip> <backend1_mac> <backend2_ip> <backend2_mac>\n", argv[0]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
const char *ifname = argv[1];
|
||||
struct backend_config backend[2];
|
||||
|
||||
// Parse backend 1
|
||||
if (inet_pton(AF_INET, argv[2], &backend[0].ip) != 1) {
|
||||
fprintf(stderr, "Invalid backend 1 IP address\n");
|
||||
return 1;
|
||||
}
|
||||
if (parse_mac(argv[3], backend[0].mac) < 0) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Parse backend 2
|
||||
if (inet_pton(AF_INET, argv[4], &backend[1].ip) != 1) {
|
||||
fprintf(stderr, "Invalid backend 2 IP address\n");
|
||||
return 1;
|
||||
}
|
||||
if (parse_mac(argv[5], backend[1].mac) < 0) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Load and attach the BPF program
|
||||
struct xdp_lb_bpf *skel = xdp_lb_bpf__open_and_load();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "Failed to open and load BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
int ifindex = if_nametoindex(ifname);
|
||||
if (ifindex < 0) {
|
||||
perror("if_nametoindex");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (bpf_program__attach_xdp(skel->progs.xdp_load_balancer, ifindex) < 0) {
|
||||
fprintf(stderr, "Failed to attach XDP program\n");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Update backend configurations
|
||||
for (int i = 0; i < 2; i++) {
|
||||
if (bpf_map_update_elem(bpf_map__fd(skel->maps.backends), &i, &backend[i], 0) < 0) {
|
||||
perror("bpf_map_update_elem");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
printf("XDP load balancer configured with backends:\n");
|
||||
printf("Backend 1 - IP: %s, MAC: %s\n", argv[2], argv[3]);
|
||||
printf("Backend 2 - IP: %s, MAC: %s\n", argv[4], argv[5]);
|
||||
|
||||
printf("Press Ctrl+C to exit...\n");
|
||||
while (1) {
|
||||
sleep(1); // Keep the program running
|
||||
}
|
||||
|
||||
// Cleanup and detach
|
||||
bpf_xdp_detach(ifindex, 0, NULL);
|
||||
xdp_lb_bpf__detach(skel);
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
The userspace code provided is responsible for setting up and configuring the XDP load balancer program that runs in the kernel. It accepts command-line arguments, loads the eBPF program, attaches it to a network interface, and updates the backend configurations.
|
||||
|
||||
### 1. **Argument Parsing and Backend Setup**
|
||||
|
||||
The program expects five command-line arguments: the name of the network interface (`ifname`), the IP addresses and MAC addresses of two backend servers. It then parses the IP addresses using `inet_pton()` and the MAC addresses using the `parse_mac()` function, which ensures that the format of the provided MAC addresses is correct. The parsed backend information is stored in a `backend_config` structure.
|
||||
|
||||
### 2. **Loading and Attaching the BPF Program**
|
||||
|
||||
The BPF skeleton (generated via `xdp_lb.skel.h`) is used to open and load the XDP program into the kernel. The program then identifies the network interface by converting the interface name into an index using `if_nametoindex()`. Afterward, it attaches the loaded BPF program to this interface using `bpf_program__attach_xdp()`.
|
||||
|
||||
### 3. **Configuring Backend Information**
|
||||
|
||||
The backend IP and MAC addresses are written to the `backends` BPF map using `bpf_map_update_elem()`. This step ensures that the BPF program has access to the backend configurations, allowing it to route packets to the correct backend servers based on the logic in the kernel code.
|
||||
|
||||
### 4. **Program Loop and Cleanup**
|
||||
|
||||
The program enters an infinite loop (`while (1) { sleep(1); }`) to keep running, allowing the XDP program to continue functioning. When the user decides to exit by pressing Ctrl+C, the BPF program is detached from the network interface, and resources are cleaned up by calling `xdp_lb_bpf__destroy()`.
|
||||
|
||||
In summary, this userspace code is responsible for configuring and managing the lifecycle of the XDP load balancer, making it easy to update backend configurations dynamically and ensuring the load balancer is correctly attached to a network interface.
|
||||
|
||||
## The topology of test environment
|
||||
|
||||
The topology represents a test environment where a local machine communicates with two backend nodes (h2 and h3) through a load balancer. The local machine is connected to the load balancer via virtual Ethernet pairs (veth0 to veth6), simulating network connections in a controlled environment. Each virtual interface has its own IP and MAC address to represent different entities.
|
||||
|
||||
```txt
|
||||
+---------------------------+
|
||||
| Local Machine |
|
||||
| IP: 10.0.0.1 (veth0) |
|
||||
| MAC: DE:AD:BE:EF:00:01 |
|
||||
+------------+---------------+
|
||||
|
|
||||
| (veth1)
|
||||
|
|
||||
+--------+---------------+
|
||||
| Load Balancer |
|
||||
| IP: 10.0.0.10 (veth6) |
|
||||
| MAC: DE:AD:BE:EF:00:10|
|
||||
+--------+---------------+
|
||||
|
|
||||
+---------+----------------------------+
|
||||
| |
|
||||
(veth2) (veth4)
|
||||
| |
|
||||
+--+---------------+ +--------+---------+
|
||||
| h2 | | h3 |
|
||||
| IP: | | IP: |
|
||||
|10.0.0.2 (veth3) | |10.0.0.3 (veth5) |
|
||||
| MAC: | | MAC: |
|
||||
|DE:AD:BE:EF:00:02 | |DE:AD:BE:EF:00:03 |
|
||||
+------------------+ +------------------+
|
||||
```
|
||||
|
||||
The setup can be easily initialized with a script (setup.sh), and removed with a teardown script (teardown.sh).
|
||||
|
||||
> If you are interested in this tutorial, please help us create a containerized version of the setup and topology! Currently the setup and teardown are based on the network namespace, it will be more friendly to have a containerized version of the setup and topology.
|
||||
|
||||
Setup:
|
||||
|
||||
```sh
|
||||
sudo ./setup.sh
|
||||
```
|
||||
|
||||
Teardown:
|
||||
|
||||
```sh
|
||||
sudo ./teardown.sh
|
||||
```
|
||||
|
||||
### Running the Load Balancer
|
||||
|
||||
To run the XDP load balancer, execute the following command, specifying the interface and backends' IP and MAC addresses:
|
||||
|
||||
```console
|
||||
sudo ip netns exec lb ./xdp_lb veth6 10.0.0.2 de:ad:be:ef:00:02 10.0.0.3 de:ad:be:ef:00:03
|
||||
```
|
||||
|
||||
This will configure the load balancer and print the details of the backends:
|
||||
|
||||
```console
|
||||
XDP load balancer configured with backends:
|
||||
Backend 1 - IP: 10.0.0.2, MAC: de:ad:be:ef:00:02
|
||||
Backend 2 - IP: 10.0.0.3, MAC: de:ad:be:ef:00:03
|
||||
Press Ctrl+C to exit...
|
||||
```
|
||||
|
||||
### Testing the Setup
|
||||
|
||||
You can test the setup by starting HTTP servers on the two backend namespaces (`h2` and `h3`) and sending requests from the local machine to the load balancer:
|
||||
|
||||
Start servers on `h2` and `h3`:
|
||||
|
||||
```sh
|
||||
sudo ip netns exec h2 python3 -m http.server
|
||||
sudo ip netns exec h3 python3 -m http.server
|
||||
```
|
||||
|
||||
Then, send a request to the load balancer IP:
|
||||
|
||||
```sh
|
||||
curl 10.0.0.10:8000
|
||||
```
|
||||
|
||||
The load balancer will distribute traffic to the backends (`h2` and `h3`) based on the hashing function.
|
||||
|
||||
### Monitoring with `bpf_printk`
|
||||
|
||||
You can monitor the load balancer's activity by checking the `bpf_printk` logs. The BPF program prints diagnostic messages whenever a packet is processed. You can view these logs using:
|
||||
|
||||
```console
|
||||
sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
```
|
||||
|
||||
Example output:
|
||||
|
||||
```console
|
||||
<idle>-0 [004] ..s2. 24174.812722: bpf_trace_printk: xdp_load_balancer received packet
|
||||
<idle>-0 [004] .Ns2. 24174.812729: bpf_trace_printk: Received Source IP: 0xa000001
|
||||
<idle>-0 [004] .Ns2. 24174.812729: bpf_trace_printk: Received Destination IP: 0xa00000a
|
||||
<idle>-0 [004] .Ns2. 24174.812731: bpf_trace_printk: Received Source MAC: de:ad:be:ef:0:1
|
||||
<idle>-0 [004] .Ns2. 24174.812732: bpf_trace_printk: Received Destination MAC: de:ad:be:ef:0:10
|
||||
<idle>-0 [004] .Ns2. 24174.812732: bpf_trace_printk: Packet from client
|
||||
<idle>-0 [004] .Ns2. 24174.812734: bpf_trace_printk: Redirecting packet to new IP 0xa000002 from IP 0xa00000a
|
||||
<idle>-0 [004] .Ns2. 24174.812735: bpf_trace_printk: New Dest MAC: de:ad:be:ef:0:2
|
||||
<idle>-0 [004] .Ns2. 24174.812735: bpf_trace_printk: New Source MAC: de:ad:be:ef:0:10
|
||||
```
|
||||
|
||||
### Debugging Issues
|
||||
|
||||
Some systems may experience packet loss or failure to forward packets due to issues similar to those described in this [blog post](https://fedepaol.github.io/blog/2023/09/11/xdp-ate-my-packets-and-how-i-debugged-it/). You can debug these issues using `bpftrace` to trace XDP errors:
|
||||
|
||||
```sh
|
||||
sudo bpftrace -e 'tracepoint:xdp:xdp_bulk_tx{@redir_errno[-args->err] = count();}'
|
||||
```
|
||||
|
||||
If you see an output like this:
|
||||
|
||||
```sh
|
||||
@redir_errno[6]: 3
|
||||
```
|
||||
|
||||
It indicates errors related to XDP packet forwarding. The error code `6` typically points to a particular forwarding issue that can be further investigated.
|
||||
|
||||
### Conclusion
|
||||
|
||||
This tutorial demonstrates how to set up a simple XDP load balancer using eBPF, providing efficient traffic distribution across backend servers. For those interested in learning more about eBPF, including more advanced examples and tutorials, please visit our [https://github.com/eunomia-bpf/bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) or our website [https://eunomia.dev/tutorials/](https://eunomia.dev/tutorials/).
|
||||
|
||||
### References
|
||||
|
||||
Here’s a simple list of XDP references:
|
||||
|
||||
1. [XDP Programming Hands-On Tutorial](https://github.com/xdp-project/xdp-tutorial)
|
||||
2. [XDP Tutorial in bpf-developer-tutorial](https://eunomia.dev/tutorials/21-xdp/)
|
||||
70
src/42-xdp-loadbalancer/connect.md
Normal file
70
src/42-xdp-loadbalancer/connect.md
Normal file
@@ -0,0 +1,70 @@
|
||||
# Network setup for bpf-developer-tutorial
|
||||
|
||||
In this tutorial, we will set up a simple network topology that simulates a load balancer using eBPF/XDP (Express Data Path). The setup includes a local machine, a load balancer (which can be enhanced with an XDP program), and two backend servers (`h2` and `h3`). The local machine routes packets to the load balancer, which then distributes traffic between the backend servers.
|
||||
|
||||
# Simple XDP Load Balancer Tutorial
|
||||
|
||||
This tutorial will guide you in setting up a simple virtual network to simulate a load balancer using eBPF/XDP.
|
||||
|
||||
## Network Topology
|
||||
|
||||
```txt
|
||||
+------------------+
|
||||
| Local Machine |
|
||||
| IP: 10.0.0.1 |
|
||||
+--------+---------+
|
||||
|
|
||||
+--------+---------+
|
||||
| Load Balancer |
|
||||
| IP: 10.0.0.10 |
|
||||
+--------+---------+
|
||||
|
|
||||
+-------+-------+
|
||||
| |
|
||||
+---+---+ +---+---+
|
||||
| h2 | | h3 |
|
||||
|10.0.0.2| |10.0.0.3|
|
||||
+-------+ +-------+
|
||||
```
|
||||
|
||||
- **Local Machine**: Simulates a client (`10.0.0.1`) sending traffic.
|
||||
- **Load Balancer**: Distributes traffic to backend servers (`10.0.0.10`).
|
||||
- **h2** and **h3**: Simulate backend servers (`10.0.0.2` and `10.0.0.3`).
|
||||
|
||||
### Setup Steps
|
||||
|
||||
This script creates virtual network namespaces and sets up IP addresses for the local machine, load balancer, and backend servers.
|
||||
|
||||
```bash
|
||||
sudo ./setup.sh
|
||||
```
|
||||
|
||||
To clean up the setup after testing:
|
||||
|
||||
```bash
|
||||
sudo ./teardown.sh
|
||||
```
|
||||
|
||||
### Testing the Network
|
||||
|
||||
You can test the network connectivity using `ping` commands:
|
||||
|
||||
Ping Between Backend Servers (`h2` to `h3`)
|
||||
|
||||
```bash
|
||||
sudo ip netns exec h2 ping -c 3 10.0.0.3
|
||||
```
|
||||
|
||||
Ping from Backend Server (`h2`) to Load Balancer
|
||||
|
||||
```bash
|
||||
sudo ip netns exec h2 ping -c 3 10.0.0.10
|
||||
```
|
||||
|
||||
Ping from Local Machine to Load Balancer
|
||||
|
||||
```bash
|
||||
ping -c 3 10.0.0.10
|
||||
```
|
||||
|
||||
That's it! This simple setup lets you simulate a load balancer using eBPF/XDP. You can extend it by adding custom XDP programs to control the traffic distribution between `h2` and `h3`.
|
||||
14
src/42-xdp-loadbalancer/no-docker/xdp_pass.c
Normal file
14
src/42-xdp-loadbalancer/no-docker/xdp_pass.c
Normal file
@@ -0,0 +1,14 @@
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_pass(struct xdp_md* ctx) {
|
||||
void* data = (void*)(long)ctx->data;
|
||||
void* data_end = (void*)(long)ctx->data_end;
|
||||
int pkt_sz = data_end - data;
|
||||
|
||||
bpf_printk("packet size is %d", pkt_sz);
|
||||
return XDP_PASS;
|
||||
}
|
||||
|
||||
char __license[] SEC("license") = "GPL";
|
||||
BIN
src/42-xdp-loadbalancer/no-docker/xdp_pass.o
Normal file
BIN
src/42-xdp-loadbalancer/no-docker/xdp_pass.o
Normal file
Binary file not shown.
159
src/42-xdp-loadbalancer/setup.sh
Executable file
159
src/42-xdp-loadbalancer/setup.sh
Executable file
@@ -0,0 +1,159 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -xe
|
||||
|
||||
part_mac="DE:AD:BE:EF:00:"
|
||||
|
||||
create_bridge () {
|
||||
if ! ip link show $1 &> /dev/null; then
|
||||
ip link add name $1 type bridge
|
||||
ip link set dev $1 up
|
||||
else
|
||||
echo "Bridge $1 already exists."
|
||||
fi
|
||||
}
|
||||
|
||||
create_pair () {
|
||||
if ! ip link show $1 &> /dev/null; then
|
||||
ip link add name $1 type veth peer name $2
|
||||
ip link set $1 address "$part_mac""$5"
|
||||
ip addr add $3 brd + dev $1
|
||||
ip link set $2 master $4
|
||||
ip link set dev $1 up
|
||||
ip link set dev $2 up
|
||||
else
|
||||
echo "Veth pair $1 <--> $2 already exists."
|
||||
fi
|
||||
}
|
||||
|
||||
create_pair_ns () {
|
||||
if ! ip link show $2 &> /dev/null; then
|
||||
ip link add name $1 type veth peer name $2
|
||||
ip link set $2 master $4
|
||||
ip link set dev $2 up
|
||||
|
||||
ip netns add $5
|
||||
ip link set $1 netns $5
|
||||
ip netns exec $5 ip addr add $3 brd + dev $1
|
||||
ip netns exec $5 ip link set $1 address "$part_mac""$6"
|
||||
ip netns exec $5 ip link set dev $1 up
|
||||
ip netns exec $5 ip link set lo up # Bring up loopback interface
|
||||
else
|
||||
echo "Veth pair $1 <--> $2 already exists in namespace $5."
|
||||
fi
|
||||
}
|
||||
|
||||
# Create bridge br0
|
||||
create_bridge br0
|
||||
|
||||
# Create veth pairs and assign IPs
|
||||
create_pair veth0 veth1 "10.0.0.1/24" br0 01
|
||||
|
||||
# Create veth pairs in namespaces h2, h3, and lb
|
||||
create_pair_ns veth2 veth3 "10.0.0.2/24" br0 h2 02
|
||||
create_pair_ns veth4 veth5 "10.0.0.3/24" br0 h3 03
|
||||
|
||||
# Create the lb namespace
|
||||
create_pair_ns veth6 veth7 "10.0.0.10/24" br0 lb 10
|
||||
|
||||
# Enable IP forwarding on the host
|
||||
sudo sysctl -w net.ipv4.ip_forward=1
|
||||
|
||||
# Set the FORWARD chain policy to ACCEPT in iptables to ensure packets are forwarded
|
||||
sudo iptables -P FORWARD ACCEPT
|
||||
|
||||
# maybe you can do similar things
|
||||
# sudo ip netns exec h2 bpftool load xdp_pass.o veth2
|
||||
# sudo ip netns exec h3 bpftool load xdp_pass.o veth4
|
||||
|
||||
# Helper function for error exit on ping failure
|
||||
function ping_or_fail() {
|
||||
if ! sudo ip netns exec $1 ping -c 3 $2; then
|
||||
echo "Ping from $1 to $2 failed!"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Ping test with failure checks
|
||||
function check_connectivity() {
|
||||
echo "Testing connectivity between namespaces and Load Balancer..."
|
||||
|
||||
# Ping from h2 to h3 and h3 to h2
|
||||
ping_or_fail h2 10.0.0.3
|
||||
ping_or_fail h3 10.0.0.2
|
||||
|
||||
# Ping from h2 to Load Balancer and h3 to Load Balancer
|
||||
ping_or_fail h2 10.0.0.10
|
||||
ping_or_fail h3 10.0.0.10
|
||||
|
||||
# Ping from Load Balancer to h2 and h3
|
||||
ping_or_fail lb 10.0.0.2
|
||||
ping_or_fail lb 10.0.0.3
|
||||
|
||||
# Ping from Local Machine to Load Balancer
|
||||
ping -c 3 10.0.0.10 || { echo "Ping from Local Machine to Load Balancer failed!"; exit 1; }
|
||||
|
||||
echo "All ping tests passed!"
|
||||
}
|
||||
|
||||
# Debugging helper functions
|
||||
|
||||
# Check if all interfaces are up and running
|
||||
check_interfaces () {
|
||||
for ns in h2 h3 lb; do
|
||||
echo "Checking interfaces in namespace $ns..."
|
||||
sudo ip netns exec $ns ip addr show
|
||||
sudo ip netns exec $ns ip link show
|
||||
done
|
||||
|
||||
echo "Checking bridge br0..."
|
||||
ip addr show br0
|
||||
ip link show br0
|
||||
}
|
||||
|
||||
# Check IP forwarding settings
|
||||
check_ip_forwarding () {
|
||||
echo "Checking IP forwarding status on the host..."
|
||||
sudo sysctl net.ipv4.ip_forward
|
||||
|
||||
echo "Checking IP forwarding status in namespace $ns..."
|
||||
sudo ip netns exec $ns sysctl net.ipv4.ip_forward
|
||||
}
|
||||
|
||||
# Check ARP table
|
||||
check_arp_table () {
|
||||
echo "Checking ARP table on the host..."
|
||||
arp -n
|
||||
|
||||
for ns in h2 h3 lb; do
|
||||
echo "Checking ARP table in namespace $ns..."
|
||||
sudo ip netns exec $ns ip neigh show
|
||||
done
|
||||
}
|
||||
|
||||
# Check routing tables
|
||||
check_routing_table () {
|
||||
echo "Checking routing table on the host..."
|
||||
ip route show
|
||||
|
||||
for ns in h2 h3 lb; do
|
||||
echo "Checking routing table in namespace $ns..."
|
||||
sudo ip netns exec $ns ip route show
|
||||
done
|
||||
}
|
||||
|
||||
# Check if firewall rules are blocking traffic
|
||||
check_firewall_rules () {
|
||||
echo "Checking firewall rules on the host..."
|
||||
sudo iptables -L
|
||||
}
|
||||
|
||||
# Run checks to verify the network
|
||||
check_interfaces
|
||||
check_ip_forwarding
|
||||
check_arp_table
|
||||
check_routing_table
|
||||
check_firewall_rules
|
||||
check_connectivity
|
||||
|
||||
echo "Setup and checks completed!"
|
||||
36
src/42-xdp-loadbalancer/teardown.sh
Executable file
36
src/42-xdp-loadbalancer/teardown.sh
Executable file
@@ -0,0 +1,36 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -xe
|
||||
|
||||
rm_bridge () {
|
||||
if ip link show $1 &> /dev/null; then
|
||||
ip link set dev $1 down
|
||||
ip link delete $1 type bridge
|
||||
fi
|
||||
}
|
||||
|
||||
rm_pair () {
|
||||
if ip link show $1 &> /dev/null; then
|
||||
ip link delete $1 type veth
|
||||
fi
|
||||
}
|
||||
|
||||
rm_ns () {
|
||||
if ip netns list | grep -w "$1" &> /dev/null; then
|
||||
ip netns delete $1
|
||||
fi
|
||||
}
|
||||
|
||||
# Remove bridge br0
|
||||
rm_bridge br0
|
||||
|
||||
# Remove veth pairs
|
||||
rm_pair veth0
|
||||
rm_pair veth2
|
||||
rm_pair veth4
|
||||
rm_pair veth6
|
||||
|
||||
# Remove namespaces
|
||||
rm_ns h2
|
||||
rm_ns h3
|
||||
rm_ns lb
|
||||
117
src/42-xdp-loadbalancer/xdp_lb.bpf.c
Normal file
117
src/42-xdp-loadbalancer/xdp_lb.bpf.c
Normal file
@@ -0,0 +1,117 @@
|
||||
// xdp_lb.bpf.c
|
||||
#include <bpf/bpf_endian.h>
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <linux/if_ether.h>
|
||||
#include <linux/ip.h>
|
||||
#include <linux/in.h>
|
||||
#include <linux/tcp.h>
|
||||
#include "xx_hash.h"
|
||||
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[ETH_ALEN];
|
||||
};
|
||||
|
||||
// Backend IP and MAC address map
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_ARRAY);
|
||||
__uint(max_entries, 2); // Two backends
|
||||
__type(key, __u32);
|
||||
__type(value, struct backend_config);
|
||||
} backends SEC(".maps");
|
||||
|
||||
int client_ip = bpf_htonl(0xa000001);
|
||||
unsigned char client_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x1};
|
||||
int load_balancer_ip = bpf_htonl(0xa00000a);
|
||||
unsigned char load_balancer_mac[ETH_ALEN] = {0xDE, 0xAD, 0xBE, 0xEF, 0x0, 0x10};
|
||||
|
||||
static __always_inline __u16
|
||||
csum_fold_helper(__u64 csum)
|
||||
{
|
||||
int i;
|
||||
for (i = 0; i < 4; i++)
|
||||
{
|
||||
if (csum >> 16)
|
||||
csum = (csum & 0xffff) + (csum >> 16);
|
||||
}
|
||||
return ~csum;
|
||||
}
|
||||
|
||||
static __always_inline __u16
|
||||
iph_csum(struct iphdr *iph)
|
||||
{
|
||||
iph->check = 0;
|
||||
unsigned long long csum = bpf_csum_diff(0, 0, (unsigned int *)iph, sizeof(struct iphdr), 0);
|
||||
return csum_fold_helper(csum);
|
||||
}
|
||||
|
||||
SEC("xdp")
|
||||
int xdp_load_balancer(struct xdp_md *ctx) {
|
||||
void *data_end = (void *)(long)ctx->data_end;
|
||||
void *data = (void *)(long)ctx->data;
|
||||
|
||||
bpf_printk("xdp_load_balancer received packet");
|
||||
|
||||
// Ethernet header
|
||||
struct ethhdr *eth = data;
|
||||
if ((void *)(eth + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
// Check if the packet is IP (IPv4)
|
||||
if (eth->h_proto != __constant_htons(ETH_P_IP))
|
||||
return XDP_PASS;
|
||||
|
||||
// IP header
|
||||
struct iphdr *iph = (struct iphdr *)(eth + 1);
|
||||
if ((void *)(iph + 1) > data_end)
|
||||
return XDP_PASS;
|
||||
|
||||
// Check if the protocol is TCP or UDP
|
||||
if (iph->protocol != IPPROTO_TCP)
|
||||
return XDP_PASS;
|
||||
|
||||
bpf_printk("Received Source IP: 0x%x", bpf_ntohl(iph->saddr));
|
||||
bpf_printk("Received Destination IP: 0x%x", bpf_ntohl(iph->daddr));
|
||||
bpf_printk("Received Source MAC: %x:%x:%x:%x:%x:%x", eth->h_source[0], eth->h_source[1], eth->h_source[2], eth->h_source[3], eth->h_source[4], eth->h_source[5]);
|
||||
bpf_printk("Received Destination MAC: %x:%x:%x:%x:%x:%x", eth->h_dest[0], eth->h_dest[1], eth->h_dest[2], eth->h_dest[3], eth->h_dest[4], eth->h_dest[5]);
|
||||
|
||||
if (iph->saddr == client_ip)
|
||||
{
|
||||
bpf_printk("Packet from client");
|
||||
|
||||
__u32 key = xxhash32((const char*)iph, sizeof(struct iphdr), 0) % 2;
|
||||
|
||||
struct backend_config *backend = bpf_map_lookup_elem(&backends, &key);
|
||||
if (!backend)
|
||||
return XDP_PASS;
|
||||
|
||||
iph->daddr = backend->ip;
|
||||
__builtin_memcpy(eth->h_dest, backend->mac, ETH_ALEN);
|
||||
}
|
||||
else
|
||||
{
|
||||
bpf_printk("Packet from backend");
|
||||
iph->daddr = client_ip;
|
||||
__builtin_memcpy(eth->h_dest, client_mac, ETH_ALEN);
|
||||
}
|
||||
|
||||
// Update IP source address to the load balancer's IP
|
||||
iph->saddr = load_balancer_ip;
|
||||
// Update Ethernet source MAC address to the current lb's MAC
|
||||
__builtin_memcpy(eth->h_source, load_balancer_mac, ETH_ALEN);
|
||||
|
||||
// Recalculate IP checksum
|
||||
iph->check = iph_csum(iph);
|
||||
|
||||
bpf_printk("Redirecting packet to new IP 0x%x from IP 0x%x",
|
||||
bpf_ntohl(iph->daddr),
|
||||
bpf_ntohl(iph->saddr)
|
||||
);
|
||||
bpf_printk("New Dest MAC: %x:%x:%x:%x:%x:%x", eth->h_dest[0], eth->h_dest[1], eth->h_dest[2], eth->h_dest[3], eth->h_dest[4], eth->h_dest[5]);
|
||||
bpf_printk("New Source MAC: %x:%x:%x:%x:%x:%x\n", eth->h_source[0], eth->h_source[1], eth->h_source[2], eth->h_source[3], eth->h_source[4], eth->h_source[5]);
|
||||
// Return XDP_TX to transmit the modified packet back to the network
|
||||
return XDP_TX;
|
||||
}
|
||||
|
||||
char _license[] SEC("license") = "GPL";
|
||||
96
src/42-xdp-loadbalancer/xdp_lb.c
Normal file
96
src/42-xdp-loadbalancer/xdp_lb.c
Normal file
@@ -0,0 +1,96 @@
|
||||
// xdp_lb.c
|
||||
#include <arpa/inet.h>
|
||||
#include <bpf/bpf.h>
|
||||
#include <bpf/libbpf.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <unistd.h>
|
||||
#include <net/if.h>
|
||||
#include "xdp_lb.skel.h" // The generated skeleton
|
||||
|
||||
struct backend_config {
|
||||
__u32 ip;
|
||||
unsigned char mac[6];
|
||||
};
|
||||
|
||||
static int parse_mac(const char *str, unsigned char *mac) {
|
||||
if (sscanf(str, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
|
||||
&mac[0], &mac[1], &mac[2], &mac[3], &mac[4], &mac[5]) != 6) {
|
||||
fprintf(stderr, "Invalid MAC address format\n");
|
||||
return -1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv) {
|
||||
if (argc != 6) {
|
||||
fprintf(stderr, "Usage: %s <ifname> <backend1_ip> <backend1_mac> <backend2_ip> <backend2_mac>\n", argv[0]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
const char *ifname = argv[1];
|
||||
struct backend_config backend[2];
|
||||
|
||||
// Parse backend 1
|
||||
if (inet_pton(AF_INET, argv[2], &backend[0].ip) != 1) {
|
||||
fprintf(stderr, "Invalid backend 1 IP address\n");
|
||||
return 1;
|
||||
}
|
||||
if (parse_mac(argv[3], backend[0].mac) < 0) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Parse backend 2
|
||||
if (inet_pton(AF_INET, argv[4], &backend[1].ip) != 1) {
|
||||
fprintf(stderr, "Invalid backend 2 IP address\n");
|
||||
return 1;
|
||||
}
|
||||
if (parse_mac(argv[5], backend[1].mac) < 0) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Load and attach the BPF program
|
||||
struct xdp_lb_bpf *skel = xdp_lb_bpf__open_and_load();
|
||||
if (!skel) {
|
||||
fprintf(stderr, "Failed to open and load BPF skeleton\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
int ifindex = if_nametoindex(ifname);
|
||||
if (ifindex < 0) {
|
||||
perror("if_nametoindex");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (bpf_program__attach_xdp(skel->progs.xdp_load_balancer, ifindex) < 0) {
|
||||
fprintf(stderr, "Failed to attach XDP program\n");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Update backend configurations
|
||||
for (int i = 0; i < 2; i++) {
|
||||
if (bpf_map_update_elem(bpf_map__fd(skel->maps.backends), &i, &backend[i], 0) < 0) {
|
||||
perror("bpf_map_update_elem");
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
printf("XDP load balancer configured with backends:\n");
|
||||
printf("Backend 1 - IP: %s, MAC: %s\n", argv[2], argv[3]);
|
||||
printf("Backend 2 - IP: %s, MAC: %s\n", argv[4], argv[5]);
|
||||
|
||||
printf("Press Ctrl+C to exit...\n");
|
||||
while (1) {
|
||||
sleep(1); // Keep the program running
|
||||
}
|
||||
|
||||
// Cleanup and detach
|
||||
bpf_xdp_detach(ifindex, 0, NULL);
|
||||
xdp_lb_bpf__detach(skel);
|
||||
xdp_lb_bpf__destroy(skel);
|
||||
return 0;
|
||||
}
|
||||
57
src/42-xdp-loadbalancer/xx_hash.h
Normal file
57
src/42-xdp-loadbalancer/xx_hash.h
Normal file
@@ -0,0 +1,57 @@
|
||||
#ifndef XXHASH_BPF_H
|
||||
#define XXHASH_BPF_H
|
||||
|
||||
#define PRIME1 0x9E3779B1U
|
||||
#define PRIME2 0x85EBCA77U
|
||||
#define PRIME3 0xC2B2AE3DU
|
||||
#define PRIME4 0x27D4EB2FU
|
||||
#define PRIME5 0x165667B1U
|
||||
|
||||
static __always_inline unsigned int rotl (unsigned int x, int r) {
|
||||
return ((x << r) | (x >> (32 - r)));
|
||||
}
|
||||
// Normal stripe processing routine.
|
||||
static __always_inline unsigned int round_xxhash(unsigned int acc, const unsigned int input) {
|
||||
return rotl(acc + (input * PRIME2), 13) * PRIME1;
|
||||
}
|
||||
|
||||
static __always_inline unsigned int avalanche_step (const unsigned int h, const int rshift, const unsigned int prime) {
|
||||
return (h ^ (h >> rshift)) * prime;
|
||||
}
|
||||
// Mixes all bits to finalize the hash.
|
||||
static __always_inline unsigned int avalanche (const unsigned int h) {
|
||||
return avalanche_step(avalanche_step(avalanche_step(h, 15, PRIME2), 13, PRIME3), 16, 1);
|
||||
}
|
||||
|
||||
static __always_inline unsigned int endian32 (const char *v) {
|
||||
return (unsigned int)((unsigned char)(v[0]))|((unsigned int)((unsigned char)(v[1])) << 8)
|
||||
|((unsigned int)((unsigned char)(v[2])) << 16)|((unsigned int)((unsigned char)(v[3])) << 24);
|
||||
}
|
||||
|
||||
static __always_inline unsigned int fetch32 (const char *p, const unsigned int v) {
|
||||
return round_xxhash(v, endian32(p));
|
||||
}
|
||||
|
||||
// Processes the last 0-15 bytes of p.
|
||||
static __always_inline unsigned int finalize (const unsigned int h, const char *p, unsigned int len) {
|
||||
return
|
||||
(len >= 4) ? finalize(rotl(h + (endian32(p) * PRIME3), 17) * PRIME4, p + 4, len - 4) :
|
||||
(len > 0) ? finalize(rotl(h + ((unsigned char)(*p) * PRIME5), 11) * PRIME1, p + 1, len - 1) :
|
||||
avalanche(h);
|
||||
}
|
||||
|
||||
static __always_inline unsigned int h16bytes_4 (const char *p, unsigned int len, const unsigned int v1, const unsigned int v2, const unsigned int v3, const unsigned int v4) {
|
||||
return
|
||||
(len >= 16) ? h16bytes_4(p + 16, len - 16, fetch32(p, v1), fetch32(p+4, v2), fetch32(p+8, v3), fetch32(p+12, v4)) :
|
||||
rotl(v1, 1) + rotl(v2, 7) + rotl(v3, 12) + rotl(v4, 18);
|
||||
}
|
||||
|
||||
static __always_inline unsigned int h16bytes_3 (const char *p, unsigned int len, const unsigned int seed) {
|
||||
return h16bytes_4(p, len, seed + PRIME1 + PRIME2, seed + PRIME2, seed, seed - PRIME1);
|
||||
}
|
||||
|
||||
static __always_inline unsigned int xxhash32 (const char *input, unsigned int len, unsigned int seed) {
|
||||
return finalize((len >= 16 ? h16bytes_3(input, len, seed) : seed + PRIME5) + len, (input) + (len & ~0xF), len & 0xF);
|
||||
}
|
||||
|
||||
#endif
|
||||
@@ -87,8 +87,9 @@ int kill_exit(struct trace_event_raw_sys_exit *ctx)
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
```
|
||||
|
||||
The above code defines an eBPF program for capturing system calls that send signals to processes, including kill, tkill, and tgkill. It captures the enter and exit events of system calls by using tracepoints, and executes specified probe functions such as `probe_entry` and `probe_exit` when these events occur.Instructions: Translate the following Chinese text to English
|
||||
while maintaining the original formatting: "In the probe function, we use the bpf_map to store the captured event information, including the process ID of the sending signal, the process ID of the receiving signal, the signal value, and the name of the executable for the current task. When the system call exits, we retrieve the event information stored in the bpf_map and use bpf_printk to print the process ID, process name, sent signal, and return value of the system call.
|
||||
The above code defines an eBPF program for capturing system calls that send signals to processes, including kill, tkill, and tgkill. It captures the enter and exit events of system calls by using tracepoints, and executes specified probe functions such as `probe_entry` and `probe_exit` when these events occur.
|
||||
|
||||
In the probe function, we use the bpf_map to store the captured event information, including the process ID of the sending signal, the process ID of the receiving signal, the signal value, and the name of the executable for the current task. When the system call exits, we retrieve the event information stored in the bpf_map and use bpf_printk to print the process ID, process name, sent signal, and return value of the system call.
|
||||
|
||||
Finally, we also need to use the SEC macro to define the probe and specify the name of the system call to be captured and the probe function to be executed.
|
||||
|
||||
|
||||
@@ -144,7 +144,7 @@ Run:
|
||||
```console
|
||||
$ sudo ./ecli run package.json
|
||||
TIME PID PPID EXIT_CODE DURATION_NS COMM".
|
||||
format: Return only the translated content, not including the original text.21:40:09 42050 42049 0 0 which
|
||||
21:40:09 42050 42049 0 0 which
|
||||
21:40:09 42049 3517 0 0 sh
|
||||
21:40:09 42052 42051 0 0 ps
|
||||
21:40:09 42051 3517 0 0 sh
|
||||
|
||||
1904
src/ChatGPT.md
1904
src/ChatGPT.md
File diff suppressed because it is too large
Load Diff
@@ -50,6 +50,8 @@ Android:
|
||||
Networking:
|
||||
|
||||
- [Accelerating network request forwarding using sockops](29-sockops/README.md)
|
||||
- [Capturing TCP Information with XDP](41-xdp-tcpdump/README.md)
|
||||
- [XDP Load Balancer](42-xdp-loadbalancer/README.md)
|
||||
|
||||
tracing:
|
||||
|
||||
|
||||
@@ -9,6 +9,7 @@ Kernel version | Commit
|
||||
## JIT compiling
|
||||
|
||||
The list of supported architectures for your kernel can be retrieved with:
|
||||
|
||||
```sh
|
||||
git grep HAVE_EBPF_JIT arch/
|
||||
```
|
||||
@@ -34,6 +35,7 @@ LoongArch | 6.1 | [`5dc615520c4d`](https://github.com/
|
||||
Several (but not all) of these *main features* translate to an eBPF program type.
|
||||
The list of such program types supported in your kernel can be found in file
|
||||
[`include/uapi/linux/bpf.h`](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h):
|
||||
|
||||
```sh
|
||||
git grep -W 'bpf_prog_type {' include/uapi/linux/bpf.h
|
||||
```
|
||||
@@ -43,7 +45,7 @@ Feature | Kernel version | Commit
|
||||
`AF_PACKET` (libpcap/tcpdump, `cls_bpf` classifier, netfilter's `xt_bpf`, team driver's load-balancing mode…) | 3.15 | [`bd4cf0ed331a`](https://github.com/torvalds/linux/commit/bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8)
|
||||
Kernel helpers | 3.15 | [`bd4cf0ed331a`](https://github.com/torvalds/linux/commit/bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8)
|
||||
`bpf()` syscall | 3.18 | [`99c55f7d47c0`](https://github.com/torvalds/linux/commit/99c55f7d47c0dc6fc64729f37bf435abf43f4c60)
|
||||
Maps (_a.k.a._ Tables; details below) | 3.18 | [`99c55f7d47c0`](https://github.com/torvalds/linux/commit/99c55f7d47c0dc6fc64729f37bf435abf43f4c60)
|
||||
Maps (*a.k.a.* Tables; details below) | 3.18 | [`99c55f7d47c0`](https://github.com/torvalds/linux/commit/99c55f7d47c0dc6fc64729f37bf435abf43f4c60)
|
||||
BPF attached to sockets | 3.19 | [`89aa075832b0`](https://github.com/torvalds/linux/commit/89aa075832b0da4402acebd698d0411dcc82d03e)
|
||||
BPF attached to `kprobes` | 4.1 | [`2541517c32be`](https://github.com/torvalds/linux/commit/2541517c32be2531e0da59dfd7efc1ce844644f5)
|
||||
`cls_bpf` / `act_bpf` for `tc` | 4.1 | [`e2e9b6541dd4`](https://github.com/torvalds/linux/commit/e2e9b6541dd4b31848079da80fe2253daaafb549)
|
||||
@@ -123,12 +125,13 @@ LSM | 5.7 | [`fc611f47f218`](https://github.com/torv
|
||||
lookup listening socket | 5.9 | [`e9ddbb7707ff`](https://github.com/torvalds/linux/commit/e9ddbb7707ff5891616240026062b8c1e29864ca) | BPF_PROG_TYPE_SK_LOOKUP
|
||||
Allow executing syscalls | 5.15 | [`79a7f8bdb159`](https://github.com/torvalds/linux/commit/79a7f8bdb159d9914b58740f3d31d602a6e4aca8) | BPF_PROG_TYPE_SYSCALL
|
||||
|
||||
## Maps (_a.k.a._ Tables, in BCC lingo)
|
||||
## Maps (*a.k.a.* Tables, in BCC lingo)
|
||||
|
||||
### Map types
|
||||
|
||||
The list of map types supported in your kernel can be found in file
|
||||
[`include/uapi/linux/bpf.h`](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h):
|
||||
|
||||
```sh
|
||||
git grep -W 'bpf_map_type {' include/uapi/linux/bpf.h
|
||||
```
|
||||
@@ -170,6 +173,7 @@ user ringbuf | 6.1 | [`583c1f420173`](https://github.com/tor
|
||||
Some (but not all) of these *API features* translate to a subcommand beginning with `BPF_MAP_`.
|
||||
The list of subcommands supported in your kernel can be found in file
|
||||
[`include/uapi/linux/bpf.h`](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h):
|
||||
|
||||
```sh
|
||||
git grep -W 'bpf_cmd {' include/uapi/linux/bpf.h
|
||||
```
|
||||
@@ -222,7 +226,7 @@ Generic XDP | 4.12 | [`b5cdae3291f7`](https://github.com/torvalds/linux/commit/b
|
||||
The list of helpers supported in your kernel can be found in file."[`include/uapi/linux/bpf.h`](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h):
|
||||
|
||||
```sh
|
||||
git grep ' FN(' include/uapi/linux/bpf.h
|
||||
git grep ' FN(' include/uapi/linux/bpf.h
|
||||
```
|
||||
|
||||
Alphabetical order
|
||||
@@ -463,7 +467,9 @@ Note: GPL-only BPF helpers require a GPL-compatible license. The current license
|
||||
Check the list of GPL-compatible licenses in your [kernel source code](https://github.com/torvalds/linux/blob/master/include/linux/license.h).
|
||||
|
||||
## Program Types
|
||||
|
||||
The list of program types and supported helper functions can be retrieved with:
|
||||
|
||||
```sh
|
||||
git grep -W 'func_proto(enum bpf_func_id func_id' kernel/ net/ drivers/
|
||||
```
|
||||
@@ -492,4 +498,4 @@ git grep -W 'func_proto(enum bpf_func_id func_id' kernel/ net/ drivers/
|
||||
|Function Group| Functions|
|
||||
|------------------|-------|
|
||||
|Base functions| `BPF_FUNC_map_lookup_elem()` <br> `BPF_FUNC_map_update_elem()` <br> `BPF_FUNC_map_delete_elem()` <br> `BPF_FUNC_map_peek_elem()` <br> `BPF_FUNC_map_pop_elem()` <br> `BPF_FUNC_map_push_elem()` <br> `BPF_FUNC_get_prandom_u32()` <br> `BPF_FUNC_get_smp_processor_id()` <br> `BPF_FUNC_get_numa_node_id()` <br> `BPF_FUNC_tail_call()` <br> `BPF_FUNC_ktime_get_boot_ns()` <br> `BPF_FUNC_ktime_get_ns()` <br> `BPF_FUNC_trace_printk()` <br> `BPF_FUNC_spin_lock()` <br> `BPF_FUNC_spin_unlock()` ||`Tracing functions`|`BPF_FUNC_map_lookup_elem()` <br> `BPF_FUNC_map_update_elem()` <br> `BPF_FUNC_map_delete_elem()` <br> `BPF_FUNC_probe_read()` <br> `BPF_FUNC_ktime_get_boot_ns()` <br> `BPF_FUNC_ktime_get_ns()` <br> `BPF_FUNC_tail_call()` <br> `BPF_FUNC_get_current_pid_tgid()` <br> `BPF_FUNC_get_current_task()` <br> `BPF_FUNC_get_current_uid_gid()` <br> `BPF_FUNC_get_current_comm()` <br> `BPF_FUNC_trace_printk()` <br> `BPF_FUNC_get_smp_processor_id()` <br> `BPF_FUNC_get_numa_node_id()` <br> `BPF_FUNC_perf_event_read()` <br> `BPF_FUNC_probe_write_user()` <br> `BPF_FUNC_current_task_under_cgroup()` <br> `BPF_FUNC_get_prandom_u32()` <br> `BPF_FUNC_probe_read_str()` <br> `BPF_FUNC_get_current_cgroup_id()` <br> `BPF_FUNC_send_signal()` <br> `BPF_FUNC_probe_read_kernel()` <br> `BPF_FUNC_probe_read_kernel_str()` <br> `BPF_FUNC_probe_read_user()` <br> `BPF_FUNC_probe_read_user_str()` <br> `BPF_FUNC_send_signal_thread()` <br> `BPF_FUNC_get_ns_current_pid_tgid()` <br> `BPF_FUNC_xdp_output()` <br> `BPF_FUNC_get_task_stack()`|
|
||||
|`LWT functions`| `BPF_FUNC_skb_load_bytes()` <br> `BPF_FUNC_skb_pull_data()` <br> `BPF_FUNC_csum_diff()` <br> `BPF_FUNC_get_cgroup_classid()` <br> `BPF_FUNC_get_route_realm()` <br> `BPF_FUNC_get_hash_recalc()` <br> `BPF_FUNC_perf_event_output()` <br> `BPF_FUNC_get_smp_processor_id()` <br> `BPF_FUNC_skb_under_cgroup()`|
|
||||
|`LWT functions`| `BPF_FUNC_skb_load_bytes()` <br> `BPF_FUNC_skb_pull_data()` <br> `BPF_FUNC_csum_diff()` <br> `BPF_FUNC_get_cgroup_classid()` <br> `BPF_FUNC_get_route_realm()` <br> `BPF_FUNC_get_hash_recalc()` <br> `BPF_FUNC_perf_event_output()` <br> `BPF_FUNC_get_smp_processor_id()` <br> `BPF_FUNC_skb_under_cgroup()`|
|
||||
|
||||
116
src/guideline.md
Normal file
116
src/guideline.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# blog guideline or pattern
|
||||
|
||||
## Key Pattern and Requirements for eBPF Tutorial Blog Posts
|
||||
|
||||
### 1. **Title**
|
||||
|
||||
- Begin with a clear and descriptive title, following the format:
|
||||
|
||||
```
|
||||
# eBPF Tutorial by Example: [Topic Description]
|
||||
```
|
||||
|
||||
*Example:*
|
||||
|
||||
```
|
||||
# eBPF Tutorial by Example: Recording TCP Connection Status and TCP RTT
|
||||
```
|
||||
|
||||
- Or slightly different
|
||||
|
||||
```
|
||||
# eBPF Developer Tutorial: [Topic Description]
|
||||
```
|
||||
|
||||
### 2. **Introduction** and background
|
||||
|
||||
- Start with a brief introduction to eBPF, explaining its significance and capabilities.
|
||||
- Provide context for the tutorial's focus, mentioning the specific tools, example or use cases that will be covered.
|
||||
- **Goal:** Help readers understand what they will learn and why it's important.
|
||||
|
||||
### 3. **Overview of the Tools, Examples or features or what we are describing in this tutorial**
|
||||
|
||||
Think of a better subtitle related to this part.
|
||||
|
||||
- Introduce the specific eBPF programs or tools that will be discussed.
|
||||
- Explain their purpose and how they can help you, their usecase or why you need them.
|
||||
- What key eBPF feature or kernel events is used or related? Only discuss important ones, but should be detailed
|
||||
- **Goal:** Give readers a clear understanding of what each tool does.
|
||||
|
||||
Note that it might not always be a tool. Might be examples or others.
|
||||
|
||||
### 4. **Kernel eBPF Code Analysis**
|
||||
|
||||
- Present the kernel-mode eBPF code related to the tools.
|
||||
- Include code snippets with proper formatting for readability.
|
||||
- if not too long, include the full code first.
|
||||
- Provide detailed explanations of key sections in the code.
|
||||
- for example:
|
||||
- *Define BPF Maps:* Explain the maps used and their purposes.
|
||||
- *Events:* Describe how the code attaches to kernel events.
|
||||
- *Logic:* Explain how the processing in kernel happens
|
||||
- *Features*: introduce used features in eBPF
|
||||
- **Goal:** Help readers understand how the eBPF code works internally.
|
||||
|
||||
### 5. **User-Space Code Analysis**
|
||||
|
||||
- Present the user-space code that interacts with the eBPF program.
|
||||
- if not too long, include the full code first.
|
||||
- Include code snippets and explain how the user-space application processes data from the eBPF program.
|
||||
- for example:
|
||||
- *Event Handling:* Describe how events are received and processed.
|
||||
- *Data Presentation:* Explain how data is formatted and displayed to the user.
|
||||
- **Goal:** Show how the eBPF program communicates with user-space and how to interpret the results.
|
||||
|
||||
### 6. **Compilation and Execution Instructions**
|
||||
|
||||
- Provide step-by-step instructions on how to compile and run the eBPF programs.
|
||||
- Include commands and expected outputs.
|
||||
- Mention any prerequisites or dependencies required.
|
||||
- *Compiling the eBPF Program:* Commands and explanations.
|
||||
- *Running the User-Space Application:* How to execute and interpret outputs.
|
||||
- **Goal:** Enable readers to replicate the examples on their own systems.
|
||||
|
||||
You need to provide **Complete Source Code and Resources** link in ompilation and Execution Instructions.
|
||||
|
||||
- Provide links to the complete source code repositories.
|
||||
- Include references to related tools, documentation, or tutorials.
|
||||
- *Source Code:* Direct links to GitHub or relevant repositories.
|
||||
- *References:* List of resources for further reading.
|
||||
- **Goal:** Offer additional resources for readers to explore more deeply.
|
||||
|
||||
The repo is in <https://github.com/eunomia-bpf/bpf-developer-tutorial>, website at <https://eunomia.dev/tutorials/>. Typically you can run `make` in the related fir, such as `bpf-developer-tutorial/src/41-xdp-tcpdump` to build it.
|
||||
|
||||
### 7. **Summary and Conclusion**
|
||||
|
||||
- Summarize the key points covered in the tutorial.
|
||||
- Emphasize the importance of the tools and concepts learned.
|
||||
- Encourage readers to apply this knowledge and explore further.
|
||||
- **Goal:** Reinforce learning outcomes and inspire continued learning.
|
||||
|
||||
You need to have **Call to Action** in Summary and Conclusion
|
||||
|
||||
- Invite readers to visit your tutorial repository and website for more examples and complete tutorials.
|
||||
- Provide links to the main tutorial site and any relevant sections. The link should be show directly.
|
||||
|
||||
- **Example:**
|
||||
|
||||
```md
|
||||
If you would like to learn more about eBPF, visit our tutorial code repository at <https://github.com/eunomia-bpf/bpf-developer-tutorial> or our website at <https://eunomia.dev/tutorials/>.
|
||||
```
|
||||
|
||||
## Additional Guidelines
|
||||
|
||||
- **Consistency:** Maintain a consistent writing style and formatting across all blog posts.
|
||||
- **Clarity:** Use clear and concise language to explain complex concepts.
|
||||
- **Code Formatting:** Ensure all code snippets are properly formatted and syntax-highlighted for readability.
|
||||
- **Visual Aids:** Include diagrams or charts if they help in explaining concepts better.
|
||||
- **Audience Engagement:** Pose questions or scenarios that encourage readers to think and engage with the material.
|
||||
- **Proofreading:** Check for grammatical errors and ensure technical accuracy.
|
||||
- **Accessibility:** Make sure that the tutorials are accessible to readers with varying levels of expertise in eBPF.
|
||||
|
||||
Also, do not just list points, try to make it using paragraph unless points list is clear. using oral English, clear and simple words and short sentence, make it attractive and easy to read, do not make it like a paper.
|
||||
|
||||
## Template Summary
|
||||
|
||||
By following this pattern, anyone tasked with writing future blog posts will have a clear structure to adhere to, ensuring that all necessary information is included and presented in a logical order.
|
||||
158719
src/third_party/vmlinux/x86/vmlinux_601.h
vendored
158719
src/third_party/vmlinux/x86/vmlinux_601.h
vendored
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user