mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-02-03 10:14:44 +08:00
implement opensnoop and uprobe
This commit is contained in:
@@ -3,7 +3,7 @@
|
||||
<!-- TOC -->
|
||||
|
||||
- [eBPF 入门开发实践指南一:介绍 eBPF 的基本概念、常见的开发工具](#ebpf-入门开发实践指南一介绍-ebpf-的基本概念常见的开发工具)
|
||||
- [1. 什么是eBPF](#1-什么是ebpf)
|
||||
- [1. 为什么会有 eBPF 技术?](#1-为什么会有-ebpf-技术)
|
||||
- [1.1. 起源](#11-起源)
|
||||
- [1.2. 执行逻辑](#12-执行逻辑)
|
||||
- [1.3. 架构](#13-架构)
|
||||
@@ -18,15 +18,11 @@
|
||||
|
||||
<!-- /TOC -->
|
||||
|
||||
## 1. 什么是eBPF
|
||||
## 1. 为什么会有 eBPF 技术?
|
||||
|
||||
Linux内核一直是实现监控/可观测性、网络和安全功能的理想地方,
|
||||
但是直接在内核中进行监控并不是一个容易的事情。在传统的Linux软件开发中,
|
||||
实现这些功能往往都离不开修改内核源码或加载内核模块。修改内核源码是一件非常危险的行为,
|
||||
稍有不慎可能便会导致系统崩溃,并且每次检验修改的代码都需要重新编译内核,耗时耗力。
|
||||
Linux内核一直是实现监控/可观测性、网络和安全功能的理想地方,但是直接在内核中进行监控并不是一个容易的事情。在传统的Linux软件开发中,实现这些功能往往都离不开修改内核源码或加载内核模块。修改内核源码是一件非常危险的行为,稍有不慎可能便会导致系统崩溃,并且每次检验修改的代码都需要重新编译内核,耗时耗力。
|
||||
|
||||
加载内核模块虽然来说更为灵活,不需要重新编译源码,但是也可能导致内核崩溃,且随着内核版本的变化
|
||||
模块也需要进行相应的修改,否则将无法使用。
|
||||
加载内核模块虽然来说更为灵活,不需要重新编译源码,但是也可能导致内核崩溃,且随着内核版本的变化,模块也需要进行相应的修改,否则将无法使用。
|
||||
|
||||
在这一背景下,eBPF技术应运而生。它是一项革命性技术,能在内核中运行沙箱程序(sandbox programs),而无需修改内核源码或者加载内核模块。用户可以使用其提供的各种接口,实现在内核中追踪、监测系统的作用。
|
||||
|
||||
|
||||
@@ -30,49 +30,48 @@ int handle_tp(void *ctx)
|
||||
}
|
||||
```
|
||||
|
||||
`minimal` is just that – a minimal practical BPF application example. It
|
||||
doesn't use or require BPF CO-RE, so should run on quite old kernels. It
|
||||
installs a tracepoint handler which is triggered once every second. It uses
|
||||
`bpf_printk()` BPF helper to communicate with the world.
|
||||
这段程序通过定义一个 handle_tp 函数并使用 SEC 宏把它附加到 sys_enter_write tracepoint(即在进入 write 系统调用时执行)。该函数通过使用 bpf_get_current_pid_tgid 和 bpf_printk 函数获取调用 write 系统调用的进程 ID,并在内核日志中打印出来。
|
||||
|
||||
要编译和运行这段程序,可以使用 ecc 工具和 ecli 命令。首先使用 ecc 编译程序:
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/minimal/package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
To see it's output,
|
||||
read `/sys/kernel/debug/tracing/trace_pipe` file as a root:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
<...>-3840345 [010] d... 3220701.101143: bpf_trace_printk: BPF triggered from PID 3840345.
|
||||
<...>-3840345 [010] d... 3220702.101265: bpf_trace_printk: BPF triggered from PID 3840345.
|
||||
```
|
||||
|
||||
`minimal` is great as a bare-bones experimental playground to quickly try out
|
||||
new ideas or BPF features.
|
||||
|
||||
## Compile and Run with eunomia-bpf
|
||||
|
||||
|
||||
|
||||
Compile:
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc minimal.bpf.c
|
||||
$ ecc hello.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
```console
|
||||
然后使用 ecli 运行编译后的程序:
|
||||
|
||||
```console
|
||||
sudo ecli ./package.json
|
||||
$ sudo ecli ./package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
运行这段程序后,可以通过查看 /sys/kernel/debug/tracing/trace_pipe 文件来查看 eBPF 程序的输出:
|
||||
|
||||
```console
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
<...>-3840345 [010] d... 3220701.101143: bpf_trace_printk: write system call from PID 3840345.
|
||||
<...>-3840345 [010] d... 3220701.101143: bpf_trace_printk: write system call from PID 3840345.
|
||||
```
|
||||
|
||||
## eBPF 程序的基本框架
|
||||
|
||||
如上所述, eBPF 程序的基本框架包括:
|
||||
|
||||
- 包含头文件:需要包含 <linux/bpf.h> 和 <bpf/bpf_helpers.h>。
|
||||
- 定义许可证:需要定义许可证,通常使用 "Dual BSD/GPL"。
|
||||
- 定义 BPF 函数:需要定义一个 BPF 函数,例如其名称为 handle_tp,其参数为 void *ctx,返回值为 int。
|
||||
- 使用 BPF 助手函数:在例如 BPF 函数中,可以使用 BPF 助手函数 bpf_get_current_pid_tgid() 和 bpf_printk()。
|
||||
- 返回值
|
||||
|
||||
## eBPF 程序的开发流程
|
||||
|
||||
eBPF 程序的开发流程可以概括为如下几个步骤:
|
||||
|
||||
- 定义 eBPF 程序的接口和类型:这包括定义 eBPF 程序的接口函数,定义和实现 eBPF 内核映射(maps)和共享内存(perf events),以及定义和使用 eBPF 内核帮助函数(helpers)。
|
||||
- 编写 eBPF 程序的代码:这包括编写 eBPF 程序的主要逻辑,实现 eBPF 内核映射的读写操作,以及使用 eBPF 内核帮助函数。
|
||||
- 编译 eBPF 程序:这包括使用 eBPF 编译器(例如 clang)将 eBPF 程序代码编译为 eBPF 字节码,并生成可执行的 eBPF 内核模块。ecc 本质上也是调用 clang 编译器来编译 eBPF 程序。
|
||||
- 加载 eBPF 程序到内核:这包括将编译好的 eBPF 内核模块加载到 Linux 内核中,并将 eBPF 程序附加到指定的内核事件上。
|
||||
- 使用 eBPF 程序:这包括监测 eBPF 程序的运行情况,并使用 eBPF 内核映射和共享内存进行数据交换和共享。
|
||||
- 在实际开发中,还可能需要进行其他的步骤,例如配置编译和加载参数,管理 eBPF 内核模块和内核映射,以及使用其他高级功能等。
|
||||
|
||||
需要注意的是,BPF 程序的执行是在内核空间进行的,因此需要使用特殊的工具和技术来编写、编译和调试 BPF 程序。eunomia-bpf 是一个开源的 BPF 编译器和工具包,它可以帮助开发者快速和简单地编写和运行 BPF 程序。
|
||||
@@ -1,76 +0,0 @@
|
||||
---
|
||||
layout: post
|
||||
title: fentry-link
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, examples, fentry, no-output]
|
||||
summary: an example that uses fentry and fexit BPF programs for tracing a file is deleted
|
||||
---
|
||||
|
||||
## Fentry
|
||||
|
||||
`fentry` is an example that uses fentry and fexit BPF programs for tracing. It
|
||||
attaches `fentry` and `fexit` traces to `do_unlinkat()` which is called when a
|
||||
file is deleted and logs the return value, PID, and filename to the
|
||||
trace pipe.
|
||||
|
||||
Important differences, compared to kprobes, are improved performance and
|
||||
usability. In this example, better usability is shown with the ability to
|
||||
directly dereference pointer arguments, like in normal C, instead of using
|
||||
various read helpers. The big distinction between **fexit** and **kretprobe**
|
||||
programs is that fexit one has access to both input arguments and returned
|
||||
result, while kretprobe can only access the result.
|
||||
|
||||
fentry and fexit programs are available starting from 5.5 kernels.
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/fentry-link/package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
The `fentry` output in `/sys/kernel/debug/tracing/trace_pipe` should look
|
||||
something like this:
|
||||
|
||||
```console
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file
|
||||
rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file, ret = 0
|
||||
rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file2
|
||||
rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file2, ret = 0
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
|
||||
|
||||
- Compile:
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```console
|
||||
$ ecc fentry-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
- Run and help:
|
||||
|
||||
```console
|
||||
sudo ecli examples/bpftools/fentry-link/package.json -h
|
||||
Usage: fentry_link_bpf [--help] [--version] [--verbose]
|
||||
|
||||
A simple eBPF program
|
||||
|
||||
Optional arguments:
|
||||
-h, --help shows help message and exits
|
||||
-v, --version prints version information and exits
|
||||
--verbose prints libbpf debug information
|
||||
|
||||
Built with eunomia-bpf framework.
|
||||
See https://github.com/eunomia-bpf/eunomia-bpf for more information.
|
||||
```
|
||||
66
2-kprobe-unlink/README.md
Normal file
66
2-kprobe-unlink/README.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# eBPF 入门开发实践指南二:在 eBPF 中使用 kprobe 捕获 unlink 系统调用
|
||||
|
||||
eBPF (Extended Berkeley Packet Filter) 是 Linux 内核上的一个强大的网络和性能分析工具。它允许开发者在内核运行时动态加载、更新和运行用户定义的代码。
|
||||
|
||||
本文是 eBPF 入门开发实践指南的第二篇,在 eBPF 中使用 kprobe 捕获 unlink 系统调用。
|
||||
|
||||
## kprobe
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
|
||||
SEC("kprobe/do_unlinkat")
|
||||
int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
const char *filename;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
filename = BPF_CORE_READ(name, name);
|
||||
bpf_printk("KPROBE ENTRY pid = %d, filename = %s\n", pid, filename);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("kretprobe/do_unlinkat")
|
||||
int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("KPROBE EXIT: pid = %d, ret = %ld\n", pid, ret);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
kprobe 是 eBPF 用于处理内核空间入口和出口(返回)探针(kprobe 和 kretprobe)的一个例子。它将 kprobe 和 kretprobe BPF 程序附加到 do_unlinkat() 函数上,并使用 bpf_printk() 宏分别记录 PID、文件名和返回值。
|
||||
|
||||
要编译这个程序,请使用 ecc 工具:
|
||||
|
||||
```console
|
||||
$ ecc kprobe-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
然后运行:
|
||||
|
||||
```console
|
||||
sudo ecli package.json
|
||||
```
|
||||
|
||||
在 /sys/kernel/debug/tracing/trace_pipe 文件中,应该能看到类似下面的 kprobe 演示输出:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9346 [005] d..3 4710.951696: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test1
|
||||
rm-9346 [005] d..4 4710.951819: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
rm-9346 [005] d..3 4710.951852: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test2
|
||||
rm-9346 [005] d..4 4710.951895: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
```
|
||||
|
||||
61
3-fentry-unlink/README.md
Normal file
61
3-fentry-unlink/README.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# eBPF 入门开发实践指南三:在 eBPF 中使用 fentry 捕获 unlink 系统调用
|
||||
|
||||
eBPF (Extended Berkeley Packet Filter) 是 Linux 内核上的一个强大的网络和性能分析工具。它允许开发者在内核运行时动态加载、更新和运行用户定义的代码。
|
||||
|
||||
本文是 eBPF 入门开发实践指南的第三篇,在 eBPF 中使用 fentry 捕获 unlink 系统调用。
|
||||
|
||||
## Fentry
|
||||
|
||||
```c
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
|
||||
SEC("fentry/do_unlinkat")
|
||||
int BPF_PROG(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("fentry: pid = %d, filename = %s\n", pid, name->name);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("fexit/do_unlinkat")
|
||||
int BPF_PROG(do_unlinkat_exit, int dfd, struct filename *name, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("fexit: pid = %d, filename = %s, ret = %ld\n", pid, name->name, ret);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
这段程序通过定义两个函数,分别附加到 do_unlinkat 和 do_unlinkat_exit 上。这两个函数分别在进入 do_unlinkat 和离开 do_unlinkat 时执行。这两个函数通过使用 bpf_get_current_pid_tgid 和 bpf_printk 函数来获取调用 do_unlinkat 的进程 ID,文件名和返回值,并在内核日志中打印出来。
|
||||
|
||||
与 kprobes 相比,fentry 和 fexit 程序有更高的性能和可用性。在这个例子中,我们可以直接访问函数的指针参数,就像在普通的 C 代码中一样,而不需要使用各种读取帮助程序。fexit 和 kretprobe 程序最大的区别在于,fexit 程序可以访问函数的输入参数和返回值,而 kretprobe 只能访问返回值。
|
||||
|
||||
从 5.5 内核开始,fentry 和 fexit 程序可用。
|
||||
|
||||
编译运行上述代码:
|
||||
|
||||
```console
|
||||
$ ecc fentry-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
$ sudo ecli package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
运行这段程序后,可以通过查看 /sys/kernel/debug/tracing/trace_pipe 文件来查看 eBPF 程序的输出:
|
||||
|
||||
```console
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file
|
||||
rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file, ret = 0
|
||||
rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file2
|
||||
rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file2, ret = 0
|
||||
```
|
||||
@@ -1,55 +0,0 @@
|
||||
---
|
||||
layout: post
|
||||
title: kprobe-link
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, examples, kprobe, no-output]
|
||||
summary: an example of dealing with kernel-space entry and exit (return) probes, `kprobe` and `kretprobe` in libbpf lingo
|
||||
---
|
||||
|
||||
|
||||
`kprobe` is an example of dealing with kernel-space entry and exit (return)
|
||||
probes, `kprobe` and `kretprobe` in libbpf lingo. It attaches `kprobe` and
|
||||
`kretprobe` BPF programs to the `do_unlinkat()` function and logs the PID,
|
||||
filename, and return result, respectively, using `bpf_printk()` macro.
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/kprobe-link/package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
The `kprobe` demo output in `/sys/kernel/debug/tracing/trace_pipe` should look
|
||||
something like this:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9346 [005] d..3 4710.951696: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test1
|
||||
rm-9346 [005] d..4 4710.951819: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
rm-9346 [005] d..3 4710.951852: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test2
|
||||
rm-9346 [005] d..4 4710.951895: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
|
||||
|
||||
Compile with docker:
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc kprobe-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
sudo ecli examples/bpftools/kprobe-link/package.json
|
||||
```
|
||||
@@ -1,263 +0,0 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序监控打开文件路径并使用 Prometheus 可视化
|
||||
|
||||
### 背景
|
||||
|
||||
通过对 open 系统调用的监测,`opensnoop`可以展现系统内所有调用了 open 系统调用的进程信息。
|
||||
|
||||
### 使用 ecli 一键运行
|
||||
|
||||
```console
|
||||
$ # 下载安装 ecli 二进制
|
||||
$ wget https://aka.pw/bpf-ecli -O ./ecli && chmod +x ./ecli
|
||||
$ # 使用 url 一键运行
|
||||
$ ./ecli run https://eunomia-bpf.github.io/eunomia-bpf/opensnoop/package.json
|
||||
|
||||
running and waiting for the ebpf events from perf event...
|
||||
time ts pid uid ret flags comm fname
|
||||
00:58:08 0 812 0 9 524288 vmtoolsd /etc/mtab
|
||||
00:58:08 0 812 0 11 0 vmtoolsd /proc/devices
|
||||
00:58:08 0 34351 0 24 524288 ecli /etc/localtime
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /sys/class/block/sda5/../device/../../../class
|
||||
00:58:08 0 812 0 -2 0 vmtoolsd /sys/class/block/sda5/../device/../../../label
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /sys/class/block/sda1/../device/../../../class
|
||||
00:58:08 0 812 0 -2 0 vmtoolsd /sys/class/block/sda1/../device/../../../label
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /run/systemd/resolve/resolv.conf
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /proc/net/route
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /proc/net/ipv6_route
|
||||
```
|
||||
|
||||
### 实现
|
||||
|
||||
使用 eunomia-bpf 可以帮助你只需要编写内核态应用程序,不需要编写任何用户态辅助框架代码;需要编写的代码由两个部分组成:
|
||||
|
||||
- 头文件 opensnoop.h 里面定义需要导出的 C 语言结构体:
|
||||
- 源文件 opensnoop.bpf.c 里面定义 BPF 代码:
|
||||
|
||||
头文件 opensnoop.h
|
||||
|
||||
```c
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __OPENSNOOP_H
|
||||
#define __OPENSNOOP_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
#define NAME_MAX 255
|
||||
#define INVALID_UID ((uid_t)-1)
|
||||
|
||||
// used for export event
|
||||
struct event {
|
||||
/* user terminology for pid: */
|
||||
unsigned long long ts;
|
||||
int pid;
|
||||
int uid;
|
||||
int ret;
|
||||
int flags;
|
||||
char comm[TASK_COMM_LEN];
|
||||
char fname[NAME_MAX];
|
||||
};
|
||||
|
||||
#endif /* __OPENSNOOP_H */
|
||||
```
|
||||
|
||||
`opensnoop` 的实现逻辑比较简单,它在 `sys_enter_open` 和 `sys_enter_openat` 这两个追踪点下
|
||||
加了执行函数,当有 open 系统调用发生时,执行函数便会被触发。同样在,在对应的 `sys_exit_open` 和
|
||||
`sys_exit_openat` 系统调用下,`opensnoop` 也加了执行函数。
|
||||
|
||||
源文件 opensnoop.bpf.c
|
||||
|
||||
```c
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2019 Facebook
|
||||
// Copyright (c) 2020 Netflix
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include "opensnoop.h"
|
||||
|
||||
struct args_t {
|
||||
const char *fname;
|
||||
int flags;
|
||||
};
|
||||
|
||||
const volatile pid_t targ_pid = 0;
|
||||
const volatile pid_t targ_tgid = 0;
|
||||
const volatile uid_t targ_uid = 0;
|
||||
const volatile bool targ_failed = false;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 10240);
|
||||
__type(key, u32);
|
||||
__type(value, struct args_t);
|
||||
} start SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(u32));
|
||||
__uint(value_size, sizeof(u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static __always_inline bool valid_uid(uid_t uid) {
|
||||
return uid != INVALID_UID;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
bool trace_allowed(u32 tgid, u32 pid)
|
||||
{
|
||||
u32 uid;
|
||||
|
||||
/* filters */
|
||||
if (targ_tgid && targ_tgid != tgid)
|
||||
return false;
|
||||
if (targ_pid && targ_pid != pid)
|
||||
return false;
|
||||
if (valid_uid(targ_uid)) {
|
||||
uid = (u32)bpf_get_current_uid_gid();
|
||||
if (targ_uid != uid) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_open")
|
||||
int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[0];
|
||||
args.flags = (int)ctx->args[1];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_openat")
|
||||
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[1];
|
||||
args.flags = (int)ctx->args[2];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
int trace_exit(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
struct event event = {};
|
||||
struct args_t *ap;
|
||||
int ret;
|
||||
u32 pid = bpf_get_current_pid_tgid();
|
||||
|
||||
ap = bpf_map_lookup_elem(&start, &pid);
|
||||
if (!ap)
|
||||
return 0; /* missed entry */
|
||||
ret = ctx->ret;
|
||||
if (targ_failed && ret >= 0)
|
||||
goto cleanup; /* want failed only */
|
||||
|
||||
/* event data */
|
||||
event.pid = bpf_get_current_pid_tgid() >> 32;
|
||||
event.uid = bpf_get_current_uid_gid();
|
||||
bpf_get_current_comm(&event.comm, sizeof(event.comm));
|
||||
bpf_probe_read_user_str(&event.fname, sizeof(event.fname), ap->fname);
|
||||
event.flags = ap->flags;
|
||||
event.ret = ret;
|
||||
|
||||
/* emit event */
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
|
||||
&event, sizeof(event));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_open")
|
||||
int tracepoint__syscalls__sys_exit_open(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_openat")
|
||||
int tracepoint__syscalls__sys_exit_openat(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
在 enter 环节,`opensnoop` 会记录调用者的 pid, comm 等基本信息,并存入 map 中。在 exit 环节,`opensnoop`
|
||||
会根据 pid 读出之前存入的数据,再结合捕获的其他数据,输出到用户态处理函数中,展现给用户。
|
||||
|
||||
完整示例代码请参考:https://github.com/eunomia-bpf/eunomia-bpf/tree/master/examples/bpftools/opensnoop
|
||||
|
||||
把头文件和源文件放在独立的目录里面,编译运行:
|
||||
|
||||
```bash
|
||||
$ # 使用容器进行编译,生成一个 package.json 文件,里面是已经编译好的代码和一些辅助信息
|
||||
$ docker run -it -v /path/to/opensnoop:/src yunwei37/ebpm:latest
|
||||
$ # 运行 eBPF 程序(root shell)
|
||||
$ sudo ecli run package.json
|
||||
```
|
||||
|
||||
### Prometheus 可视化
|
||||
|
||||
编写 yaml 配置文件:
|
||||
|
||||
```yaml
|
||||
programs:
|
||||
- name: opensnoop
|
||||
metrics:
|
||||
counters:
|
||||
- name: eunomia_file_open_counter
|
||||
description: test
|
||||
labels:
|
||||
- name: pid
|
||||
- name: comm
|
||||
- name: filename
|
||||
from: fname
|
||||
compiled_ebpf_filename: package.json
|
||||
```
|
||||
|
||||
使用 eunomia-exporter 实现导出信息到 Prometheus:
|
||||
|
||||
- 通过 https://github.com/eunomia-bpf/eunomia-bpf/releases 下载 eunomia-exporter
|
||||
|
||||
```console
|
||||
$ ls
|
||||
config.yaml eunomia-exporter package.json
|
||||
$ sudo ./eunomia-exporter
|
||||
|
||||
Running ebpf program opensnoop takes 46 ms
|
||||
Listening on http://127.0.0.1:8526
|
||||
running and waiting for the ebpf events from perf event...
|
||||
Receiving request at path /metrics
|
||||
```
|
||||
|
||||

|
||||
|
||||
### 总结和参考资料
|
||||
|
||||
`opensnoop` 通过对 open 系统调用的追踪,使得用户可以较为方便地掌握目前系统中调用了 open 系统调用的进程信息。
|
||||
|
||||
参考资料:
|
||||
|
||||
- 源代码:https://github.com/eunomia-bpf/eunomia-bpf/tree/master/examples/bpftools/opensnoop
|
||||
- libbpf 参考代码:https://github.com/iovisor/bcc/blob/master/libbpf-tools/opensnoop.bpf.c
|
||||
- eunomia-bpf 手册:https://eunomia-bpf.github.io/
|
||||
@@ -1,281 +1,86 @@
|
||||
---
|
||||
layout: post
|
||||
title: opensnoop
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall]
|
||||
summary: opensnoop traces the open() syscall system-wide, and prints various details.
|
||||
---
|
||||
# eBPF 入门开发实践指南四:捕获进程打开文件的系统调用集合,使用全局变量在 eBPF 中过滤进程 pid
|
||||
|
||||
## origin
|
||||
eBPF (Extended Berkeley Packet Filter) 是 Linux 内核上的一个强大的网络和性能分析工具,它允许开发者在内核运行时动态加载、更新和运行用户定义的代码。
|
||||
|
||||
The kernel code is origin from:
|
||||
本文是 eBPF 入门开发实践指南的第四篇,主要介绍如何捕获进程打开文件的系统调用集合,并使用全局变量在 eBPF 中过滤进程 pid。
|
||||
|
||||
<https://github.com/iovisor/bcc/blob/master/libbpf-tools/opensnoop.bpf.c>
|
||||
## 在 eBPF 中捕获进程打开文件的系统调用集合
|
||||
|
||||
result:
|
||||
首先,我们需要编写一段 eBPF 程序来捕获进程打开文件的系统调用,具体实现如下:
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/opensnoop/package.json -h
|
||||
Usage: opensnoop_bpf [--help] [--version] [--verbose] [--pid_target VAR] [--tgid_target VAR] [--uid_target VAR] [--failed]
|
||||
```c
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2019 Facebook
|
||||
// Copyright (c) 2020 Netflix
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include "opensnoop.h"
|
||||
|
||||
Trace open family syscalls.
|
||||
|
||||
Optional arguments:
|
||||
-h, --help shows help message and exits
|
||||
-v, --version prints version information and exits
|
||||
--verbose prints libbpf debug information
|
||||
--pid_target Process ID to trace
|
||||
--tgid_target Thread ID to trace
|
||||
--uid_target User ID to trace
|
||||
-f, --failed trace only failed events
|
||||
/// Process ID to trace
|
||||
const volatile int pid_target = 0;
|
||||
|
||||
Built with eunomia-bpf framework.
|
||||
See https://github.com/eunomia-bpf/eunomia-bpf for more information.
|
||||
SEC("tracepoint/syscalls/sys_enter_open")
|
||||
int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
u32 pid = id;
|
||||
|
||||
$ sudo ecli examples/bpftools/opensnoop/package.json
|
||||
TIME TS PID UID RET FLAGS COMM FNAME
|
||||
20:31:50 0 1 0 51 524288 systemd /proc/614/cgroup
|
||||
20:31:50 0 33182 0 25 524288 ecli /etc/localtime
|
||||
20:31:53 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:31:53 0 754 0 6 0 irqbalance /proc/stat
|
||||
20:32:03 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:32:03 0 754 0 6 0 irqbalance /proc/stat
|
||||
20:32:03 0 632 0 7 524288 vmtoolsd /etc/mtab
|
||||
20:32:03 0 632 0 9 0 vmtoolsd /proc/devices
|
||||
if (pid_target && pid_target != pid)
|
||||
return false;
|
||||
// Use bpf_printk to print the process information
|
||||
bpf_printk("Process ID: %d enter sys open\n", pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
$ sudo ecli examples/bpftools/opensnoop/package.json --pid_target 754
|
||||
TIME TS PID UID RET FLAGS COMM FNAME
|
||||
20:34:13 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:34:13 0 754 0 6 0 irqbalance /proc/stat
|
||||
20:34:23 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:34:23 0 754 0 6 0 irqbalance /proc/stat
|
||||
SEC("tracepoint/syscalls/sys_enter_openat")
|
||||
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
u32 pid = id;
|
||||
|
||||
if (pid_target && pid_target != pid)
|
||||
return false;
|
||||
// Use bpf_printk to print the process information
|
||||
bpf_printk("Process ID: %d enter sys openat\n", pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/// Trace open family syscalls.
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
## Compile and Run
|
||||
上面的 eBPF 程序通过定义两个函数 tracepoint__syscalls__sys_enter_open 和 tracepoint__syscalls__sys_enter_openat 并使用 SEC 宏把它们附加到 sys_enter_open 和 sys_enter_openat 两个 tracepoint(即在进入 open 和 openat 系统调用时执行)。这两个函数通过使用 bpf_get_current_pid_tgid 函数获取调用 open 或 openat 系统调用的进程 ID,并使用 bpf_printk 函数在内核日志中打印出来。
|
||||
|
||||
Compile with docker:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
编译运行上述代码:
|
||||
|
||||
```console
|
||||
$ ecc opensnoop.bpf.c opensnoop.h
|
||||
$ ecc fentry-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Generating export types...
|
||||
Packing ebpf object and config into package.json...
|
||||
$ sudo ecli package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```shell
|
||||
sudo ./ecli run examples/bpftools/opensnoop/package.json
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of opensnoop, the Linux eBPF/bcc version.
|
||||
|
||||
opensnoop traces the open() syscall system-wide, and prints various details.
|
||||
Example output:
|
||||
运行这段程序后,可以通过查看 /sys/kernel/debug/tracing/trace_pipe 文件来查看 eBPF 程序的输出:
|
||||
|
||||
```console
|
||||
# ./opensnoop
|
||||
PID COMM FD ERR PATH
|
||||
17326 <...> 7 0 /sys/kernel/debug/tracing/trace_pipe
|
||||
1576 snmpd 9 0 /proc/net/dev
|
||||
1576 snmpd 11 0 /proc/net/if_inet6
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/eth0/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/conf/eth0/forwarding
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/eth0/base_reachable_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv4/neigh/lo/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/lo/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/conf/lo/forwarding
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/lo/base_reachable_time_ms
|
||||
1576 snmpd 9 0 /proc/diskstats
|
||||
1576 snmpd 9 0 /proc/stat
|
||||
1576 snmpd 9 0 /proc/vmstat
|
||||
1956 supervise 9 0 supervise/status.new
|
||||
1956 supervise 9 0 supervise/status.new
|
||||
17358 run 3 0 /etc/ld.so.cache
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libtinfo.so.5
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
17358 run -1 6 /dev/tty
|
||||
17358 run 3 0 /proc/meminfo
|
||||
17358 run 3 0 /etc/nsswitch.conf
|
||||
17358 run 3 0 /etc/ld.so.cache
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnss_compat.so.2
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnsl.so.1
|
||||
17358 run 3 0 /etc/ld.so.cache
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnss_nis.so.2
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnss_files.so.2
|
||||
17358 run 3 0 /etc/passwd
|
||||
17358 run 3 0 ./run
|
||||
^C
|
||||
``
|
||||
While tracing, the snmpd process opened various /proc files (reading metrics),
|
||||
and a "run" process read various libraries and config files (looks like it
|
||||
was starting up: a new process).
|
||||
|
||||
opensnoop can be useful for discovering configuration and log files, if used
|
||||
during application startup.
|
||||
|
||||
```console
|
||||
The -p option can be used to filter on a PID, which is filtered in-kernel. Here
|
||||
I've used it with -T to print timestamps:
|
||||
|
||||
./opensnoop -Tp 1956
|
||||
TIME(s) PID COMM FD ERR PATH
|
||||
0.000000000 1956 supervise 9 0 supervise/status.new
|
||||
0.000289999 1956 supervise 9 0 supervise/status.new
|
||||
1.023068000 1956 supervise 9 0 supervise/status.new
|
||||
1.023381997 1956 supervise 9 0 supervise/status.new
|
||||
2.046030000 1956 supervise 9 0 supervise/status.new
|
||||
2.046363000 1956 supervise 9 0 supervise/status.new
|
||||
3.068203997 1956 supervise 9 0 supervise/status.new
|
||||
3.068544999 1956 supervise 9 0 supervise/status.new
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
<...>-3840345 [010] d... 3220701.101143: bpf_trace_printk: Process ID: 3840345 enter sys open
|
||||
<...>-3840345 [010] d... 3220701.101179: bpf_trace_printk: Process ID: 3840345 enter sys openat
|
||||
<...>-3840345 [010] d... 3220702.157967: bpf_trace_printk: Process ID: 3840345 enter sys open
|
||||
<...>-3840345 [010] d... 3220702.158000: bpf_trace_printk: Process ID: 3840345 enter sys openat
|
||||
```
|
||||
|
||||
This shows the supervise process is opening the status.new file twice every
|
||||
second.
|
||||
此时,我们已经能够捕获进程打开文件的系统调用了。
|
||||
|
||||
The -U option include UID on output:
|
||||
## 使用全局变量在 eBPF 中过滤进程 pid
|
||||
|
||||
```console
|
||||
# ./opensnoop -U
|
||||
UID PID COMM FD ERR PATH
|
||||
0 27063 vminfo 5 0 /var/run/utmp
|
||||
103 628 dbus-daemon -1 2 /usr/local/share/dbus-1/system-services
|
||||
103 628 dbus-daemon 18 0 /usr/share/dbus-1/system-services
|
||||
103 628 dbus-daemon -1 2 /lib/dbus-1/system-services
|
||||
```
|
||||
在上面的程序中,我们定义了一个全局变量 pid_target 来指定要捕获的进程的 pid。在 tracepoint__syscalls__sys_enter_open 和 tracepoint__syscalls__sys_enter_openat 函数中,我们可以使用这个全局变量来过滤输出,只输出指定的进程的信息。
|
||||
|
||||
The -u option filtering UID:
|
||||
可以通过执行 ecli -h 命令来查看 opensnoop 的帮助信息:
|
||||
|
||||
```console
|
||||
# ./opensnoop -Uu 1000
|
||||
UID PID COMM FD ERR PATH
|
||||
1000 30240 ls 3 0 /etc/ld.so.cache
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libpcre.so.3
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libpthread.so.0
|
||||
```
|
||||
```c
|
||||
|
||||
The -x option only prints failed opens:
|
||||
|
||||
```console
|
||||
# ./opensnoop -x
|
||||
PID COMM FD ERR PATH
|
||||
18372 run -1 6 /dev/tty
|
||||
18373 run -1 6 /dev/tty
|
||||
18373 multilog -1 13 lock
|
||||
18372 multilog -1 13 lock
|
||||
18384 df -1 2 /usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en_US/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en/LC_MESSAGES/coreutils.mo
|
||||
18385 run -1 6 /dev/tty
|
||||
18386 run -1 6 /dev/tty
|
||||
```
|
||||
|
||||
This caught a df command failing to open a coreutils.mo file, and trying from
|
||||
different directories.
|
||||
|
||||
The ERR column is the system error number. Error number 2 is ENOENT: no such
|
||||
file or directory.
|
||||
|
||||
A maximum tracing duration can be set with the -d option. For example, to trace
|
||||
for 2 seconds:
|
||||
|
||||
```console
|
||||
# ./opensnoop -d 2
|
||||
PID COMM FD ERR PATH
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
|
||||
```
|
||||
|
||||
The -n option can be used to filter on process name using partial matches:
|
||||
|
||||
```console
|
||||
# ./opensnoop -n ed
|
||||
|
||||
PID COMM FD ERR PATH
|
||||
2679 sed 3 0 /etc/ld.so.cache
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libpcre.so.3
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libpthread.so.0
|
||||
2679 sed 3 0 /proc/filesystems
|
||||
2679 sed 3 0 /usr/lib/locale/locale-archive
|
||||
2679 sed -1 2
|
||||
2679 sed 3 0 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
|
||||
2679 sed 3 0 /dev/null
|
||||
2680 sed 3 0 /etc/ld.so.cache
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libpcre.so.3
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libpthread.so.0
|
||||
2680 sed 3 0 /proc/filesystems
|
||||
2680 sed 3 0 /usr/lib/locale/locale-archive
|
||||
2680 sed -1 2
|
||||
^C
|
||||
```
|
||||
|
||||
This caught the 'sed' command because it partially matches 'ed' that's passed
|
||||
to the '-n' option.
|
||||
|
||||
The -e option prints out extra columns; for example, the following output
|
||||
contains the flags passed to open(2), in octal:
|
||||
|
||||
```console
|
||||
# ./opensnoop -e
|
||||
PID COMM FD ERR FLAGS PATH
|
||||
28512 sshd 10 0 00101101 /proc/self/oom_score_adj
|
||||
28512 sshd 3 0 02100000 /etc/ld.so.cache
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libwrap.so.0
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libaudit.so.1
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libpam.so.0
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libsystemd.so.0
|
||||
28512 sshd 3 0 02100000 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.2
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libutil.so.1
|
||||
```
|
||||
|
||||
The -f option filters based on flags to the open(2) call, for example:
|
||||
|
||||
```console
|
||||
# ./opensnoop -e -f O_WRONLY -f O_RDWR
|
||||
PID COMM FD ERR FLAGS PATH
|
||||
28084 clear_console 3 0 00100002 /dev/tty
|
||||
28084 clear_console -1 13 00100002 /dev/tty0
|
||||
28084 clear_console -1 13 00100001 /dev/tty0
|
||||
28084 clear_console -1 13 00100002 /dev/console
|
||||
28084 clear_console -1 13 00100001 /dev/console
|
||||
28051 sshd 8 0 02100002 /var/run/utmp
|
||||
28051 sshd 7 0 00100001 /var/log/wtmp
|
||||
```
|
||||
|
||||
The --cgroupmap option filters based on a cgroup set. It is meant to be used
|
||||
with an externally created map.
|
||||
|
||||
```console
|
||||
# ./opensnoop --cgroupmap /sys/fs/bpf/test01
|
||||
```
|
||||
|
||||
For more details, see docs/special_filtering.md
|
||||
```
|
||||
@@ -1,12 +0,0 @@
|
||||
programs:
|
||||
- name: opensnoop
|
||||
metrics:
|
||||
counters:
|
||||
- name: eunomia_file_open_counter
|
||||
description: test
|
||||
labels:
|
||||
- name: pid
|
||||
- name: comm
|
||||
- name: filename
|
||||
from: fname
|
||||
compiled_ebpf_filename: package.json
|
||||
@@ -5,72 +5,20 @@
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include "opensnoop.h"
|
||||
|
||||
struct args_t {
|
||||
const char *fname;
|
||||
int flags;
|
||||
};
|
||||
|
||||
/// Process ID to trace
|
||||
const volatile int pid_target = 0;
|
||||
/// Thread ID to trace
|
||||
const volatile int tgid_target = 0;
|
||||
/// @description User ID to trace
|
||||
const volatile int uid_target = 0;
|
||||
/// @cmdarg {"default": false, "short": "f", "long": "failed"}
|
||||
/// @description trace only failed events
|
||||
const volatile bool targ_failed = false;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 10240);
|
||||
__type(key, u32);
|
||||
__type(value, struct args_t);
|
||||
} start SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(u32));
|
||||
__uint(value_size, sizeof(u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static __always_inline bool valid_uid(uid_t uid) {
|
||||
return uid != INVALID_UID;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
bool trace_allowed(u32 tgid, u32 pid)
|
||||
{
|
||||
u32 uid;
|
||||
|
||||
/* filters */
|
||||
if (tgid_target && tgid_target != tgid)
|
||||
return false;
|
||||
if (pid_target && pid_target != pid)
|
||||
return false;
|
||||
if (valid_uid(uid_target)) {
|
||||
uid = (u32)bpf_get_current_uid_gid();
|
||||
if (uid_target != uid) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_open")
|
||||
int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[0];
|
||||
args.flags = (int)ctx->args[1];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
if (pid_target && pid_target != pid)
|
||||
return false;
|
||||
// Use bpf_printk to print the process information
|
||||
bpf_printk("Process ID: %d enter sys open\n", pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -78,63 +26,14 @@ SEC("tracepoint/syscalls/sys_enter_openat")
|
||||
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[1];
|
||||
args.flags = (int)ctx->args[2];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
if (pid_target && pid_target != pid)
|
||||
return false;
|
||||
// Use bpf_printk to print the process information
|
||||
bpf_printk("Process ID: %d enter sys openat\n", pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
int trace_exit(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
struct event event = {};
|
||||
struct args_t *ap;
|
||||
int ret;
|
||||
u32 pid = bpf_get_current_pid_tgid();
|
||||
|
||||
ap = bpf_map_lookup_elem(&start, &pid);
|
||||
if (!ap)
|
||||
return 0; /* missed entry */
|
||||
ret = ctx->ret;
|
||||
if (targ_failed && ret >= 0)
|
||||
goto cleanup; /* want failed only */
|
||||
|
||||
/* event data */
|
||||
event.pid = bpf_get_current_pid_tgid() >> 32;
|
||||
event.uid = bpf_get_current_uid_gid();
|
||||
bpf_get_current_comm(&event.comm, sizeof(event.comm));
|
||||
bpf_probe_read_user_str(&event.fname, sizeof(event.fname), ap->fname);
|
||||
event.flags = ap->flags;
|
||||
event.ret = ret;
|
||||
|
||||
/* emit event */
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
|
||||
&event, sizeof(event));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_open")
|
||||
int tracepoint__syscalls__sys_exit_open(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_openat")
|
||||
int tracepoint__syscalls__sys_exit_openat(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
/// Trace open family syscalls.
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
|
||||
@@ -6,12 +6,7 @@
|
||||
#include "bashreadline.h"
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(__u32));
|
||||
__uint(value_size, sizeof(__u32));
|
||||
} events SEC(".maps");
|
||||
#define MAX_LINE_SIZE 80
|
||||
|
||||
/* Format of u[ret]probe section definition supporting auto-attach:
|
||||
* u[ret]probe/binary:function[+offset]
|
||||
@@ -25,7 +20,7 @@ struct {
|
||||
*/
|
||||
SEC("uprobe//bin/bash:readline")
|
||||
int BPF_KRETPROBE(printret, const void *ret) {
|
||||
struct str_t data;
|
||||
char str[MAX_LINE_SIZE];
|
||||
char comm[TASK_COMM_LEN];
|
||||
u32 pid;
|
||||
|
||||
@@ -33,14 +28,11 @@ int BPF_KRETPROBE(printret, const void *ret) {
|
||||
return 0;
|
||||
|
||||
bpf_get_current_comm(&comm, sizeof(comm));
|
||||
if (comm[0] != 'b' || comm[1] != 'a' || comm[2] != 's' || comm[3] != 'h' || comm[4] != 0 )
|
||||
return 0;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
data.pid = pid;
|
||||
bpf_probe_read_user_str(&data.str, sizeof(data.str), ret);
|
||||
bpf_probe_read_user_str(str, sizeof(str), ret);
|
||||
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &data, sizeof(data));
|
||||
bpf_printk("PID %d (%s) read: %s ", pid, comm, str);
|
||||
|
||||
return 0;
|
||||
};
|
||||
|
||||
@@ -12,8 +12,8 @@
|
||||
|
||||
- [lesson 0-introduce](0-introduce/README.md) 介绍 eBPF 的基本概念和常见的开发工具
|
||||
- [lesson 1-helloworld](1-helloworld/README.md) 使用 eBPF 开发最简单的「Hello World」程序,介绍 eBPF 的基本框架和开发流程
|
||||
- [lesson 2-fentry-unlink](2-fentry-unlink/README.md) 在 eBPF 中使用 fentry 捕获 unlink 系统调用
|
||||
- [lesson 3-kprobe-unlink](3-kprobe-unlink/README.md) 在 eBPF 中使用 kprobe 捕获 unlink 系统调用
|
||||
- [lesson 2-kprobe-unlink](2-kprobe-unlink/README.md) 在 eBPF 中使用 kprobe 捕获 unlink 系统调用
|
||||
- [lesson 3-fentry-unlink](3-fentry-unlink/README.md) 在 eBPF 中使用 fentry 捕获 unlink 系统调用
|
||||
- [lesson 4-opensnoop](4-opensnoop/README.md) 捕获进程打开文件的系统调用集合,使用全局变量在 eBPF 中过滤进程 pid
|
||||
- [lesson 5-uprobe-bashreadline](5-uprobe-bashreadline/README.md) 使用 uprobe 捕获 bash 的 readline 函数调用
|
||||
- [lesson 6-sigsnoop](6-sigsnoop/README.md) 捕获进程发送信号的系统调用集合,使用 hash map 保存状态
|
||||
|
||||
Reference in New Issue
Block a user