mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-03-20 20:05:56 +08:00
init with documents from eunomia-bpf
This commit is contained in:
161
0-introduce/introduce.md
Normal file
161
0-introduce/introduce.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# eBPF 入门开发实践指南一:介绍与快速上手
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [1. 什么是eBPF](#1-什么是ebpf)
|
||||
- [1.1. 起源](#11-起源)
|
||||
- [1.2. 执行逻辑](#12-执行逻辑)
|
||||
- [1.3. 架构](#13-架构)
|
||||
- [1.3.1. 寄存器设计](#131-寄存器设计)
|
||||
- [1.3.2. 指令编码格式](#132-指令编码格式)
|
||||
- [1.4. 本节参考文章](#14-本节参考文章)
|
||||
- [2. 如何使用eBPF编程](#2-如何使用ebpf编程)
|
||||
- [2.1. BCC](#21-bcc)
|
||||
- [2.2. libbpf-bootstrap](#22-libbpf-bootstrap)
|
||||
- [2.3 eunomia-bpf](#23-eunomia-bpf)
|
||||
|
||||
<!-- /TOC -->
|
||||
|
||||
## 1. 什么是eBPF
|
||||
|
||||
Linux内核一直是实现监控/可观测性、网络和安全功能的理想地方,
|
||||
但是直接在内核中进行监控并不是一个容易的事情。在传统的Linux软件开发中,
|
||||
实现这些功能往往都离不开修改内核源码或加载内核模块。修改内核源码是一件非常危险的行为,
|
||||
稍有不慎可能便会导致系统崩溃,并且每次检验修改的代码都需要重新编译内核,耗时耗力。
|
||||
|
||||
加载内核模块虽然来说更为灵活,不需要重新编译源码,但是也可能导致内核崩溃,且随着内核版本的变化
|
||||
模块也需要进行相应的修改,否则将无法使用。
|
||||
|
||||
在这一背景下,eBPF技术应运而生。它是一项革命性技术,能在内核中运行沙箱程序(sandbox programs),而无需修改内核源码或者加载内核模块。用户可以使用其提供的各种接口,实现在内核中追踪、监测系统的作用。
|
||||
|
||||
### 1.1. 起源
|
||||
|
||||
eBPF的雏形是BPF(Berkeley Packet Filter, 伯克利包过滤器)。BPF于
|
||||
1992年被Steven McCanne和Van Jacobson在其[论文](https://www.tcpdump.org/papers/bpf-usenix93.pdf)
|
||||
提出。二人提出BPF的初衷是是提供一种新的数据包过滤方法,该方法的模型如下图所示。
|
||||

|
||||
|
||||
相较于其他过滤方法,BPF有两大创新点,首先是它使用了一个新的虚拟机,可以有效地工作在基于寄存器结构的CPU之上。其次是其不会全盘复制数据包的所有信息,只会复制相关数据,可以有效地提高效率。这两大创新使得BPF在实际应用中得到了巨大的成功,在被移植到Linux系统后,其被上层的`libcap`
|
||||
和`tcpdump`等应用使用,是一个性能卓越的工具。
|
||||
|
||||
传统的BPF是32位架构,其指令集编码格式为:
|
||||
|
||||
- 16 bit: 操作指令
|
||||
- 8 bit: 下一条指令跳向正确目标的偏移量
|
||||
- 8 bit: 下一条指令跳往错误目标的偏移量
|
||||
|
||||
经过十余年的沉积后,2013年,Alexei Starovoitov对BPF进行了彻底地改造,改造后的BPF被命名为eBPF(extended BPF),于Linux Kernel 3.15中引入Linux内核源码。
|
||||
eBPF相较于BPF有了革命性的变化。首先在于eBPF支持了更多领域的应用,它不仅支持网络包的过滤,还可以通过
|
||||
`kprobe`,`tracepoint`,`lsm`等Linux现有的工具对响应事件进行追踪。另一方面,其在使用上也更为
|
||||
灵活,更为方便。同时,其JIT编译器也得到了升级,解释器也被替换,这直接使得其具有达到平台原生的
|
||||
执行性能的能力。
|
||||
|
||||
### 1.2. 执行逻辑
|
||||
|
||||
eBPF在执行逻辑上和BPF有相似之处,eBPF也可以认为是一个基于寄存器的,使用自定义的64位RISC指令集的
|
||||
微型"虚拟机"。它可以在Linux内核中,以一种安全可控的方式运行本机编译的eBPF程序并且访问内核函数和内存的子集。
|
||||
|
||||
在写好程序后,我们将代码使用llvm编译得到使用BPF指令集的ELF文件,解析出需要注入的部分后调用函数将其
|
||||
注入内核。用户态的程序和注入内核态中的字节码公用一个位于内核的eBPF Map进行通信,实现数据的传递。同时,
|
||||
为了防止我们写入的程序本身不会对内核产生较大影响,编译好的字节码在注入内核之前会被eBPF校验器严格地检查。
|
||||
|
||||
eBPF程序是由事件驱动的,我们在程序中需要提前确定程序的执行点。编译好的程序被注入内核后,如果提前确定的执行点
|
||||
被调用,那么注入的程序就会被触发,按照既定方式处理。
|
||||
|
||||
### 1.3. 架构
|
||||
|
||||
#### 1.3.1. 寄存器设计
|
||||
|
||||
eBPF有11个寄存器,分别是R0~R10,每个寄存器均是64位大小,有相应的32位子寄存器,其指令集是固定的64位宽。
|
||||
|
||||
#### 1.3.2. 指令编码格式
|
||||
|
||||
eBPF指令编码格式为:
|
||||
|
||||
- 8 bit: 存放真实指令码
|
||||
- 4 bit: 存放指令用到的目标寄存器号
|
||||
- 4 bit: 存放指令用到的源寄存器号
|
||||
- 16 bit: 存放偏移量,具体作用取决于指令类型
|
||||
- 32 bit: 存放立即数
|
||||
|
||||
### 1.4. 本节参考文章
|
||||
|
||||
[A thorough introduction to eBPF](https://lwn.net/Articles/740157/)
|
||||
[bpf简介](https://www.collabora.com/news-and-blog/blog/2019/04/05/an-ebpf-overview-part-1-introduction/)
|
||||
[bpf架构知识](https://www.collabora.com/news-and-blog/blog/2019/04/15/an-ebpf-overview-part-2-machine-and-bytecode/)
|
||||
|
||||
## 2. 如何使用eBPF编程
|
||||
|
||||
原始的eBPF程序编写是非常繁琐和困难的。为了改变这一现状,
|
||||
llvm于2015年推出了可以将由高级语言编写的代码编译为eBPF字节码的功能,同时,其将`bpf()`
|
||||
等原始的系统调用进行了初步地封装,给出了`libbpf`库。这些库会包含将字节码加载到内核中
|
||||
的函数以及一些其他的关键函数。在Linux的源码包的`samples/bpf/`目录下,有大量Linux
|
||||
提供的基于`libbpf`的eBPF样例代码。
|
||||
|
||||
一个典型的基于`libbpf`的eBPF程序具有`*_kern.c`和`*_user.c`两个文件,
|
||||
`*_kern.c`中书写在内核中的挂载点以及处理函数,`*_user.c`中书写用户态代码,
|
||||
完成内核态代码注入以及与用户交互的各种任务。 更为详细的教程可以参考[该视频](https://www.bilibili.com/video/BV1f54y1h74r?spm_id_from=333.999.0.0)
|
||||
然而由于该方法仍然较难理解且入门存在一定的难度,因此现阶段的eBPF程序开发大多基于一些工具,比如:
|
||||
|
||||
- BCC
|
||||
- BPFtrace
|
||||
- libbpf-bootstrap
|
||||
|
||||
以及还有比较新的工具,例如 `eunomia-bpf` 将 CO-RE eBPF 功能作为服务运行,包含一个工具链和一个运行时,主要功能包括:
|
||||
|
||||
- 不需要再为每个 eBPF 工具编写用户态代码框架:大多数情况下只需要编写内核态应用程序,即可实现正确加载运行 eBPF 程序;同时所需编写的内核态代码和 libbpf 完全兼容,可轻松实现迁移;
|
||||
- 提供基于 async Rust 的 Prometheus 或 OpenTelemetry 自定义可观测性数据收集器,通常仅占用不到1%的资源开销,编写内核态代码和 yaml 配置文件即可实现 eBPF 信息可视化,编译后可在其他机器上通过 API 请求直接部署;
|
||||
|
||||
### 2.1. BCC
|
||||
|
||||
BCC全称为BPF Compiler Collection,该项目是一个python库,
|
||||
包含了完整的编写、编译、和加载BPF程序的工具链,以及用于调试和诊断性能问题的工具。
|
||||
|
||||
自2015年发布以来,BCC经过上百位贡献者地不断完善后,目前已经包含了大量随时可用的跟踪工具。[其官方项目库](https://github.com/iovisor/bcc/blob/master/docs/tutorial.md)
|
||||
提供了一个方便上手的教程,用户可以快速地根据教程完成BCC入门工作。
|
||||
|
||||
用户可以在BCC上使用Python、Lua等高级语言进行编程。
|
||||
相较于使用C语言直接编程,这些高级语言具有极大的便捷性,用户只需要使用C来设计内核中的
|
||||
BPF程序,其余包括编译、解析、加载等工作在内,均可由BCC完成。
|
||||
|
||||
然而使用BCC存在一个缺点便是在于其兼容性并不好。基于BCC的
|
||||
eBPF程序每次执行时候都需要进行编译,编译则需要用户配置相关的头文件和对应实现。在实际应用中,
|
||||
相信大家也会有体会,编译依赖问题是一个很棘手的问题。也正是因此,在本项目的开发中我们放弃了BCC,
|
||||
选择了可以做到一次编译-多次运行的libbpf-bootstrap工具。
|
||||
|
||||
### 2.2. libbpf-bootstrap
|
||||
|
||||
`libbpf-bootstrap`是一个基于`libbpf`库的BPF开发脚手架,从其
|
||||
[github](https://github.com/libbpf/libbpf-bootstrap) 上可以得到其源码。
|
||||
|
||||
`libbpf-bootstrap`综合了BPF社区过去多年的实践,为开发者提了一个现代化的、便捷的工作流,实
|
||||
现了一次编译,重复使用的目的。
|
||||
|
||||
基于`libbpf-bootstrap`的BPF程序对于源文件有一定的命名规则,
|
||||
用于生成内核态字节码的bpf文件以`.bpf.c`结尾,用户态加载字节码的文件以`.c`结尾,且这两个文件的
|
||||
前缀必须相同。
|
||||
|
||||
基于`libbpf-bootstrap`的BPF程序在编译时会先将`*.bpf.c`文件编译为
|
||||
对应的`.o`文件,然后根据此文件生成`skeleton`文件,即`*.skel.h`,这个文件会包含内核态中定义的一些
|
||||
数据结构,以及用于装载内核态代码的关键函数。在用户态代码`include`此文件之后调用对应的装载函数即可将
|
||||
字节码装载到内核中。同样的,`libbpf-bootstrap`也有非常完备的入门教程,用户可以在[该处](https://nakryiko.com/posts/libbpf-bootstrap/)
|
||||
得到详细的入门操作介绍。
|
||||
|
||||
### 2.3 eunomia-bpf
|
||||
|
||||
开发、构建和分发 eBPF 一直以来都是一个高门槛的工作,使用 BCC、bpftrace 等工具开发效率高、可移植性好,但是分发部署时需要安装 LLVM、Clang等编译环境,每次运行的时候执行本地或远程编译过程,资源消耗较大;使用原生的 CO-RE libbpf时又需要编写不少用户态加载代码来帮助 eBPF 程序正确加载和从内核中获取上报的信息,同时对于 eBPF 程序的分发、管理也没有很好地解决方案.
|
||||
|
||||
[eunomia-bpf](https://github.com/eunomia-bpf/eunomia-bpf) 是一个开源的 eBPF 动态加载运行时和开发工具链,是为了简化 eBPF 程序的开发、构建、分发、运行而设计的,基于 libbpf 的 CO-RE 轻量级开发框架。
|
||||
|
||||
使用 eunomia-bpf ,可以:
|
||||
|
||||
- 在编写 eBPF 程序或工具时只编写内核态代码,自动获取内核态导出信息;
|
||||
- 使用 WASM 进行用户态交互程序的开发,在 WASM 虚拟机内部控制整个 eBPF 程序的加载和执行,以及处理相关数据;
|
||||
- eunomia-bpf 可以将预编译的 eBPF 程序打包为通用的 JSON 或 WASM 模块,跨架构和内核版本进行分发,无需重新编译即可动态加载运行。
|
||||
|
||||
eunomia-bpf 由一个编译工具链和一个运行时库组成, 对比传统的 BCC、原生 libbpf 等框架,大幅简化了 eBPF 程序的开发流程,在大多数时候只需编写内核态代码,即可轻松构建、打包、发布完整的 eBPF 应用,同时内核态 eBPF 代码保证和主流的 libbpf, libbpfgo, libbpf-rs 等开发框架的 100% 兼容性。需要编写用户态代码的时候,也可以借助 Webassembly 实现通过多种语言进行用户态开发。和 bpftrace 等脚本工具相比, eunomia-bpf 保留了类似的便捷性, 同时不仅局限于 trace 方面, 可以用于更多的场景, 如网络、安全等等。
|
||||
|
||||
> - eunomia-bpf 项目 Github 地址: <https://github.com/eunomia-bpf/eunomia-bpf>
|
||||
> - gitee 镜像: <https://gitee.com/anolis/eunomia>
|
||||
|
||||
## 参考资料
|
||||
6
1-helloworld/.gitignore
vendored
Normal file
6
1-helloworld/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
.vscode
|
||||
package.json
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
57
1-helloworld/README.md
Normal file
57
1-helloworld/README.md
Normal file
@@ -0,0 +1,57 @@
|
||||
---
|
||||
layout: post
|
||||
title: minimal
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, tracepoint, example, syscall]
|
||||
summary: a minimal example of a BPF application installs a tracepoint handler which is triggered by write syscall
|
||||
---
|
||||
|
||||
|
||||
`minimal` is just that – a minimal practical BPF application example. It
|
||||
doesn't use or require BPF CO-RE, so should run on quite old kernels. It
|
||||
installs a tracepoint handler which is triggered once every second. It uses
|
||||
`bpf_printk()` BPF helper to communicate with the world.
|
||||
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/minimal/package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
To see it's output,
|
||||
read `/sys/kernel/debug/tracing/trace_pipe` file as a root:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
<...>-3840345 [010] d... 3220701.101143: bpf_trace_printk: BPF triggered from PID 3840345.
|
||||
<...>-3840345 [010] d... 3220702.101265: bpf_trace_printk: BPF triggered from PID 3840345.
|
||||
```
|
||||
|
||||
`minimal` is great as a bare-bones experimental playground to quickly try out
|
||||
new ideas or BPF features.
|
||||
|
||||
## Compile and Run
|
||||
|
||||
|
||||
|
||||
Compile:
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc minimal.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
sudo ecli ./package.json
|
||||
```
|
||||
21
1-helloworld/minimal.bpf.c
Normal file
21
1-helloworld/minimal.bpf.c
Normal file
@@ -0,0 +1,21 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#define BPF_NO_GLOBAL_DATA
|
||||
#include <linux/bpf.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
|
||||
typedef unsigned int u32;
|
||||
typedef int pid_t;
|
||||
const pid_t pid_filter = 0;
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
|
||||
SEC("tp/syscalls/sys_enter_write")
|
||||
int handle_tp(void *ctx)
|
||||
{
|
||||
pid_t pid = bpf_get_current_pid_tgid() >> 32;
|
||||
if (pid_filter && pid != pid_filter)
|
||||
return 0;
|
||||
bpf_printk("BPF triggered from PID %d.\n", pid);
|
||||
return 0;
|
||||
}
|
||||
6
10-lsm-connect/.gitignore
vendored
Normal file
6
10-lsm-connect/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
.vscode
|
||||
package.json
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
34
10-lsm-connect/README.md
Normal file
34
10-lsm-connect/README.md
Normal file
@@ -0,0 +1,34 @@
|
||||
---
|
||||
layout: post
|
||||
title: lsm-connect
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, examples, lsm, no-output]
|
||||
summary: BPF LSM program (on socket_connect hook) that prevents any connection towards 1.1.1.1 to happen. Found in demo-cloud-native-ebpf-day
|
||||
---
|
||||
|
||||
|
||||
## run
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc lsm-connect.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
sudo ecli examples/bpftools/lsm-connect/package.json
|
||||
```
|
||||
|
||||
## reference
|
||||
|
||||
<https://github.com/leodido/demo-cloud-native-ebpf-day>
|
||||
41
10-lsm-connect/lsm-connect.bpf.c
Normal file
41
10-lsm-connect/lsm-connect.bpf.c
Normal file
@@ -0,0 +1,41 @@
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_core_read.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
|
||||
#define EPERM 1
|
||||
#define AF_INET 2
|
||||
|
||||
const __u32 blockme = 16843009; // 1.1.1.1 -> int
|
||||
|
||||
SEC("lsm/socket_connect")
|
||||
int BPF_PROG(restrict_connect, struct socket *sock, struct sockaddr *address, int addrlen, int ret)
|
||||
{
|
||||
// Satisfying "cannot override a denial" rule
|
||||
if (ret != 0)
|
||||
{
|
||||
return ret;
|
||||
}
|
||||
|
||||
// Only IPv4 in this example
|
||||
if (address->sa_family != AF_INET)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
// Cast the address to an IPv4 socket address
|
||||
struct sockaddr_in *addr = (struct sockaddr_in *)address;
|
||||
|
||||
// Where do you want to go?
|
||||
__u32 dest = addr->sin_addr.s_addr;
|
||||
bpf_printk("lsm: found connect to %d", dest);
|
||||
|
||||
if (dest == blockme)
|
||||
{
|
||||
bpf_printk("lsm: blocking %d", dest);
|
||||
return -EPERM;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
10
11-tc/.gitignore
vendored
Executable file
10
11-tc/.gitignore
vendored
Executable file
@@ -0,0 +1,10 @@
|
||||
.vscode
|
||||
package.json
|
||||
*.wasm
|
||||
ewasm-skel.h
|
||||
ecli
|
||||
ewasm
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
56
11-tc/README.md
Normal file
56
11-tc/README.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
layout: post
|
||||
title: tc
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, tc, example]
|
||||
summary: a minimal example of a BPF application use tc
|
||||
---
|
||||
|
||||
|
||||
`tc` (short for Traffic Control) is an example of handling ingress network traffics.
|
||||
It creates a qdisc on the `lo` interface and attaches the `tc_ingress` BPF program to it.
|
||||
It reports the metadata of the IP packets that coming into the `lo` interface.
|
||||
|
||||
```shell
|
||||
$ sudo ecli ./package.json
|
||||
...
|
||||
Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` to see output of the BPF program.
|
||||
......
|
||||
```
|
||||
|
||||
The `tc` output in `/sys/kernel/debug/tracing/trace_pipe` should look
|
||||
something like this:
|
||||
|
||||
```
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
node-1254811 [007] ..s1 8737831.671074: 0: Got IP packet: tot_len: 79, ttl: 64
|
||||
sshd-1254728 [006] ..s1 8737831.674334: 0: Got IP packet: tot_len: 79, ttl: 64
|
||||
sshd-1254728 [006] ..s1 8737831.674349: 0: Got IP packet: tot_len: 72, ttl: 64
|
||||
node-1254811 [007] ..s1 8737831.674550: 0: Got IP packet: tot_len: 71, ttl: 64
|
||||
```
|
||||
|
||||
## Compile and Run
|
||||
|
||||
|
||||
|
||||
Compile:
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc tc.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
sudo ecli ./package.json
|
||||
```
|
||||
36
11-tc/tc.bpf.c
Normal file
36
11-tc/tc.bpf.c
Normal file
@@ -0,0 +1,36 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
/* Copyright (c) 2022 Hengqi Chen */
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
|
||||
#define TC_ACT_OK 0
|
||||
#define ETH_P_IP 0x0800 /* Internet Protocol packet */
|
||||
|
||||
/// @tchook {"ifindex":1, "attach_point":"BPF_TC_INGRESS"}
|
||||
/// @tcopts {"handle":1, "priority":1}
|
||||
SEC("tc")
|
||||
int tc_ingress(struct __sk_buff *ctx)
|
||||
{
|
||||
void *data_end = (void *)(__u64)ctx->data_end;
|
||||
void *data = (void *)(__u64)ctx->data;
|
||||
struct ethhdr *l2;
|
||||
struct iphdr *l3;
|
||||
|
||||
if (ctx->protocol != bpf_htons(ETH_P_IP))
|
||||
return TC_ACT_OK;
|
||||
|
||||
l2 = data;
|
||||
if ((void *)(l2 + 1) > data_end)
|
||||
return TC_ACT_OK;
|
||||
|
||||
l3 = (struct iphdr *)(l2 + 1);
|
||||
if ((void *)(l3 + 1) > data_end)
|
||||
return TC_ACT_OK;
|
||||
|
||||
bpf_printk("Got IP packet: tot_len: %d, ttl: %d", bpf_ntohs(l3->tot_len), l3->ttl);
|
||||
return TC_ACT_OK;
|
||||
}
|
||||
|
||||
char __license[] SEC("license") = "GPL";
|
||||
3
12-bindsnoop/.gitignore
vendored
Normal file
3
12-bindsnoop/.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
.vscode
|
||||
package.json
|
||||
ecli
|
||||
106
12-bindsnoop/README.md
Normal file
106
12-bindsnoop/README.md
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
layout: post
|
||||
title: bindsnoop
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall, kprobe, perf-event]
|
||||
summary: This tool traces the kernel function performing socket binding and print socket options set before the system call.
|
||||
---
|
||||
|
||||
## origin
|
||||
|
||||
origin from:
|
||||
|
||||
https://github.com/iovisor/bcc/blob/master/libbpf-tools/bindsnoop.bpf.c
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```shell
|
||||
sudo ./ecli run examples/bpftools/bindsnoop/package.json
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of bindsnoop, the Linux eBPF/bcc version.
|
||||
|
||||
This tool traces the kernel function performing socket binding and
|
||||
print socket options set before the system call invocation that might
|
||||
```console
|
||||
impact bind behavior and bound interface:
|
||||
SOL_IP IP_FREEBIND F....
|
||||
SOL_IP IP_TRANSPARENT .T...
|
||||
SOL_IP IP_BIND_ADDRESS_NO_PORT ..N..
|
||||
SOL_SOCKET SO_REUSEADDR ...R.
|
||||
SOL_SOCKET SO_REUSEPORT ....r
|
||||
```
|
||||
```console
|
||||
# ./bindsnoop.py
|
||||
Tracing binds ... Hit Ctrl-C to end
|
||||
PID COMM PROT ADDR PORT OPTS IF
|
||||
3941081 test_bind_op TCP 192.168.1.102 0 F.N.. 0
|
||||
3940194 dig TCP :: 62087 ..... 0
|
||||
3940219 dig UDP :: 48665 ..... 0
|
||||
3940893 Acceptor Thr TCP :: 35343 ...R. 0
|
||||
```
|
||||
The output shows four bind system calls:
|
||||
two "test_bind_op" instances, one with IP_FREEBIND and IP_BIND_ADDRESS_NO_PORT
|
||||
options, dig process called bind for TCP and UDP sockets,
|
||||
and Acceptor called bind for TCP with SO_REUSEADDR option set.
|
||||
|
||||
|
||||
The -t option prints a timestamp column
|
||||
```console
|
||||
# ./bindsnoop.py -t
|
||||
TIME(s) PID COMM PROT ADDR PORT OPTS IF
|
||||
0.000000 3956801 dig TCP :: 49611 ..... 0
|
||||
0.011045 3956822 dig UDP :: 56343 ..... 0
|
||||
2.310629 3956498 test_bind_op TCP 192.168.1.102 39609 F...r 0
|
||||
```
|
||||
|
||||
The -U option prints a UID column:
|
||||
```console
|
||||
# ./bindsnoop.py -U
|
||||
Tracing binds ... Hit Ctrl-C to end
|
||||
UID PID COMM PROT ADDR PORT OPTS IF
|
||||
127072 3956498 test_bind_op TCP 192.168.1.102 44491 F...r 0
|
||||
127072 3960261 Acceptor Thr TCP :: 48869 ...R. 0
|
||||
0 3960729 Acceptor Thr TCP :: 44637 ...R. 0
|
||||
0 3959075 chef-client UDP :: 61722 ..... 0
|
||||
```
|
||||
|
||||
The -u option filtering UID:
|
||||
```console
|
||||
# ./bindsnoop.py -Uu 0
|
||||
Tracing binds ... Hit Ctrl-C to end
|
||||
UID PID COMM PROT ADDR PORT OPTS IF
|
||||
0 3966330 Acceptor Thr TCP :: 39319 ...R. 0
|
||||
0 3968044 python3.7 TCP ::1 59371 ..... 0
|
||||
0 10224 fetch TCP 0.0.0.0 42091 ...R. 0
|
||||
```
|
||||
|
||||
The --cgroupmap option filters based on a cgroup set.
|
||||
It is meant to be used with an externally created map.
|
||||
```console
|
||||
# ./bindsnoop.py --cgroupmap /sys/fs/bpf/test01
|
||||
```
|
||||
For more details, see docs/special_filtering.md
|
||||
|
||||
|
||||
In order to track heavy bind usage one can use --count option
|
||||
```console
|
||||
# ./bindsnoop.py --count
|
||||
Tracing binds ... Hit Ctrl-C to end
|
||||
LADDR LPORT BINDS
|
||||
0.0.0.0 6771 4
|
||||
0.0.0.0 4433 4
|
||||
127.0.0.1 33665 1
|
||||
```
|
||||
151
12-bindsnoop/bindsnoop.bpf.c
Normal file
151
12-bindsnoop/bindsnoop.bpf.c
Normal file
@@ -0,0 +1,151 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
/* Copyright (c) 2021 Hengqi Chen */
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_endian.h>
|
||||
#include "bindsnoop.bpf.h"
|
||||
|
||||
#define MAX_ENTRIES 10240
|
||||
#define MAX_PORTS 1024
|
||||
|
||||
const volatile bool filter_cg = false;
|
||||
const volatile pid_t target_pid = 0;
|
||||
const volatile bool ignore_errors = true;
|
||||
const volatile bool filter_by_port = false;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
|
||||
__type(key, u32);
|
||||
__type(value, u32);
|
||||
__uint(max_entries, 1);
|
||||
} cgroup_map SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_ENTRIES);
|
||||
__type(key, __u32);
|
||||
__type(value, struct socket *);
|
||||
} sockets SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_PORTS);
|
||||
__type(key, __u16);
|
||||
__type(value, __u16);
|
||||
} ports SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(__u32));
|
||||
__uint(value_size, sizeof(__u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static int probe_entry(struct pt_regs *ctx, struct socket *socket)
|
||||
{
|
||||
__u64 pid_tgid = bpf_get_current_pid_tgid();
|
||||
__u32 pid = pid_tgid >> 32;
|
||||
__u32 tid = (__u32)pid_tgid;
|
||||
|
||||
if (target_pid && target_pid != pid)
|
||||
return 0;
|
||||
|
||||
bpf_map_update_elem(&sockets, &tid, &socket, BPF_ANY);
|
||||
return 0;
|
||||
};
|
||||
|
||||
static int probe_exit(struct pt_regs *ctx, short ver)
|
||||
{
|
||||
__u64 pid_tgid = bpf_get_current_pid_tgid();
|
||||
__u32 pid = pid_tgid >> 32;
|
||||
__u32 tid = (__u32)pid_tgid;
|
||||
struct socket **socketp, *socket;
|
||||
struct inet_sock *inet_sock;
|
||||
struct sock *sock;
|
||||
union bind_options opts;
|
||||
struct bind_event event = {};
|
||||
__u16 sport = 0, *port;
|
||||
int ret;
|
||||
|
||||
socketp = bpf_map_lookup_elem(&sockets, &tid);
|
||||
if (!socketp)
|
||||
return 0;
|
||||
|
||||
ret = PT_REGS_RC(ctx);
|
||||
if (ignore_errors && ret != 0)
|
||||
goto cleanup;
|
||||
|
||||
socket = *socketp;
|
||||
sock = BPF_CORE_READ(socket, sk);
|
||||
inet_sock = (struct inet_sock *)sock;
|
||||
|
||||
sport = bpf_ntohs(BPF_CORE_READ(inet_sock, inet_sport));
|
||||
port = bpf_map_lookup_elem(&ports, &sport);
|
||||
if (filter_by_port && !port)
|
||||
goto cleanup;
|
||||
|
||||
opts.fields.freebind = BPF_CORE_READ_BITFIELD_PROBED(inet_sock, freebind);
|
||||
opts.fields.transparent = BPF_CORE_READ_BITFIELD_PROBED(inet_sock, transparent);
|
||||
opts.fields.bind_address_no_port = BPF_CORE_READ_BITFIELD_PROBED(inet_sock, bind_address_no_port);
|
||||
opts.fields.reuseaddress = BPF_CORE_READ_BITFIELD_PROBED(sock, __sk_common.skc_reuse);
|
||||
opts.fields.reuseport = BPF_CORE_READ_BITFIELD_PROBED(sock, __sk_common.skc_reuseport);
|
||||
event.opts = opts.data;
|
||||
event.ts_us = bpf_ktime_get_ns() / 1000;
|
||||
event.pid = pid;
|
||||
event.port = sport;
|
||||
event.bound_dev_if = BPF_CORE_READ(sock, __sk_common.skc_bound_dev_if);
|
||||
event.ret = ret;
|
||||
event.proto = BPF_CORE_READ_BITFIELD_PROBED(sock, sk_protocol);
|
||||
bpf_get_current_comm(&event.task, sizeof(event.task));
|
||||
if (ver == 4) {
|
||||
event.ver = ver;
|
||||
bpf_probe_read_kernel(&event.addr, sizeof(event.addr), &inet_sock->inet_saddr);
|
||||
} else { /* ver == 6 */
|
||||
event.ver = ver;
|
||||
bpf_probe_read_kernel(&event.addr, sizeof(event.addr), sock->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32);
|
||||
}
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&sockets, &tid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("kprobe/inet_bind")
|
||||
int BPF_KPROBE(ipv4_bind_entry, struct socket *socket)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_entry(ctx, socket);
|
||||
}
|
||||
|
||||
SEC("kretprobe/inet_bind")
|
||||
int BPF_KRETPROBE(ipv4_bind_exit)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_exit(ctx, 4);
|
||||
}
|
||||
|
||||
SEC("kprobe/inet6_bind")
|
||||
int BPF_KPROBE(ipv6_bind_entry, struct socket *socket)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_entry(ctx, socket);
|
||||
}
|
||||
|
||||
SEC("kretprobe/inet6_bind")
|
||||
int BPF_KRETPROBE(ipv6_bind_exit)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_exit(ctx, 6);
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
31
12-bindsnoop/bindsnoop.bpf.h
Normal file
31
12-bindsnoop/bindsnoop.bpf.h
Normal file
@@ -0,0 +1,31 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __BINDSNOOP_H
|
||||
#define __BINDSNOOP_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
|
||||
struct bind_event {
|
||||
unsigned __int128 addr;
|
||||
unsigned long long ts_us;
|
||||
unsigned int pid;
|
||||
unsigned int bound_dev_if;
|
||||
int ret;
|
||||
unsigned short port;
|
||||
unsigned short proto;
|
||||
unsigned char opts;
|
||||
unsigned char ver;
|
||||
char task[TASK_COMM_LEN];
|
||||
};
|
||||
|
||||
union bind_options {
|
||||
unsigned char data;
|
||||
struct {
|
||||
unsigned char freebind : 1;
|
||||
unsigned char transparent : 1;
|
||||
unsigned char bind_address_no_port : 1;
|
||||
unsigned char reuseaddress : 1;
|
||||
unsigned char reuseport : 1;
|
||||
} fields;
|
||||
};
|
||||
|
||||
#endif /* __BINDSNOOP_H */
|
||||
95
12-bindsnoop/bindsnoop.md
Normal file
95
12-bindsnoop/bindsnoop.md
Normal file
@@ -0,0 +1,95 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 Bindsnoopn 监控 socket 端口绑定事件
|
||||
|
||||
### 背景
|
||||
|
||||
Bindsnoop 会跟踪操作 socket 端口绑定的内核函数,并且在可能会影响端口绑定的系统调用发生之前,打印
|
||||
现有的 socket 选项。
|
||||
|
||||
### 实现原理
|
||||
|
||||
Bindsnoop 通过kprobe实现。其主要挂载点为 inet_bind 和 inet6_bind。inet_bind 为处理 IPV4 类型
|
||||
socket 端口绑定系统调用的接口,inet6_bind 为处理IPV6类型 socket 端口绑定系统调用的接口。
|
||||
|
||||
```c
|
||||
SEC("kprobe/inet_bind")
|
||||
int BPF_KPROBE(ipv4_bind_entry, struct socket *socket)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_entry(ctx, socket);
|
||||
}
|
||||
SEC("kretprobe/inet_bind")
|
||||
|
||||
int BPF_KRETPROBE(ipv4_bind_exit)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_exit(ctx, 4);
|
||||
}
|
||||
|
||||
SEC("kprobe/inet6_bind")
|
||||
int BPF_KPROBE(ipv6_bind_entry, struct socket *socket)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_entry(ctx, socket);
|
||||
}
|
||||
|
||||
SEC("kretprobe/inet6_bind")
|
||||
int BPF_KRETPROBE(ipv6_bind_exit)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return probe_exit(ctx, 6);
|
||||
}
|
||||
```
|
||||
|
||||
当系统试图进行socket端口绑定操作时, kprobe挂载的处理函数会被触发。在进入绑定函数时,`probe_entry`会先被
|
||||
调用,它会以 tid 为主键将 socket 信息存入 map 中。
|
||||
|
||||
```c
|
||||
static int probe_entry(struct pt_regs *ctx, struct socket *socket)
|
||||
{
|
||||
__u64 pid_tgid = bpf_get_current_pid_tgid();
|
||||
__u32 pid = pid_tgid >> 32;
|
||||
__u32 tid = (__u32)pid_tgid;
|
||||
|
||||
if (target_pid && target_pid != pid)
|
||||
return 0;
|
||||
|
||||
bpf_map_update_elem(&sockets, &tid, &socket, BPF_ANY);
|
||||
return 0;
|
||||
};
|
||||
```
|
||||
在执行完绑定函数后,`probe_exit`函数会被调用。该函数会读取tid对应的socket信息,将其和其他信息一起
|
||||
写入 event 结构体并输出到用户态。
|
||||
|
||||
```c
|
||||
struct bind_event {
|
||||
unsigned __int128 addr;
|
||||
__u64 ts_us;
|
||||
__u32 pid;
|
||||
__u32 bound_dev_if;
|
||||
int ret;
|
||||
__u16 port;
|
||||
__u16 proto;
|
||||
__u8 opts;
|
||||
__u8 ver;
|
||||
char task[TASK_COMM_LEN];
|
||||
};
|
||||
```
|
||||
|
||||
当用户停止该工具时,其用户态代码会读取存入的数据并按要求打印。
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||

|
||||

|
||||
|
||||
### 总结
|
||||
|
||||
Bindsnoop 通过 kprobe 挂载点,实现了对 socket 端口的监视,增强了 Eunomia 的应用范围。
|
||||
2
13-tcpconnlat/.gitignore
vendored
Normal file
2
13-tcpconnlat/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
.vscode
|
||||
package.json
|
||||
137
13-tcpconnlat/README.md
Normal file
137
13-tcpconnlat/README.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
layout: post
|
||||
title: tcpconnlat
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall, network]
|
||||
summary: Traces the kernel function performing active TCP connections(eg, via a connect() syscall; accept() are passive connections). and show connection latency.
|
||||
---
|
||||
|
||||
## origin
|
||||
|
||||
origin from:
|
||||
|
||||
https://github.com/iovisor/bcc/blob/master/libbpf-tools/tcpconnlat.bpf.c
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```shell
|
||||
sudo ./ecli run package.json
|
||||
```
|
||||
|
||||
TODO: support union in C
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of tcpconnect, the Linux eBPF/bcc version.
|
||||
|
||||
|
||||
This tool traces the kernel function performing active TCP connections
|
||||
(eg, via a connect() syscall; accept() are passive connections). Some example
|
||||
output (IP addresses changed to protect the innocent):
|
||||
```console
|
||||
# ./tcpconnect
|
||||
PID COMM IP SADDR DADDR DPORT
|
||||
1479 telnet 4 127.0.0.1 127.0.0.1 23
|
||||
1469 curl 4 10.201.219.236 54.245.105.25 80
|
||||
1469 curl 4 10.201.219.236 54.67.101.145 80
|
||||
1991 telnet 6 ::1 ::1 23
|
||||
2015 ssh 6 fe80::2000:bff:fe82:3ac fe80::2000:bff:fe82:3ac 22
|
||||
```
|
||||
This output shows four connections, one from a "telnet" process, two from
|
||||
"curl", and one from "ssh". The output details shows the IP version, source
|
||||
address, destination address, and destination port. This traces attempted
|
||||
connections: these may have failed.
|
||||
|
||||
The overhead of this tool should be negligible, since it is only tracing the
|
||||
kernel functions performing connect. It is not tracing every packet and then
|
||||
filtering.
|
||||
|
||||
|
||||
The -t option prints a timestamp column:
|
||||
```console
|
||||
# ./tcpconnect -t
|
||||
TIME(s) PID COMM IP SADDR DADDR DPORT
|
||||
31.871 2482 local_agent 4 10.103.219.236 10.251.148.38 7001
|
||||
31.874 2482 local_agent 4 10.103.219.236 10.101.3.132 7001
|
||||
31.878 2482 local_agent 4 10.103.219.236 10.171.133.98 7101
|
||||
90.917 2482 local_agent 4 10.103.219.236 10.251.148.38 7001
|
||||
90.928 2482 local_agent 4 10.103.219.236 10.102.64.230 7001
|
||||
90.938 2482 local_agent 4 10.103.219.236 10.115.167.169 7101
|
||||
```
|
||||
The output shows some periodic connections (or attempts) from a "local_agent"
|
||||
process to various other addresses. A few connections occur every minute.
|
||||
|
||||
The -d option tracks DNS responses and tries to associate each connection with
|
||||
the a previous DNS query issued before it. If a DNS response matching the IP
|
||||
is found, it will be printed. If no match was found, "No DNS Query" is printed
|
||||
in this column. Queries for 127.0.0.1 and ::1 are automatically associated with
|
||||
"localhost". If the time between when the DNS response was received and a
|
||||
connect call was traced exceeds 100ms, the tool will print the time delta
|
||||
after the query name. See below for www.domain.com for an example.
|
||||
```console
|
||||
# ./tcpconnect -d
|
||||
PID COMM IP SADDR DADDR DPORT QUERY
|
||||
1543 amazon-ssm-a 4 10.66.75.54 176.32.119.67 443 ec2messages.us-west-1.amazonaws.com
|
||||
1479 telnet 4 127.0.0.1 127.0.0.1 23 localhost
|
||||
1469 curl 4 10.201.219.236 54.245.105.25 80 www.domain.com (123.342ms)
|
||||
1469 curl 4 10.201.219.236 54.67.101.145 80 No DNS Query
|
||||
1991 telnet 6 ::1 ::1 23 localhost
|
||||
2015 ssh 6 fe80::2000:bff:fe82:3ac fe80::2000:bff:fe82:3ac 22 anotherhost.org
|
||||
```
|
||||
|
||||
The -L option prints a LPORT column:
|
||||
```console
|
||||
# ./tcpconnect -L
|
||||
PID COMM IP SADDR LPORT DADDR DPORT
|
||||
3706 nc 4 192.168.122.205 57266 192.168.122.150 5000
|
||||
3722 ssh 4 192.168.122.205 50966 192.168.122.150 22
|
||||
3779 ssh 6 fe80::1 52328 fe80::2 22
|
||||
```
|
||||
|
||||
The -U option prints a UID column:
|
||||
```console
|
||||
# ./tcpconnect -U
|
||||
UID PID COMM IP SADDR DADDR DPORT
|
||||
0 31333 telnet 6 ::1 ::1 23
|
||||
0 31333 telnet 4 127.0.0.1 127.0.0.1 23
|
||||
1000 31322 curl 4 127.0.0.1 127.0.0.1 80
|
||||
1000 31322 curl 6 ::1 ::1 80
|
||||
```
|
||||
|
||||
The -u option filtering UID:
|
||||
```console
|
||||
# ./tcpconnect -Uu 1000
|
||||
UID PID COMM IP SADDR DADDR DPORT
|
||||
1000 31338 telnet 6 ::1 ::1 23
|
||||
1000 31338 telnet 4 127.0.0.1 127.0.0.1 23
|
||||
```
|
||||
To spot heavy outbound connections quickly one can use the -c flag. It will
|
||||
count all active connections per source ip and destination ip/port.
|
||||
```console
|
||||
# ./tcpconnect.py -c
|
||||
Tracing connect ... Hit Ctrl-C to end
|
||||
^C
|
||||
LADDR RADDR RPORT CONNECTS
|
||||
192.168.10.50 172.217.21.194 443 70
|
||||
192.168.10.50 172.213.11.195 443 34
|
||||
192.168.10.50 172.212.22.194 443 21
|
||||
[...]
|
||||
```
|
||||
|
||||
The --cgroupmap option filters based on a cgroup set. It is meant to be used
|
||||
with an externally created map.
|
||||
```console
|
||||
# ./tcpconnect --cgroupmap /sys/fs/bpf/test01
|
||||
```
|
||||
For more details, see docs/special_filtering.md
|
||||
|
||||
113
13-tcpconnlat/tcpconnlat.bpf.c
Normal file
113
13-tcpconnlat/tcpconnlat.bpf.c
Normal file
@@ -0,0 +1,113 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Wenbo Zhang
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include "tcpconnlat.bpf.h"
|
||||
|
||||
#define AF_INET 2
|
||||
#define AF_INET6 10
|
||||
|
||||
const volatile __u64 targ_min_us = 0;
|
||||
const volatile pid_t targ_tgid = 0;
|
||||
|
||||
struct piddata {
|
||||
char comm[TASK_COMM_LEN];
|
||||
u64 ts;
|
||||
u32 tgid;
|
||||
};
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 4096);
|
||||
__type(key, struct sock *);
|
||||
__type(value, struct piddata);
|
||||
} start SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(u32));
|
||||
__uint(value_size, sizeof(u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static int trace_connect(struct sock *sk)
|
||||
{
|
||||
u32 tgid = bpf_get_current_pid_tgid() >> 32;
|
||||
struct piddata piddata = {};
|
||||
|
||||
if (targ_tgid && targ_tgid != tgid)
|
||||
return 0;
|
||||
|
||||
bpf_get_current_comm(&piddata.comm, sizeof(piddata.comm));
|
||||
piddata.ts = bpf_ktime_get_ns();
|
||||
piddata.tgid = tgid;
|
||||
bpf_map_update_elem(&start, &sk, &piddata, 0);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int handle_tcp_rcv_state_process(void *ctx, struct sock *sk)
|
||||
{
|
||||
struct piddata *piddatap;
|
||||
struct event event = {};
|
||||
s64 delta;
|
||||
u64 ts;
|
||||
|
||||
if (BPF_CORE_READ(sk, __sk_common.skc_state) != TCP_SYN_SENT)
|
||||
return 0;
|
||||
|
||||
piddatap = bpf_map_lookup_elem(&start, &sk);
|
||||
if (!piddatap)
|
||||
return 0;
|
||||
|
||||
ts = bpf_ktime_get_ns();
|
||||
delta = (s64)(ts - piddatap->ts);
|
||||
if (delta < 0)
|
||||
goto cleanup;
|
||||
|
||||
event.delta_us = delta / 1000U;
|
||||
if (targ_min_us && event.delta_us < targ_min_us)
|
||||
goto cleanup;
|
||||
__builtin_memcpy(&event.comm, piddatap->comm,
|
||||
sizeof(event.comm));
|
||||
event.ts_us = ts / 1000;
|
||||
event.tgid = piddatap->tgid;
|
||||
event.lport = BPF_CORE_READ(sk, __sk_common.skc_num);
|
||||
event.dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
|
||||
event.af = BPF_CORE_READ(sk, __sk_common.skc_family);
|
||||
if (event.af == AF_INET) {
|
||||
event.saddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr);
|
||||
event.daddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_daddr);
|
||||
} else {
|
||||
BPF_CORE_READ_INTO(&event.saddr_v6, sk,
|
||||
__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32);
|
||||
BPF_CORE_READ_INTO(&event.daddr_v6, sk,
|
||||
__sk_common.skc_v6_daddr.in6_u.u6_addr32);
|
||||
}
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
|
||||
&event, sizeof(event));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &sk);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("kprobe/tcp_v4_connect")
|
||||
int BPF_KPROBE(tcp_v4_connect, struct sock *sk)
|
||||
{
|
||||
return trace_connect(sk);
|
||||
}
|
||||
|
||||
SEC("kprobe/tcp_v6_connect")
|
||||
int BPF_KPROBE(tcp_v6_connect, struct sock *sk)
|
||||
{
|
||||
return trace_connect(sk);
|
||||
}
|
||||
|
||||
SEC("kprobe/tcp_rcv_state_process")
|
||||
int BPF_KPROBE(tcp_rcv_state_process, struct sock *sk)
|
||||
{
|
||||
return handle_tcp_rcv_state_process(ctx, sk);
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
26
13-tcpconnlat/tcpconnlat.bpf.h
Normal file
26
13-tcpconnlat/tcpconnlat.bpf.h
Normal file
@@ -0,0 +1,26 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __TCPCONNLAT_H
|
||||
#define __TCPCONNLAT_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
|
||||
struct event {
|
||||
// union {
|
||||
unsigned int saddr_v4;
|
||||
unsigned char saddr_v6[16];
|
||||
// };
|
||||
// union {
|
||||
unsigned int daddr_v4;
|
||||
unsigned char daddr_v6[16];
|
||||
// };
|
||||
char comm[TASK_COMM_LEN];
|
||||
unsigned long long delta_us;
|
||||
unsigned long long ts_us;
|
||||
unsigned int tgid;
|
||||
int af;
|
||||
unsigned short lport;
|
||||
unsigned short dport;
|
||||
};
|
||||
|
||||
|
||||
#endif /* __TCPCONNLAT_H_ */
|
||||
186
13-tcpconnlat/tcpconnlat.md
Normal file
186
13-tcpconnlat/tcpconnlat.md
Normal file
@@ -0,0 +1,186 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 tcpconnlat 测量 tcp 连接延时
|
||||
|
||||
### 背景
|
||||
|
||||
在互联网后端日常开发接口的时候中,不管你使用的是C、Java、PHP还是Golang,都避免不了需要调用mysql、redis等组件来获取数据,可能还需要执行一些rpc远程调用,或者再调用一些其它restful api。 在这些调用的底层,基本都是在使用TCP协议进行传输。这是因为在传输层协议中,TCP协议具备可靠的连接,错误重传,拥塞控制等优点,所以目前应用比UDP更广泛一些。但相对而言,tcp 连接也有一些缺点,例如建立连接的延时较长等。因此也会出现像 QUIC ,即 快速UDP网络连接 ( Quick UDP Internet Connections )这样的替代方案。
|
||||
|
||||
tcp 连接延时分析对于网络性能分析优化或者故障排查都能起到不少作用。
|
||||
|
||||
### tcpconnlat 的实现原理
|
||||
|
||||
tcpconnlat 这个工具跟踪执行活动TCP连接的内核函数 (例如,通过connect()系统调用),并显示本地测量的连接的延迟(时间),即从发送 SYN 到响应包的时间。
|
||||
|
||||
### tcp 连接原理
|
||||
|
||||
tcp 连接的整个过程如图所示:
|
||||
|
||||

|
||||
|
||||
在这个连接过程中,我们来简单分析一下每一步的耗时:
|
||||
|
||||
1. 客户端发出SYNC包:客户端一般是通过connect系统调用来发出 SYN 的,这里牵涉到本机的系统调用和软中断的 CPU 耗时开销
|
||||
2. SYN传到服务器:SYN从客户端网卡被发出,这是一次长途远距离的网络传输
|
||||
3. 服务器处理SYN包:内核通过软中断来收包,然后放到半连接队列中,然后再发出SYN/ACK响应。主要是 CPU 耗时开销
|
||||
4. SYC/ACK传到客户端:长途网络跋涉
|
||||
5. 客户端处理 SYN/ACK:客户端内核收包并处理SYN后,经过几us的CPU处理,接着发出 ACK。同样是软中断处理开销
|
||||
6. ACK传到服务器:长途网络跋涉
|
||||
7. 服务端收到ACK:服务器端内核收到并处理ACK,然后把对应的连接从半连接队列中取出来,然后放到全连接队列中。一次软中断CPU开销
|
||||
8. 服务器端用户进程唤醒:正在被accpet系统调用阻塞的用户进程被唤醒,然后从全连接队列中取出来已经建立好的连接。一次上下文切换的CPU开销
|
||||
|
||||
在客户端视角,在正常情况下一次TCP连接总的耗时也就就大约是一次网络RTT的耗时。但在某些情况下,可能会导致连接时的网络传输耗时上涨、CPU处理开销增加、甚至是连接失败。这种时候在发现延时过长之后,就可以结合其他信息进行分析。
|
||||
|
||||
### ebpf 实现原理
|
||||
|
||||
在 TCP 三次握手的时候,Linux 内核会维护两个队列,分别是:
|
||||
|
||||
- 半连接队列,也称 SYN 队列;
|
||||
- 全连接队列,也称 accepet 队列;
|
||||
|
||||
|
||||
服务端收到客户端发起的 SYN 请求后,内核会把该连接存储到半连接队列,并向客户端响应 SYN+ACK,接着客户端会返回 ACK,服务端收到第三次握手的 ACK 后,内核会把连接从半连接队列移除,然后创建新的完全的连接,并将其添加到 accept 队列,等待进程调用 accept 函数时把连接取出来。
|
||||
|
||||
我们的 ebpf 代码实现在 https://github.com/yunwei37/Eunomia/blob/master/bpftools/tcpconnlat/tcpconnlat.bpf.c 中:
|
||||
|
||||
它主要使用了 trace_tcp_rcv_state_process 和 kprobe/tcp_v4_connect 这样的跟踪点:
|
||||
|
||||
```c
|
||||
|
||||
SEC("kprobe/tcp_v4_connect")
|
||||
int BPF_KPROBE(tcp_v4_connect, struct sock *sk)
|
||||
{
|
||||
return trace_connect(sk);
|
||||
}
|
||||
|
||||
SEC("kprobe/tcp_v6_connect")
|
||||
int BPF_KPROBE(tcp_v6_connect, struct sock *sk)
|
||||
{
|
||||
return trace_connect(sk);
|
||||
}
|
||||
|
||||
SEC("kprobe/tcp_rcv_state_process")
|
||||
int BPF_KPROBE(tcp_rcv_state_process, struct sock *sk)
|
||||
{
|
||||
return handle_tcp_rcv_state_process(ctx, sk);
|
||||
}
|
||||
```
|
||||
|
||||
在 trace_connect 中,我们跟踪新的 tcp 连接,记录到达时间,并且把它加入 map 中:
|
||||
|
||||
```c
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 4096);
|
||||
__type(key, struct sock *);
|
||||
__type(value, struct piddata);
|
||||
} start SEC(".maps");
|
||||
|
||||
static int trace_connect(struct sock *sk)
|
||||
{
|
||||
u32 tgid = bpf_get_current_pid_tgid() >> 32;
|
||||
struct piddata piddata = {};
|
||||
|
||||
if (targ_tgid && targ_tgid != tgid)
|
||||
return 0;
|
||||
|
||||
bpf_get_current_comm(&piddata.comm, sizeof(piddata.comm));
|
||||
piddata.ts = bpf_ktime_get_ns();
|
||||
piddata.tgid = tgid;
|
||||
bpf_map_update_elem(&start, &sk, &piddata, 0);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
在 handle_tcp_rcv_state_process 中,我们跟踪接收到的 tcp 数据包,从 map 从提取出对应的 connect 事件,并且计算延迟:
|
||||
|
||||
```c
|
||||
static int handle_tcp_rcv_state_process(void *ctx, struct sock *sk)
|
||||
{
|
||||
struct piddata *piddatap;
|
||||
struct event event = {};
|
||||
s64 delta;
|
||||
u64 ts;
|
||||
|
||||
if (BPF_CORE_READ(sk, __sk_common.skc_state) != TCP_SYN_SENT)
|
||||
return 0;
|
||||
|
||||
piddatap = bpf_map_lookup_elem(&start, &sk);
|
||||
if (!piddatap)
|
||||
return 0;
|
||||
|
||||
ts = bpf_ktime_get_ns();
|
||||
delta = (s64)(ts - piddatap->ts);
|
||||
if (delta < 0)
|
||||
goto cleanup;
|
||||
|
||||
event.delta_us = delta / 1000U;
|
||||
if (targ_min_us && event.delta_us < targ_min_us)
|
||||
goto cleanup;
|
||||
__builtin_memcpy(&event.comm, piddatap->comm,
|
||||
sizeof(event.comm));
|
||||
event.ts_us = ts / 1000;
|
||||
event.tgid = piddatap->tgid;
|
||||
event.lport = BPF_CORE_READ(sk, __sk_common.skc_num);
|
||||
event.dport = BPF_CORE_READ(sk, __sk_common.skc_dport);
|
||||
event.af = BPF_CORE_READ(sk, __sk_common.skc_family);
|
||||
if (event.af == AF_INET) {
|
||||
event.saddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr);
|
||||
event.daddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_daddr);
|
||||
} else {
|
||||
BPF_CORE_READ_INTO(&event.saddr_v6, sk,
|
||||
__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32);
|
||||
BPF_CORE_READ_INTO(&event.daddr_v6, sk,
|
||||
__sk_common.skc_v6_daddr.in6_u.u6_addr32);
|
||||
}
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
|
||||
&event, sizeof(event));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &sk);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
### Eunomia 测试 demo
|
||||
|
||||
使用命令行进行追踪:
|
||||
|
||||
```bash
|
||||
$ sudo build/bin/Release/eunomia run tcpconnlat
|
||||
[sudo] password for yunwei:
|
||||
[2022-08-07 02:13:39.601] [info] eunomia run in cmd...
|
||||
[2022-08-07 02:13:40.534] [info] press 'Ctrl C' key to exit...
|
||||
PID COMM IP SRC DEST PORT LAT(ms) CONATINER/OS
|
||||
3477 openresty 4 172.19.0.7 172.19.0.5 2379 0.05 docker-apisix_apisix_1
|
||||
3483 openresty 4 172.19.0.7 172.19.0.5 2379 0.08 docker-apisix_apisix_1
|
||||
3477 openresty 4 172.19.0.7 172.19.0.5 2379 0.04 docker-apisix_apisix_1
|
||||
3478 openresty 4 172.19.0.7 172.19.0.5 2379 0.05 docker-apisix_apisix_1
|
||||
3478 openresty 4 172.19.0.7 172.19.0.5 2379 0.03 docker-apisix_apisix_1
|
||||
3478 openresty 4 172.19.0.7 172.19.0.5 2379 0.03 docker-apisix_apisix_1
|
||||
```
|
||||
|
||||
还可以使用 eunomia 作为 prometheus exporter,在运行上述命令之后,打开 prometheus 自带的可视化面板:
|
||||
|
||||
使用下述查询命令即可看到延时的统计图表:
|
||||
|
||||
```
|
||||
rate(eunomia_observed_tcpconnlat_v4_histogram_sum[5m])
|
||||
/
|
||||
rate(eunomia_observed_tcpconnlat_v4_histogram_count[5m])
|
||||
```
|
||||
|
||||
结果:
|
||||
|
||||

|
||||
|
||||
### 总结
|
||||
|
||||
通过上面的实验,我们可以看到,tcpconnlat 工具的实现原理是基于内核的TCP连接的跟踪,并且可以跟踪到 tcp 连接的延迟时间;除了命令行使用方式之外,还可以将其和容器、k8s 等元信息综合起来,通过 `prometheus` 和 `grafana` 等工具进行网络性能分析。
|
||||
|
||||
> `Eunomia` 是一个使用 C/C++ 开发的基于 eBPF的轻量级,高性能云原生监控工具,旨在帮助用户了解容器的各项行为、监控可疑的容器安全事件,力求提供覆盖容器全生命周期的轻量级开源监控解决方案。它使用 `Linux` `eBPF` 技术在运行时跟踪您的系统和应用程序,并分析收集的事件以检测可疑的行为模式。目前,它包含性能分析、容器集群网络可视化分析*、容器安全感知告警、一键部署、持久化存储监控等功能,提供了多样化的 ebpf 追踪点。其核心导出器/命令行工具最小仅需要约 4MB 大小的二进制程序,即可在支持的 Linux 内核上启动。
|
||||
|
||||
项目地址:https://github.com/yunwei37/Eunomia
|
||||
|
||||
### 参考资料
|
||||
|
||||
1. http://kerneltravel.net/blog/2020/tcpconnlat/
|
||||
2. https://network.51cto.com/article/640631.html
|
||||
BIN
13-tcpconnlat/tcpconnlat1.png
Normal file
BIN
13-tcpconnlat/tcpconnlat1.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 132 KiB |
BIN
13-tcpconnlat/tcpconnlat_p.png
Normal file
BIN
13-tcpconnlat/tcpconnlat_p.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 40 KiB |
5
14-tcpstates/.gitignore
vendored
Normal file
5
14-tcpstates/.gitignore
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
.vscode
|
||||
package.json
|
||||
eunomia-exporter
|
||||
ecli
|
||||
|
||||
56
14-tcpstates/README.md
Normal file
56
14-tcpstates/README.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
layout: post
|
||||
title: tcpstates
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall, network]
|
||||
summary: Tcpstates prints TCP state change information, including the duration in each state as milliseconds
|
||||
---
|
||||
|
||||
|
||||
## origin
|
||||
|
||||
origin from:
|
||||
|
||||
https://github.com/iovisor/bcc/blob/master/libbpf-tools/tcpconnlat.bpf.c
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
Run:
|
||||
|
||||
```shell
|
||||
sudo ./ecli run package.json
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of tcpstates, the Linux BPF/bcc version.
|
||||
|
||||
|
||||
tcpstates prints TCP state change information, including the duration in each
|
||||
state as milliseconds. For example, a single TCP session:
|
||||
```console
|
||||
# tcpstates
|
||||
SKADDR C-PID C-COMM LADDR LPORT RADDR RPORT OLDSTATE -> NEWSTATE MS
|
||||
ffff9fd7e8192000 22384 curl 100.66.100.185 0 52.33.159.26 80 CLOSE -> SYN_SENT 0.000
|
||||
ffff9fd7e8192000 0 swapper/5 100.66.100.185 63446 52.33.159.26 80 SYN_SENT -> ESTABLISHED 1.373
|
||||
ffff9fd7e8192000 22384 curl 100.66.100.185 63446 52.33.159.26 80 ESTABLISHED -> FIN_WAIT1 176.042
|
||||
ffff9fd7e8192000 0 swapper/5 100.66.100.185 63446 52.33.159.26 80 FIN_WAIT1 -> FIN_WAIT2 0.536
|
||||
ffff9fd7e8192000 0 swapper/5 100.66.100.185 63446 52.33.159.26 80 FIN_WAIT2 -> CLOSE 0.006
|
||||
^C
|
||||
```
|
||||
This showed that the most time was spent in the ESTABLISHED state (which then
|
||||
transitioned to FIN_WAIT1), which was 176.042 milliseconds.
|
||||
|
||||
The first column is the socked address, as the output may include lines from
|
||||
different sessions interleaved. The next two columns show the current on-CPU
|
||||
process ID and command name: these may show the process that owns the TCP
|
||||
session, depending on whether the state change executes synchronously in
|
||||
process context. If that's not the case, they may show kernel details.
|
||||
|
||||
109
14-tcpstates/tcpstates.bpf.c
Normal file
109
14-tcpstates/tcpstates.bpf.c
Normal file
@@ -0,0 +1,109 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
/* Copyright (c) 2021 Hengqi Chen */
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
#include "tcpstates.bpf.h"
|
||||
|
||||
#define MAX_ENTRIES 10240
|
||||
#define AF_INET 2
|
||||
#define AF_INET6 10
|
||||
|
||||
const volatile bool filter_by_sport = false;
|
||||
const volatile bool filter_by_dport = false;
|
||||
const volatile short target_family = 0;
|
||||
|
||||
struct
|
||||
{
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_ENTRIES);
|
||||
__type(key, __u16);
|
||||
__type(value, __u16);
|
||||
} sports SEC(".maps");
|
||||
|
||||
struct
|
||||
{
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_ENTRIES);
|
||||
__type(key, __u16);
|
||||
__type(value, __u16);
|
||||
} dports SEC(".maps");
|
||||
|
||||
struct
|
||||
{
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_ENTRIES);
|
||||
__type(key, struct sock *);
|
||||
__type(value, __u64);
|
||||
} timestamps SEC(".maps");
|
||||
|
||||
struct
|
||||
{
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(__u32));
|
||||
__uint(value_size, sizeof(__u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
SEC("tracepoint/sock/inet_sock_set_state")
|
||||
int handle_set_state(struct trace_event_raw_inet_sock_set_state *ctx)
|
||||
{
|
||||
struct sock *sk = (struct sock *)ctx->skaddr;
|
||||
__u16 family = ctx->family;
|
||||
__u16 sport = ctx->sport;
|
||||
__u16 dport = ctx->dport;
|
||||
__u64 *tsp, delta_us, ts;
|
||||
struct event event = {};
|
||||
|
||||
if (ctx->protocol != IPPROTO_TCP)
|
||||
return 0;
|
||||
|
||||
if (target_family && target_family != family)
|
||||
return 0;
|
||||
|
||||
if (filter_by_sport && !bpf_map_lookup_elem(&sports, &sport))
|
||||
return 0;
|
||||
|
||||
if (filter_by_dport && !bpf_map_lookup_elem(&dports, &dport))
|
||||
return 0;
|
||||
|
||||
tsp = bpf_map_lookup_elem(×tamps, &sk);
|
||||
ts = bpf_ktime_get_ns();
|
||||
if (!tsp)
|
||||
delta_us = 0;
|
||||
else
|
||||
delta_us = (ts - *tsp) / 1000;
|
||||
|
||||
event.skaddr = (__u64)sk;
|
||||
event.ts_us = ts / 1000;
|
||||
event.delta_us = delta_us;
|
||||
event.pid = bpf_get_current_pid_tgid() >> 32;
|
||||
event.oldstate = ctx->oldstate;
|
||||
event.newstate = ctx->newstate;
|
||||
event.family = family;
|
||||
event.sport = sport;
|
||||
event.dport = dport;
|
||||
bpf_get_current_comm(&event.task, sizeof(event.task));
|
||||
|
||||
if (family == AF_INET)
|
||||
{
|
||||
bpf_probe_read_kernel(&event.saddr, sizeof(event.saddr), &sk->__sk_common.skc_rcv_saddr);
|
||||
bpf_probe_read_kernel(&event.daddr, sizeof(event.daddr), &sk->__sk_common.skc_daddr);
|
||||
}
|
||||
else
|
||||
{ /* family == AF_INET6 */
|
||||
bpf_probe_read_kernel(&event.saddr, sizeof(event.saddr), &sk->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32);
|
||||
bpf_probe_read_kernel(&event.daddr, sizeof(event.daddr), &sk->__sk_common.skc_v6_daddr.in6_u.u6_addr32);
|
||||
}
|
||||
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event));
|
||||
|
||||
if (ctx->newstate == TCP_CLOSE)
|
||||
bpf_map_delete_elem(×tamps, &sk);
|
||||
else
|
||||
bpf_map_update_elem(×tamps, &sk, &ts, BPF_ANY);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
24
14-tcpstates/tcpstates.bpf.h
Normal file
24
14-tcpstates/tcpstates.bpf.h
Normal file
@@ -0,0 +1,24 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
/* Copyright (c) 2021 Hengqi Chen */
|
||||
#ifndef __TCPSTATES_H
|
||||
#define __TCPSTATES_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
|
||||
struct event
|
||||
{
|
||||
unsigned __int128 saddr;
|
||||
unsigned __int128 daddr;
|
||||
__u64 skaddr;
|
||||
__u64 ts_us;
|
||||
__u64 delta_us;
|
||||
__u32 pid;
|
||||
int oldstate;
|
||||
int newstate;
|
||||
__u16 family;
|
||||
__u16 sport;
|
||||
__u16 dport;
|
||||
char task[TASK_COMM_LEN];
|
||||
};
|
||||
|
||||
#endif /* __TCPSTATES_H */
|
||||
116
15-tcprtt/tcprtt.md
Normal file
116
15-tcprtt/tcprtt.md
Normal file
@@ -0,0 +1,116 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 Tcprtt 测量 TCP 连接的往返时间
|
||||
|
||||
### 背景
|
||||
网络质量在互联网社会中是一个很重要的因素。导致网络质量差的因素有很多,可能是硬件因素导致,也可能是程序
|
||||
写的不好导致。为了能更好地定位网络问题,`tcprtt` 工具被提出。它可以监测TCP链接的往返时间,从而分析
|
||||
网络质量,帮助用户定位问题来源。
|
||||
|
||||
### 实现原理
|
||||
`tcprtt` 在tcp链接建立的执行点下挂载了执行函数。
|
||||
```c
|
||||
SEC("fentry/tcp_rcv_established")
|
||||
int BPF_PROG(tcp_rcv, struct sock *sk)
|
||||
{
|
||||
const struct inet_sock *inet = (struct inet_sock *)(sk);
|
||||
struct tcp_sock *ts;
|
||||
struct hist *histp;
|
||||
u64 key, slot;
|
||||
u32 srtt;
|
||||
|
||||
if (targ_sport && targ_sport != inet->inet_sport)
|
||||
return 0;
|
||||
if (targ_dport && targ_dport != sk->__sk_common.skc_dport)
|
||||
return 0;
|
||||
if (targ_saddr && targ_saddr != inet->inet_saddr)
|
||||
return 0;
|
||||
if (targ_daddr && targ_daddr != sk->__sk_common.skc_daddr)
|
||||
return 0;
|
||||
|
||||
if (targ_laddr_hist)
|
||||
key = inet->inet_saddr;
|
||||
else if (targ_raddr_hist)
|
||||
key = inet->sk.__sk_common.skc_daddr;
|
||||
else
|
||||
key = 0;
|
||||
histp = bpf_map_lookup_or_try_init(&hists, &key, &zero);
|
||||
if (!histp)
|
||||
return 0;
|
||||
ts = (struct tcp_sock *)(sk);
|
||||
srtt = BPF_CORE_READ(ts, srtt_us) >> 3;
|
||||
if (targ_ms)
|
||||
srtt /= 1000U;
|
||||
slot = log2l(srtt);
|
||||
if (slot >= MAX_SLOTS)
|
||||
slot = MAX_SLOTS - 1;
|
||||
__sync_fetch_and_add(&histp->slots[slot], 1);
|
||||
if (targ_show_ext) {
|
||||
__sync_fetch_and_add(&histp->latency, srtt);
|
||||
__sync_fetch_and_add(&histp->cnt, 1);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("kprobe/tcp_rcv_established")
|
||||
int BPF_KPROBE(tcp_rcv_kprobe, struct sock *sk)
|
||||
{
|
||||
const struct inet_sock *inet = (struct inet_sock *)(sk);
|
||||
u32 srtt, saddr, daddr;
|
||||
struct tcp_sock *ts;
|
||||
struct hist *histp;
|
||||
u64 key, slot;
|
||||
|
||||
if (targ_sport) {
|
||||
u16 sport;
|
||||
bpf_probe_read_kernel(&sport, sizeof(sport), &inet->inet_sport);
|
||||
if (targ_sport != sport)
|
||||
return 0;
|
||||
}
|
||||
if (targ_dport) {
|
||||
u16 dport;
|
||||
bpf_probe_read_kernel(&dport, sizeof(dport), &sk->__sk_common.skc_dport);
|
||||
if (targ_dport != dport)
|
||||
return 0;
|
||||
}
|
||||
bpf_probe_read_kernel(&saddr, sizeof(saddr), &inet->inet_saddr);
|
||||
if (targ_saddr && targ_saddr != saddr)
|
||||
return 0;
|
||||
bpf_probe_read_kernel(&daddr, sizeof(daddr), &sk->__sk_common.skc_daddr);
|
||||
if (targ_daddr && targ_daddr != daddr)
|
||||
return 0;
|
||||
|
||||
if (targ_laddr_hist)
|
||||
key = saddr;
|
||||
else if (targ_raddr_hist)
|
||||
key = daddr;
|
||||
else
|
||||
key = 0;
|
||||
histp = bpf_map_lookup_or_try_init(&hists, &key, &zero);
|
||||
if (!histp)
|
||||
return 0;
|
||||
ts = (struct tcp_sock *)(sk);
|
||||
bpf_probe_read_kernel(&srtt, sizeof(srtt), &ts->srtt_us);
|
||||
srtt >>= 3;
|
||||
if (targ_ms)
|
||||
srtt /= 1000U;
|
||||
slot = log2l(srtt);
|
||||
if (slot >= MAX_SLOTS)
|
||||
slot = MAX_SLOTS - 1;
|
||||
__sync_fetch_and_add(&histp->slots[slot], 1);
|
||||
if (targ_show_ext) {
|
||||
__sync_fetch_and_add(&histp->latency, srtt);
|
||||
__sync_fetch_and_add(&histp->cnt, 1);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
```
|
||||
当有tcp链接建立时,该工具会自动根据当前系统的支持情况,选择合适的执行函数。
|
||||
在执行函数中,`tcprtt`会收集tcp链接的各项基本底薪,包括地址,源端口,目标端口,耗时
|
||||
等等,并将其更新到直方图的map中。运行结束后通过用户态代码,展现给用户。
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
|
||||
`tcprtt` 通过直方图的形式,可以轻松展现当前系统中网络抖动的情况,方便开发者快速定位系统网络问题
|
||||
104
16-profile/profile.md
Normal file
104
16-profile/profile.md
Normal file
@@ -0,0 +1,104 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 profile 进行性能分析
|
||||
|
||||
### 背景
|
||||
|
||||
`profile` 是一款用户追踪程序执行调用流程的工具,类似于perf中的 -g 指令。但是相较于perf而言,
|
||||
`profile`的功能更为细化,它可以选择用户需要追踪的层面,比如在用户态层面进行追踪,或是在内核态进行追踪。
|
||||
|
||||
### 实现原理
|
||||
|
||||
`profile` 的实现依赖于linux中的perf_event。在注入ebpf程序前,`profile` 工具会先将 perf_event
|
||||
注册好。
|
||||
```c
|
||||
static int open_and_attach_perf_event(int freq, struct bpf_program *prog,
|
||||
struct bpf_link *links[])
|
||||
{
|
||||
struct perf_event_attr attr = {
|
||||
.type = PERF_TYPE_SOFTWARE,
|
||||
.freq = env.freq,
|
||||
.sample_freq = env.sample_freq,
|
||||
.config = PERF_COUNT_SW_CPU_CLOCK,
|
||||
};
|
||||
int i, fd;
|
||||
|
||||
for (i = 0; i < nr_cpus; i++) {
|
||||
if (env.cpu != -1 && env.cpu != i)
|
||||
continue;
|
||||
|
||||
fd = syscall(__NR_perf_event_open, &attr, -1, i, -1, 0);
|
||||
if (fd < 0) {
|
||||
/* Ignore CPU that is offline */
|
||||
if (errno == ENODEV)
|
||||
continue;
|
||||
fprintf(stderr, "failed to init perf sampling: %s\n",
|
||||
strerror(errno));
|
||||
return -1;
|
||||
}
|
||||
links[i] = bpf_program__attach_perf_event(prog, fd);
|
||||
if (!links[i]) {
|
||||
fprintf(stderr, "failed to attach perf event on cpu: "
|
||||
"%d\n", i);
|
||||
links[i] = NULL;
|
||||
close(fd);
|
||||
return -1;
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
其ebpf程序实现逻辑是对程序的堆栈进行定时采样,从而捕获程序的执行流程。
|
||||
```c
|
||||
SEC("perf_event")
|
||||
int do_perf_event(struct bpf_perf_event_data *ctx)
|
||||
{
|
||||
__u64 id = bpf_get_current_pid_tgid();
|
||||
__u32 pid = id >> 32;
|
||||
__u32 tid = id;
|
||||
__u64 *valp;
|
||||
static const __u64 zero;
|
||||
struct key_t key = {};
|
||||
|
||||
if (!include_idle && tid == 0)
|
||||
return 0;
|
||||
|
||||
if (targ_pid != -1 && targ_pid != pid)
|
||||
return 0;
|
||||
if (targ_tid != -1 && targ_tid != tid)
|
||||
return 0;
|
||||
|
||||
key.pid = pid;
|
||||
bpf_get_current_comm(&key.name, sizeof(key.name));
|
||||
|
||||
if (user_stacks_only)
|
||||
key.kern_stack_id = -1;
|
||||
else
|
||||
key.kern_stack_id = bpf_get_stackid(&ctx->regs, &stackmap, 0);
|
||||
|
||||
if (kernel_stacks_only)
|
||||
key.user_stack_id = -1;
|
||||
else
|
||||
key.user_stack_id = bpf_get_stackid(&ctx->regs, &stackmap, BPF_F_USER_STACK);
|
||||
|
||||
if (key.kern_stack_id >= 0) {
|
||||
// populate extras to fix the kernel stack
|
||||
__u64 ip = PT_REGS_IP(&ctx->regs);
|
||||
|
||||
if (is_kernel_addr(ip)) {
|
||||
key.kernel_ip = ip;
|
||||
}
|
||||
}
|
||||
|
||||
valp = bpf_map_lookup_or_try_init(&counts, &key, &zero);
|
||||
if (valp)
|
||||
__sync_fetch_and_add(valp, 1);
|
||||
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
通过这种方式,它可以根据用户指令,简单的决定追踪用户态层面的执行流程或是内核态层面的执行流程。
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
`profile` 实现了对程序执行流程的分析,在debug等操作中可以极大的帮助开发者提高效率。
|
||||
80
17-memleak/memleak.md
Normal file
80
17-memleak/memleak.md
Normal file
@@ -0,0 +1,80 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 Memleak 监控内存泄漏
|
||||
|
||||
### 背景
|
||||
|
||||
内存泄漏对于一个程序而言是一个很严重的问题。倘若放任一个存在内存泄漏的程序运行,久而久之
|
||||
系统的内存会慢慢被耗尽,导致程序运行速度显著下降。为了避免这一情况,`memleak`工具被提出。
|
||||
它可以跟踪并匹配内存分配和释放的请求,并且打印出已经被分配资源而又尚未释放的堆栈信息。
|
||||
|
||||
### 实现原理
|
||||
|
||||
`memleak` 的实现逻辑非常直观。它在我们常用的动态分配内存的函数接口路径上挂载了ebpf程序,
|
||||
同时在free上也挂载了ebpf程序。在调用分配内存相关函数时,`memleak` 会记录调用者的pid,分配得到
|
||||
内存的地址,分配得到的内存大小等基本数据。在free之后,`memeleak`则会去map中删除记录的对应的分配
|
||||
信息。对于用户态常用的分配函数 `malloc`, `calloc` 等,`memleak`使用了 uporbe 技术实现挂载,对于
|
||||
内核态的函数,比如 `kmalloc` 等,`memleak` 则使用了现有的 tracepoint 来实现。
|
||||
`memleak`主要的挂载点为
|
||||
```c
|
||||
SEC("uprobe/malloc")
|
||||
|
||||
SEC("uretprobe/malloc")
|
||||
|
||||
SEC("uprobe/calloc")
|
||||
|
||||
SEC("uretprobe/calloc")
|
||||
|
||||
SEC("uprobe/realloc")
|
||||
|
||||
SEC("uretprobe/realloc")
|
||||
|
||||
SEC("uprobe/memalign")
|
||||
|
||||
SEC("uretprobe/memalign")
|
||||
|
||||
SEC("uprobe/posix_memalign")
|
||||
|
||||
SEC("uretprobe/posix_memalign")
|
||||
|
||||
SEC("uprobe/valloc")
|
||||
|
||||
SEC("uretprobe/valloc")
|
||||
|
||||
SEC("uprobe/pvalloc")
|
||||
|
||||
SEC("uretprobe/pvalloc")
|
||||
|
||||
SEC("uprobe/aligned_alloc")
|
||||
|
||||
SEC("uretprobe/aligned_alloc")
|
||||
|
||||
SEC("uprobe/free")
|
||||
|
||||
SEC("tracepoint/kmem/kmalloc")
|
||||
|
||||
SEC("tracepoint/kmem/kfree")
|
||||
|
||||
|
||||
SEC("tracepoint/kmem/kmalloc_node")
|
||||
|
||||
SEC("tracepoint/kmem/kmem_cache_alloc")
|
||||
|
||||
SEC("tracepoint/kmem/kmem_cache_alloc_node")
|
||||
|
||||
SEC("tracepoint/kmem/kmem_cache_free")
|
||||
|
||||
SEC("tracepoint/kmem/mm_page_alloc")
|
||||
|
||||
SEC("tracepoint/kmem/mm_page_free")
|
||||
|
||||
SEC("tracepoint/percpu/percpu_alloc_percpu")
|
||||
|
||||
SEC("tracepoint/percpu/percpu_free_percpu")
|
||||
|
||||
```
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
`memleak` 实现了对内存分配系列函数的监控追踪,可以避免程序发生严重的内存泄漏事故,对于开发者而言
|
||||
具有极大的帮助。
|
||||
121
18-biopattern/biolatency.md
Normal file
121
18-biopattern/biolatency.md
Normal file
@@ -0,0 +1,121 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 Biolatency: 统计系统中发生的I/O事件
|
||||
|
||||
### 背景
|
||||
|
||||
Biolatency 可以统计在该工具运行后系统中发生的I/O事件个数,并且计算I/O事件在不同时间段内的分布情况,以
|
||||
直方图的形式展现给用户。
|
||||
|
||||
### 实现原理
|
||||
|
||||
Biolatency 主要通过 tracepoint 实现,其在 block_rq_insert, block_rq_issue,
|
||||
block_rq_complete 挂载点下设置了处理函数。在 block_rq_insert 和 block_rq_issue 挂载点下,
|
||||
Biolatency 会将IO操作发生时的request queue和时间计入map中。
|
||||
```c
|
||||
int trace_rq_start(struct request *rq, int issue)
|
||||
{
|
||||
if (issue && targ_queued && BPF_CORE_READ(rq->q, elevator))
|
||||
return 0;
|
||||
|
||||
u64 ts = bpf_ktime_get_ns();
|
||||
|
||||
if (filter_dev) {
|
||||
struct gendisk *disk = get_disk(rq);
|
||||
u32 dev;
|
||||
|
||||
dev = disk ? MKDEV(BPF_CORE_READ(disk, major),
|
||||
BPF_CORE_READ(disk, first_minor)) : 0;
|
||||
if (targ_dev != dev)
|
||||
return 0;
|
||||
}
|
||||
bpf_map_update_elem(&start, &rq, &ts, 0);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tp_btf/block_rq_insert")
|
||||
int block_rq_insert(u64 *ctx)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
if (LINUX_KERNEL_VERSION < KERNEL_VERSION(5, 11, 0))
|
||||
return trace_rq_start((void *)ctx[1], false);
|
||||
else
|
||||
return trace_rq_start((void *)ctx[0], false);
|
||||
}
|
||||
|
||||
SEC("tp_btf/block_rq_issue")
|
||||
int block_rq_issue(u64 *ctx)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
if (LINUX_KERNEL_VERSION < KERNEL_VERSION(5, 11, 0))
|
||||
return trace_rq_start((void *)ctx[1], true);
|
||||
else
|
||||
return trace_rq_start((void *)ctx[0], true);
|
||||
}
|
||||
|
||||
```
|
||||
在block_rq_complete 挂载点下,Biolatency 会根据 request queue 从map中读取
|
||||
上一次操作发生的时间,然后计算与当前时间的差值来判断其在直方图中存在的区域,将该区域内的IO操作
|
||||
计数加一。
|
||||
```c
|
||||
SEC("tp_btf/block_rq_complete")
|
||||
int BPF_PROG(block_rq_complete, struct request *rq, int error,
|
||||
unsigned int nr_bytes)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
u64 slot, *tsp, ts = bpf_ktime_get_ns();
|
||||
struct hist_key hkey = {};
|
||||
struct hist *histp;
|
||||
s64 delta;
|
||||
|
||||
tsp = bpf_map_lookup_elem(&start, &rq);
|
||||
if (!tsp)
|
||||
return 0;
|
||||
delta = (s64)(ts - *tsp);
|
||||
if (delta < 0)
|
||||
goto cleanup;
|
||||
|
||||
if (targ_per_disk) {
|
||||
struct gendisk *disk = get_disk(rq);
|
||||
|
||||
hkey.dev = disk ? MKDEV(BPF_CORE_READ(disk, major),
|
||||
BPF_CORE_READ(disk, first_minor)) : 0;
|
||||
}
|
||||
if (targ_per_flag)
|
||||
hkey.cmd_flags = rq->cmd_flags;
|
||||
|
||||
histp = bpf_map_lookup_elem(&hists, &hkey);
|
||||
if (!histp) {
|
||||
bpf_map_update_elem(&hists, &hkey, &initial_hist, 0);
|
||||
histp = bpf_map_lookup_elem(&hists, &hkey);
|
||||
if (!histp)
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
if (targ_ms)
|
||||
delta /= 1000000U;
|
||||
else
|
||||
delta /= 1000U;
|
||||
slot = log2l(delta);
|
||||
if (slot >= MAX_SLOTS)
|
||||
slot = MAX_SLOTS - 1;
|
||||
__sync_fetch_and_add(&histp->slots[slot], 1);
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &rq);
|
||||
return 0;
|
||||
}
|
||||
|
||||
```
|
||||
当用户中止程序时,用户态程序会读取直方图map中的数据,并打印呈现。
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
Biolatency 通过 tracepoint 挂载点实现了对IO事件个数的统计,并且能以直方图的
|
||||
形式进行展现,可以方便开发者了解系统I/O事件情况。
|
||||
48
18-biopattern/biopattern.md
Normal file
48
18-biopattern/biopattern.md
Normal file
@@ -0,0 +1,48 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 Biopattern: 统计随机/顺序磁盘 I/O
|
||||
|
||||
### 背景
|
||||
|
||||
Biopattern 可以统计随机/顺序磁盘I/O次数的比例。
|
||||
|
||||
### 实现原理
|
||||
|
||||
Biopattern 的ebpf代码在 tracepoint/block/block_rq_complete 挂载点下实现。在磁盘完成IO请求
|
||||
后,程序会经过此挂载点。Biopattern 内部存有一张以设备号为主键的哈希表,当程序经过挂载点时, Biopattern
|
||||
会获得操作信息,根据哈希表中该设备的上一次操作记录来判断本次操作是随机IO还是顺序IO,并更新操作计数。
|
||||
|
||||
```c
|
||||
SEC("tracepoint/block/block_rq_complete")
|
||||
int handle__block_rq_complete(struct trace_event_raw_block_rq_complete *ctx)
|
||||
{
|
||||
sector_t *last_sectorp, sector = ctx->sector;
|
||||
struct counter *counterp, zero = {};
|
||||
u32 nr_sector = ctx->nr_sector;
|
||||
dev_t dev = ctx->dev;
|
||||
|
||||
if (targ_dev != -1 && targ_dev != dev)
|
||||
return 0;
|
||||
|
||||
counterp = bpf_map_lookup_or_try_init(&counters, &dev, &zero);
|
||||
if (!counterp)
|
||||
return 0;
|
||||
if (counterp->last_sector) {
|
||||
if (counterp->last_sector == sector)
|
||||
__sync_fetch_and_add(&counterp->sequential, 1);
|
||||
else
|
||||
__sync_fetch_and_add(&counterp->random, 1);
|
||||
__sync_fetch_and_add(&counterp->bytes, nr_sector * 512);
|
||||
}
|
||||
counterp->last_sector = sector + nr_sector;
|
||||
return 0;
|
||||
}
|
||||
|
||||
```
|
||||
当用户停止Biopattern后,用户态程序会读取获得的计数信息,并将其输出给用户。
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||
尚未集成
|
||||
|
||||
### 总结
|
||||
|
||||
Biopattern 可以展现随机/顺序磁盘I/O次数的比例,对于开发者把握整体I/O情况有较大帮助。
|
||||
100
18-biopattern/biostacks.md
Normal file
100
18-biopattern/biostacks.md
Normal file
@@ -0,0 +1,100 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 Biostacks: 监控内核 I/O 操作耗时
|
||||
|
||||
|
||||
### 背景
|
||||
由于有些磁盘I/O操作不是直接由应用发起的,比如元数据读写,因此有些直接捕捉磁盘I/O操作信息可能
|
||||
会有一些无法解释的I/O操作发生。为此,Biostacks 会直接追踪内核中初始化I/O操作的函数,并将磁
|
||||
盘I/O操作耗时以直方图的形式展现。
|
||||
|
||||
### 实现原理
|
||||
Biostacks 的挂载点为 fentry/blk_account_io_start, kprobe/blk_account_io_merge_bio 和
|
||||
fentry/blk_account_io_done。fentry/blk_account_io_start 和 kprobe/blk_account_io_merge_bio
|
||||
挂载点均时内核需要发起I/O操作中必经的初始化路径。在经过此处时,Biostacks 会根据 request queue ,将数据存入
|
||||
map中。
|
||||
```c
|
||||
static __always_inline
|
||||
int trace_start(void *ctx, struct request *rq, bool merge_bio)
|
||||
{
|
||||
struct internal_rqinfo *i_rqinfop = NULL, i_rqinfo = {};
|
||||
struct gendisk *disk = BPF_CORE_READ(rq, rq_disk);
|
||||
dev_t dev;
|
||||
|
||||
dev = disk ? MKDEV(BPF_CORE_READ(disk, major),
|
||||
BPF_CORE_READ(disk, first_minor)) : 0;
|
||||
if (targ_dev != -1 && targ_dev != dev)
|
||||
return 0;
|
||||
|
||||
if (merge_bio)
|
||||
i_rqinfop = bpf_map_lookup_elem(&rqinfos, &rq);
|
||||
if (!i_rqinfop)
|
||||
i_rqinfop = &i_rqinfo;
|
||||
|
||||
i_rqinfop->start_ts = bpf_ktime_get_ns();
|
||||
i_rqinfop->rqinfo.pid = bpf_get_current_pid_tgid();
|
||||
i_rqinfop->rqinfo.kern_stack_size =
|
||||
bpf_get_stack(ctx, i_rqinfop->rqinfo.kern_stack,
|
||||
sizeof(i_rqinfop->rqinfo.kern_stack), 0);
|
||||
bpf_get_current_comm(&i_rqinfop->rqinfo.comm,
|
||||
sizeof(&i_rqinfop->rqinfo.comm));
|
||||
i_rqinfop->rqinfo.dev = dev;
|
||||
|
||||
if (i_rqinfop == &i_rqinfo)
|
||||
bpf_map_update_elem(&rqinfos, &rq, i_rqinfop, 0);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("fentry/blk_account_io_start")
|
||||
int BPF_PROG(blk_account_io_start, struct request *rq)
|
||||
{
|
||||
return trace_start(ctx, rq, false);
|
||||
}
|
||||
|
||||
SEC("kprobe/blk_account_io_merge_bio")
|
||||
int BPF_KPROBE(blk_account_io_merge_bio, struct request *rq)
|
||||
{
|
||||
return trace_start(ctx, rq, true);
|
||||
}
|
||||
|
||||
```
|
||||
在I/O操作完成后,fentry/blk_account_io_done 下的处理函数会从map中读取之前存入的信息,根据当下时间
|
||||
记录时间差值,得到I/O操作的耗时信息,并更新到存储直方图数据的map中。
|
||||
```c
|
||||
SEC("fentry/blk_account_io_done")
|
||||
int BPF_PROG(blk_account_io_done, struct request *rq)
|
||||
{
|
||||
u64 slot, ts = bpf_ktime_get_ns();
|
||||
struct internal_rqinfo *i_rqinfop;
|
||||
struct rqinfo *rqinfop;
|
||||
struct hist *histp;
|
||||
s64 delta;
|
||||
|
||||
i_rqinfop = bpf_map_lookup_elem(&rqinfos, &rq);
|
||||
if (!i_rqinfop)
|
||||
return 0;
|
||||
delta = (s64)(ts - i_rqinfop->start_ts);
|
||||
if (delta < 0)
|
||||
goto cleanup;
|
||||
histp = bpf_map_lookup_or_try_init(&hists, &i_rqinfop->rqinfo, &zero);
|
||||
if (!histp)
|
||||
goto cleanup;
|
||||
if (targ_ms)
|
||||
delta /= 1000000U;
|
||||
else
|
||||
delta /= 1000U;
|
||||
slot = log2l(delta);
|
||||
if (slot >= MAX_SLOTS)
|
||||
slot = MAX_SLOTS - 1;
|
||||
__sync_fetch_and_add(&histp->slots[slot], 1);
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&rqinfos, &rq);
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
在用户输入程序退出指令后,其用户态程序会将直方图map中的信息读出并打印。
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
Biostacks 从源头实现了对I/O操作的追踪,可以极大的方便我们掌握磁盘I/O情况。
|
||||
63
18-biopattern/bitesize.md
Normal file
63
18-biopattern/bitesize.md
Normal file
@@ -0,0 +1,63 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 Bitesize: 监控块设备 I/O
|
||||
|
||||
### 背景
|
||||
|
||||
为了能更好的获得 I/O 操作需要的磁盘块大小相关信息,Bitesize 工具被开发。它可以在启动后追踪
|
||||
不同进程所需要的块大小,并以直方图的形式显示分布
|
||||
|
||||
### 实现原理
|
||||
|
||||
Biteszie 在 block_rq_issue 追踪点下挂在了处理函数。当进程对磁盘发出了块 I/O 请求操作时,
|
||||
系统会经过此挂载点,此时处理函数或许请求的信息,将其存入对应的map中。
|
||||
```c
|
||||
static int trace_rq_issue(struct request *rq)
|
||||
{
|
||||
struct hist_key hkey;
|
||||
struct hist *histp;
|
||||
u64 slot;
|
||||
|
||||
if (filter_dev) {
|
||||
struct gendisk *disk = get_disk(rq);
|
||||
u32 dev;
|
||||
|
||||
dev = disk ? MKDEV(BPF_CORE_READ(disk, major),
|
||||
BPF_CORE_READ(disk, first_minor)) : 0;
|
||||
if (targ_dev != dev)
|
||||
return 0;
|
||||
}
|
||||
bpf_get_current_comm(&hkey.comm, sizeof(hkey.comm));
|
||||
if (!comm_allowed(hkey.comm))
|
||||
return 0;
|
||||
|
||||
histp = bpf_map_lookup_elem(&hists, &hkey);
|
||||
if (!histp) {
|
||||
bpf_map_update_elem(&hists, &hkey, &initial_hist, 0);
|
||||
histp = bpf_map_lookup_elem(&hists, &hkey);
|
||||
if (!histp)
|
||||
return 0;
|
||||
}
|
||||
slot = log2l(rq->__data_len / 1024);
|
||||
if (slot >= MAX_SLOTS)
|
||||
slot = MAX_SLOTS - 1;
|
||||
__sync_fetch_and_add(&histp->slots[slot], 1);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tp_btf/block_rq_issue")
|
||||
int BPF_PROG(block_rq_issue)
|
||||
{
|
||||
if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(5, 11, 0))
|
||||
return trace_rq_issue((void *)ctx[0]);
|
||||
else
|
||||
return trace_rq_issue((void *)ctx[1]);
|
||||
}
|
||||
```
|
||||
|
||||
当用户发出中止工具的指令后,其用户态代码会将map中存储的数据读出并逐进程的展示追踪结果
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
Bitesize 以进程为粒度,使得开发者可以更好的掌握程序对磁盘 I/O 的请求情况。
|
||||
81
19-syscount/syscount.md
Normal file
81
19-syscount/syscount.md
Normal file
@@ -0,0 +1,81 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 syscount 监控慢系统调用
|
||||
|
||||
### 背景
|
||||
|
||||
`syscount` 可以统计系统或者某个进程发生的各类syscall的总数或者时耗时。
|
||||
|
||||
### 实现原理
|
||||
`syscount` 的实现逻辑非常直观,他在 `sys_enter` 和 `sys_exit` 这两个 `tracepoint` 下挂载了
|
||||
执行函数。
|
||||
```c
|
||||
SEC("tracepoint/raw_syscalls/sys_enter")
|
||||
int sys_enter(struct trace_event_raw_sys_enter *args)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
pid_t pid = id >> 32;
|
||||
u32 tid = id;
|
||||
u64 ts;
|
||||
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
if (filter_pid && pid != filter_pid)
|
||||
return 0;
|
||||
|
||||
ts = bpf_ktime_get_ns();
|
||||
bpf_map_update_elem(&start, &tid, &ts, 0);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/raw_syscalls/sys_exit")
|
||||
int sys_exit(struct trace_event_raw_sys_exit *args)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
static const struct data_t zero;
|
||||
pid_t pid = id >> 32;
|
||||
struct data_t *val;
|
||||
u64 *start_ts, lat = 0;
|
||||
u32 tid = id;
|
||||
u32 key;
|
||||
|
||||
/* this happens when there is an interrupt */
|
||||
if (args->id == -1)
|
||||
return 0;
|
||||
|
||||
if (filter_pid && pid != filter_pid)
|
||||
return 0;
|
||||
if (filter_failed && args->ret >= 0)
|
||||
return 0;
|
||||
if (filter_errno && args->ret != -filter_errno)
|
||||
return 0;
|
||||
|
||||
if (measure_latency) {
|
||||
start_ts = bpf_map_lookup_elem(&start, &tid);
|
||||
if (!start_ts)
|
||||
return 0;
|
||||
lat = bpf_ktime_get_ns() - *start_ts;
|
||||
}
|
||||
|
||||
key = (count_by_process) ? pid : args->id;
|
||||
val = bpf_map_lookup_or_try_init(&data, &key, &zero);
|
||||
if (val) {
|
||||
__sync_fetch_and_add(&val->count, 1);
|
||||
if (count_by_process)
|
||||
save_proc_name(val);
|
||||
if (measure_latency)
|
||||
__sync_fetch_and_add(&val->total_ns, lat);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
```
|
||||
当syscall发生时,`syscount`会记录其tid和发生的时间并存入map中。在syscall完成时,`syscount` 会根据用户
|
||||
的需求,统计syscall持续的时间,或者是发生的次数。
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
`sycount` 使得用户可以较为方便的追踪某个进程或者是系统内系统调用发生的情况。
|
||||
6
2-fentry-unlink/.gitignore
vendored
Normal file
6
2-fentry-unlink/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
.vscode
|
||||
package.json
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
76
2-fentry-unlink/README.md
Normal file
76
2-fentry-unlink/README.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
layout: post
|
||||
title: fentry-link
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, examples, fentry, no-output]
|
||||
summary: an example that uses fentry and fexit BPF programs for tracing a file is deleted
|
||||
---
|
||||
|
||||
## Fentry
|
||||
|
||||
`fentry` is an example that uses fentry and fexit BPF programs for tracing. It
|
||||
attaches `fentry` and `fexit` traces to `do_unlinkat()` which is called when a
|
||||
file is deleted and logs the return value, PID, and filename to the
|
||||
trace pipe.
|
||||
|
||||
Important differences, compared to kprobes, are improved performance and
|
||||
usability. In this example, better usability is shown with the ability to
|
||||
directly dereference pointer arguments, like in normal C, instead of using
|
||||
various read helpers. The big distinction between **fexit** and **kretprobe**
|
||||
programs is that fexit one has access to both input arguments and returned
|
||||
result, while kretprobe can only access the result.
|
||||
|
||||
fentry and fexit programs are available starting from 5.5 kernels.
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/fentry-link/package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
The `fentry` output in `/sys/kernel/debug/tracing/trace_pipe` should look
|
||||
something like this:
|
||||
|
||||
```console
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file
|
||||
rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file, ret = 0
|
||||
rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file2
|
||||
rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file2, ret = 0
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
|
||||
|
||||
- Compile:
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```console
|
||||
$ ecc fentry-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
- Run and help:
|
||||
|
||||
```console
|
||||
sudo ecli examples/bpftools/fentry-link/package.json -h
|
||||
Usage: fentry_link_bpf [--help] [--version] [--verbose]
|
||||
|
||||
A simple eBPF program
|
||||
|
||||
Optional arguments:
|
||||
-h, --help shows help message and exits
|
||||
-v, --version prints version information and exits
|
||||
--verbose prints libbpf debug information
|
||||
|
||||
Built with eunomia-bpf framework.
|
||||
See https://github.com/eunomia-bpf/eunomia-bpf for more information.
|
||||
```
|
||||
27
2-fentry-unlink/fentry-link.bpf.c
Normal file
27
2-fentry-unlink/fentry-link.bpf.c
Normal file
@@ -0,0 +1,27 @@
|
||||
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
|
||||
/* Copyright (c) 2021 Sartura */
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
|
||||
SEC("fentry/do_unlinkat")
|
||||
int BPF_PROG(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("fentry: pid = %d, filename = %s\n", pid, name->name);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("fexit/do_unlinkat")
|
||||
int BPF_PROG(do_unlinkat_exit, int dfd, struct filename *name, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("fexit: pid = %d, filename = %s, ret = %ld\n", pid, name->name, ret);
|
||||
return 0;
|
||||
}
|
||||
75
21-llcstat/llcstat.md
Normal file
75
21-llcstat/llcstat.md
Normal file
@@ -0,0 +1,75 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 llcstat 监控 cache miss 和 cache reference
|
||||
|
||||
### 背景
|
||||
|
||||
为了能更好地优化程序性能,开发者有时需要考虑如何更好地减少cache miss的发生。
|
||||
但是程序到底可能发生多少次cache miss这是一个难以回答的问题。`llcstat` 通过
|
||||
ebpf技术,实现了对 cache miss 和 cache reference 的准确追踪,可以极大方便开发者
|
||||
调试程序,优化性能。
|
||||
|
||||
### 实现原理
|
||||
|
||||
`llcstat` 引入了linux中的 `perf_event` 机制,程序在用户态载入的时候,
|
||||
会将现有的c `perf_event` attach到指定的位置。
|
||||
```c
|
||||
if (open_and_attach_perf_event(PERF_COUNT_HW_CACHE_MISSES,
|
||||
env.sample_period,
|
||||
obj->progs.on_cache_miss, mlinks))
|
||||
goto cleanup;
|
||||
if (open_and_attach_perf_event(PERF_COUNT_HW_CACHE_REFERENCES,
|
||||
env.sample_period,
|
||||
obj->progs.on_cache_ref, rlinks))
|
||||
```
|
||||
|
||||
同时,`llcstat` 在内核态中会在`perf_event`下挂载执行函数,当程序运行到了
|
||||
挂载点,执行函数会启动并开始计数,将结果写入对应的map中。
|
||||
|
||||
```c
|
||||
static __always_inline
|
||||
int trace_event(__u64 sample_period, bool miss)
|
||||
{
|
||||
struct key_info key = {};
|
||||
struct value_info *infop, zero = {};
|
||||
|
||||
u64 pid_tgid = bpf_get_current_pid_tgid();
|
||||
key.cpu = bpf_get_smp_processor_id();
|
||||
key.pid = pid_tgid >> 32;
|
||||
if (targ_per_thread)
|
||||
key.tid = (u32)pid_tgid;
|
||||
else
|
||||
key.tid = key.pid;
|
||||
|
||||
infop = bpf_map_lookup_or_try_init(&infos, &key, &zero);
|
||||
if (!infop)
|
||||
return 0;
|
||||
if (miss)
|
||||
infop->miss += sample_period;
|
||||
else
|
||||
infop->ref += sample_period;
|
||||
bpf_get_current_comm(infop->comm, sizeof(infop->comm));
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("perf_event")
|
||||
int on_cache_miss(struct bpf_perf_event_data *ctx)
|
||||
{
|
||||
return trace_event(ctx->sample_period, true);
|
||||
}
|
||||
|
||||
SEC("perf_event")
|
||||
int on_cache_ref(struct bpf_perf_event_data *ctx)
|
||||
{
|
||||
return trace_event(ctx->sample_period, false);
|
||||
}
|
||||
```
|
||||
|
||||
用户态程序会读取map存入的 cache miss 和 cache reference 的计数信息,并
|
||||
逐进程的进行展示。
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||
|
||||
### 总结
|
||||
`llcstat` 运用了ebpf计数,高效简洁地展示了某个线程发生cache miss和cache
|
||||
reference的次数,这使得开发者们在优化程序的过程中有了更明确的量化指标。
|
||||
6
3-kprobe-unlink/.gitignore
vendored
Normal file
6
3-kprobe-unlink/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
.vscode
|
||||
package.json
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
55
3-kprobe-unlink/README.md
Normal file
55
3-kprobe-unlink/README.md
Normal file
@@ -0,0 +1,55 @@
|
||||
---
|
||||
layout: post
|
||||
title: kprobe-link
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, examples, kprobe, no-output]
|
||||
summary: an example of dealing with kernel-space entry and exit (return) probes, `kprobe` and `kretprobe` in libbpf lingo
|
||||
---
|
||||
|
||||
|
||||
`kprobe` is an example of dealing with kernel-space entry and exit (return)
|
||||
probes, `kprobe` and `kretprobe` in libbpf lingo. It attaches `kprobe` and
|
||||
`kretprobe` BPF programs to the `do_unlinkat()` function and logs the PID,
|
||||
filename, and return result, respectively, using `bpf_printk()` macro.
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/kprobe-link/package.json
|
||||
Runing eBPF program...
|
||||
```
|
||||
|
||||
The `kprobe` demo output in `/sys/kernel/debug/tracing/trace_pipe` should look
|
||||
something like this:
|
||||
|
||||
```shell
|
||||
$ sudo cat /sys/kernel/debug/tracing/trace_pipe
|
||||
rm-9346 [005] d..3 4710.951696: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test1
|
||||
rm-9346 [005] d..4 4710.951819: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
rm-9346 [005] d..3 4710.951852: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test2
|
||||
rm-9346 [005] d..4 4710.951895: bpf_trace_printk: KPROBE EXIT: ret = 0
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
|
||||
|
||||
Compile with docker:
|
||||
|
||||
```console
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc kprobe-link.bpf.c
|
||||
Compiling bpf object...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
sudo ecli examples/bpftools/kprobe-link/package.json
|
||||
```
|
||||
30
3-kprobe-unlink/kprobe-link.bpf.c
Normal file
30
3-kprobe-unlink/kprobe-link.bpf.c
Normal file
@@ -0,0 +1,30 @@
|
||||
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
|
||||
/* Copyright (c) 2021 Sartura */
|
||||
#include "vmlinux.h"
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
|
||||
SEC("kprobe/do_unlinkat")
|
||||
int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name)
|
||||
{
|
||||
pid_t pid;
|
||||
const char *filename;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
filename = BPF_CORE_READ(name, name);
|
||||
bpf_printk("KPROBE ENTRY pid = %d, filename = %s\n", pid, filename);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("kretprobe/do_unlinkat")
|
||||
int BPF_KRETPROBE(do_unlinkat_exit, long ret)
|
||||
{
|
||||
pid_t pid;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
bpf_printk("KPROBE EXIT: pid = %d, ret = %ld\n", pid, ret);
|
||||
return 0;
|
||||
}
|
||||
7
4-opensnoop/.gitignore
vendored
Normal file
7
4-opensnoop/.gitignore
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
.vscode
|
||||
package.json
|
||||
eunomia-exporter
|
||||
ecli
|
||||
*.bpf.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
263
4-opensnoop/1_opensnoop.md
Normal file
263
4-opensnoop/1_opensnoop.md
Normal file
@@ -0,0 +1,263 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序监控打开文件路径并使用 Prometheus 可视化
|
||||
|
||||
### 背景
|
||||
|
||||
通过对 open 系统调用的监测,`opensnoop`可以展现系统内所有调用了 open 系统调用的进程信息。
|
||||
|
||||
### 使用 ecli 一键运行
|
||||
|
||||
```console
|
||||
$ # 下载安装 ecli 二进制
|
||||
$ wget https://aka.pw/bpf-ecli -O ./ecli && chmod +x ./ecli
|
||||
$ # 使用 url 一键运行
|
||||
$ ./ecli run https://eunomia-bpf.github.io/eunomia-bpf/opensnoop/package.json
|
||||
|
||||
running and waiting for the ebpf events from perf event...
|
||||
time ts pid uid ret flags comm fname
|
||||
00:58:08 0 812 0 9 524288 vmtoolsd /etc/mtab
|
||||
00:58:08 0 812 0 11 0 vmtoolsd /proc/devices
|
||||
00:58:08 0 34351 0 24 524288 ecli /etc/localtime
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /sys/class/block/sda5/../device/../../../class
|
||||
00:58:08 0 812 0 -2 0 vmtoolsd /sys/class/block/sda5/../device/../../../label
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /sys/class/block/sda1/../device/../../../class
|
||||
00:58:08 0 812 0 -2 0 vmtoolsd /sys/class/block/sda1/../device/../../../label
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /run/systemd/resolve/resolv.conf
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /proc/net/route
|
||||
00:58:08 0 812 0 9 0 vmtoolsd /proc/net/ipv6_route
|
||||
```
|
||||
|
||||
### 实现
|
||||
|
||||
使用 eunomia-bpf 可以帮助你只需要编写内核态应用程序,不需要编写任何用户态辅助框架代码;需要编写的代码由两个部分组成:
|
||||
|
||||
- 头文件 opensnoop.h 里面定义需要导出的 C 语言结构体:
|
||||
- 源文件 opensnoop.bpf.c 里面定义 BPF 代码:
|
||||
|
||||
头文件 opensnoop.h
|
||||
|
||||
```c
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __OPENSNOOP_H
|
||||
#define __OPENSNOOP_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
#define NAME_MAX 255
|
||||
#define INVALID_UID ((uid_t)-1)
|
||||
|
||||
// used for export event
|
||||
struct event {
|
||||
/* user terminology for pid: */
|
||||
unsigned long long ts;
|
||||
int pid;
|
||||
int uid;
|
||||
int ret;
|
||||
int flags;
|
||||
char comm[TASK_COMM_LEN];
|
||||
char fname[NAME_MAX];
|
||||
};
|
||||
|
||||
#endif /* __OPENSNOOP_H */
|
||||
```
|
||||
|
||||
`opensnoop` 的实现逻辑比较简单,它在 `sys_enter_open` 和 `sys_enter_openat` 这两个追踪点下
|
||||
加了执行函数,当有 open 系统调用发生时,执行函数便会被触发。同样在,在对应的 `sys_exit_open` 和
|
||||
`sys_exit_openat` 系统调用下,`opensnoop` 也加了执行函数。
|
||||
|
||||
源文件 opensnoop.bpf.c
|
||||
|
||||
```c
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2019 Facebook
|
||||
// Copyright (c) 2020 Netflix
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include "opensnoop.h"
|
||||
|
||||
struct args_t {
|
||||
const char *fname;
|
||||
int flags;
|
||||
};
|
||||
|
||||
const volatile pid_t targ_pid = 0;
|
||||
const volatile pid_t targ_tgid = 0;
|
||||
const volatile uid_t targ_uid = 0;
|
||||
const volatile bool targ_failed = false;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 10240);
|
||||
__type(key, u32);
|
||||
__type(value, struct args_t);
|
||||
} start SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(u32));
|
||||
__uint(value_size, sizeof(u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static __always_inline bool valid_uid(uid_t uid) {
|
||||
return uid != INVALID_UID;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
bool trace_allowed(u32 tgid, u32 pid)
|
||||
{
|
||||
u32 uid;
|
||||
|
||||
/* filters */
|
||||
if (targ_tgid && targ_tgid != tgid)
|
||||
return false;
|
||||
if (targ_pid && targ_pid != pid)
|
||||
return false;
|
||||
if (valid_uid(targ_uid)) {
|
||||
uid = (u32)bpf_get_current_uid_gid();
|
||||
if (targ_uid != uid) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_open")
|
||||
int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[0];
|
||||
args.flags = (int)ctx->args[1];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_openat")
|
||||
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[1];
|
||||
args.flags = (int)ctx->args[2];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
int trace_exit(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
struct event event = {};
|
||||
struct args_t *ap;
|
||||
int ret;
|
||||
u32 pid = bpf_get_current_pid_tgid();
|
||||
|
||||
ap = bpf_map_lookup_elem(&start, &pid);
|
||||
if (!ap)
|
||||
return 0; /* missed entry */
|
||||
ret = ctx->ret;
|
||||
if (targ_failed && ret >= 0)
|
||||
goto cleanup; /* want failed only */
|
||||
|
||||
/* event data */
|
||||
event.pid = bpf_get_current_pid_tgid() >> 32;
|
||||
event.uid = bpf_get_current_uid_gid();
|
||||
bpf_get_current_comm(&event.comm, sizeof(event.comm));
|
||||
bpf_probe_read_user_str(&event.fname, sizeof(event.fname), ap->fname);
|
||||
event.flags = ap->flags;
|
||||
event.ret = ret;
|
||||
|
||||
/* emit event */
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
|
||||
&event, sizeof(event));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_open")
|
||||
int tracepoint__syscalls__sys_exit_open(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_openat")
|
||||
int tracepoint__syscalls__sys_exit_openat(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
```
|
||||
|
||||
在 enter 环节,`opensnoop` 会记录调用者的 pid, comm 等基本信息,并存入 map 中。在 exit 环节,`opensnoop`
|
||||
会根据 pid 读出之前存入的数据,再结合捕获的其他数据,输出到用户态处理函数中,展现给用户。
|
||||
|
||||
完整示例代码请参考:https://github.com/eunomia-bpf/eunomia-bpf/tree/master/examples/bpftools/opensnoop
|
||||
|
||||
把头文件和源文件放在独立的目录里面,编译运行:
|
||||
|
||||
```bash
|
||||
$ # 使用容器进行编译,生成一个 package.json 文件,里面是已经编译好的代码和一些辅助信息
|
||||
$ docker run -it -v /path/to/opensnoop:/src yunwei37/ebpm:latest
|
||||
$ # 运行 eBPF 程序(root shell)
|
||||
$ sudo ecli run package.json
|
||||
```
|
||||
|
||||
### Prometheus 可视化
|
||||
|
||||
编写 yaml 配置文件:
|
||||
|
||||
```yaml
|
||||
programs:
|
||||
- name: opensnoop
|
||||
metrics:
|
||||
counters:
|
||||
- name: eunomia_file_open_counter
|
||||
description: test
|
||||
labels:
|
||||
- name: pid
|
||||
- name: comm
|
||||
- name: filename
|
||||
from: fname
|
||||
compiled_ebpf_filename: package.json
|
||||
```
|
||||
|
||||
使用 eunomia-exporter 实现导出信息到 Prometheus:
|
||||
|
||||
- 通过 https://github.com/eunomia-bpf/eunomia-bpf/releases 下载 eunomia-exporter
|
||||
|
||||
```console
|
||||
$ ls
|
||||
config.yaml eunomia-exporter package.json
|
||||
$ sudo ./eunomia-exporter
|
||||
|
||||
Running ebpf program opensnoop takes 46 ms
|
||||
Listening on http://127.0.0.1:8526
|
||||
running and waiting for the ebpf events from perf event...
|
||||
Receiving request at path /metrics
|
||||
```
|
||||
|
||||

|
||||
|
||||
### 总结和参考资料
|
||||
|
||||
`opensnoop` 通过对 open 系统调用的追踪,使得用户可以较为方便地掌握目前系统中调用了 open 系统调用的进程信息。
|
||||
|
||||
参考资料:
|
||||
|
||||
- 源代码:https://github.com/eunomia-bpf/eunomia-bpf/tree/master/examples/bpftools/opensnoop
|
||||
- libbpf 参考代码:https://github.com/iovisor/bcc/blob/master/libbpf-tools/opensnoop.bpf.c
|
||||
- eunomia-bpf 手册:https://eunomia-bpf.github.io/
|
||||
281
4-opensnoop/README.md
Normal file
281
4-opensnoop/README.md
Normal file
@@ -0,0 +1,281 @@
|
||||
---
|
||||
layout: post
|
||||
title: opensnoop
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall]
|
||||
summary: opensnoop traces the open() syscall system-wide, and prints various details.
|
||||
---
|
||||
|
||||
## origin
|
||||
|
||||
The kernel code is origin from:
|
||||
|
||||
<https://github.com/iovisor/bcc/blob/master/libbpf-tools/opensnoop.bpf.c>
|
||||
|
||||
result:
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/opensnoop/package.json -h
|
||||
Usage: opensnoop_bpf [--help] [--version] [--verbose] [--pid_target VAR] [--tgid_target VAR] [--uid_target VAR] [--failed]
|
||||
|
||||
Trace open family syscalls.
|
||||
|
||||
Optional arguments:
|
||||
-h, --help shows help message and exits
|
||||
-v, --version prints version information and exits
|
||||
--verbose prints libbpf debug information
|
||||
--pid_target Process ID to trace
|
||||
--tgid_target Thread ID to trace
|
||||
--uid_target User ID to trace
|
||||
-f, --failed trace only failed events
|
||||
|
||||
Built with eunomia-bpf framework.
|
||||
See https://github.com/eunomia-bpf/eunomia-bpf for more information.
|
||||
|
||||
$ sudo ecli examples/bpftools/opensnoop/package.json
|
||||
TIME TS PID UID RET FLAGS COMM FNAME
|
||||
20:31:50 0 1 0 51 524288 systemd /proc/614/cgroup
|
||||
20:31:50 0 33182 0 25 524288 ecli /etc/localtime
|
||||
20:31:53 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:31:53 0 754 0 6 0 irqbalance /proc/stat
|
||||
20:32:03 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:32:03 0 754 0 6 0 irqbalance /proc/stat
|
||||
20:32:03 0 632 0 7 524288 vmtoolsd /etc/mtab
|
||||
20:32:03 0 632 0 9 0 vmtoolsd /proc/devices
|
||||
|
||||
$ sudo ecli examples/bpftools/opensnoop/package.json --pid_target 754
|
||||
TIME TS PID UID RET FLAGS COMM FNAME
|
||||
20:34:13 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:34:13 0 754 0 6 0 irqbalance /proc/stat
|
||||
20:34:23 0 754 0 6 0 irqbalance /proc/interrupts
|
||||
20:34:23 0 754 0 6 0 irqbalance /proc/stat
|
||||
```
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile with docker:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc opensnoop.bpf.c opensnoop.h
|
||||
Compiling bpf object...
|
||||
Generating export types...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```shell
|
||||
sudo ./ecli run examples/bpftools/opensnoop/package.json
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of opensnoop, the Linux eBPF/bcc version.
|
||||
|
||||
opensnoop traces the open() syscall system-wide, and prints various details.
|
||||
Example output:
|
||||
|
||||
```console
|
||||
# ./opensnoop
|
||||
PID COMM FD ERR PATH
|
||||
17326 <...> 7 0 /sys/kernel/debug/tracing/trace_pipe
|
||||
1576 snmpd 9 0 /proc/net/dev
|
||||
1576 snmpd 11 0 /proc/net/if_inet6
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/eth0/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/conf/eth0/forwarding
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/eth0/base_reachable_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv4/neigh/lo/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/lo/retrans_time_ms
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/conf/lo/forwarding
|
||||
1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/lo/base_reachable_time_ms
|
||||
1576 snmpd 9 0 /proc/diskstats
|
||||
1576 snmpd 9 0 /proc/stat
|
||||
1576 snmpd 9 0 /proc/vmstat
|
||||
1956 supervise 9 0 supervise/status.new
|
||||
1956 supervise 9 0 supervise/status.new
|
||||
17358 run 3 0 /etc/ld.so.cache
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libtinfo.so.5
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
17358 run -1 6 /dev/tty
|
||||
17358 run 3 0 /proc/meminfo
|
||||
17358 run 3 0 /etc/nsswitch.conf
|
||||
17358 run 3 0 /etc/ld.so.cache
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnss_compat.so.2
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnsl.so.1
|
||||
17358 run 3 0 /etc/ld.so.cache
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnss_nis.so.2
|
||||
17358 run 3 0 /lib/x86_64-linux-gnu/libnss_files.so.2
|
||||
17358 run 3 0 /etc/passwd
|
||||
17358 run 3 0 ./run
|
||||
^C
|
||||
``
|
||||
While tracing, the snmpd process opened various /proc files (reading metrics),
|
||||
and a "run" process read various libraries and config files (looks like it
|
||||
was starting up: a new process).
|
||||
|
||||
opensnoop can be useful for discovering configuration and log files, if used
|
||||
during application startup.
|
||||
|
||||
```console
|
||||
The -p option can be used to filter on a PID, which is filtered in-kernel. Here
|
||||
I've used it with -T to print timestamps:
|
||||
|
||||
./opensnoop -Tp 1956
|
||||
TIME(s) PID COMM FD ERR PATH
|
||||
0.000000000 1956 supervise 9 0 supervise/status.new
|
||||
0.000289999 1956 supervise 9 0 supervise/status.new
|
||||
1.023068000 1956 supervise 9 0 supervise/status.new
|
||||
1.023381997 1956 supervise 9 0 supervise/status.new
|
||||
2.046030000 1956 supervise 9 0 supervise/status.new
|
||||
2.046363000 1956 supervise 9 0 supervise/status.new
|
||||
3.068203997 1956 supervise 9 0 supervise/status.new
|
||||
3.068544999 1956 supervise 9 0 supervise/status.new
|
||||
```
|
||||
|
||||
This shows the supervise process is opening the status.new file twice every
|
||||
second.
|
||||
|
||||
The -U option include UID on output:
|
||||
|
||||
```console
|
||||
# ./opensnoop -U
|
||||
UID PID COMM FD ERR PATH
|
||||
0 27063 vminfo 5 0 /var/run/utmp
|
||||
103 628 dbus-daemon -1 2 /usr/local/share/dbus-1/system-services
|
||||
103 628 dbus-daemon 18 0 /usr/share/dbus-1/system-services
|
||||
103 628 dbus-daemon -1 2 /lib/dbus-1/system-services
|
||||
```
|
||||
|
||||
The -u option filtering UID:
|
||||
|
||||
```console
|
||||
# ./opensnoop -Uu 1000
|
||||
UID PID COMM FD ERR PATH
|
||||
1000 30240 ls 3 0 /etc/ld.so.cache
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libpcre.so.3
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libpthread.so.0
|
||||
```
|
||||
|
||||
The -x option only prints failed opens:
|
||||
|
||||
```console
|
||||
# ./opensnoop -x
|
||||
PID COMM FD ERR PATH
|
||||
18372 run -1 6 /dev/tty
|
||||
18373 run -1 6 /dev/tty
|
||||
18373 multilog -1 13 lock
|
||||
18372 multilog -1 13 lock
|
||||
18384 df -1 2 /usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en_US/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo
|
||||
18384 df -1 2 /usr/share/locale/en/LC_MESSAGES/coreutils.mo
|
||||
18385 run -1 6 /dev/tty
|
||||
18386 run -1 6 /dev/tty
|
||||
```
|
||||
|
||||
This caught a df command failing to open a coreutils.mo file, and trying from
|
||||
different directories.
|
||||
|
||||
The ERR column is the system error number. Error number 2 is ENOENT: no such
|
||||
file or directory.
|
||||
|
||||
A maximum tracing duration can be set with the -d option. For example, to trace
|
||||
for 2 seconds:
|
||||
|
||||
```console
|
||||
# ./opensnoop -d 2
|
||||
PID COMM FD ERR PATH
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
2191 indicator-multi 11 0 /sys/block
|
||||
|
||||
```
|
||||
|
||||
The -n option can be used to filter on process name using partial matches:
|
||||
|
||||
```console
|
||||
# ./opensnoop -n ed
|
||||
|
||||
PID COMM FD ERR PATH
|
||||
2679 sed 3 0 /etc/ld.so.cache
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libpcre.so.3
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
2679 sed 3 0 /lib/x86_64-linux-gnu/libpthread.so.0
|
||||
2679 sed 3 0 /proc/filesystems
|
||||
2679 sed 3 0 /usr/lib/locale/locale-archive
|
||||
2679 sed -1 2
|
||||
2679 sed 3 0 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
|
||||
2679 sed 3 0 /dev/null
|
||||
2680 sed 3 0 /etc/ld.so.cache
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libc.so.6
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libpcre.so.3
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libdl.so.2
|
||||
2680 sed 3 0 /lib/x86_64-linux-gnu/libpthread.so.0
|
||||
2680 sed 3 0 /proc/filesystems
|
||||
2680 sed 3 0 /usr/lib/locale/locale-archive
|
||||
2680 sed -1 2
|
||||
^C
|
||||
```
|
||||
|
||||
This caught the 'sed' command because it partially matches 'ed' that's passed
|
||||
to the '-n' option.
|
||||
|
||||
The -e option prints out extra columns; for example, the following output
|
||||
contains the flags passed to open(2), in octal:
|
||||
|
||||
```console
|
||||
# ./opensnoop -e
|
||||
PID COMM FD ERR FLAGS PATH
|
||||
28512 sshd 10 0 00101101 /proc/self/oom_score_adj
|
||||
28512 sshd 3 0 02100000 /etc/ld.so.cache
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libwrap.so.0
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libaudit.so.1
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libpam.so.0
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libselinux.so.1
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libsystemd.so.0
|
||||
28512 sshd 3 0 02100000 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.2
|
||||
28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libutil.so.1
|
||||
```
|
||||
|
||||
The -f option filters based on flags to the open(2) call, for example:
|
||||
|
||||
```console
|
||||
# ./opensnoop -e -f O_WRONLY -f O_RDWR
|
||||
PID COMM FD ERR FLAGS PATH
|
||||
28084 clear_console 3 0 00100002 /dev/tty
|
||||
28084 clear_console -1 13 00100002 /dev/tty0
|
||||
28084 clear_console -1 13 00100001 /dev/tty0
|
||||
28084 clear_console -1 13 00100002 /dev/console
|
||||
28084 clear_console -1 13 00100001 /dev/console
|
||||
28051 sshd 8 0 02100002 /var/run/utmp
|
||||
28051 sshd 7 0 00100001 /var/log/wtmp
|
||||
```
|
||||
|
||||
The --cgroupmap option filters based on a cgroup set. It is meant to be used
|
||||
with an externally created map.
|
||||
|
||||
```console
|
||||
# ./opensnoop --cgroupmap /sys/fs/bpf/test01
|
||||
```
|
||||
|
||||
For more details, see docs/special_filtering.md
|
||||
12
4-opensnoop/config.yaml
Normal file
12
4-opensnoop/config.yaml
Normal file
@@ -0,0 +1,12 @@
|
||||
programs:
|
||||
- name: opensnoop
|
||||
metrics:
|
||||
counters:
|
||||
- name: eunomia_file_open_counter
|
||||
description: test
|
||||
labels:
|
||||
- name: pid
|
||||
- name: comm
|
||||
- name: filename
|
||||
from: fname
|
||||
compiled_ebpf_filename: package.json
|
||||
140
4-opensnoop/opensnoop.bpf.c
Normal file
140
4-opensnoop/opensnoop.bpf.c
Normal file
@@ -0,0 +1,140 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2019 Facebook
|
||||
// Copyright (c) 2020 Netflix
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include "opensnoop.h"
|
||||
|
||||
struct args_t {
|
||||
const char *fname;
|
||||
int flags;
|
||||
};
|
||||
|
||||
/// Process ID to trace
|
||||
const volatile int pid_target = 0;
|
||||
/// Thread ID to trace
|
||||
const volatile int tgid_target = 0;
|
||||
/// @description User ID to trace
|
||||
const volatile int uid_target = 0;
|
||||
/// @cmdarg {"default": false, "short": "f", "long": "failed"}
|
||||
/// @description trace only failed events
|
||||
const volatile bool targ_failed = false;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 10240);
|
||||
__type(key, u32);
|
||||
__type(value, struct args_t);
|
||||
} start SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(u32));
|
||||
__uint(value_size, sizeof(u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static __always_inline bool valid_uid(uid_t uid) {
|
||||
return uid != INVALID_UID;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
bool trace_allowed(u32 tgid, u32 pid)
|
||||
{
|
||||
u32 uid;
|
||||
|
||||
/* filters */
|
||||
if (tgid_target && tgid_target != tgid)
|
||||
return false;
|
||||
if (pid_target && pid_target != pid)
|
||||
return false;
|
||||
if (valid_uid(uid_target)) {
|
||||
uid = (u32)bpf_get_current_uid_gid();
|
||||
if (uid_target != uid) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_open")
|
||||
int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[0];
|
||||
args.flags = (int)ctx->args[1];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_openat")
|
||||
int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id = bpf_get_current_pid_tgid();
|
||||
/* use kernel terminology here for tgid/pid: */
|
||||
u32 tgid = id >> 32;
|
||||
u32 pid = id;
|
||||
|
||||
/* store arg info for later lookup */
|
||||
if (trace_allowed(tgid, pid)) {
|
||||
struct args_t args = {};
|
||||
args.fname = (const char *)ctx->args[1];
|
||||
args.flags = (int)ctx->args[2];
|
||||
bpf_map_update_elem(&start, &pid, &args, 0);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
int trace_exit(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
struct event event = {};
|
||||
struct args_t *ap;
|
||||
int ret;
|
||||
u32 pid = bpf_get_current_pid_tgid();
|
||||
|
||||
ap = bpf_map_lookup_elem(&start, &pid);
|
||||
if (!ap)
|
||||
return 0; /* missed entry */
|
||||
ret = ctx->ret;
|
||||
if (targ_failed && ret >= 0)
|
||||
goto cleanup; /* want failed only */
|
||||
|
||||
/* event data */
|
||||
event.pid = bpf_get_current_pid_tgid() >> 32;
|
||||
event.uid = bpf_get_current_uid_gid();
|
||||
bpf_get_current_comm(&event.comm, sizeof(event.comm));
|
||||
bpf_probe_read_user_str(&event.fname, sizeof(event.fname), ap->fname);
|
||||
event.flags = ap->flags;
|
||||
event.ret = ret;
|
||||
|
||||
/* emit event */
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
|
||||
&event, sizeof(event));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_open")
|
||||
int tracepoint__syscalls__sys_exit_open(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_openat")
|
||||
int tracepoint__syscalls__sys_exit_openat(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
return trace_exit(ctx);
|
||||
}
|
||||
|
||||
/// Trace open family syscalls.
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
21
4-opensnoop/opensnoop.h
Normal file
21
4-opensnoop/opensnoop.h
Normal file
@@ -0,0 +1,21 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __OPENSNOOP_H
|
||||
#define __OPENSNOOP_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
#define NAME_MAX 255
|
||||
#define INVALID_UID ((uid_t)-1)
|
||||
|
||||
// used for export event
|
||||
struct event {
|
||||
/* user terminology for pid: */
|
||||
unsigned long long ts;
|
||||
int pid;
|
||||
int uid;
|
||||
int ret;
|
||||
int flags;
|
||||
char comm[TASK_COMM_LEN];
|
||||
char fname[NAME_MAX];
|
||||
};
|
||||
|
||||
#endif /* __OPENSNOOP_H */
|
||||
7
5-uprobe-bashreadline/.gitignore
vendored
Normal file
7
5-uprobe-bashreadline/.gitignore
vendored
Normal file
@@ -0,0 +1,7 @@
|
||||
.vscode
|
||||
package.json
|
||||
ecli
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
79
5-uprobe-bashreadline/README.md
Normal file
79
5-uprobe-bashreadline/README.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
layout: post
|
||||
title: bootstrap
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, examples, uprobe, perf event]
|
||||
summary: an example of a simple (but realistic) BPF application prints bash commands from all running bash shells on the system.
|
||||
---
|
||||
|
||||
|
||||
|
||||
This prints bash commands from all running bash shells on the system.
|
||||
|
||||
## System requirements:
|
||||
|
||||
- Linux kernel > 5.5
|
||||
- Eunomia's [ecli](https://github.com/eunomia-bpf/eunomia-bpf/tree/master/ecli) installed
|
||||
|
||||
|
||||
## Run
|
||||
|
||||
- Compile:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```shell
|
||||
ecc bashreadline.bpf.c bashreadline.h
|
||||
```
|
||||
|
||||
- Run:
|
||||
|
||||
```console
|
||||
$ sudo ./ecli run eunomia-bpf/examples/bpftools/bootstrap/package.json
|
||||
TIME PID STR
|
||||
11:17:34 28796 whoami
|
||||
11:17:41 28796 ps -ef
|
||||
11:17:51 28796 echo "Hello eBPF!"
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
|
||||
```
|
||||
Demonstrations of bashreadline, the Linux eBPF/bcc version.
|
||||
|
||||
This prints bash commands from all running bash shells on the system. For
|
||||
example:
|
||||
|
||||
# ./bashreadline
|
||||
TIME PID COMMAND
|
||||
05:28:25 21176 ls -l
|
||||
05:28:28 21176 date
|
||||
05:28:35 21176 echo hello world
|
||||
05:28:43 21176 foo this command failed
|
||||
05:28:45 21176 df -h
|
||||
05:29:04 3059 echo another shell
|
||||
05:29:13 21176 echo first shell again
|
||||
|
||||
When running the script on Arch Linux, you may need to specify the location
|
||||
of libreadline.so library:
|
||||
|
||||
# ./bashreadline -s /lib/libreadline.so
|
||||
TIME PID COMMAND
|
||||
11:17:34 28796 whoami
|
||||
11:17:41 28796 ps -ef
|
||||
11:17:51 28796 echo "Hello eBPF!"
|
||||
|
||||
|
||||
The entered command may fail. This is just showing what command lines were
|
||||
entered interactively for bash to process.
|
||||
|
||||
It works by tracing the return of the readline() function using uprobes
|
||||
(specifically a uretprobe).
|
||||
```
|
||||
48
5-uprobe-bashreadline/bashreadline.bpf.c
Normal file
48
5-uprobe-bashreadline/bashreadline.bpf.c
Normal file
@@ -0,0 +1,48 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/* Copyright (c) 2021 Facebook */
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include "bashreadline.h"
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(__u32));
|
||||
__uint(value_size, sizeof(__u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
/* Format of u[ret]probe section definition supporting auto-attach:
|
||||
* u[ret]probe/binary:function[+offset]
|
||||
*
|
||||
* binary can be an absolute/relative path or a filename; the latter is resolved to a
|
||||
* full binary path via bpf_program__attach_uprobe_opts.
|
||||
*
|
||||
* Specifying uprobe+ ensures we carry out strict matching; either "uprobe" must be
|
||||
* specified (and auto-attach is not possible) or the above format is specified for
|
||||
* auto-attach.
|
||||
*/
|
||||
SEC("uprobe//bin/bash:readline")
|
||||
int BPF_KRETPROBE(printret, const void *ret) {
|
||||
struct str_t data;
|
||||
char comm[TASK_COMM_LEN];
|
||||
u32 pid;
|
||||
|
||||
if (!ret)
|
||||
return 0;
|
||||
|
||||
bpf_get_current_comm(&comm, sizeof(comm));
|
||||
if (comm[0] != 'b' || comm[1] != 'a' || comm[2] != 's' || comm[3] != 'h' || comm[4] != 0 )
|
||||
return 0;
|
||||
|
||||
pid = bpf_get_current_pid_tgid() >> 32;
|
||||
data.pid = pid;
|
||||
bpf_probe_read_user_str(&data.str, sizeof(data.str), ret);
|
||||
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &data, sizeof(data));
|
||||
|
||||
return 0;
|
||||
};
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
13
5-uprobe-bashreadline/bashreadline.h
Normal file
13
5-uprobe-bashreadline/bashreadline.h
Normal file
@@ -0,0 +1,13 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
/* Copyright (c) 2021 Facebook */
|
||||
#ifndef __BASHREADLINE_H
|
||||
#define __BASHREADLINE_H
|
||||
|
||||
#define MAX_LINE_SIZE 80
|
||||
|
||||
struct str_t {
|
||||
__u32 pid;
|
||||
char str[MAX_LINE_SIZE];
|
||||
};
|
||||
|
||||
#endif /* __BASHREADLINE_H */
|
||||
10
6-sigsnoop/.gitignore
vendored
Executable file
10
6-sigsnoop/.gitignore
vendored
Executable file
@@ -0,0 +1,10 @@
|
||||
.vscode
|
||||
package.json
|
||||
*.wasm
|
||||
ewasm-skel.h
|
||||
ecli
|
||||
ewasm
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
155
6-sigsnoop/README.md
Executable file
155
6-sigsnoop/README.md
Executable file
@@ -0,0 +1,155 @@
|
||||
---
|
||||
layout: post
|
||||
title: sigsnoop
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall, kprobe, tracepoint]
|
||||
summary: Trace signals generated system wide, from syscalls and others.
|
||||
---
|
||||
|
||||
|
||||
## origin
|
||||
|
||||
origin from:
|
||||
|
||||
https://github.com/iovisor/bcc/blob/master/libbpf-tools/sigsnoop.bpf.c
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
Or compile with `ecc`:
|
||||
|
||||
```console
|
||||
$ ecc sigsnoop.bpf.c sigsnoop.h
|
||||
Compiling bpf object...
|
||||
Generating export types...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
$ sudo ./ecli examples/bpftools/sigsnoop/package.json
|
||||
TIME PID TPID SIG RET COMM
|
||||
20:43:44 21276 3054 0 0 cpptools-srv
|
||||
20:43:44 22407 3054 0 0 cpptools-srv
|
||||
20:43:44 20222 3054 0 0 cpptools-srv
|
||||
20:43:44 8933 3054 0 0 cpptools-srv
|
||||
20:43:44 2915 2803 0 0 node
|
||||
20:43:44 2943 2803 0 0 node
|
||||
20:43:44 31453 3054 0 0 cpptools-srv
|
||||
$ sudo ./ecli examples/bpftools/sigsnoop/package.json -h
|
||||
Usage: sigsnoop_bpf [--help] [--version] [--verbose] [--filtered_pid VAR] [--target_signal VAR] [--failed_only]
|
||||
|
||||
A simple eBPF program
|
||||
|
||||
Optional arguments:
|
||||
-h, --help shows help message and exits
|
||||
-v, --version prints version information and exits
|
||||
--verbose prints libbpf debug information
|
||||
--filtered_pid set value of pid_t variable filtered_pid
|
||||
--target_signal set value of int variable target_signal
|
||||
--failed_only set value of bool variable failed_only
|
||||
|
||||
Built with eunomia-bpf framework.
|
||||
See https://github.com/eunomia-bpf/eunomia-bpf for more information.
|
||||
```
|
||||
|
||||
## WASM example
|
||||
|
||||
Generate WASM skel:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest gen-wasm-skel
|
||||
```
|
||||
|
||||
> The skel is generated and commit, so you don't need to generate it again.
|
||||
> skel includes:
|
||||
>
|
||||
> - eunomia-include: include headers for WASM
|
||||
> - app.c: the WASM app. all library is header only.
|
||||
|
||||
Build WASM module
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest build-wasm
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
$ sudo ./ecli run app.wasm -h
|
||||
Usage: sigsnoop [-h] [-x] [-k] [-n] [-p PID] [-s SIGNAL]
|
||||
Trace standard and real-time signals.
|
||||
|
||||
|
||||
-h, --help show this help message and exit
|
||||
-x, --failed failed signals only
|
||||
-k, --killed kill only
|
||||
-p, --pid=<int> target pid
|
||||
-s, --signal=<int> target signal
|
||||
|
||||
$ sudo ./ecli run app.wasm
|
||||
running and waiting for the ebpf events from perf event...
|
||||
{"pid":185539,"tpid":185538,"sig":17,"ret":0,"comm":"cat","sig_name":"SIGCHLD"}
|
||||
{"pid":185540,"tpid":185538,"sig":17,"ret":0,"comm":"grep","sig_name":"SIGCHLD"}
|
||||
|
||||
$ sudo ./ecli run app.wasm -p 1641
|
||||
running and waiting for the ebpf events from perf event...
|
||||
{"pid":1641,"tpid":2368,"sig":23,"ret":0,"comm":"YDLive","sig_name":"SIGURG"}
|
||||
{"pid":1641,"tpid":2368,"sig":23,"ret":0,"comm":"YDLive","sig_name":"SIGURG"}
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of sigsnoop.
|
||||
|
||||
|
||||
This traces signals generated system wide. For example:
|
||||
```console
|
||||
# ./sigsnoop -n
|
||||
TIME PID COMM SIG TPID RESULT
|
||||
19:56:14 3204808 a.out SIGSEGV 3204808 0
|
||||
19:56:14 3204808 a.out SIGPIPE 3204808 0
|
||||
19:56:14 3204808 a.out SIGCHLD 3204722 0
|
||||
```
|
||||
The first line showed that a.out (a test program) deliver a SIGSEGV signal.
|
||||
The result, 0, means success.
|
||||
|
||||
The second and third lines showed that a.out also deliver SIGPIPE/SIGCHLD
|
||||
signals successively.
|
||||
|
||||
USAGE message:
|
||||
```console
|
||||
# ./sigsnoop -h
|
||||
Usage: sigsnoop [OPTION...]
|
||||
Trace standard and real-time signals.
|
||||
|
||||
USAGE: sigsnoop [-h] [-x] [-k] [-n] [-p PID] [-s SIGNAL]
|
||||
|
||||
EXAMPLES:
|
||||
sigsnoop # trace signals system-wide
|
||||
sigsnoop -k # trace signals issued by kill syscall only
|
||||
sigsnoop -x # trace failed signals only
|
||||
sigsnoop -p 1216 # only trace PID 1216
|
||||
sigsnoop -s 9 # only trace signal 9
|
||||
|
||||
-k, --kill Trace signals issued by kill syscall only.
|
||||
-n, --name Output signal name instead of signal number.
|
||||
-p, --pid=PID Process ID to trace
|
||||
-s, --signal=SIGNAL Signal to trace.
|
||||
-x, --failed Trace failed signals only.
|
||||
-?, --help Give this help list
|
||||
--usage Give a short usage message
|
||||
-V, --version Print program version
|
||||
```
|
||||
Mandatory or optional arguments to long options are also mandatory or optional
|
||||
for any corresponding short options.
|
||||
|
||||
Report bugs to https://github.com/iovisor/bcc/tree/master/libbpf-tools.
|
||||
245
6-sigsnoop/app.c
Executable file
245
6-sigsnoop/app.c
Executable file
@@ -0,0 +1,245 @@
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdarg.h>
|
||||
#include <stdint.h>
|
||||
#include <stdbool.h>
|
||||
#include "eunomia-include/wasm-app.h"
|
||||
#include "eunomia-include/entry.h"
|
||||
#include "eunomia-include/argp.h"
|
||||
#include "sigsnoop.bpf.h"
|
||||
#include "ewasm-skel.h"
|
||||
#include "eunomia-include/sigsnoop.skel.h"
|
||||
#define PERF_BUFFER_PAGES 16
|
||||
#define PERF_POLL_TIMEOUT_MS 100
|
||||
#define warn(...) printf(__VA_ARGS__)
|
||||
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
|
||||
|
||||
static volatile int exiting = 0;
|
||||
|
||||
static int target_pid = 0;
|
||||
static int target_signal = 0;
|
||||
static bool failed_only = false;
|
||||
static bool kill_only = false;
|
||||
static bool signal_name = false;
|
||||
static bool verbose = false;
|
||||
|
||||
static const char *sig_name[] = {
|
||||
[0] = "N/A",
|
||||
[1] = "SIGHUP",
|
||||
[2] = "SIGINT",
|
||||
[3] = "SIGQUIT",
|
||||
[4] = "SIGILL",
|
||||
[5] = "SIGTRAP",
|
||||
[6] = "SIGABRT",
|
||||
[7] = "SIGBUS",
|
||||
[8] = "SIGFPE",
|
||||
[9] = "SIGKILL",
|
||||
[10] = "SIGUSR1",
|
||||
[11] = "SIGSEGV",
|
||||
[12] = "SIGUSR2",
|
||||
[13] = "SIGPIPE",
|
||||
[14] = "SIGALRM",
|
||||
[15] = "SIGTERM",
|
||||
[16] = "SIGSTKFLT",
|
||||
[17] = "SIGCHLD",
|
||||
[18] = "SIGCONT",
|
||||
[19] = "SIGSTOP",
|
||||
[20] = "SIGTSTP",
|
||||
[21] = "SIGTTIN",
|
||||
[22] = "SIGTTOU",
|
||||
[23] = "SIGURG",
|
||||
[24] = "SIGXCPU",
|
||||
[25] = "SIGXFSZ",
|
||||
[26] = "SIGVTALRM",
|
||||
[27] = "SIGPROF",
|
||||
[28] = "SIGWINCH",
|
||||
[29] = "SIGIO",
|
||||
[30] = "SIGPWR",
|
||||
[31] = "SIGSYS",
|
||||
};
|
||||
|
||||
const char *argp_program_version = "sigsnoop 0.1";
|
||||
const char *argp_program_bug_address =
|
||||
"https://github.com/iovisor/bcc/tree/master/libbpf-tools";
|
||||
const char argp_program_doc[] =
|
||||
"Trace standard and real-time signals.\n"
|
||||
"\n"
|
||||
"USAGE: sigsnoop [-h] [-x] [-k] [-n] [-p PID] [-s SIGNAL]\n"
|
||||
"\n"
|
||||
"EXAMPLES:\n"
|
||||
" sigsnoop # trace signals system-wide\n"
|
||||
" sigsnoop -k # trace signals issued by kill syscall only\n"
|
||||
" sigsnoop -x # trace failed signals only\n"
|
||||
" sigsnoop -p 1216 # only trace PID 1216\n"
|
||||
" sigsnoop -s 9 # only trace signal 9\n";
|
||||
|
||||
static const struct argp_option opts[] = {
|
||||
{ "failed", 'x', NULL, 0, "Trace failed signals only." },
|
||||
{ "kill", 'k', NULL, 0, "Trace signals issued by kill syscall only." },
|
||||
{ "pid", 'p', "PID", 0, "Process ID to trace" },
|
||||
{ "signal", 's', "SIGNAL", 0, "Signal to trace." },
|
||||
{ "name", 'n', NULL, 0, "Output signal name instead of signal number." },
|
||||
{ "verbose", 'v', NULL, 0, "Verbose debug output" },
|
||||
{ NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" },
|
||||
{},
|
||||
};
|
||||
|
||||
static error_t parse_arg(int key, char *arg, struct argp_state *state)
|
||||
{
|
||||
long pid, sig;
|
||||
|
||||
switch (key) {
|
||||
case 'p':
|
||||
errno = 0;
|
||||
pid = strtol(arg, NULL, 10);
|
||||
if (errno || pid <= 0) {
|
||||
warn("Invalid PID: %s\n", arg);
|
||||
argp_usage(state);
|
||||
}
|
||||
target_pid = pid;
|
||||
break;
|
||||
case 's':
|
||||
errno = 0;
|
||||
sig = strtol(arg, NULL, 10);
|
||||
if (errno || sig <= 0) {
|
||||
warn("Invalid SIGNAL: %s\n", arg);
|
||||
argp_usage(state);
|
||||
}
|
||||
target_signal = sig;
|
||||
break;
|
||||
case 'n':
|
||||
signal_name = true;
|
||||
break;
|
||||
case 'x':
|
||||
failed_only = true;
|
||||
break;
|
||||
case 'k':
|
||||
kill_only = true;
|
||||
break;
|
||||
case 'v':
|
||||
verbose = true;
|
||||
break;
|
||||
case 'h':
|
||||
argp_state_help(state, ARGP_HELP_STD_HELP);
|
||||
break;
|
||||
default:
|
||||
return ARGP_ERR_UNKNOWN;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int libbpf_print_fn(const char *format, va_list args)
|
||||
{
|
||||
if (!verbose)
|
||||
return 0;
|
||||
return printf(format, args);
|
||||
}
|
||||
|
||||
static void alias_parse(char *prog)
|
||||
{
|
||||
char *name = prog;
|
||||
|
||||
if (!strcmp(name, "killsnoop")) {
|
||||
kill_only = true;
|
||||
}
|
||||
}
|
||||
|
||||
static void sig_int(int signo)
|
||||
{
|
||||
exiting = 1;
|
||||
}
|
||||
|
||||
static void handle_event(void *ctx, int cpu, void *data, unsigned int data_sz)
|
||||
{
|
||||
struct event *e = data;
|
||||
char ts[32] = "12:47:32";
|
||||
|
||||
if (signal_name && e->sig < ARRAY_SIZE(sig_name))
|
||||
printf("%-8s %-7d %-16s %-9s %-7d %-6d\n",
|
||||
ts, e->pid, e->comm, sig_name[e->sig], e->tpid, e->ret);
|
||||
else
|
||||
printf("%-8s %-7d %-16s %-9d %-7d %-6d\n",
|
||||
ts, e->pid, e->comm, e->sig, e->tpid, e->ret);
|
||||
}
|
||||
|
||||
static void handle_lost_events(void *ctx, int cpu, unsigned long long lost_cnt)
|
||||
{
|
||||
warn("lost %llu events on CPU #%d\n", lost_cnt, cpu);
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
static const struct argp argp = {
|
||||
.options = opts,
|
||||
.parser = parse_arg,
|
||||
.doc = argp_program_doc,
|
||||
};
|
||||
struct perf_buffer *pb = NULL;
|
||||
struct sigsnoop_bpf *obj;
|
||||
int err;
|
||||
|
||||
alias_parse(argv[0]);
|
||||
err = argp_parse(&argp, argc, argv, 0, NULL, NULL);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
obj = sigsnoop_bpf__open();
|
||||
if (!obj) {
|
||||
warn("failed to open BPF object\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
obj->rodata->filtered_pid = target_pid;
|
||||
obj->rodata->target_signal = target_signal;
|
||||
obj->rodata->failed_only = failed_only;
|
||||
|
||||
if (kill_only) {
|
||||
bpf_program__set_autoload(obj->progs.sig_trace, false);
|
||||
} else {
|
||||
bpf_program__set_autoload(obj->progs.kill_entry, false);
|
||||
bpf_program__set_autoload(obj->progs.kill_exit, false);
|
||||
bpf_program__set_autoload(obj->progs.tkill_entry, false);
|
||||
bpf_program__set_autoload(obj->progs.tkill_exit, false);
|
||||
bpf_program__set_autoload(obj->progs.tgkill_entry, false);
|
||||
bpf_program__set_autoload(obj->progs.tgkill_exit, false);
|
||||
}
|
||||
|
||||
err = sigsnoop_bpf__load(obj);
|
||||
if (err) {
|
||||
warn("failed to load BPF object: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
err = sigsnoop_bpf__attach(obj);
|
||||
if (err) {
|
||||
warn("failed to attach BPF programs: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
pb = perf_buffer__new(bpf_map__fd(obj->maps.events), PERF_BUFFER_PAGES,
|
||||
handle_event, handle_lost_events, NULL, NULL);
|
||||
if (!pb) {
|
||||
warn("failed to open perf buffer: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
printf("%-8s %-7s %-16s %-9s %-7s %-6s\n",
|
||||
"TIME", "PID", "COMM", "SIG", "TPID", "RESULT");
|
||||
|
||||
while (!exiting) {
|
||||
err = perf_buffer__poll(pb, PERF_POLL_TIMEOUT_MS);
|
||||
if (err < 0 && err != -EINTR) {
|
||||
warn("error polling perf buffer: %s\n", strerror(-err));
|
||||
goto cleanup;
|
||||
}
|
||||
/* reset err to return 0 if exiting */
|
||||
err = 0;
|
||||
}
|
||||
|
||||
cleanup:
|
||||
perf_buffer__free(pb);
|
||||
sigsnoop_bpf__destroy(obj);
|
||||
|
||||
return err != 0;
|
||||
}
|
||||
96
6-sigsnoop/eunomia-include/argp-namefrob.h
Normal file
96
6-sigsnoop/eunomia-include/argp-namefrob.h
Normal file
@@ -0,0 +1,96 @@
|
||||
/* Name frobnication for compiling argp outside of glibc
|
||||
Copyright (C) 1997 Free Software Foundation, Inc.
|
||||
This file is part of the GNU C Library.
|
||||
Written by Miles Bader <miles@gnu.ai.mit.edu>.
|
||||
|
||||
The GNU C Library is free software; you can redistribute it and/or
|
||||
modify it under the terms of the GNU Library General Public License as
|
||||
published by the Free Software Foundation; either version 2 of the
|
||||
License, or (at your option) any later version.
|
||||
|
||||
The GNU C Library is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
Library General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Library General Public
|
||||
License along with the GNU C Library; see the file COPYING.LIB. If not,
|
||||
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
||||
Boston, MA 02111-1307, USA. */
|
||||
|
||||
#if !_LIBC
|
||||
/* This code is written for inclusion in gnu-libc, and uses names in the
|
||||
namespace reserved for libc. If we're not compiling in libc, define those
|
||||
names to be the normal ones instead. */
|
||||
|
||||
/* argp-parse functions */
|
||||
#undef __argp_parse
|
||||
#define __argp_parse argp_parse
|
||||
#undef __option_is_end
|
||||
#define __option_is_end _option_is_end
|
||||
#undef __option_is_short
|
||||
#define __option_is_short _option_is_short
|
||||
#undef __argp_input
|
||||
#define __argp_input _argp_input
|
||||
|
||||
/* argp-help functions */
|
||||
#undef __argp_help
|
||||
#define __argp_help argp_help
|
||||
#undef __argp_error
|
||||
#define __argp_error argp_error
|
||||
#undef __argp_failure
|
||||
#define __argp_failure argp_failure
|
||||
#undef __argp_state_help
|
||||
#define __argp_state_help argp_state_help
|
||||
#undef __argp_usage
|
||||
#define __argp_usage argp_usage
|
||||
#undef __argp_basename
|
||||
#define __argp_basename _argp_basename
|
||||
#undef __argp_short_program_name
|
||||
#define __argp_short_program_name _argp_short_program_name
|
||||
|
||||
/* argp-fmtstream functions */
|
||||
#undef __argp_make_fmtstream
|
||||
#define __argp_make_fmtstream argp_make_fmtstream
|
||||
#undef __argp_fmtstream_free
|
||||
#define __argp_fmtstream_free argp_fmtstream_free
|
||||
#undef __argp_fmtstream_putc
|
||||
#define __argp_fmtstream_putc argp_fmtstream_putc
|
||||
#undef __argp_fmtstream_puts
|
||||
#define __argp_fmtstream_puts argp_fmtstream_puts
|
||||
#undef __argp_fmtstream_write
|
||||
#define __argp_fmtstream_write argp_fmtstream_write
|
||||
#undef __argp_fmtstream_printf
|
||||
#define __argp_fmtstream_printf argp_fmtstream_printf
|
||||
#undef __argp_fmtstream_set_lmargin
|
||||
#define __argp_fmtstream_set_lmargin argp_fmtstream_set_lmargin
|
||||
#undef __argp_fmtstream_set_rmargin
|
||||
#define __argp_fmtstream_set_rmargin argp_fmtstream_set_rmargin
|
||||
#undef __argp_fmtstream_set_wmargin
|
||||
#define __argp_fmtstream_set_wmargin argp_fmtstream_set_wmargin
|
||||
#undef __argp_fmtstream_point
|
||||
#define __argp_fmtstream_point argp_fmtstream_point
|
||||
#undef __argp_fmtstream_update
|
||||
#define __argp_fmtstream_update _argp_fmtstream_update
|
||||
#undef __argp_fmtstream_ensure
|
||||
#define __argp_fmtstream_ensure _argp_fmtstream_ensure
|
||||
#undef __argp_fmtstream_lmargin
|
||||
#define __argp_fmtstream_lmargin argp_fmtstream_lmargin
|
||||
#undef __argp_fmtstream_rmargin
|
||||
#define __argp_fmtstream_rmargin argp_fmtstream_rmargin
|
||||
#undef __argp_fmtstream_wmargin
|
||||
#define __argp_fmtstream_wmargin argp_fmtstream_wmargin
|
||||
|
||||
/* normal libc functions we call */
|
||||
#undef __sleep
|
||||
#define __sleep sleep
|
||||
#undef __strcasecmp
|
||||
#define __strcasecmp strcasecmp
|
||||
#undef __vsnprintf
|
||||
#define __vsnprintf vsnprintf
|
||||
|
||||
#endif /* !_LIBC */
|
||||
|
||||
#ifndef __set_errno
|
||||
#define __set_errno(e) (errno = (e))
|
||||
#endif
|
||||
1854
6-sigsnoop/eunomia-include/argp.h
Normal file
1854
6-sigsnoop/eunomia-include/argp.h
Normal file
File diff suppressed because it is too large
Load Diff
403
6-sigsnoop/eunomia-include/argparse/argparse.c
Normal file
403
6-sigsnoop/eunomia-include/argparse/argparse.c
Normal file
@@ -0,0 +1,403 @@
|
||||
#ifndef ARGPARSE_C_H_
|
||||
#define ARGPARSE_C_H_
|
||||
|
||||
/**
|
||||
* Copyright (C) 2012-2015 Yecheng Fu <cofyc.jackson at gmail dot com>
|
||||
* All rights reserved.
|
||||
*
|
||||
* Use of this source code is governed by a MIT-style license that can be found
|
||||
* in the LICENSE file.
|
||||
*/
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <assert.h>
|
||||
#include <errno.h>
|
||||
#include "argparse.h"
|
||||
|
||||
#define OPT_UNSET 1
|
||||
#define OPT_LONG (1 << 1)
|
||||
|
||||
/* We define these the same for all machines.
|
||||
Changes from this to the outside world should be done in `_exit'. */
|
||||
#define EXIT_FAILURE 1 /* Failing exit status. */
|
||||
#define EXIT_SUCCESS 0 /* Successful exit status. */
|
||||
|
||||
static const char *
|
||||
prefix_skip(const char *str, const char *prefix)
|
||||
{
|
||||
size_t len = strlen(prefix);
|
||||
return strncmp(str, prefix, len) ? NULL : str + len;
|
||||
}
|
||||
|
||||
static int
|
||||
prefix_cmp(const char *str, const char *prefix)
|
||||
{
|
||||
for (;; str++, prefix++)
|
||||
if (!*prefix) {
|
||||
return 0;
|
||||
} else if (*str != *prefix) {
|
||||
return (unsigned char)*prefix - (unsigned char)*str;
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
argparse_error(struct argparse *self, const struct argparse_option *opt,
|
||||
const char *reason, int flags)
|
||||
{
|
||||
(void)self;
|
||||
if (flags & OPT_LONG) {
|
||||
printf("error: option `--%s` %s\n", opt->long_name, reason);
|
||||
} else {
|
||||
printf("error: option `-%c` %s\n", opt->short_name, reason);
|
||||
}
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
static int
|
||||
argparse_getvalue(struct argparse *self, const struct argparse_option *opt,
|
||||
int flags)
|
||||
{
|
||||
const char *s = NULL;
|
||||
if (!opt->value)
|
||||
goto skipped;
|
||||
switch (opt->type) {
|
||||
case ARGPARSE_OPT_BOOLEAN:
|
||||
if (flags & OPT_UNSET) {
|
||||
*(int *)opt->value = *(int *)opt->value - 1;
|
||||
} else {
|
||||
*(int *)opt->value = *(int *)opt->value + 1;
|
||||
}
|
||||
if (*(int *)opt->value < 0) {
|
||||
*(int *)opt->value = 0;
|
||||
}
|
||||
break;
|
||||
case ARGPARSE_OPT_BIT:
|
||||
if (flags & OPT_UNSET) {
|
||||
*(int *)opt->value &= ~opt->data;
|
||||
} else {
|
||||
*(int *)opt->value |= opt->data;
|
||||
}
|
||||
break;
|
||||
case ARGPARSE_OPT_STRING:
|
||||
if (self->optvalue) {
|
||||
*(const char **)opt->value = self->optvalue;
|
||||
self->optvalue = NULL;
|
||||
} else if (self->argc > 1) {
|
||||
self->argc--;
|
||||
*(const char **)opt->value = *++self->argv;
|
||||
} else {
|
||||
argparse_error(self, opt, "requires a value", flags);
|
||||
}
|
||||
break;
|
||||
case ARGPARSE_OPT_INTEGER:
|
||||
// errno = 0;
|
||||
if (self->optvalue) {
|
||||
*(int *)opt->value = strtol(self->optvalue, (char **)&s, 0);
|
||||
self->optvalue = NULL;
|
||||
} else if (self->argc > 1) {
|
||||
self->argc--;
|
||||
*(int *)opt->value = strtol(*++self->argv, (char **)&s, 0);
|
||||
} else {
|
||||
argparse_error(self, opt, "requires a value", flags);
|
||||
}
|
||||
// if (errno == ERANGE)
|
||||
// argparse_error(self, opt, "numerical result out of range", flags);
|
||||
if (s[0] != '\0') // no digits or contains invalid characters
|
||||
argparse_error(self, opt, "expects an integer value", flags);
|
||||
break;
|
||||
case ARGPARSE_OPT_FLOAT:
|
||||
// errno = 0;
|
||||
if (self->optvalue) {
|
||||
*(float *)opt->value = strtod(self->optvalue, (char **)&s);
|
||||
self->optvalue = NULL;
|
||||
} else if (self->argc > 1) {
|
||||
self->argc--;
|
||||
*(float *)opt->value = strtod(*++self->argv, (char **)&s);
|
||||
} else {
|
||||
argparse_error(self, opt, "requires a value", flags);
|
||||
}
|
||||
// if (errno == ERANGE)
|
||||
// argparse_error(self, opt, "numerical result out of range", flags);
|
||||
if (s[0] != '\0') // no digits or contains invalid characters
|
||||
argparse_error(self, opt, "expects a numerical value", flags);
|
||||
break;
|
||||
default:
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
skipped:
|
||||
if (opt->callback) {
|
||||
return opt->callback(self, opt);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void
|
||||
argparse_options_check(const struct argparse_option *options)
|
||||
{
|
||||
for (; options->type != ARGPARSE_OPT_END; options++) {
|
||||
switch (options->type) {
|
||||
case ARGPARSE_OPT_END:
|
||||
case ARGPARSE_OPT_BOOLEAN:
|
||||
case ARGPARSE_OPT_BIT:
|
||||
case ARGPARSE_OPT_INTEGER:
|
||||
case ARGPARSE_OPT_FLOAT:
|
||||
case ARGPARSE_OPT_STRING:
|
||||
case ARGPARSE_OPT_GROUP:
|
||||
continue;
|
||||
default:
|
||||
printf("wrong option type: %d", options->type);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static int
|
||||
argparse_short_opt(struct argparse *self, const struct argparse_option *options)
|
||||
{
|
||||
for (; options->type != ARGPARSE_OPT_END; options++) {
|
||||
if (options->short_name == *self->optvalue) {
|
||||
self->optvalue = self->optvalue[1] ? self->optvalue + 1 : NULL;
|
||||
return argparse_getvalue(self, options, 0);
|
||||
}
|
||||
}
|
||||
return -2;
|
||||
}
|
||||
|
||||
static int
|
||||
argparse_long_opt(struct argparse *self, const struct argparse_option *options)
|
||||
{
|
||||
for (; options->type != ARGPARSE_OPT_END; options++) {
|
||||
const char *rest;
|
||||
int opt_flags = 0;
|
||||
if (!options->long_name)
|
||||
continue;
|
||||
|
||||
rest = prefix_skip(self->argv[0] + 2, options->long_name);
|
||||
if (!rest) {
|
||||
// negation disabled?
|
||||
if (options->flags & OPT_NONEG) {
|
||||
continue;
|
||||
}
|
||||
// only OPT_BOOLEAN/OPT_BIT supports negation
|
||||
if (options->type != ARGPARSE_OPT_BOOLEAN && options->type !=
|
||||
ARGPARSE_OPT_BIT) {
|
||||
continue;
|
||||
}
|
||||
|
||||
if (prefix_cmp(self->argv[0] + 2, "no-")) {
|
||||
continue;
|
||||
}
|
||||
rest = prefix_skip(self->argv[0] + 2 + 3, options->long_name);
|
||||
if (!rest)
|
||||
continue;
|
||||
opt_flags |= OPT_UNSET;
|
||||
}
|
||||
if (*rest) {
|
||||
if (*rest != '=')
|
||||
continue;
|
||||
self->optvalue = rest + 1;
|
||||
}
|
||||
return argparse_getvalue(self, options, opt_flags | OPT_LONG);
|
||||
}
|
||||
return -2;
|
||||
}
|
||||
|
||||
int
|
||||
argparse_init(struct argparse *self, struct argparse_option *options,
|
||||
const char *const *usages, int flags)
|
||||
{
|
||||
memset(self, 0, sizeof(*self));
|
||||
self->options = options;
|
||||
self->usages = usages;
|
||||
self->flags = flags;
|
||||
self->description = NULL;
|
||||
self->epilog = NULL;
|
||||
return 0;
|
||||
}
|
||||
|
||||
void
|
||||
argparse_describe(struct argparse *self, const char *description,
|
||||
const char *epilog)
|
||||
{
|
||||
self->description = description;
|
||||
self->epilog = epilog;
|
||||
}
|
||||
|
||||
int
|
||||
argparse_parse(struct argparse *self, int argc, const char **argv)
|
||||
{
|
||||
self->argc = argc - 1;
|
||||
self->argv = argv + 1;
|
||||
self->out = argv;
|
||||
|
||||
argparse_options_check(self->options);
|
||||
|
||||
for (; self->argc; self->argc--, self->argv++) {
|
||||
const char *arg = self->argv[0];
|
||||
if (arg[0] != '-' || !arg[1]) {
|
||||
if (self->flags & ARGPARSE_STOP_AT_NON_OPTION) {
|
||||
goto end;
|
||||
}
|
||||
// if it's not option or is a single char '-', copy verbatim
|
||||
self->out[self->cpidx++] = self->argv[0];
|
||||
continue;
|
||||
}
|
||||
// short option
|
||||
if (arg[1] != '-') {
|
||||
self->optvalue = arg + 1;
|
||||
switch (argparse_short_opt(self, self->options)) {
|
||||
case -1:
|
||||
break;
|
||||
case -2:
|
||||
goto unknown;
|
||||
}
|
||||
while (self->optvalue) {
|
||||
switch (argparse_short_opt(self, self->options)) {
|
||||
case -1:
|
||||
break;
|
||||
case -2:
|
||||
goto unknown;
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
// if '--' presents
|
||||
if (!arg[2]) {
|
||||
self->argc--;
|
||||
self->argv++;
|
||||
break;
|
||||
}
|
||||
// long option
|
||||
switch (argparse_long_opt(self, self->options)) {
|
||||
case -1:
|
||||
break;
|
||||
case -2:
|
||||
goto unknown;
|
||||
}
|
||||
continue;
|
||||
|
||||
unknown:
|
||||
printf("error: unknown option `%s`\n", self->argv[0]);
|
||||
argparse_usage(self);
|
||||
if (!(self->flags & ARGPARSE_IGNORE_UNKNOWN_ARGS)) {
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
}
|
||||
|
||||
end:
|
||||
memmove(self->out + self->cpidx, self->argv,
|
||||
self->argc * sizeof(*self->out));
|
||||
self->out[self->cpidx + self->argc] = NULL;
|
||||
|
||||
return self->cpidx + self->argc;
|
||||
}
|
||||
|
||||
void
|
||||
argparse_usage(struct argparse *self)
|
||||
{
|
||||
if (self->usages) {
|
||||
printf("Usage: %s\n", *self->usages++);
|
||||
while (*self->usages && **self->usages)
|
||||
printf(" or: %s\n", *self->usages++);
|
||||
} else {
|
||||
printf("Usage:\n");
|
||||
}
|
||||
|
||||
// print description
|
||||
if (self->description)
|
||||
printf("%s\n", self->description);
|
||||
|
||||
putchar('\n');
|
||||
|
||||
const struct argparse_option *options;
|
||||
|
||||
// figure out best width
|
||||
size_t usage_opts_width = 0;
|
||||
size_t len;
|
||||
options = self->options;
|
||||
for (; options->type != ARGPARSE_OPT_END; options++) {
|
||||
len = 0;
|
||||
if ((options)->short_name) {
|
||||
len += 2;
|
||||
}
|
||||
if ((options)->short_name && (options)->long_name) {
|
||||
len += 2; // separator ", "
|
||||
}
|
||||
if ((options)->long_name) {
|
||||
len += strlen((options)->long_name) + 2;
|
||||
}
|
||||
if (options->type == ARGPARSE_OPT_INTEGER) {
|
||||
len += strlen("=<int>");
|
||||
}
|
||||
if (options->type == ARGPARSE_OPT_FLOAT) {
|
||||
len += strlen("=<flt>");
|
||||
} else if (options->type == ARGPARSE_OPT_STRING) {
|
||||
len += strlen("=<str>");
|
||||
}
|
||||
len = (len + 3) - ((len + 3) & 3);
|
||||
if (usage_opts_width < len) {
|
||||
usage_opts_width = len;
|
||||
}
|
||||
}
|
||||
usage_opts_width += 4; // 4 spaces prefix
|
||||
|
||||
options = self->options;
|
||||
for (; options->type != ARGPARSE_OPT_END; options++) {
|
||||
size_t pos = 0;
|
||||
size_t pad = 0;
|
||||
if (options->type == ARGPARSE_OPT_GROUP) {
|
||||
putchar('\n');
|
||||
printf("%s", options->help);
|
||||
putchar('\n');
|
||||
continue;
|
||||
}
|
||||
pos = printf(" ");
|
||||
if (options->short_name) {
|
||||
pos += printf("-%c", options->short_name);
|
||||
}
|
||||
if (options->long_name && options->short_name) {
|
||||
pos += printf(", ");
|
||||
}
|
||||
if (options->long_name) {
|
||||
pos += printf("--%s", options->long_name);
|
||||
}
|
||||
if (options->type == ARGPARSE_OPT_INTEGER) {
|
||||
pos += printf("=<int>");
|
||||
} else if (options->type == ARGPARSE_OPT_FLOAT) {
|
||||
pos += printf("=<flt>");
|
||||
} else if (options->type == ARGPARSE_OPT_STRING) {
|
||||
pos += printf("=<str>");
|
||||
}
|
||||
if (pos <= usage_opts_width) {
|
||||
pad = usage_opts_width - pos;
|
||||
} else {
|
||||
putchar('\n');
|
||||
pad = usage_opts_width;
|
||||
}
|
||||
printf(" %s\n", options->help);
|
||||
}
|
||||
|
||||
// print epilog
|
||||
if (self->epilog)
|
||||
printf("%s\n", self->epilog);
|
||||
}
|
||||
|
||||
int
|
||||
argparse_help_cb_no_exit(struct argparse *self,
|
||||
const struct argparse_option *option)
|
||||
{
|
||||
(void)option;
|
||||
argparse_usage(self);
|
||||
return (EXIT_SUCCESS);
|
||||
}
|
||||
|
||||
int
|
||||
argparse_help_cb(struct argparse *self, const struct argparse_option *option)
|
||||
{
|
||||
argparse_help_cb_no_exit(self, option);
|
||||
exit(EXIT_SUCCESS);
|
||||
}
|
||||
|
||||
#endif /* ARGPARSE_C_H */
|
||||
133
6-sigsnoop/eunomia-include/argparse/argparse.h
Normal file
133
6-sigsnoop/eunomia-include/argparse/argparse.h
Normal file
@@ -0,0 +1,133 @@
|
||||
/**
|
||||
* Copyright (C) 2012-2015 Yecheng Fu <cofyc.jackson at gmail dot com>
|
||||
* All rights reserved.
|
||||
*
|
||||
* Use of this source code is governed by a MIT-style license that can be found
|
||||
* in the LICENSE file.
|
||||
*/
|
||||
#ifndef ARGPARSE_H
|
||||
#define ARGPARSE_H
|
||||
|
||||
/* For c++ compatibility */
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
struct argparse;
|
||||
struct argparse_option;
|
||||
|
||||
typedef int argparse_callback (struct argparse *self,
|
||||
const struct argparse_option *option);
|
||||
|
||||
enum argparse_flag {
|
||||
ARGPARSE_STOP_AT_NON_OPTION = 1 << 0,
|
||||
ARGPARSE_IGNORE_UNKNOWN_ARGS = 1 << 1,
|
||||
};
|
||||
|
||||
enum argparse_option_type {
|
||||
/* special */
|
||||
ARGPARSE_OPT_END,
|
||||
ARGPARSE_OPT_GROUP,
|
||||
/* options with no arguments */
|
||||
ARGPARSE_OPT_BOOLEAN,
|
||||
ARGPARSE_OPT_BIT,
|
||||
/* options with arguments (optional or required) */
|
||||
ARGPARSE_OPT_INTEGER,
|
||||
ARGPARSE_OPT_FLOAT,
|
||||
ARGPARSE_OPT_STRING,
|
||||
};
|
||||
|
||||
enum argparse_option_flags {
|
||||
OPT_NONEG = 1, /* disable negation */
|
||||
};
|
||||
|
||||
/**
|
||||
* argparse option
|
||||
*
|
||||
* `type`:
|
||||
* holds the type of the option, you must have an ARGPARSE_OPT_END last in your
|
||||
* array.
|
||||
*
|
||||
* `short_name`:
|
||||
* the character to use as a short option name, '\0' if none.
|
||||
*
|
||||
* `long_name`:
|
||||
* the long option name, without the leading dash, NULL if none.
|
||||
*
|
||||
* `value`:
|
||||
* stores pointer to the value to be filled.
|
||||
*
|
||||
* `help`:
|
||||
* the short help message associated to what the option does.
|
||||
* Must never be NULL (except for ARGPARSE_OPT_END).
|
||||
*
|
||||
* `callback`:
|
||||
* function is called when corresponding argument is parsed.
|
||||
*
|
||||
* `data`:
|
||||
* associated data. Callbacks can use it like they want.
|
||||
*
|
||||
* `flags`:
|
||||
* option flags.
|
||||
*/
|
||||
struct argparse_option {
|
||||
enum argparse_option_type type;
|
||||
const char short_name;
|
||||
const char *long_name;
|
||||
void *value;
|
||||
const char *help;
|
||||
argparse_callback *callback;
|
||||
intptr_t data;
|
||||
int flags;
|
||||
};
|
||||
|
||||
/**
|
||||
* argpparse
|
||||
*/
|
||||
struct argparse {
|
||||
// user supplied
|
||||
const struct argparse_option *options;
|
||||
const char *const *usages;
|
||||
int flags;
|
||||
const char *description; // a description after usage
|
||||
const char *epilog; // a description at the end
|
||||
// internal context
|
||||
int argc;
|
||||
const char **argv;
|
||||
const char **out;
|
||||
int cpidx;
|
||||
const char *optvalue; // current option value
|
||||
};
|
||||
|
||||
// built-in callbacks
|
||||
int argparse_help_cb(struct argparse *self,
|
||||
const struct argparse_option *option);
|
||||
int argparse_help_cb_no_exit(struct argparse *self,
|
||||
const struct argparse_option *option);
|
||||
|
||||
// built-in option macros
|
||||
#define OPT_END() { ARGPARSE_OPT_END, 0, NULL, NULL, 0, NULL, 0, 0 }
|
||||
#define OPT_BOOLEAN(...) { ARGPARSE_OPT_BOOLEAN, __VA_ARGS__ }
|
||||
#define OPT_BIT(...) { ARGPARSE_OPT_BIT, __VA_ARGS__ }
|
||||
#define OPT_INTEGER(...) { ARGPARSE_OPT_INTEGER, __VA_ARGS__ }
|
||||
#define OPT_FLOAT(...) { ARGPARSE_OPT_FLOAT, __VA_ARGS__ }
|
||||
#define OPT_STRING(...) { ARGPARSE_OPT_STRING, __VA_ARGS__ }
|
||||
#define OPT_GROUP(h) { ARGPARSE_OPT_GROUP, 0, NULL, NULL, h, NULL, 0, 0 }
|
||||
#define OPT_HELP() OPT_BOOLEAN('h', "help", NULL, \
|
||||
"show this help message and exit", \
|
||||
argparse_help_cb, 0, OPT_NONEG)
|
||||
|
||||
int argparse_init(struct argparse *self, struct argparse_option *options,
|
||||
const char *const *usages, int flags);
|
||||
void argparse_describe(struct argparse *self, const char *description,
|
||||
const char *epilog);
|
||||
int argparse_parse(struct argparse *self, int argc, const char **argv);
|
||||
void argparse_usage(struct argparse *self);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif
|
||||
2917
6-sigsnoop/eunomia-include/cJSON/cJSON.c
Normal file
2917
6-sigsnoop/eunomia-include/cJSON/cJSON.c
Normal file
File diff suppressed because it is too large
Load Diff
358
6-sigsnoop/eunomia-include/cJSON/cJSON.h
Normal file
358
6-sigsnoop/eunomia-include/cJSON/cJSON.h
Normal file
@@ -0,0 +1,358 @@
|
||||
/*
|
||||
Copyright (c) 2009-2017 Dave Gamble and cJSON contributors
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
The above copyright notice and this permission notice shall be included in
|
||||
all copies or substantial portions of the Software.
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
THE SOFTWARE.
|
||||
|
||||
A header only cJSON library for C and C++.
|
||||
*/
|
||||
|
||||
#ifndef cJSON__h
|
||||
#define cJSON__h
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
#if !defined(__WINDOWS__) \
|
||||
&& (defined(WIN32) || defined(WIN64) || defined(_MSC_VER) \
|
||||
|| defined(_WIN32))
|
||||
#define __WINDOWS__
|
||||
#endif
|
||||
|
||||
#ifdef __WINDOWS__
|
||||
|
||||
/**
|
||||
* When compiling for windows, we specify a specific calling convention to avoid
|
||||
* issues where we are being called from a project with a different default
|
||||
* calling convention. For windows you have 3 define options:
|
||||
* CJSON_HIDE_SYMBOLS - Define this in the case where you don't want to ever
|
||||
* dllexport symbols
|
||||
* CJSON_EXPORT_SYMBOLS - Define this on library build when you want to
|
||||
* dllexport symbols (default)
|
||||
* CJSON_IMPORT_SYMBOLS - Define this if you want to dllimport symbol
|
||||
*
|
||||
* For *nix builds that support visibility attribute, you can define similar
|
||||
* behavior by setting default visibility to hidden by adding
|
||||
* -fvisibility=hidden (for gcc)
|
||||
* or
|
||||
* -xldscope=hidden (for sun cc)
|
||||
* to CFLAGS, then using the CJSON_API_VISIBILITY flag to "export" the same
|
||||
* symbols the way CJSON_EXPORT_SYMBOLS does
|
||||
*/
|
||||
|
||||
#define CJSON_CDECL __cdecl
|
||||
#define CJSON_STDCALL __stdcall
|
||||
|
||||
/* export symbols by default, this is necessary for copy pasting the C and
|
||||
header file */
|
||||
#if !defined(CJSON_HIDE_SYMBOLS) && !defined(CJSON_IMPORT_SYMBOLS) \
|
||||
&& !defined(CJSON_EXPORT_SYMBOLS)
|
||||
#define CJSON_EXPORT_SYMBOLS
|
||||
#endif
|
||||
|
||||
#if defined(CJSON_HIDE_SYMBOLS)
|
||||
#define CJSON_PUBLIC(type) type CJSON_STDCALL
|
||||
#elif defined(CJSON_EXPORT_SYMBOLS)
|
||||
#define CJSON_PUBLIC(type) __declspec(dllexport) type CJSON_STDCALL
|
||||
#elif defined(CJSON_IMPORT_SYMBOLS)
|
||||
#define CJSON_PUBLIC(type) __declspec(dllimport) type CJSON_STDCALL
|
||||
#endif
|
||||
#else /* !__WINDOWS__ */
|
||||
#define CJSON_CDECL
|
||||
#define CJSON_STDCALL
|
||||
|
||||
#if (defined(__GNUC__) || defined(__SUNPRO_CC) || defined(__SUNPRO_C)) \
|
||||
&& defined(CJSON_API_VISIBILITY)
|
||||
#define CJSON_PUBLIC(type) __attribute__((visibility("default"))) type
|
||||
#else
|
||||
#define CJSON_PUBLIC(type) type
|
||||
#endif
|
||||
#endif
|
||||
|
||||
/* project version */
|
||||
#define CJSON_VERSION_MAJOR 1
|
||||
#define CJSON_VERSION_MINOR 7
|
||||
#define CJSON_VERSION_PATCH 10
|
||||
|
||||
#include <stddef.h>
|
||||
|
||||
/* cJSON Types: */
|
||||
#define cJSON_Invalid (0)
|
||||
#define cJSON_False (1 << 0)
|
||||
#define cJSON_True (1 << 1)
|
||||
#define cJSON_NULL (1 << 2)
|
||||
#define cJSON_Number (1 << 3)
|
||||
#define cJSON_String (1 << 4)
|
||||
#define cJSON_Array (1 << 5)
|
||||
#define cJSON_Object (1 << 6)
|
||||
#define cJSON_Raw (1 << 7) /* raw json */
|
||||
|
||||
#define cJSON_IsReference 256
|
||||
#define cJSON_StringIsConst 512
|
||||
|
||||
/* The cJSON structure: */
|
||||
typedef struct cJSON {
|
||||
/* next/prev allow you to walk array/object chains. Alternatively, use
|
||||
GetArraySize/GetArrayItem/GetObjectItem */
|
||||
struct cJSON *next;
|
||||
struct cJSON *prev;
|
||||
/* An array or object item will have a child pointer pointing to a chain of
|
||||
the items in the array/object. */
|
||||
struct cJSON *child;
|
||||
|
||||
/* The type of the item, as above. */
|
||||
int type;
|
||||
|
||||
/* The item's string, if type==cJSON_String and type == cJSON_Raw */
|
||||
char *valuestring;
|
||||
/* writing to valueint is DEPRECATED, use cJSON_SetNumberValue instead */
|
||||
int valueint;
|
||||
/* The item's number, if type==cJSON_Number */
|
||||
double valuedouble;
|
||||
|
||||
/* The item's name string, if this item is the child of, or is in the list
|
||||
of subitems of an object. */
|
||||
char *string;
|
||||
} cJSON;
|
||||
|
||||
typedef struct cJSON_Hooks {
|
||||
/* malloc/free are CDECL on Windows regardless of the default calling
|
||||
* convention of the compiler, so ensure the hooks allow passing those
|
||||
* functions directly. */
|
||||
void *(CJSON_CDECL *malloc_fn)(size_t sz);
|
||||
void(CJSON_CDECL *free_fn)(void *ptr);
|
||||
} cJSON_Hooks;
|
||||
|
||||
typedef int cJSON_bool;
|
||||
|
||||
/* Limits how deeply nested arrays/objects can be before cJSON rejects to parse
|
||||
them. This is to prevent stack overflows. */
|
||||
#ifndef CJSON_NESTING_LIMIT
|
||||
#define CJSON_NESTING_LIMIT 1000
|
||||
#endif
|
||||
|
||||
/* returns the version of cJSON as a string */
|
||||
CJSON_PUBLIC(const char *) cJSON_Version(void);
|
||||
|
||||
/* Supply malloc, realloc and free functions to cJSON */
|
||||
CJSON_PUBLIC(void) cJSON_InitHooks(cJSON_Hooks *hooks);
|
||||
|
||||
/* Memory Management: the caller is always responsible to free the results from
|
||||
* all variants of cJSON_Parse (with cJSON_Delete) and cJSON_Print (with stdlib
|
||||
* free, cJSON_Hooks.free_fn, or cJSON_free as appropriate). The exception is
|
||||
* cJSON_PrintPreallocated, where the caller has full responsibility of the
|
||||
* buffer. */
|
||||
/* Supply a block of JSON, and this returns a cJSON object you can interrogate.
|
||||
*/
|
||||
CJSON_PUBLIC(cJSON *) cJSON_Parse(const char *value);
|
||||
/* ParseWithOpts allows you to require (and check) that the JSON is null
|
||||
* terminated, and to retrieve the pointer to the final byte parsed. */
|
||||
/* If you supply a ptr in return_parse_end and parsing fails, then
|
||||
* return_parse_end will contain a pointer to the error so will match
|
||||
* cJSON_GetErrorPtr(). */
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_ParseWithOpts(const char *value, const char **return_parse_end,
|
||||
cJSON_bool require_null_terminated);
|
||||
|
||||
/* Render a cJSON entity to text for transfer/storage. */
|
||||
CJSON_PUBLIC(char *) cJSON_Print(const cJSON *item);
|
||||
/* Render a cJSON entity to text for transfer/storage without any formatting. */
|
||||
CJSON_PUBLIC(char *) cJSON_PrintUnformatted(const cJSON *item);
|
||||
/* Render a cJSON entity to text using a buffered strategy. prebuffer is a guess
|
||||
* at the final size. guessing well reduces reallocation. fmt=0 gives
|
||||
* unformatted, =1 gives formatted */
|
||||
CJSON_PUBLIC(char *)
|
||||
cJSON_PrintBuffered(const cJSON *item, int prebuffer, cJSON_bool fmt);
|
||||
/* Render a cJSON entity to text using a buffer already allocated in memory with
|
||||
* given length. Returns 1 on success and 0 on failure. */
|
||||
/* NOTE: cJSON is not always 100% accurate in estimating how much memory it will
|
||||
* use, so to be safe allocate 5 bytes more than you actually need */
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_PrintPreallocated(cJSON *item, char *buffer, const int length,
|
||||
const cJSON_bool format);
|
||||
/* Delete a cJSON entity and all subentities. */
|
||||
CJSON_PUBLIC(void) cJSON_Delete(cJSON *c);
|
||||
|
||||
/* Returns the number of items in an array (or object). */
|
||||
CJSON_PUBLIC(int) cJSON_GetArraySize(const cJSON *array);
|
||||
/* Retrieve item number "index" from array "array". Returns NULL if
|
||||
* unsuccessful. */
|
||||
CJSON_PUBLIC(cJSON *) cJSON_GetArrayItem(const cJSON *array, int index);
|
||||
/* Get item "string" from object. Case insensitive. */
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_GetObjectItem(const cJSON *const object, const char *const string);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_GetObjectItemCaseSensitive(const cJSON *const object,
|
||||
const char *const string);
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_HasObjectItem(const cJSON *object, const char *string);
|
||||
/* For analysing failed parses. This returns a pointer to the parse error.
|
||||
* You'll probably need to look a few chars back to make sense of it. Defined
|
||||
* when cJSON_Parse() returns 0. 0 when cJSON_Parse() succeeds. */
|
||||
CJSON_PUBLIC(const char *) cJSON_GetErrorPtr(void);
|
||||
|
||||
/* Check if the item is a string and return its valuestring */
|
||||
CJSON_PUBLIC(char *) cJSON_GetStringValue(cJSON *item);
|
||||
|
||||
/* These functions check the type of an item */
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsInvalid(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsFalse(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsTrue(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsBool(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsNull(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsNumber(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsString(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsArray(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsObject(const cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_IsRaw(const cJSON *const item);
|
||||
|
||||
/* These calls create a cJSON item of the appropriate type. */
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateNull(void);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateTrue(void);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateFalse(void);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateBool(cJSON_bool boolean);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateNumber(double num);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateString(const char *string);
|
||||
/* raw json */
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateRaw(const char *raw);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateArray(void);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateObject(void);
|
||||
|
||||
/* Create a string where valuestring references a string so
|
||||
it will not be freed by cJSON_Delete */
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateStringReference(const char *string);
|
||||
/* Create an object/arrray that only references it's elements so
|
||||
they will not be freed by cJSON_Delete */
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateObjectReference(const cJSON *child);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateArrayReference(const cJSON *child);
|
||||
|
||||
/* These utilities create an Array of count items. */
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateIntArray(const int *numbers, int count);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateFloatArray(const float *numbers, int count);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateDoubleArray(const double *numbers, int count);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_CreateStringArray(const char **strings, int count);
|
||||
|
||||
/* Append item to the specified array/object. */
|
||||
CJSON_PUBLIC(cJSON_bool) cJSON_AddItemToArray(cJSON *array, cJSON *item);
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_AddItemToObject(cJSON *object, const char *string, cJSON *item);
|
||||
/* Use this when string is definitely const (i.e. a literal, or as good as), and
|
||||
* will definitely survive the cJSON object. WARNING: When this function was
|
||||
* used, make sure to always check that (item->type & cJSON_StringIsConst) is
|
||||
* zero before writing to `item->string` */
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_AddItemToObjectCS(cJSON *object, const char *string, cJSON *item);
|
||||
/* Append reference to item to the specified array/object. Use this when you
|
||||
* want to add an existing cJSON to a new cJSON, but don't want to corrupt your
|
||||
* existing cJSON. */
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_AddItemReferenceToArray(cJSON *array, cJSON *item);
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_AddItemReferenceToObject(cJSON *object, const char *string, cJSON *item);
|
||||
|
||||
/* Remove/Detatch items from Arrays/Objects. */
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_DetachItemViaPointer(cJSON *parent, cJSON *const item);
|
||||
CJSON_PUBLIC(cJSON *) cJSON_DetachItemFromArray(cJSON *array, int which);
|
||||
CJSON_PUBLIC(void) cJSON_DeleteItemFromArray(cJSON *array, int which);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_DetachItemFromObject(cJSON *object, const char *string);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_DetachItemFromObjectCaseSensitive(cJSON *object, const char *string);
|
||||
CJSON_PUBLIC(void)
|
||||
cJSON_DeleteItemFromObject(cJSON *object, const char *string);
|
||||
CJSON_PUBLIC(void)
|
||||
cJSON_DeleteItemFromObjectCaseSensitive(cJSON *object, const char *string);
|
||||
|
||||
/* Update array items. */
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_InsertItemInArray(
|
||||
cJSON *array, int which,
|
||||
cJSON *newitem); /* Shifts pre-existing items to the right. */
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_ReplaceItemViaPointer(cJSON *const parent, cJSON *const item,
|
||||
cJSON *replacement);
|
||||
CJSON_PUBLIC(void)
|
||||
cJSON_ReplaceItemInArray(cJSON *array, int which, cJSON *newitem);
|
||||
CJSON_PUBLIC(void)
|
||||
cJSON_ReplaceItemInObject(cJSON *object, const char *string, cJSON *newitem);
|
||||
CJSON_PUBLIC(void)
|
||||
cJSON_ReplaceItemInObjectCaseSensitive(cJSON *object, const char *string,
|
||||
cJSON *newitem);
|
||||
|
||||
/* Duplicate a cJSON item */
|
||||
CJSON_PUBLIC(cJSON *) cJSON_Duplicate(const cJSON *item, cJSON_bool recurse);
|
||||
/* Duplicate will create a new, identical cJSON item to the one you pass, in new
|
||||
memory that will need to be released. With recurse!=0, it will duplicate any
|
||||
children connected to the item. The item->next and ->prev pointers are always
|
||||
zero on return from Duplicate. */
|
||||
/* Recursively compare two cJSON items for equality. If either a or b is NULL or
|
||||
* invalid, they will be considered unequal.
|
||||
* case_sensitive determines if object keys are treated case sensitive (1) or
|
||||
* case insensitive (0) */
|
||||
CJSON_PUBLIC(cJSON_bool)
|
||||
cJSON_Compare(const cJSON *const a, const cJSON *const b,
|
||||
const cJSON_bool case_sensitive);
|
||||
|
||||
CJSON_PUBLIC(void) cJSON_Minify(char *json);
|
||||
|
||||
/* Helper functions for creating and adding items to an object at the same time.
|
||||
They return the added item or NULL on failure. */
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddNullToObject(cJSON *const object, const char *const name);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddTrueToObject(cJSON *const object, const char *const name);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddFalseToObject(cJSON *const object, const char *const name);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddBoolToObject(cJSON *const object, const char *const name,
|
||||
const cJSON_bool boolean);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddNumberToObject(cJSON *const object, const char *const name,
|
||||
const double number);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddStringToObject(cJSON *const object, const char *const name,
|
||||
const char *const string);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddRawToObject(cJSON *const object, const char *const name,
|
||||
const char *const raw);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddObjectToObject(cJSON *const object, const char *const name);
|
||||
CJSON_PUBLIC(cJSON *)
|
||||
cJSON_AddArrayToObject(cJSON *const object, const char *const name);
|
||||
|
||||
/* When assigning an integer value, it needs to be propagated to valuedouble
|
||||
too. */
|
||||
#define cJSON_SetIntValue(object, number) \
|
||||
((object) ? (object)->valueint = (object)->valuedouble = (number) \
|
||||
: (number))
|
||||
/* helper for the cJSON_SetNumberValue macro */
|
||||
CJSON_PUBLIC(double) cJSON_SetNumberHelper(cJSON *object, double number);
|
||||
#define cJSON_SetNumberValue(object, number) \
|
||||
((object != NULL) ? cJSON_SetNumberHelper(object, (double)number) \
|
||||
: (number))
|
||||
|
||||
/* Macro for iterating over an array or object */
|
||||
#define cJSON_ArrayForEach(element, array) \
|
||||
for (element = (array != NULL) ? (array)->child : NULL; element != NULL; \
|
||||
element = element->next)
|
||||
|
||||
/* malloc/free objects using the malloc/free functions that have been set with
|
||||
cJSON_InitHooks */
|
||||
CJSON_PUBLIC(void *) cJSON_malloc(size_t size);
|
||||
CJSON_PUBLIC(void) cJSON_free(void *object);
|
||||
|
||||
#endif
|
||||
40
6-sigsnoop/eunomia-include/entry.h
Normal file
40
6-sigsnoop/eunomia-include/entry.h
Normal file
@@ -0,0 +1,40 @@
|
||||
#ifndef ENTRY_H_
|
||||
#define ENTRY_H_
|
||||
|
||||
// header only helpers for develop wasm app
|
||||
#include "cJSON/cJSON.c"
|
||||
#include "helpers.h"
|
||||
|
||||
#define MAX_ARGS 32
|
||||
|
||||
int main(int argc, char **argv);
|
||||
int bpf_main(char *env_json, int str_len)
|
||||
{
|
||||
cJSON *env = cJSON_Parse(env_json);
|
||||
if (!env)
|
||||
{
|
||||
printf("cJSON_Parse failed for env json args.");
|
||||
return 1;
|
||||
}
|
||||
if (!cJSON_IsArray(env)) {
|
||||
printf("env json args is not an array.");
|
||||
return 1;
|
||||
}
|
||||
int argc = cJSON_GetArraySize(env);
|
||||
if (argc > MAX_ARGS) {
|
||||
printf("env json args is too long.");
|
||||
return 1;
|
||||
}
|
||||
char *argv[MAX_ARGS];
|
||||
for (int i = 0; i < argc; i++) {
|
||||
cJSON *item = cJSON_GetArrayItem(env, i);
|
||||
if (!cJSON_IsString(item)) {
|
||||
printf("env json args is not a string.");
|
||||
return 1;
|
||||
}
|
||||
argv[i] = item->valuestring;
|
||||
}
|
||||
return main(argc, argv);
|
||||
}
|
||||
|
||||
#endif
|
||||
40
6-sigsnoop/eunomia-include/errno-base.h
Normal file
40
6-sigsnoop/eunomia-include/errno-base.h
Normal file
@@ -0,0 +1,40 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
||||
#ifndef _ASM_GENERIC_ERRNO_BASE_H
|
||||
#define _ASM_GENERIC_ERRNO_BASE_H
|
||||
|
||||
#define EPERM 1 /* Operation not permitted */
|
||||
#define ENOENT 2 /* No such file or directory */
|
||||
#define ESRCH 3 /* No such process */
|
||||
#define EINTR 4 /* Interrupted system call */
|
||||
#define EIO 5 /* I/O error */
|
||||
#define ENXIO 6 /* No such device or address */
|
||||
#define E2BIG 7 /* Argument list too long */
|
||||
#define ENOEXEC 8 /* Exec format error */
|
||||
#define EBADF 9 /* Bad file number */
|
||||
#define ECHILD 10 /* No child processes */
|
||||
#define EAGAIN 11 /* Try again */
|
||||
#define ENOMEM 12 /* Out of memory */
|
||||
#define EACCES 13 /* Permission denied */
|
||||
#define EFAULT 14 /* Bad address */
|
||||
#define ENOTBLK 15 /* Block device required */
|
||||
#define EBUSY 16 /* Device or resource busy */
|
||||
#define EEXIST 17 /* File exists */
|
||||
#define EXDEV 18 /* Cross-device link */
|
||||
#define ENODEV 19 /* No such device */
|
||||
#define ENOTDIR 20 /* Not a directory */
|
||||
#define EISDIR 21 /* Is a directory */
|
||||
#define EINVAL 22 /* Invalid argument */
|
||||
#define ENFILE 23 /* File table overflow */
|
||||
#define EMFILE 24 /* Too many open files */
|
||||
#define ENOTTY 25 /* Not a typewriter */
|
||||
#define ETXTBSY 26 /* Text file busy */
|
||||
#define EFBIG 27 /* File too large */
|
||||
#define ENOSPC 28 /* No space left on device */
|
||||
#define ESPIPE 29 /* Illegal seek */
|
||||
#define EROFS 30 /* Read-only file system */
|
||||
#define EMLINK 31 /* Too many links */
|
||||
#define EPIPE 32 /* Broken pipe */
|
||||
#define EDOM 33 /* Math argument out of domain of func */
|
||||
#define ERANGE 34 /* Math result not representable */
|
||||
|
||||
#endif
|
||||
54
6-sigsnoop/eunomia-include/helpers.h
Normal file
54
6-sigsnoop/eunomia-include/helpers.h
Normal file
@@ -0,0 +1,54 @@
|
||||
#ifndef EWASM_APP_HELPERS_H_
|
||||
#define EWASM_APP_HELPERS_H_
|
||||
|
||||
#include "native-ewasm.h"
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <stdint.h>
|
||||
#include "cJSON/cJSON.h"
|
||||
|
||||
/// @brief start the eBPF program with JSON and wait for it to exit
|
||||
/// @param program_data the json data of eBPF program
|
||||
/// @return 0 on success, -1 on failure, the eBPF program will be terminated in failure case
|
||||
int
|
||||
start_bpf_program(char *program_data)
|
||||
{
|
||||
int res = create_bpf(program_data, strlen(program_data));
|
||||
if (res < 0) {
|
||||
printf("create_bpf failed %d", res);
|
||||
return -1;
|
||||
}
|
||||
res = run_bpf(res);
|
||||
if (res < 0) {
|
||||
printf("run_bpf failed %d\n", res);
|
||||
return -1;
|
||||
}
|
||||
res = wait_and_poll_bpf(res);
|
||||
if (res < 0) {
|
||||
printf("wait_and_poll_bpf failed %d\n", res);
|
||||
return -1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/// @brief set the global variable of bpf program to the value
|
||||
/// @param program the json program data
|
||||
/// @param key global
|
||||
/// @param value arg value
|
||||
/// @return new eBPF program
|
||||
cJSON *
|
||||
set_bpf_program_global_var(cJSON *program, char *key, cJSON *value)
|
||||
{
|
||||
|
||||
cJSON *args = cJSON_GetObjectItem(program, "runtime_args");
|
||||
if (args == NULL)
|
||||
{
|
||||
args = cJSON_CreateObject();
|
||||
cJSON_AddItemToObject(program, "runtime_args", args);
|
||||
}
|
||||
cJSON_AddItemToObject(args, key, value);
|
||||
return program;
|
||||
}
|
||||
|
||||
#endif // EWASM_APP_INIT_H
|
||||
50
6-sigsnoop/eunomia-include/native-ewasm.h
Normal file
50
6-sigsnoop/eunomia-include/native-ewasm.h
Normal file
@@ -0,0 +1,50 @@
|
||||
#ifndef EWASM_NATIVE_API_H_
|
||||
#define EWASM_NATIVE_API_H_
|
||||
|
||||
/// c function interface to called from wasm
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
/// @brief create a ebpf program with json data
|
||||
/// @param ebpf_json
|
||||
/// @return id on success, -1 on failure
|
||||
int
|
||||
create_bpf(char *ebpf_json, int str_len);
|
||||
|
||||
/// @brief start running the ebpf program
|
||||
/// @details load and attach the ebpf program to the kernel to run the ebpf
|
||||
/// program if the ebpf program has maps to export to user space, you need to
|
||||
/// call the wait and export.
|
||||
int
|
||||
run_bpf(int id);
|
||||
|
||||
/// @brief wait for the program to exit and receive data from export maps and
|
||||
/// print the data
|
||||
/// @details if the program has a ring buffer or perf event to export data
|
||||
/// to user space, the program will help load the map info and poll the
|
||||
/// events automatically.
|
||||
int
|
||||
wait_and_poll_bpf(int id);
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
/// @brief init the eBPF program
|
||||
/// @param env_json the env config from input
|
||||
/// @return 0 on success, -1 on failure, the eBPF program will be terminated in
|
||||
/// failure case
|
||||
int
|
||||
bpf_main(char *env_json, int str_len);
|
||||
|
||||
/// @brief handle the event output from the eBPF program, valid only when
|
||||
/// wait_and_poll_events is called
|
||||
/// @param ctx user defined context
|
||||
/// @param e json event message
|
||||
/// @return 0 on success, -1 on failure,
|
||||
/// the event will be send to next handler in chain on success, or dropped in
|
||||
/// failure
|
||||
int
|
||||
process_event(int ctx, char *e, int str_len);
|
||||
|
||||
#endif // NATIVE_EWASM_H_
|
||||
195
6-sigsnoop/eunomia-include/sigsnoop.skel.h
Normal file
195
6-sigsnoop/eunomia-include/sigsnoop.skel.h
Normal file
@@ -0,0 +1,195 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
|
||||
/* THIS FILE IS AUTOGENERATED BY BPFTOOL! */
|
||||
#ifndef __SIGSNOOP_BPF_SKEL_H__
|
||||
#define __SIGSNOOP_BPF_SKEL_H__
|
||||
|
||||
extern int errno;
|
||||
#include <stdlib.h>
|
||||
|
||||
struct bpf_object_skeleton;
|
||||
struct bpf_object;
|
||||
struct bpf_map;
|
||||
struct bpf_program;
|
||||
struct bpf_object_open_opts;
|
||||
struct bpf_link;
|
||||
|
||||
struct sigsnoop_bpf {
|
||||
struct bpf_object_skeleton *skeleton;
|
||||
struct bpf_object *obj;
|
||||
struct {
|
||||
struct bpf_map *events;
|
||||
struct bpf_map *values;
|
||||
struct bpf_map *rodata;
|
||||
} maps;
|
||||
struct {
|
||||
struct bpf_program *kill_entry;
|
||||
struct bpf_program *kill_exit;
|
||||
struct bpf_program *tkill_entry;
|
||||
struct bpf_program *tkill_exit;
|
||||
struct bpf_program *tgkill_entry;
|
||||
struct bpf_program *tgkill_exit;
|
||||
struct bpf_program *sig_trace;
|
||||
} progs;
|
||||
struct {
|
||||
struct bpf_link *kill_entry;
|
||||
struct bpf_link *kill_exit;
|
||||
struct bpf_link *tkill_entry;
|
||||
struct bpf_link *tkill_exit;
|
||||
struct bpf_link *tgkill_entry;
|
||||
struct bpf_link *tgkill_exit;
|
||||
struct bpf_link *sig_trace;
|
||||
} links;
|
||||
struct sigsnoop_bpf__rodata {
|
||||
int filtered_pid;
|
||||
int target_signal;
|
||||
bool failed_only;
|
||||
} *rodata;
|
||||
|
||||
#ifdef __cplusplus
|
||||
static inline struct sigsnoop_bpf *open(const struct bpf_object_open_opts *opts = nullptr);
|
||||
static inline struct sigsnoop_bpf *open_and_load();
|
||||
static inline int load(struct sigsnoop_bpf *skel);
|
||||
static inline int attach(struct sigsnoop_bpf *skel);
|
||||
static inline void detach(struct sigsnoop_bpf *skel);
|
||||
static inline void destroy(struct sigsnoop_bpf *skel);
|
||||
static inline const void *elf_bytes(size_t *sz);
|
||||
#endif /* __cplusplus */
|
||||
};
|
||||
|
||||
static void
|
||||
sigsnoop_bpf__destroy(struct sigsnoop_bpf *obj)
|
||||
{
|
||||
|
||||
}
|
||||
|
||||
static inline int
|
||||
sigsnoop_bpf__create_skeleton(struct sigsnoop_bpf *obj);
|
||||
|
||||
static inline struct sigsnoop_bpf *
|
||||
sigsnoop_bpf__open_opts(const struct bpf_object_open_opts *opts)
|
||||
{
|
||||
struct sigsnoop_bpf *obj;
|
||||
int err;
|
||||
|
||||
obj = (struct sigsnoop_bpf *)calloc(1, sizeof(*obj));
|
||||
if (!obj) {
|
||||
errno = ENOMEM;
|
||||
return NULL;
|
||||
}
|
||||
return obj;
|
||||
}
|
||||
|
||||
static inline struct sigsnoop_bpf *
|
||||
sigsnoop_bpf__open(void)
|
||||
{
|
||||
return sigsnoop_bpf__open_opts(NULL);
|
||||
}
|
||||
|
||||
static inline int
|
||||
sigsnoop_bpf__load(struct sigsnoop_bpf *obj)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline struct sigsnoop_bpf *
|
||||
sigsnoop_bpf__open_and_load(void)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline int
|
||||
sigsnoop_bpf__attach(struct sigsnoop_bpf *obj)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
static inline void
|
||||
sigsnoop_bpf__detach(struct sigsnoop_bpf *obj)
|
||||
{
|
||||
}
|
||||
|
||||
static inline const void *sigsnoop_bpf__elf_bytes(size_t *sz);
|
||||
|
||||
static inline int
|
||||
sigsnoop_bpf__create_skeleton(struct sigsnoop_bpf *obj)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef __cplusplus
|
||||
struct sigsnoop_bpf *sigsnoop_bpf::open(const struct bpf_object_open_opts *opts) { return sigsnoop_bpf__open_opts(opts); }
|
||||
struct sigsnoop_bpf *sigsnoop_bpf::open_and_load() { return sigsnoop_bpf__open_and_load(); }
|
||||
int sigsnoop_bpf::load(struct sigsnoop_bpf *skel) { return sigsnoop_bpf__load(skel); }
|
||||
int sigsnoop_bpf::attach(struct sigsnoop_bpf *skel) { return sigsnoop_bpf__attach(skel); }
|
||||
void sigsnoop_bpf::detach(struct sigsnoop_bpf *skel) { sigsnoop_bpf__detach(skel); }
|
||||
void sigsnoop_bpf::destroy(struct sigsnoop_bpf *skel) { sigsnoop_bpf__destroy(skel); }
|
||||
const void *sigsnoop_bpf::elf_bytes(size_t *sz) { return sigsnoop_bpf__elf_bytes(sz); }
|
||||
#endif /* __cplusplus */
|
||||
|
||||
__attribute__((unused)) static void
|
||||
sigsnoop_bpf__assert(struct sigsnoop_bpf *s __attribute__((unused)))
|
||||
{
|
||||
#ifdef __cplusplus
|
||||
#define _Static_assert static_assert
|
||||
#endif
|
||||
_Static_assert(sizeof(s->rodata->filtered_pid) == 4, "unexpected size of 'filtered_pid'");
|
||||
_Static_assert(sizeof(s->rodata->target_signal) == 4, "unexpected size of 'target_signal'");
|
||||
_Static_assert(sizeof(s->rodata->failed_only) == 1, "unexpected size of 'failed_only'");
|
||||
#ifdef __cplusplus
|
||||
#undef _Static_assert
|
||||
#endif
|
||||
}
|
||||
|
||||
struct perf_buffer;
|
||||
void perf_buffer__free(struct perf_buffer *pb) {
|
||||
}
|
||||
int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) {
|
||||
return start_bpf_program(program_data);
|
||||
}
|
||||
int bpf_program__set_autoload(struct bpf_program *prog, bool autoload) {
|
||||
return 0;
|
||||
}
|
||||
char* strerror(int errnum) {
|
||||
return "error";
|
||||
}
|
||||
int bpf_map__fd(const struct bpf_map *map) {
|
||||
return 0;
|
||||
}
|
||||
typedef void (*perf_buffer_sample_fn)(void *ctx, int cpu,
|
||||
void *data, unsigned int size);
|
||||
typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, unsigned long long cnt);
|
||||
struct perf_buffer;
|
||||
|
||||
perf_buffer_sample_fn global_cb;
|
||||
struct perf_buffer_opts;
|
||||
|
||||
struct perf_buffer *
|
||||
perf_buffer__new(int map_fd, size_t page_cnt,
|
||||
perf_buffer_sample_fn sample_cb, perf_buffer_lost_fn lost_cb, void *ctx,
|
||||
const struct perf_buffer_opts *opts) {
|
||||
global_cb = sample_cb;
|
||||
return (void*)1;
|
||||
}
|
||||
|
||||
int process_event(int ctx, char *e, int str_len)
|
||||
{
|
||||
struct event eve = {0};
|
||||
cJSON *json = cJSON_Parse(e);
|
||||
eve.sig = cJSON_GetObjectItem(json, "sig")->valueint;
|
||||
eve.pid = cJSON_GetObjectItem(json, "pid")->valueint;
|
||||
strcpy(eve.comm, cJSON_GetObjectItem(json, "comm")->valuestring);
|
||||
eve.tpid = cJSON_GetObjectItem(json, "tpid")->valueint;
|
||||
eve.ret = cJSON_GetObjectItem(json, "ret")->valueint;
|
||||
global_cb((void*)ctx, 0, &eve, str_len);
|
||||
return 0;
|
||||
}
|
||||
|
||||
extern const char argp_program_doc[];
|
||||
|
||||
void argp_state_help(const struct argp_state *__state, int flag) {
|
||||
printf("%s", argp_program_doc);
|
||||
exit(0);
|
||||
}
|
||||
|
||||
#endif /* __SIGSNOOP_BPF_SKEL_H__ */
|
||||
8
6-sigsnoop/eunomia-include/wasm-app.h
Normal file
8
6-sigsnoop/eunomia-include/wasm-app.h
Normal file
@@ -0,0 +1,8 @@
|
||||
#ifndef EWASM_EWASM_APP_H_
|
||||
#define EWASM_EWASM_APP_H_
|
||||
|
||||
// header only helpers for develop wasm app
|
||||
#include "cJSON/cJSON.c"
|
||||
#include "helpers.h"
|
||||
|
||||
#endif // EWASM_EWASM_APP_H
|
||||
145
6-sigsnoop/sigsnoop.bpf.c
Executable file
145
6-sigsnoop/sigsnoop.bpf.c
Executable file
@@ -0,0 +1,145 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
/* Copyright (c) 2021~2022 Hengqi Chen */
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include "sigsnoop.h"
|
||||
|
||||
#define MAX_ENTRIES 10240
|
||||
|
||||
const volatile pid_t filtered_pid = 0;
|
||||
const volatile int target_signal = 0;
|
||||
const volatile bool failed_only = false;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_ENTRIES);
|
||||
__type(key, __u32);
|
||||
__type(value, struct event);
|
||||
} values SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(__u32));
|
||||
__uint(value_size, sizeof(__u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static int probe_entry(pid_t tpid, int sig)
|
||||
{
|
||||
struct event event = {};
|
||||
__u64 pid_tgid;
|
||||
__u32 pid, tid;
|
||||
|
||||
if (target_signal && sig != target_signal)
|
||||
return 0;
|
||||
|
||||
pid_tgid = bpf_get_current_pid_tgid();
|
||||
pid = pid_tgid >> 32;
|
||||
tid = (__u32)pid_tgid;
|
||||
if (filtered_pid && pid != filtered_pid)
|
||||
return 0;
|
||||
|
||||
event.pid = pid;
|
||||
event.tpid = tpid;
|
||||
event.sig = sig;
|
||||
bpf_get_current_comm(event.comm, sizeof(event.comm));
|
||||
bpf_map_update_elem(&values, &tid, &event, BPF_ANY);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int probe_exit(void *ctx, int ret)
|
||||
{
|
||||
__u64 pid_tgid = bpf_get_current_pid_tgid();
|
||||
__u32 tid = (__u32)pid_tgid;
|
||||
struct event *eventp;
|
||||
|
||||
eventp = bpf_map_lookup_elem(&values, &tid);
|
||||
if (!eventp)
|
||||
return 0;
|
||||
|
||||
if (failed_only && ret >= 0)
|
||||
goto cleanup;
|
||||
|
||||
eventp->ret = ret;
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, eventp, sizeof(*eventp));
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&values, &tid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_kill")
|
||||
int kill_entry(struct trace_event_raw_sys_enter *ctx)
|
||||
{
|
||||
pid_t tpid = (pid_t)ctx->args[0];
|
||||
int sig = (int)ctx->args[1];
|
||||
|
||||
return probe_entry(tpid, sig);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_kill")
|
||||
int kill_exit(struct trace_event_raw_sys_exit *ctx)
|
||||
{
|
||||
return probe_exit(ctx, ctx->ret);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_tkill")
|
||||
int tkill_entry(struct trace_event_raw_sys_enter *ctx)
|
||||
{
|
||||
pid_t tpid = (pid_t)ctx->args[0];
|
||||
int sig = (int)ctx->args[1];
|
||||
|
||||
return probe_entry(tpid, sig);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_tkill")
|
||||
int tkill_exit(struct trace_event_raw_sys_exit *ctx)
|
||||
{
|
||||
return probe_exit(ctx, ctx->ret);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_tgkill")
|
||||
int tgkill_entry(struct trace_event_raw_sys_enter *ctx)
|
||||
{
|
||||
pid_t tpid = (pid_t)ctx->args[1];
|
||||
int sig = (int)ctx->args[2];
|
||||
|
||||
return probe_entry(tpid, sig);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_tgkill")
|
||||
int tgkill_exit(struct trace_event_raw_sys_exit *ctx)
|
||||
{
|
||||
return probe_exit(ctx, ctx->ret);
|
||||
}
|
||||
|
||||
SEC("tracepoint/signal/signal_generate")
|
||||
int sig_trace(struct trace_event_raw_signal_generate *ctx)
|
||||
{
|
||||
struct event event = {};
|
||||
pid_t tpid = ctx->pid;
|
||||
int ret = ctx->errno;
|
||||
int sig = ctx->sig;
|
||||
__u64 pid_tgid;
|
||||
__u32 pid;
|
||||
|
||||
if (failed_only && ret == 0)
|
||||
return 0;
|
||||
|
||||
if (target_signal && sig != target_signal)
|
||||
return 0;
|
||||
|
||||
pid_tgid = bpf_get_current_pid_tgid();
|
||||
pid = pid_tgid >> 32;
|
||||
if (filtered_pid && pid != filtered_pid)
|
||||
return 0;
|
||||
|
||||
event.pid = pid;
|
||||
event.tpid = tpid;
|
||||
event.sig = sig;
|
||||
event.ret = ret;
|
||||
bpf_get_current_comm(event.comm, sizeof(event.comm));
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event));
|
||||
return 0;
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "Dual BSD/GPL";
|
||||
16
6-sigsnoop/sigsnoop.h
Executable file
16
6-sigsnoop/sigsnoop.h
Executable file
@@ -0,0 +1,16 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
/* Copyright (c) 2021~2022 Hengqi Chen */
|
||||
#ifndef __SIGSNOOP_H
|
||||
#define __SIGSNOOP_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
|
||||
struct event {
|
||||
unsigned int pid;
|
||||
unsigned int tpid;
|
||||
int sig;
|
||||
int ret;
|
||||
char comm[TASK_COMM_LEN];
|
||||
};
|
||||
|
||||
#endif /* __SIGSNOOP_H */
|
||||
92
6-sigsnoop/sigsnoop.md
Normal file
92
6-sigsnoop/sigsnoop.md
Normal file
@@ -0,0 +1,92 @@
|
||||
## eBPF 入门实践教程:编写 eBPF 程序 sigsnoop 工具监控全局 signal 事件
|
||||
|
||||
### 背景
|
||||
|
||||
### 实现原理
|
||||
|
||||
`sigsnoop` 在利用了linux的tracepoint挂载点,其在syscall进入和退出的各个关键挂载点均挂载了执行函数。
|
||||
```c
|
||||
SEC("tracepoint/syscalls/sys_enter_kill")
|
||||
int kill_entry(struct trace_event_raw_sys_enter *ctx)
|
||||
{
|
||||
pid_t tpid = (pid_t)ctx->args[0];
|
||||
int sig = (int)ctx->args[1];
|
||||
|
||||
return probe_entry(tpid, sig);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_kill")
|
||||
int kill_exit(struct trace_event_raw_sys_exit *ctx)
|
||||
{
|
||||
return probe_exit(ctx, ctx->ret);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_tkill")
|
||||
int tkill_entry(struct trace_event_raw_sys_enter *ctx)
|
||||
{
|
||||
pid_t tpid = (pid_t)ctx->args[0];
|
||||
int sig = (int)ctx->args[1];
|
||||
|
||||
return probe_entry(tpid, sig);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_tkill")
|
||||
int tkill_exit(struct trace_event_raw_sys_exit *ctx)
|
||||
{
|
||||
return probe_exit(ctx, ctx->ret);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_tgkill")
|
||||
int tgkill_entry(struct trace_event_raw_sys_enter *ctx)
|
||||
{
|
||||
pid_t tpid = (pid_t)ctx->args[1];
|
||||
int sig = (int)ctx->args[2];
|
||||
|
||||
return probe_entry(tpid, sig);
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_tgkill")
|
||||
int tgkill_exit(struct trace_event_raw_sys_exit *ctx)
|
||||
{
|
||||
return probe_exit(ctx, ctx->ret);
|
||||
}
|
||||
|
||||
SEC("tracepoint/signal/signal_generate")
|
||||
int sig_trace(struct trace_event_raw_signal_generate *ctx)
|
||||
{
|
||||
struct event event = {};
|
||||
pid_t tpid = ctx->pid;
|
||||
int ret = ctx->errno;
|
||||
int sig = ctx->sig;
|
||||
__u64 pid_tgid;
|
||||
__u32 pid;
|
||||
|
||||
if (failed_only && ret == 0)
|
||||
return 0;
|
||||
|
||||
if (target_signal && sig != target_signal)
|
||||
return 0;
|
||||
|
||||
pid_tgid = bpf_get_current_pid_tgid();
|
||||
pid = pid_tgid >> 32;
|
||||
if (filtered_pid && pid != filtered_pid)
|
||||
return 0;
|
||||
|
||||
event.pid = pid;
|
||||
event.tpid = tpid;
|
||||
event.sig = sig;
|
||||
event.ret = ret;
|
||||
bpf_get_current_comm(event.comm, sizeof(event.comm));
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event));
|
||||
return 0;
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
|
||||
### Eunomia中使用方式
|
||||
|
||||

|
||||

|
||||
|
||||
### 总结
|
||||
3
7-execsnoop/.gitignore
vendored
Normal file
3
7-execsnoop/.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
ecli
|
||||
package.json
|
||||
|
||||
148
7-execsnoop/README.md
Normal file
148
7-execsnoop/README.md
Normal file
@@ -0,0 +1,148 @@
|
||||
---
|
||||
layout: post
|
||||
title: execsnoop
|
||||
date: 2022-11-17 19:57
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall]
|
||||
summary: execsnoop traces the exec() syscall system-wide, and prints various details.
|
||||
---
|
||||
|
||||
## origin
|
||||
|
||||
origin from:
|
||||
|
||||
https://github.com/iovisor/bcc/blob/master/libbpf-tools/execsnoop.bpf.c
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```
|
||||
$ sudo ./ecli run package.json
|
||||
|
||||
running and waiting for the ebpf events from perf event...
|
||||
time pid ppid uid retval args_count args_size comm args
|
||||
23:07:35 32940 32783 1000 0 1 13 cat /usr/bin/cat
|
||||
23:07:43 32946 24577 1000 0 1 10 bash /bin/bash
|
||||
23:07:43 32948 32946 1000 0 1 18 lesspipe /usr/bin/lesspipe
|
||||
23:07:43 32949 32948 1000 0 2 36 basename /usr/bin/basename
|
||||
23:07:43 32951 32950 1000 0 2 35 dirname /usr/bin/dirname
|
||||
23:07:43 32952 32946 1000 0 2 22 dircolors /usr/bin/dircolors
|
||||
23:07:48 32953 32946 1000 0 2 25 ls /usr/bin/ls
|
||||
23:07:53 32957 32946 1000 0 2 17 sleep /usr/bin/sleep
|
||||
23:07:57 32959 32946 1000 0 1 17 oneko /usr/games/oneko
|
||||
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of execsnoop, the Linux eBPF/bcc version.
|
||||
|
||||
execsnoop traces the exec() syscall system-wide, and prints various details.
|
||||
Example output:
|
||||
|
||||
```
|
||||
# ./execsnoop
|
||||
COMM PID PPID RET ARGS
|
||||
bash 33161 24577 0 /bin/bash
|
||||
lesspipe 33163 33161 0 /usr/bin/lesspipe
|
||||
basename 33164 33163 0 /usr/bin/basename /usr/bin/lesspipe
|
||||
dirname 33166 33165 0 /usr/bin/dirname /usr/bin/lesspipe
|
||||
dircolors 33167 33161 0 /usr/bin/dircolors -b
|
||||
ls 33172 33161 0 /usr/bin/ls --color=auto
|
||||
top 33173 33161 0 /usr/bin/top
|
||||
oneko 33174 33161 0 /usr/games/oneko
|
||||
systemctl 33175 2975 0 /bin/systemctl is-enabled -q whoopsie.path
|
||||
apport-checkrep 33176 2975 0 /usr/share/apport/apport-checkreports
|
||||
apport-checkrep 33177 2975 0 /usr/share/apport/apport-checkreports --system
|
||||
apport-checkrep 33178 2975 0 /usr/share/apport/apport-checkreports --system
|
||||
|
||||
```
|
||||
|
||||
This shows process information when exec system call is called.
|
||||
|
||||
USAGE message:
|
||||
|
||||
```
|
||||
usage: execsnoop [-h] [-T] [-t] [-x] [--cgroupmap CGROUPMAP]
|
||||
[--mntnsmap MNTNSMAP] [-u USER] [-q] [-n NAME]
|
||||
[-l LINE] [-U] [--max-args MAX_ARGS]
|
||||
|
||||
Trace exec() syscalls
|
||||
|
||||
options:
|
||||
-h, --help show this help message and exit
|
||||
-T, --time include time column on output (HH:MM:SS)
|
||||
-t, --timestamp include timestamp on output
|
||||
-x, --fails include failed exec()s
|
||||
--cgroupmap CGROUPMAP
|
||||
trace cgroups in this BPF map only
|
||||
--mntnsmap MNTNSMAP trace mount namespaces in this BPF map only
|
||||
-u USER, --uid USER trace this UID only
|
||||
-q, --quote Add quotemarks (") around arguments.
|
||||
-n NAME, --name NAME only print commands matching this name (regex), any
|
||||
arg
|
||||
-l LINE, --line LINE only print commands where arg contains this line
|
||||
(regex)
|
||||
-U, --print-uid print UID column
|
||||
--max-args MAX_ARGS maximum number of arguments parsed and displayed,
|
||||
defaults to 20
|
||||
|
||||
examples:
|
||||
./execsnoop # trace all exec() syscalls
|
||||
./execsnoop -x # include failed exec()s
|
||||
./execsnoop -T # include time (HH:MM:SS)
|
||||
./execsnoop -U # include UID
|
||||
./execsnoop -u 1000 # only trace UID 1000
|
||||
./execsnoop -u user # get user UID and trace only them
|
||||
./execsnoop -t # include timestamps
|
||||
./execsnoop -q # add "quotemarks" around arguments
|
||||
./execsnoop -n main # only print command lines containing "main"
|
||||
./execsnoop -l tpkg # only print command where arguments contains "tpkg"
|
||||
./execsnoop --cgroupmap mappath # only trace cgroups in this BPF map
|
||||
./execsnoop --mntnsmap mappath # only trace mount namespaces in the map
|
||||
|
||||
|
||||
```
|
||||
|
||||
The -T and -t option include time and timestamps on output:
|
||||
|
||||
```
|
||||
# ./execsnoop -T -t
|
||||
TIME TIME(s) PCOMM PID PPID RET ARGS
|
||||
23:35:25 4.335 bash 33360 24577 0 /bin/bash
|
||||
23:35:25 4.338 lesspipe 33361 33360 0 /usr/bin/lesspipe
|
||||
23:35:25 4.340 basename 33362 33361 0 /usr/bin/basename /usr/bin/lesspipe
|
||||
23:35:25 4.342 dirname 33364 33363 0 /usr/bin/dirname /usr/bin/lesspipe
|
||||
23:35:25 4.347 dircolors 33365 33360 0 /usr/bin/dircolors -b
|
||||
23:35:40 19.327 touch 33367 33366 0 /usr/bin/touch /run/udev/gdm-machine-has-hardware-gpu
|
||||
23:35:40 19.329 snap-device-hel 33368 33366 0 /usr/lib/snapd/snap-device-helper change snap_firefox_firefox /devices/pci0000:00/0000:00:02.0/drm/card0 226:0
|
||||
23:35:40 19.331 snap-device-hel 33369 33366 0 /usr/lib/snapd/snap-device-helper change snap_firefox_geckodriver /devices/pci0000:00/0000:00:02.0/drm/card0 226:0
|
||||
23:35:40 19.332 snap-device-hel 33370 33366 0 /usr/lib/snapd/snap-device-helper change snap_snap-store_snap-store /devices/pci0000:00/0000:00:02.0/drm/card0 226:0
|
||||
|
||||
```
|
||||
|
||||
The -u option filtering UID:
|
||||
|
||||
```
|
||||
# ./execsnoop -Uu 1000
|
||||
UID PCOMM PID PPID RET ARGS
|
||||
1000 bash 33604 24577 0 /bin/bash
|
||||
1000 lesspipe 33606 33604 0 /usr/bin/lesspipe
|
||||
1000 basename 33607 33606 0 /usr/bin/basename /usr/bin/lesspipe
|
||||
1000 dirname 33609 33608 0 /usr/bin/dirname /usr/bin/lesspipe
|
||||
1000 dircolors 33610 33604 0 /usr/bin/dircolors -b
|
||||
1000 sleep 33615 33604 0 /usr/bin/sleep
|
||||
1000 sleep 33616 33604 0 /usr/bin/sleep 1
|
||||
1000 clear 33617 33604 0 /usr/bin/clear
|
||||
|
||||
```
|
||||
|
||||
Report bugs to https://github.com/iovisor/bcc/tree/master/libbpf-tools.
|
||||
146
7-execsnoop/execsnoop.bpf.c
Normal file
146
7-execsnoop/execsnoop.bpf.c
Normal file
@@ -0,0 +1,146 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
#include "execsnoop.bpf.h"
|
||||
|
||||
const volatile bool filter_cg = false;
|
||||
const volatile bool ignore_failed = true;
|
||||
const volatile uid_t targ_uid = INVALID_UID;
|
||||
const volatile int max_args = DEFAULT_MAXARGS;
|
||||
|
||||
static const struct event empty_event = {};
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
|
||||
__type(key, u32);
|
||||
__type(value, u32);
|
||||
__uint(max_entries, 1);
|
||||
} cgroup_map SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 10240);
|
||||
__type(key, pid_t);
|
||||
__type(value, struct event);
|
||||
} execs SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(u32));
|
||||
__uint(value_size, sizeof(u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
static __always_inline bool valid_uid(uid_t uid) {
|
||||
return uid != INVALID_UID;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_enter_execve")
|
||||
int tracepoint__syscalls__sys_enter_execve(struct trace_event_raw_sys_enter* ctx)
|
||||
{
|
||||
u64 id;
|
||||
pid_t pid, tgid;
|
||||
unsigned int ret;
|
||||
struct event *event;
|
||||
struct task_struct *task;
|
||||
const char **args = (const char **)(ctx->args[1]);
|
||||
const char *argp;
|
||||
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
uid_t uid = (u32)bpf_get_current_uid_gid();
|
||||
int i;
|
||||
|
||||
if (valid_uid(targ_uid) && targ_uid != uid)
|
||||
return 0;
|
||||
|
||||
id = bpf_get_current_pid_tgid();
|
||||
pid = (pid_t)id;
|
||||
tgid = id >> 32;
|
||||
if (bpf_map_update_elem(&execs, &pid, &empty_event, BPF_NOEXIST))
|
||||
return 0;
|
||||
|
||||
event = bpf_map_lookup_elem(&execs, &pid);
|
||||
if (!event)
|
||||
return 0;
|
||||
|
||||
event->pid = tgid;
|
||||
event->uid = uid;
|
||||
task = (struct task_struct*)bpf_get_current_task();
|
||||
event->ppid = (pid_t)BPF_CORE_READ(task, real_parent, tgid);
|
||||
event->args_count = 0;
|
||||
event->args_size = 0;
|
||||
|
||||
ret = bpf_probe_read_user_str(event->args, ARGSIZE, (const char*)ctx->args[0]);
|
||||
if (ret <= ARGSIZE) {
|
||||
event->args_size += ret;
|
||||
} else {
|
||||
/* write an empty string */
|
||||
event->args[0] = '\0';
|
||||
event->args_size++;
|
||||
}
|
||||
|
||||
event->args_count++;
|
||||
#pragma unroll
|
||||
for (i = 1; i < TOTAL_MAX_ARGS && i < max_args; i++) {
|
||||
bpf_probe_read_user(&argp, sizeof(argp), &args[i]);
|
||||
if (!argp)
|
||||
return 0;
|
||||
|
||||
if (event->args_size > LAST_ARG)
|
||||
return 0;
|
||||
|
||||
ret = bpf_probe_read_user_str(&event->args[event->args_size], ARGSIZE, argp);
|
||||
if (ret > ARGSIZE)
|
||||
return 0;
|
||||
|
||||
event->args_count++;
|
||||
event->args_size += ret;
|
||||
}
|
||||
/* try to read one more argument to check if there is one */
|
||||
bpf_probe_read_user(&argp, sizeof(argp), &args[max_args]);
|
||||
if (!argp)
|
||||
return 0;
|
||||
|
||||
/* pointer to max_args+1 isn't null, asume we have more arguments */
|
||||
event->args_count++;
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tracepoint/syscalls/sys_exit_execve")
|
||||
int tracepoint__syscalls__sys_exit_execve(struct trace_event_raw_sys_exit* ctx)
|
||||
{
|
||||
u64 id;
|
||||
pid_t pid;
|
||||
int ret;
|
||||
struct event *event;
|
||||
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
u32 uid = (u32)bpf_get_current_uid_gid();
|
||||
|
||||
if (valid_uid(targ_uid) && targ_uid != uid)
|
||||
return 0;
|
||||
id = bpf_get_current_pid_tgid();
|
||||
pid = (pid_t)id;
|
||||
event = bpf_map_lookup_elem(&execs, &pid);
|
||||
if (!event)
|
||||
return 0;
|
||||
ret = ctx->ret;
|
||||
if (ignore_failed && ret < 0)
|
||||
goto cleanup;
|
||||
|
||||
event->retval = ret;
|
||||
bpf_get_current_comm(&event->comm, sizeof(event->comm));
|
||||
size_t len =((size_t)(&((struct event*)0)->args) + event->args_size);
|
||||
if (len <= sizeof(*event))
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, event, len);
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&execs, &pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
|
||||
26
7-execsnoop/execsnoop.bpf.h
Normal file
26
7-execsnoop/execsnoop.bpf.h
Normal file
@@ -0,0 +1,26 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __EXECSNOOP_H
|
||||
#define __EXECSNOOP_H
|
||||
|
||||
#define ARGSIZE 128
|
||||
#define TASK_COMM_LEN 16
|
||||
#define TOTAL_MAX_ARGS 60
|
||||
#define DEFAULT_MAXARGS 20
|
||||
#define FULL_MAX_ARGS_ARR (TOTAL_MAX_ARGS * ARGSIZE)
|
||||
#define INVALID_UID ((uid_t)-1)
|
||||
#define LAST_ARG (FULL_MAX_ARGS_ARR - ARGSIZE)
|
||||
|
||||
struct event {
|
||||
int pid;
|
||||
int ppid;
|
||||
int uid;
|
||||
int retval;
|
||||
int args_count;
|
||||
unsigned int args_size;
|
||||
char comm[TASK_COMM_LEN];
|
||||
char args[FULL_MAX_ARGS_ARR];
|
||||
};
|
||||
|
||||
#endif /* __EXECSNOOP_H */
|
||||
|
||||
|
||||
4
8-runqslower/.gitignore
vendored
Normal file
4
8-runqslower/.gitignore
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
.vscode
|
||||
package.json
|
||||
eunomia-exporter
|
||||
ecli
|
||||
147
8-runqslower/README.md
Normal file
147
8-runqslower/README.md
Normal file
@@ -0,0 +1,147 @@
|
||||
| layout | title | date | category | author | tags | summary |
|
||||
| ------ | ---------- | ---------------- | -------- | -------- | --------------- | ----------------------------------------------- |
|
||||
| post | runqslower | 2022-11-11-20:50 | bpftools | yunwei37 | bpftool syscall | runqslower Trace long process scheduling delays |
|
||||
|
||||
## origin
|
||||
|
||||
origin from:
|
||||
|
||||
https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqslower.bpf.c
|
||||
|
||||
result:
|
||||
|
||||
```
|
||||
$ sudo ecli/build/bin/Release/ecli run examples/bpftools/runqslower/package.json
|
||||
|
||||
running and waiting for the ebpf events from perf event...
|
||||
time task prev_task delta_us pid prev_pid
|
||||
20:11:59 gnome-shell swapper/0 32 2202 0
|
||||
20:11:59 ecli swapper/3 23 3437 0
|
||||
20:11:59 rcu_sched swapper/1 1 14 0
|
||||
20:11:59 gnome-terminal- swapper/1 13 2714 0
|
||||
20:11:59 ecli swapper/3 2 3437 0
|
||||
20:11:59 kworker/3:3 swapper/3 3 215 0
|
||||
20:11:59 containerd swapper/1 8 1088 0
|
||||
20:11:59 ecli swapper/2 5 3437 0
|
||||
20:11:59 HangDetector swapper/3 6 854 0
|
||||
20:11:59 ecli swapper/2 60 3437 0
|
||||
20:11:59 rcu_sched swapper/1 26 14 0
|
||||
20:11:59 kworker/0:1 swapper/0 26 3414 0
|
||||
20:11:59 ecli swapper/2 6 3437 0
|
||||
```
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile:
|
||||
|
||||
```
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```
|
||||
sudo ./ecli run examples/bpftools/runqslower/package.json
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
Demonstrations of runqslower, the Linux eBPF/bcc version.
|
||||
|
||||
runqslower traces high scheduling delays between tasks being ready to run and them running on CPU after that. Example output:
|
||||
|
||||
```
|
||||
# ./runqslower
|
||||
Tracing run queue latency higher than 10000 us
|
||||
TIME COMM TID LAT(us)
|
||||
13:11:43 b'kworker/0:2' 8680 10250
|
||||
13:12:18 b'irq/16-vmwgfx' 422 10838
|
||||
13:12:18 b'systemd-oomd' 753 11012
|
||||
13:12:18 b'containerd' 8272 11254
|
||||
13:12:18 b'HangDetector' 764 12042
|
||||
^C
|
||||
``
|
||||
This measures the time a task spends waiting on a run queue for a turn on-CPU, and shows this time as a individual events. This time should be small, but a task may need to wait its turn due to CPU load.
|
||||
|
||||
This measures two types of run queue latency:
|
||||
1. The time from a task being enqueued on a run queue to its context switch and execution. This traces ttwu_do_wakeup(), wake_up_new_task() -> finish_task_switch() with either raw tracepoints (if supported) or kprobes and instruments the run queue latency after a voluntary context switch.
|
||||
2. The time from when a task was involuntary context switched and still in the runnable state, to when it next executed. This is instrumented from finish_task_switch() alone.
|
||||
|
||||
The overhead of this tool may become significant for some workloads: see the OVERHEAD section.
|
||||
|
||||
This works by tracing various kernel scheduler functions using dynamic tracing, and will need updating to match any changes to these functions.
|
||||
|
||||
Since this uses BPF, only the root user can use this tool.
|
||||
|
||||
```console
|
||||
Usage: runqslower [-h] [-p PID | -t TID | -P] [min_us]
|
||||
```
|
||||
|
||||
The min_us option sets the latency of the run queue to track:
|
||||
|
||||
```
|
||||
# ./runqslower 100
|
||||
Tracing run queue latency higher than 100 us
|
||||
TIME COMM TID LAT(us)
|
||||
20:48:26 b'gnome-shell' 3005 201
|
||||
20:48:26 b'gnome-shell' 3005 202
|
||||
20:48:26 b'gnome-shell' 3005 254
|
||||
20:48:26 b'gnome-shell' 3005 208
|
||||
20:48:26 b'gnome-shell' 3005 132
|
||||
20:48:26 b'gnome-shell' 3005 213
|
||||
20:48:26 b'gnome-shell' 3005 205
|
||||
20:48:26 b'python3' 5224 127
|
||||
20:48:26 b'gnome-shell' 3005 214
|
||||
20:48:26 b'gnome-shell' 3005 126
|
||||
20:48:26 b'gnome-shell' 3005 285
|
||||
20:48:26 b'Xorg' 2869 296
|
||||
20:48:26 b'gnome-shell' 3005 119
|
||||
20:48:26 b'gnome-shell' 3005 206
|
||||
```
|
||||
|
||||
The -p PID option only traces this PID:
|
||||
|
||||
```
|
||||
# ./runqslower -p 3005
|
||||
Tracing run queue latency higher than 10000 us
|
||||
TIME COMM TID LAT(us)
|
||||
20:46:22 b'gnome-shell' 3005 16024
|
||||
20:46:45 b'gnome-shell' 3005 11494
|
||||
20:46:45 b'gnome-shell' 3005 21430
|
||||
20:46:45 b'gnome-shell' 3005 14948
|
||||
20:47:16 b'gnome-shell' 3005 10164
|
||||
20:47:16 b'gnome-shell' 3005 18070
|
||||
20:47:17 b'gnome-shell' 3005 13272
|
||||
20:47:18 b'gnome-shell' 3005 10451
|
||||
20:47:18 b'gnome-shell' 3005 15010
|
||||
20:47:18 b'gnome-shell' 3005 19449
|
||||
20:47:22 b'gnome-shell' 3005 19327
|
||||
20:47:23 b'gnome-shell' 3005 13178
|
||||
20:47:23 b'gnome-shell' 3005 13483
|
||||
20:47:23 b'gnome-shell' 3005 15562
|
||||
20:47:23 b'gnome-shell' 3005 13655
|
||||
20:47:23 b'gnome-shell' 3005 19571
|
||||
```
|
||||
|
||||
The -P option also shows previous task name and TID:
|
||||
|
||||
```
|
||||
# ./runqslower -P
|
||||
Tracing run queue latency higher than 10000 us
|
||||
TIME COMM TID LAT(us) PREV COMM PREV TID
|
||||
20:42:48 b'sysbench' 5159 10562 b'sysbench' 5152
|
||||
20:42:48 b'sysbench' 5159 10367 b'sysbench' 5152
|
||||
20:42:49 b'sysbench' 5158 11818 b'sysbench' 5159
|
||||
20:42:49 b'sysbench' 5160 16913 b'sysbench' 5153
|
||||
20:42:49 b'sysbench' 5157 13742 b'sysbench' 5160
|
||||
20:42:49 b'sysbench' 5152 13746 b'sysbench' 5160
|
||||
20:42:49 b'sysbench' 5153 13731 b'sysbench' 5160
|
||||
20:42:49 b'sysbench' 5158 14688 b'sysbench' 5161
|
||||
20:42:50 b'sysbench' 5155 10468 b'sysbench' 5152
|
||||
20:42:50 b'sysbench' 5156 17695 b'sysbench' 5158
|
||||
20:42:50 b'sysbench' 5155 11251 b'sysbench' 5152
|
||||
20:42:50 b'sysbench' 5154 13283 b'sysbench' 5152
|
||||
20:42:50 b'sysbench' 5158 22278 b'sysbench' 5157
|
||||
```
|
||||
|
||||
For more details, see docs/special_filtering.md
|
||||
112
8-runqslower/core_fixes.h
Normal file
112
8-runqslower/core_fixes.h
Normal file
@@ -0,0 +1,112 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
/* Copyright (c) 2021 Hengqi Chen */
|
||||
|
||||
#ifndef __CORE_FIXES_BPF_H
|
||||
#define __CORE_FIXES_BPF_H
|
||||
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
/**
|
||||
* commit 2f064a59a1 ("sched: Change task_struct::state") changes
|
||||
* the name of task_struct::state to task_struct::__state
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/2f064a59a1
|
||||
*/
|
||||
struct task_struct___o {
|
||||
volatile long int state;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct task_struct___x {
|
||||
unsigned int __state;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline __s64 get_task_state(void *task)
|
||||
{
|
||||
struct task_struct___x *t = task;
|
||||
|
||||
if (bpf_core_field_exists(t->__state))
|
||||
return BPF_CORE_READ(t, __state);
|
||||
return BPF_CORE_READ((struct task_struct___o *)task, state);
|
||||
}
|
||||
|
||||
/**
|
||||
* commit 309dca309fc3 ("block: store a block_device pointer in struct bio")
|
||||
* adds a new member bi_bdev which is a pointer to struct block_device
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/309dca309fc3
|
||||
*/
|
||||
struct bio___o {
|
||||
struct gendisk *bi_disk;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct bio___x {
|
||||
struct block_device *bi_bdev;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline struct gendisk *get_gendisk(void *bio)
|
||||
{
|
||||
struct bio___x *b = bio;
|
||||
|
||||
if (bpf_core_field_exists(b->bi_bdev))
|
||||
return BPF_CORE_READ(b, bi_bdev, bd_disk);
|
||||
return BPF_CORE_READ((struct bio___o *)bio, bi_disk);
|
||||
}
|
||||
|
||||
/**
|
||||
* commit d5869fdc189f ("block: introduce block_rq_error tracepoint")
|
||||
* adds a new tracepoint block_rq_error and it shares the same arguments
|
||||
* with tracepoint block_rq_complete. As a result, the kernel BTF now has
|
||||
* a `struct trace_event_raw_block_rq_completion` instead of
|
||||
* `struct trace_event_raw_block_rq_complete`.
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/d5869fdc189f
|
||||
*/
|
||||
struct trace_event_raw_block_rq_complete___x {
|
||||
dev_t dev;
|
||||
sector_t sector;
|
||||
unsigned int nr_sector;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct trace_event_raw_block_rq_completion___x {
|
||||
dev_t dev;
|
||||
sector_t sector;
|
||||
unsigned int nr_sector;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline bool has_block_rq_completion()
|
||||
{
|
||||
if (bpf_core_type_exists(struct trace_event_raw_block_rq_completion___x))
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* commit d152c682f03c ("block: add an explicit ->disk backpointer to the
|
||||
* request_queue") and commit f3fa33acca9f ("block: remove the ->rq_disk
|
||||
* field in struct request") make some changes to `struct request` and
|
||||
* `struct request_queue`. Now, to get the `struct gendisk *` field in a CO-RE
|
||||
* way, we need both `struct request` and `struct request_queue`.
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/d152c682f03c
|
||||
* https://github.com/torvalds/linux/commit/f3fa33acca9f
|
||||
*/
|
||||
struct request_queue___x {
|
||||
struct gendisk *disk;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct request___x {
|
||||
struct request_queue___x *q;
|
||||
struct gendisk *rq_disk;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline struct gendisk *get_disk(void *request)
|
||||
{
|
||||
struct request___x *r = request;
|
||||
|
||||
if (bpf_core_field_exists(r->rq_disk))
|
||||
return BPF_CORE_READ(r, rq_disk);
|
||||
return BPF_CORE_READ(r, q, disk);
|
||||
}
|
||||
|
||||
#endif /* __CORE_FIXES_BPF_H */
|
||||
117
8-runqslower/runqslower.bpf.c
Normal file
117
8-runqslower/runqslower.bpf.c
Normal file
@@ -0,0 +1,117 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2019 Facebook
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include "runqslower.bpf.h"
|
||||
#include "core_fixes.h"
|
||||
|
||||
#define TASK_RUNNING 0
|
||||
|
||||
const volatile __u64 min_us = 0;
|
||||
const volatile pid_t targ_pid = 0;
|
||||
const volatile pid_t targ_tgid = 0;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, 10240);
|
||||
__type(key, u32);
|
||||
__type(value, u64);
|
||||
} start SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
|
||||
__uint(key_size, sizeof(u32));
|
||||
__uint(value_size, sizeof(u32));
|
||||
} events SEC(".maps");
|
||||
|
||||
/* record enqueue timestamp */
|
||||
static int trace_enqueue(u32 tgid, u32 pid)
|
||||
{
|
||||
u64 ts;
|
||||
|
||||
if (!pid)
|
||||
return 0;
|
||||
if (targ_tgid && targ_tgid != tgid)
|
||||
return 0;
|
||||
if (targ_pid && targ_pid != pid)
|
||||
return 0;
|
||||
|
||||
ts = bpf_ktime_get_ns();
|
||||
bpf_map_update_elem(&start, &pid, &ts, 0);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int handle_switch(void *ctx, struct task_struct *prev, struct task_struct *next)
|
||||
{
|
||||
struct event event = {};
|
||||
u64 *tsp, delta_us;
|
||||
u32 pid;
|
||||
|
||||
/* ivcsw: treat like an enqueue event and store timestamp */
|
||||
if (get_task_state(prev) == TASK_RUNNING)
|
||||
trace_enqueue(BPF_CORE_READ(prev, tgid), BPF_CORE_READ(prev, pid));
|
||||
|
||||
pid = BPF_CORE_READ(next, pid);
|
||||
|
||||
/* fetch timestamp and calculate delta */
|
||||
tsp = bpf_map_lookup_elem(&start, &pid);
|
||||
if (!tsp)
|
||||
return 0; /* missed enqueue */
|
||||
|
||||
delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
|
||||
if (min_us && delta_us <= min_us)
|
||||
return 0;
|
||||
|
||||
event.pid = pid;
|
||||
event.prev_pid = BPF_CORE_READ(prev, pid);
|
||||
event.delta_us = delta_us;
|
||||
bpf_probe_read_kernel_str(&event.task, sizeof(event.task), next->comm);
|
||||
bpf_probe_read_kernel_str(&event.prev_task, sizeof(event.prev_task), prev->comm);
|
||||
|
||||
/* output */
|
||||
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
|
||||
&event, sizeof(event));
|
||||
|
||||
bpf_map_delete_elem(&start, &pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("tp_btf/sched_wakeup")
|
||||
int BPF_PROG(sched_wakeup, struct task_struct *p)
|
||||
{
|
||||
return trace_enqueue(p->tgid, p->pid);
|
||||
}
|
||||
|
||||
SEC("tp_btf/sched_wakeup_new")
|
||||
int BPF_PROG(sched_wakeup_new, struct task_struct *p)
|
||||
{
|
||||
return trace_enqueue(p->tgid, p->pid);
|
||||
}
|
||||
|
||||
SEC("tp_btf/sched_switch")
|
||||
int BPF_PROG(sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next)
|
||||
{
|
||||
return handle_switch(ctx, prev, next);
|
||||
}
|
||||
|
||||
SEC("raw_tp/sched_wakeup")
|
||||
int BPF_PROG(handle_sched_wakeup, struct task_struct *p)
|
||||
{
|
||||
return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid));
|
||||
}
|
||||
|
||||
SEC("raw_tp/sched_wakeup_new")
|
||||
int BPF_PROG(handle_sched_wakeup_new, struct task_struct *p)
|
||||
{
|
||||
return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid));
|
||||
}
|
||||
|
||||
SEC("raw_tp/sched_switch")
|
||||
int BPF_PROG(handle_sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next)
|
||||
{
|
||||
return handle_switch(ctx, prev, next);
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
15
8-runqslower/runqslower.bpf.h
Normal file
15
8-runqslower/runqslower.bpf.h
Normal file
@@ -0,0 +1,15 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __RUNQSLOWER_H
|
||||
#define __RUNQSLOWER_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
|
||||
struct event {
|
||||
char task[TASK_COMM_LEN];
|
||||
char prev_task[TASK_COMM_LEN];
|
||||
__u64 delta_us;
|
||||
int pid;
|
||||
int prev_pid;
|
||||
};
|
||||
|
||||
#endif /* __RUNQSLOWER_H */
|
||||
6
9-runqlat/.gitignore
vendored
Normal file
6
9-runqlat/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
.vscode
|
||||
package.json
|
||||
*.o
|
||||
*.skel.json
|
||||
*.skel.yaml
|
||||
package.yaml
|
||||
675
9-runqlat/README.md
Executable file
675
9-runqlat/README.md
Executable file
@@ -0,0 +1,675 @@
|
||||
---
|
||||
layout: post
|
||||
title: runqlat
|
||||
date: 2022-10-10 16:18
|
||||
category: bpftools
|
||||
author: yunwei37
|
||||
tags: [bpftools, syscall, tracepoint]
|
||||
summary: Summarize run queue (scheduler) latency as a histogram.
|
||||
---
|
||||
|
||||
|
||||
## origin
|
||||
|
||||
origin from:
|
||||
|
||||
<https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqlat.bpf.c>
|
||||
|
||||
This program summarizes scheduler run queue latency as a histogram, showing
|
||||
how long tasks spent waiting their turn to run on-CPU.
|
||||
|
||||
## Compile and Run
|
||||
|
||||
Compile:
|
||||
|
||||
```shell
|
||||
docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest
|
||||
```
|
||||
|
||||
```console
|
||||
$ ecc runqlat.bpf.c runqlat.h
|
||||
Compiling bpf object...
|
||||
Generating export types...
|
||||
Packing ebpf object and config into package.json...
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```console
|
||||
$ sudo ecli examples/bpftools/runqlat/package.json -h
|
||||
Usage: runqlat_bpf [--help] [--version] [--verbose] [--filter_cg] [--targ_per_process] [--targ_per_thread] [--targ_per_pidns] [--targ_ms] [--targ_tgid VAR]
|
||||
|
||||
A simple eBPF program
|
||||
|
||||
Optional arguments:
|
||||
-h, --help shows help message and exits
|
||||
-v, --version prints version information and exits
|
||||
--verbose prints libbpf debug information
|
||||
--filter_cg set value of bool variable filter_cg
|
||||
--targ_per_process set value of bool variable targ_per_process
|
||||
--targ_per_thread set value of bool variable targ_per_thread
|
||||
--targ_per_pidns set value of bool variable targ_per_pidns
|
||||
--targ_ms set value of bool variable targ_ms
|
||||
--targ_tgid set value of pid_t variable targ_tgid
|
||||
|
||||
Built with eunomia-bpf framework.
|
||||
See https://github.com/eunomia-bpf/eunomia-bpf for more information.
|
||||
|
||||
$ sudo ecli examples/bpftools/runqlat/package.json
|
||||
key = 4294967295
|
||||
comm = rcu_preempt
|
||||
|
||||
(unit) : count distribution
|
||||
0 -> 1 : 9 |**** |
|
||||
2 -> 3 : 6 |** |
|
||||
4 -> 7 : 12 |***** |
|
||||
8 -> 15 : 28 |************* |
|
||||
16 -> 31 : 40 |******************* |
|
||||
32 -> 63 : 83 |****************************************|
|
||||
64 -> 127 : 57 |*************************** |
|
||||
128 -> 255 : 19 |********* |
|
||||
256 -> 511 : 11 |***** |
|
||||
512 -> 1023 : 2 | |
|
||||
1024 -> 2047 : 2 | |
|
||||
2048 -> 4095 : 0 | |
|
||||
4096 -> 8191 : 0 | |
|
||||
8192 -> 16383 : 0 | |
|
||||
16384 -> 32767 : 1 | |
|
||||
|
||||
$ sudo ecli examples/bpftools/runqlat/package.json --targ_per_process
|
||||
key = 3189
|
||||
comm = cpptools
|
||||
|
||||
(unit) : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 1 |*** |
|
||||
16 -> 31 : 2 |******* |
|
||||
32 -> 63 : 11 |****************************************|
|
||||
64 -> 127 : 8 |***************************** |
|
||||
128 -> 255 : 3 |********** |
|
||||
```
|
||||
|
||||
## details in bcc
|
||||
|
||||
```text
|
||||
Demonstrations of runqlat, the Linux eBPF/bcc version.
|
||||
|
||||
|
||||
This program summarizes scheduler run queue latency as a histogram, showing
|
||||
how long tasks spent waiting their turn to run on-CPU.
|
||||
|
||||
Here is a heavily loaded system:
|
||||
|
||||
# ./runqlat
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
^C
|
||||
usecs : count distribution
|
||||
0 -> 1 : 233 |*********** |
|
||||
2 -> 3 : 742 |************************************ |
|
||||
4 -> 7 : 203 |********** |
|
||||
8 -> 15 : 173 |******** |
|
||||
16 -> 31 : 24 |* |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 30 |* |
|
||||
128 -> 255 : 6 | |
|
||||
256 -> 511 : 3 | |
|
||||
512 -> 1023 : 5 | |
|
||||
1024 -> 2047 : 27 |* |
|
||||
2048 -> 4095 : 30 |* |
|
||||
4096 -> 8191 : 20 | |
|
||||
8192 -> 16383 : 29 |* |
|
||||
16384 -> 32767 : 809 |****************************************|
|
||||
32768 -> 65535 : 64 |*** |
|
||||
|
||||
The distribution is bimodal, with one mode between 0 and 15 microseconds,
|
||||
and another between 16 and 65 milliseconds. These modes are visible as the
|
||||
spikes in the ASCII distribution (which is merely a visual representation
|
||||
of the "count" column). As an example of reading one line: 809 events fell
|
||||
into the 16384 to 32767 microsecond range (16 to 32 ms) while tracing.
|
||||
|
||||
I would expect the two modes to be due the workload: 16 hot CPU-bound threads,
|
||||
and many other mostly idle threads doing occasional work. I suspect the mostly
|
||||
idle threads will run with a higher priority when they wake up, and are
|
||||
the reason for the low latency mode. The high latency mode will be the
|
||||
CPU-bound threads. More analysis with this and other tools can confirm.
|
||||
|
||||
|
||||
A -m option can be used to show milliseconds instead, as well as an interval
|
||||
and a count. For example, showing three x five second summary in milliseconds:
|
||||
|
||||
# ./runqlat -m 5 3
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
|
||||
msecs : count distribution
|
||||
0 -> 1 : 3818 |****************************************|
|
||||
2 -> 3 : 39 | |
|
||||
4 -> 7 : 39 | |
|
||||
8 -> 15 : 62 | |
|
||||
16 -> 31 : 2214 |*********************** |
|
||||
32 -> 63 : 226 |** |
|
||||
|
||||
msecs : count distribution
|
||||
0 -> 1 : 3775 |****************************************|
|
||||
2 -> 3 : 52 | |
|
||||
4 -> 7 : 37 | |
|
||||
8 -> 15 : 65 | |
|
||||
16 -> 31 : 2230 |*********************** |
|
||||
32 -> 63 : 212 |** |
|
||||
|
||||
msecs : count distribution
|
||||
0 -> 1 : 3816 |****************************************|
|
||||
2 -> 3 : 49 | |
|
||||
4 -> 7 : 40 | |
|
||||
8 -> 15 : 53 | |
|
||||
16 -> 31 : 2228 |*********************** |
|
||||
32 -> 63 : 221 |** |
|
||||
|
||||
This shows a similar distribution across the three summaries.
|
||||
|
||||
|
||||
A -p option can be used to show one PID only, which is filtered in kernel for
|
||||
efficiency. For example, PID 4505, and one second summaries:
|
||||
|
||||
# ./runqlat -mp 4505 1
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
|
||||
msecs : count distribution
|
||||
0 -> 1 : 1 |* |
|
||||
2 -> 3 : 2 |*** |
|
||||
4 -> 7 : 1 |* |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 25 |****************************************|
|
||||
32 -> 63 : 3 |**** |
|
||||
|
||||
msecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 2 |** |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 1 |* |
|
||||
16 -> 31 : 30 |****************************************|
|
||||
32 -> 63 : 1 |* |
|
||||
|
||||
msecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 28 |****************************************|
|
||||
32 -> 63 : 2 |** |
|
||||
|
||||
msecs : count distribution
|
||||
0 -> 1 : 1 |* |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 27 |****************************************|
|
||||
32 -> 63 : 4 |***** |
|
||||
[...]
|
||||
|
||||
For comparison, here is pidstat(1) for that process:
|
||||
|
||||
# pidstat -p 4505 1
|
||||
Linux 4.4.0-virtual (bgregg-xxxxxxxx) 02/08/2016 _x86_64_ (8 CPU)
|
||||
|
||||
08:56:11 AM UID PID %usr %system %guest %CPU CPU Command
|
||||
08:56:12 AM 0 4505 9.00 3.00 0.00 12.00 0 bash
|
||||
08:56:13 AM 0 4505 7.00 5.00 0.00 12.00 0 bash
|
||||
08:56:14 AM 0 4505 10.00 2.00 0.00 12.00 0 bash
|
||||
08:56:15 AM 0 4505 11.00 2.00 0.00 13.00 0 bash
|
||||
08:56:16 AM 0 4505 9.00 3.00 0.00 12.00 0 bash
|
||||
[...]
|
||||
|
||||
This is a synthetic workload that is CPU bound. It's only spending 12% on-CPU
|
||||
each second because of high CPU demand on this server: the remaining time
|
||||
is spent waiting on a run queue, as visualized by runqlat.
|
||||
|
||||
|
||||
Here is the same system, but when it is CPU idle:
|
||||
|
||||
# ./runqlat 5 1
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
|
||||
usecs : count distribution
|
||||
0 -> 1 : 2250 |******************************** |
|
||||
2 -> 3 : 2340 |********************************** |
|
||||
4 -> 7 : 2746 |****************************************|
|
||||
8 -> 15 : 418 |****** |
|
||||
16 -> 31 : 93 |* |
|
||||
32 -> 63 : 28 | |
|
||||
64 -> 127 : 119 |* |
|
||||
128 -> 255 : 9 | |
|
||||
256 -> 511 : 4 | |
|
||||
512 -> 1023 : 20 | |
|
||||
1024 -> 2047 : 22 | |
|
||||
2048 -> 4095 : 5 | |
|
||||
4096 -> 8191 : 2 | |
|
||||
|
||||
Back to a microsecond scale, this time there is little run queue latency past 1
|
||||
millisecond, as would be expected.
|
||||
|
||||
|
||||
Now 16 threads are performing heavy disk I/O:
|
||||
|
||||
# ./runqlat 5 1
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
|
||||
usecs : count distribution
|
||||
0 -> 1 : 204 | |
|
||||
2 -> 3 : 944 |* |
|
||||
4 -> 7 : 16315 |********************* |
|
||||
8 -> 15 : 29897 |****************************************|
|
||||
16 -> 31 : 1044 |* |
|
||||
32 -> 63 : 23 | |
|
||||
64 -> 127 : 128 | |
|
||||
128 -> 255 : 24 | |
|
||||
256 -> 511 : 5 | |
|
||||
512 -> 1023 : 13 | |
|
||||
1024 -> 2047 : 15 | |
|
||||
2048 -> 4095 : 13 | |
|
||||
4096 -> 8191 : 10 | |
|
||||
|
||||
The distribution hasn't changed too much. While the disks are 100% busy, there
|
||||
is still plenty of CPU headroom, and threads still don't spend much time
|
||||
waiting their turn.
|
||||
|
||||
|
||||
A -P option will print a distribution for each PID:
|
||||
|
||||
# ./runqlat -P
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
^C
|
||||
|
||||
pid = 0
|
||||
usecs : count distribution
|
||||
0 -> 1 : 351 |******************************** |
|
||||
2 -> 3 : 96 |******** |
|
||||
4 -> 7 : 437 |****************************************|
|
||||
8 -> 15 : 12 |* |
|
||||
16 -> 31 : 10 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 16 |* |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 0 | |
|
||||
1024 -> 2047 : 0 | |
|
||||
2048 -> 4095 : 0 | |
|
||||
4096 -> 8191 : 0 | |
|
||||
8192 -> 16383 : 1 | |
|
||||
|
||||
pid = 12929
|
||||
usecs : count distribution
|
||||
0 -> 1 : 1 |****************************************|
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 1 |****************************************|
|
||||
|
||||
pid = 12930
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 1 |****************************************|
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 1 |****************************************|
|
||||
|
||||
pid = 12931
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 1 |******************** |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 2 |****************************************|
|
||||
|
||||
pid = 12932
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 1 |****************************************|
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 1 |****************************************|
|
||||
|
||||
pid = 7
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 426 |************************************* |
|
||||
4 -> 7 : 457 |****************************************|
|
||||
8 -> 15 : 16 |* |
|
||||
|
||||
pid = 9
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 425 |****************************************|
|
||||
8 -> 15 : 16 |* |
|
||||
|
||||
pid = 11
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 10 |****************************************|
|
||||
|
||||
pid = 14
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 8 |****************************************|
|
||||
4 -> 7 : 2 |********** |
|
||||
|
||||
pid = 18
|
||||
usecs : count distribution
|
||||
0 -> 1 : 414 |****************************************|
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 20 |* |
|
||||
8 -> 15 : 8 | |
|
||||
|
||||
pid = 12928
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 1 |****************************************|
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 1 |****************************************|
|
||||
|
||||
pid = 1867
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 15 |****************************************|
|
||||
16 -> 31 : 1 |** |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 4 |********** |
|
||||
|
||||
pid = 1871
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 2 |****************************************|
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 1 |******************** |
|
||||
|
||||
pid = 1876
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 1 |****************************************|
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 1 |****************************************|
|
||||
|
||||
pid = 1878
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 3 |****************************************|
|
||||
|
||||
pid = 1880
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 3 |****************************************|
|
||||
|
||||
pid = 9307
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 1 |****************************************|
|
||||
|
||||
pid = 1886
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 1 |******************** |
|
||||
8 -> 15 : 2 |****************************************|
|
||||
|
||||
pid = 1888
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 3 |****************************************|
|
||||
|
||||
pid = 3297
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 1 |****************************************|
|
||||
|
||||
pid = 1892
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 1 |******************** |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 2 |****************************************|
|
||||
|
||||
pid = 7024
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 4 |****************************************|
|
||||
|
||||
pid = 16468
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 3 |****************************************|
|
||||
|
||||
pid = 12922
|
||||
usecs : count distribution
|
||||
0 -> 1 : 1 |****************************************|
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 1 |****************************************|
|
||||
16 -> 31 : 1 |****************************************|
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 1 |****************************************|
|
||||
|
||||
pid = 12923
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 1 |******************** |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 2 |****************************************|
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 1 |******************** |
|
||||
1024 -> 2047 : 1 |******************** |
|
||||
|
||||
pid = 12924
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 2 |******************** |
|
||||
8 -> 15 : 4 |****************************************|
|
||||
16 -> 31 : 1 |********** |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 0 | |
|
||||
1024 -> 2047 : 1 |********** |
|
||||
|
||||
pid = 12925
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 1 |****************************************|
|
||||
|
||||
pid = 12926
|
||||
usecs : count distribution
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 1 |****************************************|
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 1 |****************************************|
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 0 | |
|
||||
512 -> 1023 : 1 |****************************************|
|
||||
|
||||
pid = 12927
|
||||
usecs : count distribution
|
||||
0 -> 1 : 1 |****************************************|
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 1 |****************************************|
|
||||
|
||||
|
||||
A -L option will print a distribution for each TID:
|
||||
|
||||
# ./runqlat -L
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
^C
|
||||
|
||||
tid = 0
|
||||
usecs : count distribution
|
||||
0 -> 1 : 593 |**************************** |
|
||||
2 -> 3 : 829 |****************************************|
|
||||
4 -> 7 : 300 |************** |
|
||||
8 -> 15 : 321 |*************** |
|
||||
16 -> 31 : 132 |****** |
|
||||
32 -> 63 : 58 |** |
|
||||
64 -> 127 : 0 | |
|
||||
128 -> 255 : 0 | |
|
||||
256 -> 511 : 13 | |
|
||||
|
||||
tid = 7
|
||||
usecs : count distribution
|
||||
0 -> 1 : 8 |******** |
|
||||
2 -> 3 : 19 |******************** |
|
||||
4 -> 7 : 37 |****************************************|
|
||||
[...]
|
||||
|
||||
|
||||
And a --pidnss option (short for PID namespaces) will print for each PID
|
||||
namespace, for analyzing container performance:
|
||||
|
||||
# ./runqlat --pidnss -m
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
^C
|
||||
|
||||
pidns = 4026532870
|
||||
msecs : count distribution
|
||||
0 -> 1 : 40 |****************************************|
|
||||
2 -> 3 : 1 |* |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 2 |** |
|
||||
64 -> 127 : 5 |***** |
|
||||
|
||||
pidns = 4026532809
|
||||
msecs : count distribution
|
||||
0 -> 1 : 67 |****************************************|
|
||||
|
||||
pidns = 4026532748
|
||||
msecs : count distribution
|
||||
0 -> 1 : 63 |****************************************|
|
||||
|
||||
pidns = 4026532687
|
||||
msecs : count distribution
|
||||
0 -> 1 : 7 |****************************************|
|
||||
|
||||
pidns = 4026532626
|
||||
msecs : count distribution
|
||||
0 -> 1 : 45 |****************************************|
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
64 -> 127 : 3 |** |
|
||||
|
||||
pidns = 4026531836
|
||||
msecs : count distribution
|
||||
0 -> 1 : 314 |****************************************|
|
||||
2 -> 3 : 1 | |
|
||||
4 -> 7 : 11 |* |
|
||||
8 -> 15 : 28 |*** |
|
||||
16 -> 31 : 137 |***************** |
|
||||
32 -> 63 : 86 |********** |
|
||||
64 -> 127 : 1 | |
|
||||
|
||||
pidns = 4026532382
|
||||
msecs : count distribution
|
||||
0 -> 1 : 285 |****************************************|
|
||||
2 -> 3 : 5 | |
|
||||
4 -> 7 : 16 |** |
|
||||
8 -> 15 : 9 |* |
|
||||
16 -> 31 : 69 |********* |
|
||||
32 -> 63 : 25 |*** |
|
||||
|
||||
Many of these distributions have two modes: the second, in this case, is
|
||||
caused by capping CPU usage via CPU shares.
|
||||
|
||||
|
||||
USAGE message:
|
||||
|
||||
# ./runqlat -h
|
||||
usage: runqlat.py [-h] [-T] [-m] [-P] [--pidnss] [-L] [-p PID]
|
||||
[interval] [count]
|
||||
|
||||
Summarize run queue (scheduler) latency as a histogram
|
||||
|
||||
positional arguments:
|
||||
interval output interval, in seconds
|
||||
count number of outputs
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
-T, --timestamp include timestamp on output
|
||||
-m, --milliseconds millisecond histogram
|
||||
-P, --pids print a histogram per process ID
|
||||
--pidnss print a histogram per PID namespace
|
||||
-L, --tids print a histogram per thread ID
|
||||
-p PID, --pid PID trace this PID only
|
||||
|
||||
examples:
|
||||
./runqlat # summarize run queue latency as a histogram
|
||||
./runqlat 1 10 # print 1 second summaries, 10 times
|
||||
./runqlat -mT 1 # 1s summaries, milliseconds, and timestamps
|
||||
./runqlat -P # show each PID separately
|
||||
./runqlat -p 185 # trace PID 185 only
|
||||
|
||||
```
|
||||
31
9-runqlat/bits.bpf.h
Normal file
31
9-runqlat/bits.bpf.h
Normal file
@@ -0,0 +1,31 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __BITS_BPF_H
|
||||
#define __BITS_BPF_H
|
||||
|
||||
#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
|
||||
#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = val)
|
||||
|
||||
static __always_inline u64 log2(u32 v)
|
||||
{
|
||||
u32 shift, r;
|
||||
|
||||
r = (v > 0xFFFF) << 4; v >>= r;
|
||||
shift = (v > 0xFF) << 3; v >>= shift; r |= shift;
|
||||
shift = (v > 0xF) << 2; v >>= shift; r |= shift;
|
||||
shift = (v > 0x3) << 1; v >>= shift; r |= shift;
|
||||
r |= (v >> 1);
|
||||
|
||||
return r;
|
||||
}
|
||||
|
||||
static __always_inline u64 log2l(u64 v)
|
||||
{
|
||||
u32 hi = v >> 32;
|
||||
|
||||
if (hi)
|
||||
return log2(hi) + 32;
|
||||
else
|
||||
return log2(v);
|
||||
}
|
||||
|
||||
#endif /* __BITS_BPF_H */
|
||||
112
9-runqlat/core_fixes.bpf.h
Normal file
112
9-runqlat/core_fixes.bpf.h
Normal file
@@ -0,0 +1,112 @@
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
/* Copyright (c) 2021 Hengqi Chen */
|
||||
|
||||
#ifndef __CORE_FIXES_BPF_H
|
||||
#define __CORE_FIXES_BPF_H
|
||||
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
|
||||
/**
|
||||
* commit 2f064a59a1 ("sched: Change task_struct::state") changes
|
||||
* the name of task_struct::state to task_struct::__state
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/2f064a59a1
|
||||
*/
|
||||
struct task_struct___o {
|
||||
volatile long int state;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct task_struct___x {
|
||||
unsigned int __state;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline __s64 get_task_state(void *task)
|
||||
{
|
||||
struct task_struct___x *t = task;
|
||||
|
||||
if (bpf_core_field_exists(t->__state))
|
||||
return BPF_CORE_READ(t, __state);
|
||||
return BPF_CORE_READ((struct task_struct___o *)task, state);
|
||||
}
|
||||
|
||||
/**
|
||||
* commit 309dca309fc3 ("block: store a block_device pointer in struct bio")
|
||||
* adds a new member bi_bdev which is a pointer to struct block_device
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/309dca309fc3
|
||||
*/
|
||||
struct bio___o {
|
||||
struct gendisk *bi_disk;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct bio___x {
|
||||
struct block_device *bi_bdev;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline struct gendisk *get_gendisk(void *bio)
|
||||
{
|
||||
struct bio___x *b = bio;
|
||||
|
||||
if (bpf_core_field_exists(b->bi_bdev))
|
||||
return BPF_CORE_READ(b, bi_bdev, bd_disk);
|
||||
return BPF_CORE_READ((struct bio___o *)bio, bi_disk);
|
||||
}
|
||||
|
||||
/**
|
||||
* commit d5869fdc189f ("block: introduce block_rq_error tracepoint")
|
||||
* adds a new tracepoint block_rq_error and it shares the same arguments
|
||||
* with tracepoint block_rq_complete. As a result, the kernel BTF now has
|
||||
* a `struct trace_event_raw_block_rq_completion` instead of
|
||||
* `struct trace_event_raw_block_rq_complete`.
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/d5869fdc189f
|
||||
*/
|
||||
struct trace_event_raw_block_rq_complete___x {
|
||||
dev_t dev;
|
||||
sector_t sector;
|
||||
unsigned int nr_sector;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct trace_event_raw_block_rq_completion___x {
|
||||
dev_t dev;
|
||||
sector_t sector;
|
||||
unsigned int nr_sector;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline bool has_block_rq_completion()
|
||||
{
|
||||
if (bpf_core_type_exists(struct trace_event_raw_block_rq_completion___x))
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* commit d152c682f03c ("block: add an explicit ->disk backpointer to the
|
||||
* request_queue") and commit f3fa33acca9f ("block: remove the ->rq_disk
|
||||
* field in struct request") make some changes to `struct request` and
|
||||
* `struct request_queue`. Now, to get the `struct gendisk *` field in a CO-RE
|
||||
* way, we need both `struct request` and `struct request_queue`.
|
||||
* see:
|
||||
* https://github.com/torvalds/linux/commit/d152c682f03c
|
||||
* https://github.com/torvalds/linux/commit/f3fa33acca9f
|
||||
*/
|
||||
struct request_queue___x {
|
||||
struct gendisk *disk;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
struct request___x {
|
||||
struct request_queue___x *q;
|
||||
struct gendisk *rq_disk;
|
||||
} __attribute__((preserve_access_index));
|
||||
|
||||
static __always_inline struct gendisk *get_disk(void *request)
|
||||
{
|
||||
struct request___x *r = request;
|
||||
|
||||
if (bpf_core_field_exists(r->rq_disk))
|
||||
return BPF_CORE_READ(r, rq_disk);
|
||||
return BPF_CORE_READ(r, q, disk);
|
||||
}
|
||||
|
||||
#endif /* __CORE_FIXES_BPF_H */
|
||||
26
9-runqlat/maps.bpf.h
Normal file
26
9-runqlat/maps.bpf.h
Normal file
@@ -0,0 +1,26 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
// Copyright (c) 2020 Anton Protopopov
|
||||
#ifndef __MAPS_BPF_H
|
||||
#define __MAPS_BPF_H
|
||||
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <asm-generic/errno.h>
|
||||
|
||||
static __always_inline void *
|
||||
bpf_map_lookup_or_try_init(void *map, const void *key, const void *init)
|
||||
{
|
||||
void *val;
|
||||
long err;
|
||||
|
||||
val = bpf_map_lookup_elem(map, key);
|
||||
if (val)
|
||||
return val;
|
||||
|
||||
err = bpf_map_update_elem(map, key, init, BPF_NOEXIST);
|
||||
if (err && err != -EEXIST)
|
||||
return 0;
|
||||
|
||||
return bpf_map_lookup_elem(map, key);
|
||||
}
|
||||
|
||||
#endif /* __MAPS_BPF_H */
|
||||
152
9-runqlat/runqlat.bpf.c
Normal file
152
9-runqlat/runqlat.bpf.c
Normal file
@@ -0,0 +1,152 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
// Copyright (c) 2020 Wenbo Zhang
|
||||
#include <vmlinux.h>
|
||||
#include <bpf/bpf_helpers.h>
|
||||
#include <bpf/bpf_core_read.h>
|
||||
#include <bpf/bpf_tracing.h>
|
||||
#include "runqlat.h"
|
||||
#include "bits.bpf.h"
|
||||
#include "maps.bpf.h"
|
||||
#include "core_fixes.bpf.h"
|
||||
|
||||
#define MAX_ENTRIES 10240
|
||||
#define TASK_RUNNING 0
|
||||
|
||||
const volatile bool filter_cg = false;
|
||||
const volatile bool targ_per_process = false;
|
||||
const volatile bool targ_per_thread = false;
|
||||
const volatile bool targ_per_pidns = false;
|
||||
const volatile bool targ_ms = false;
|
||||
const volatile pid_t targ_tgid = 0;
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
|
||||
__type(key, u32);
|
||||
__type(value, u32);
|
||||
__uint(max_entries, 1);
|
||||
} cgroup_map SEC(".maps");
|
||||
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_ENTRIES);
|
||||
__type(key, u32);
|
||||
__type(value, u64);
|
||||
} start SEC(".maps");
|
||||
|
||||
static struct hist zero;
|
||||
|
||||
/// @sample {"interval": 1000, "type" : "log2_hist"}
|
||||
struct {
|
||||
__uint(type, BPF_MAP_TYPE_HASH);
|
||||
__uint(max_entries, MAX_ENTRIES);
|
||||
__type(key, u32);
|
||||
__type(value, struct hist);
|
||||
} hists SEC(".maps");
|
||||
|
||||
static int trace_enqueue(u32 tgid, u32 pid)
|
||||
{
|
||||
u64 ts;
|
||||
|
||||
if (!pid)
|
||||
return 0;
|
||||
if (targ_tgid && targ_tgid != tgid)
|
||||
return 0;
|
||||
|
||||
ts = bpf_ktime_get_ns();
|
||||
bpf_map_update_elem(&start, &pid, &ts, BPF_ANY);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static unsigned int pid_namespace(struct task_struct *task)
|
||||
{
|
||||
struct pid *pid;
|
||||
unsigned int level;
|
||||
struct upid upid;
|
||||
unsigned int inum;
|
||||
|
||||
/* get the pid namespace by following task_active_pid_ns(),
|
||||
* pid->numbers[pid->level].ns
|
||||
*/
|
||||
pid = BPF_CORE_READ(task, thread_pid);
|
||||
level = BPF_CORE_READ(pid, level);
|
||||
bpf_core_read(&upid, sizeof(upid), &pid->numbers[level]);
|
||||
inum = BPF_CORE_READ(upid.ns, ns.inum);
|
||||
|
||||
return inum;
|
||||
}
|
||||
|
||||
static int handle_switch(bool preempt, struct task_struct *prev, struct task_struct *next)
|
||||
{
|
||||
struct hist *histp;
|
||||
u64 *tsp, slot;
|
||||
u32 pid, hkey;
|
||||
s64 delta;
|
||||
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
if (get_task_state(prev) == TASK_RUNNING)
|
||||
trace_enqueue(BPF_CORE_READ(prev, tgid), BPF_CORE_READ(prev, pid));
|
||||
|
||||
pid = BPF_CORE_READ(next, pid);
|
||||
|
||||
tsp = bpf_map_lookup_elem(&start, &pid);
|
||||
if (!tsp)
|
||||
return 0;
|
||||
delta = bpf_ktime_get_ns() - *tsp;
|
||||
if (delta < 0)
|
||||
goto cleanup;
|
||||
|
||||
if (targ_per_process)
|
||||
hkey = BPF_CORE_READ(next, tgid);
|
||||
else if (targ_per_thread)
|
||||
hkey = pid;
|
||||
else if (targ_per_pidns)
|
||||
hkey = pid_namespace(next);
|
||||
else
|
||||
hkey = -1;
|
||||
histp = bpf_map_lookup_or_try_init(&hists, &hkey, &zero);
|
||||
if (!histp)
|
||||
goto cleanup;
|
||||
if (!histp->comm[0])
|
||||
bpf_probe_read_kernel_str(&histp->comm, sizeof(histp->comm),
|
||||
next->comm);
|
||||
if (targ_ms)
|
||||
delta /= 1000000U;
|
||||
else
|
||||
delta /= 1000U;
|
||||
slot = log2l(delta);
|
||||
if (slot >= MAX_SLOTS)
|
||||
slot = MAX_SLOTS - 1;
|
||||
__sync_fetch_and_add(&histp->slots[slot], 1);
|
||||
|
||||
cleanup:
|
||||
bpf_map_delete_elem(&start, &pid);
|
||||
return 0;
|
||||
}
|
||||
|
||||
SEC("raw_tp/sched_wakeup")
|
||||
int BPF_PROG(handle_sched_wakeup, struct task_struct *p)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid));
|
||||
}
|
||||
|
||||
SEC("raw_tp/sched_wakeup_new")
|
||||
int BPF_PROG(handle_sched_wakeup_new, struct task_struct *p)
|
||||
{
|
||||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||||
return 0;
|
||||
|
||||
return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid));
|
||||
}
|
||||
|
||||
SEC("raw_tp/sched_switch")
|
||||
int BPF_PROG(handle_sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next)
|
||||
{
|
||||
return handle_switch(preempt, prev, next);
|
||||
}
|
||||
|
||||
char LICENSE[] SEC("license") = "GPL";
|
||||
14
9-runqlat/runqlat.h
Normal file
14
9-runqlat/runqlat.h
Normal file
@@ -0,0 +1,14 @@
|
||||
|
||||
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||||
#ifndef __RUNQLAT_H
|
||||
#define __RUNQLAT_H
|
||||
|
||||
#define TASK_COMM_LEN 16
|
||||
#define MAX_SLOTS 26
|
||||
|
||||
struct hist {
|
||||
__u32 slots[MAX_SLOTS];
|
||||
char comm[TASK_COMM_LEN];
|
||||
};
|
||||
|
||||
#endif /* __RUNQLAT_H */
|
||||
Reference in New Issue
Block a user