diff --git a/0-introduce/introduce.md b/0-introduce/introduce.md new file mode 100644 index 0000000..9dbe637 --- /dev/null +++ b/0-introduce/introduce.md @@ -0,0 +1,161 @@ +# eBPF 入门开发实践指南一:介绍与快速上手 + + + +- [1. 什么是eBPF](#1-什么是ebpf) + - [1.1. 起源](#11-起源) + - [1.2. 执行逻辑](#12-执行逻辑) + - [1.3. 架构](#13-架构) + - [1.3.1. 寄存器设计](#131-寄存器设计) + - [1.3.2. 指令编码格式](#132-指令编码格式) + - [1.4. 本节参考文章](#14-本节参考文章) +- [2. 如何使用eBPF编程](#2-如何使用ebpf编程) + - [2.1. BCC](#21-bcc) + - [2.2. libbpf-bootstrap](#22-libbpf-bootstrap) + - [2.3 eunomia-bpf](#23-eunomia-bpf) + + + +## 1. 什么是eBPF + +Linux内核一直是实现监控/可观测性、网络和安全功能的理想地方, +但是直接在内核中进行监控并不是一个容易的事情。在传统的Linux软件开发中, +实现这些功能往往都离不开修改内核源码或加载内核模块。修改内核源码是一件非常危险的行为, +稍有不慎可能便会导致系统崩溃,并且每次检验修改的代码都需要重新编译内核,耗时耗力。 + +加载内核模块虽然来说更为灵活,不需要重新编译源码,但是也可能导致内核崩溃,且随着内核版本的变化 +模块也需要进行相应的修改,否则将无法使用。 + +在这一背景下,eBPF技术应运而生。它是一项革命性技术,能在内核中运行沙箱程序(sandbox programs),而无需修改内核源码或者加载内核模块。用户可以使用其提供的各种接口,实现在内核中追踪、监测系统的作用。 + +### 1.1. 起源 + +eBPF的雏形是BPF(Berkeley Packet Filter, 伯克利包过滤器)。BPF于 +1992年被Steven McCanne和Van Jacobson在其[论文](https://www.tcpdump.org/papers/bpf-usenix93.pdf) +提出。二人提出BPF的初衷是是提供一种新的数据包过滤方法,该方法的模型如下图所示。 +![](../imgs/original_bpf.png) + +相较于其他过滤方法,BPF有两大创新点,首先是它使用了一个新的虚拟机,可以有效地工作在基于寄存器结构的CPU之上。其次是其不会全盘复制数据包的所有信息,只会复制相关数据,可以有效地提高效率。这两大创新使得BPF在实际应用中得到了巨大的成功,在被移植到Linux系统后,其被上层的`libcap` +和`tcpdump`等应用使用,是一个性能卓越的工具。 + +传统的BPF是32位架构,其指令集编码格式为: + +- 16 bit: 操作指令 +- 8 bit: 下一条指令跳向正确目标的偏移量 +- 8 bit: 下一条指令跳往错误目标的偏移量 + +经过十余年的沉积后,2013年,Alexei Starovoitov对BPF进行了彻底地改造,改造后的BPF被命名为eBPF(extended BPF),于Linux Kernel 3.15中引入Linux内核源码。 +eBPF相较于BPF有了革命性的变化。首先在于eBPF支持了更多领域的应用,它不仅支持网络包的过滤,还可以通过 +`kprobe`,`tracepoint`,`lsm`等Linux现有的工具对响应事件进行追踪。另一方面,其在使用上也更为 +灵活,更为方便。同时,其JIT编译器也得到了升级,解释器也被替换,这直接使得其具有达到平台原生的 +执行性能的能力。 + +### 1.2. 执行逻辑 + +eBPF在执行逻辑上和BPF有相似之处,eBPF也可以认为是一个基于寄存器的,使用自定义的64位RISC指令集的 +微型"虚拟机"。它可以在Linux内核中,以一种安全可控的方式运行本机编译的eBPF程序并且访问内核函数和内存的子集。 + +在写好程序后,我们将代码使用llvm编译得到使用BPF指令集的ELF文件,解析出需要注入的部分后调用函数将其 +注入内核。用户态的程序和注入内核态中的字节码公用一个位于内核的eBPF Map进行通信,实现数据的传递。同时, +为了防止我们写入的程序本身不会对内核产生较大影响,编译好的字节码在注入内核之前会被eBPF校验器严格地检查。 + +eBPF程序是由事件驱动的,我们在程序中需要提前确定程序的执行点。编译好的程序被注入内核后,如果提前确定的执行点 +被调用,那么注入的程序就会被触发,按照既定方式处理。 + +### 1.3. 架构 + +#### 1.3.1. 寄存器设计 + +eBPF有11个寄存器,分别是R0~R10,每个寄存器均是64位大小,有相应的32位子寄存器,其指令集是固定的64位宽。 + +#### 1.3.2. 指令编码格式 + +eBPF指令编码格式为: + +- 8 bit: 存放真实指令码 +- 4 bit: 存放指令用到的目标寄存器号 +- 4 bit: 存放指令用到的源寄存器号 +- 16 bit: 存放偏移量,具体作用取决于指令类型 +- 32 bit: 存放立即数 + +### 1.4. 本节参考文章 + +[A thorough introduction to eBPF](https://lwn.net/Articles/740157/) +[bpf简介](https://www.collabora.com/news-and-blog/blog/2019/04/05/an-ebpf-overview-part-1-introduction/) +[bpf架构知识](https://www.collabora.com/news-and-blog/blog/2019/04/15/an-ebpf-overview-part-2-machine-and-bytecode/) + +## 2. 如何使用eBPF编程 + +原始的eBPF程序编写是非常繁琐和困难的。为了改变这一现状, +llvm于2015年推出了可以将由高级语言编写的代码编译为eBPF字节码的功能,同时,其将`bpf()` +等原始的系统调用进行了初步地封装,给出了`libbpf`库。这些库会包含将字节码加载到内核中 +的函数以及一些其他的关键函数。在Linux的源码包的`samples/bpf/`目录下,有大量Linux +提供的基于`libbpf`的eBPF样例代码。 + +一个典型的基于`libbpf`的eBPF程序具有`*_kern.c`和`*_user.c`两个文件, +`*_kern.c`中书写在内核中的挂载点以及处理函数,`*_user.c`中书写用户态代码, +完成内核态代码注入以及与用户交互的各种任务。 更为详细的教程可以参考[该视频](https://www.bilibili.com/video/BV1f54y1h74r?spm_id_from=333.999.0.0) +然而由于该方法仍然较难理解且入门存在一定的难度,因此现阶段的eBPF程序开发大多基于一些工具,比如: + +- BCC +- BPFtrace +- libbpf-bootstrap + +以及还有比较新的工具,例如 `eunomia-bpf` 将 CO-RE eBPF 功能作为服务运行,包含一个工具链和一个运行时,主要功能包括: + +- 不需要再为每个 eBPF 工具编写用户态代码框架:大多数情况下只需要编写内核态应用程序,即可实现正确加载运行 eBPF 程序;同时所需编写的内核态代码和 libbpf 完全兼容,可轻松实现迁移; +- 提供基于 async Rust 的 Prometheus 或 OpenTelemetry 自定义可观测性数据收集器,通常仅占用不到1%的资源开销,编写内核态代码和 yaml 配置文件即可实现 eBPF 信息可视化,编译后可在其他机器上通过 API 请求直接部署; + +### 2.1. BCC + +BCC全称为BPF Compiler Collection,该项目是一个python库, +包含了完整的编写、编译、和加载BPF程序的工具链,以及用于调试和诊断性能问题的工具。 + +自2015年发布以来,BCC经过上百位贡献者地不断完善后,目前已经包含了大量随时可用的跟踪工具。[其官方项目库](https://github.com/iovisor/bcc/blob/master/docs/tutorial.md) +提供了一个方便上手的教程,用户可以快速地根据教程完成BCC入门工作。 + +用户可以在BCC上使用Python、Lua等高级语言进行编程。 +相较于使用C语言直接编程,这些高级语言具有极大的便捷性,用户只需要使用C来设计内核中的 +BPF程序,其余包括编译、解析、加载等工作在内,均可由BCC完成。 + +然而使用BCC存在一个缺点便是在于其兼容性并不好。基于BCC的 +eBPF程序每次执行时候都需要进行编译,编译则需要用户配置相关的头文件和对应实现。在实际应用中, +相信大家也会有体会,编译依赖问题是一个很棘手的问题。也正是因此,在本项目的开发中我们放弃了BCC, +选择了可以做到一次编译-多次运行的libbpf-bootstrap工具。 + +### 2.2. libbpf-bootstrap + +`libbpf-bootstrap`是一个基于`libbpf`库的BPF开发脚手架,从其 +[github](https://github.com/libbpf/libbpf-bootstrap) 上可以得到其源码。 + +`libbpf-bootstrap`综合了BPF社区过去多年的实践,为开发者提了一个现代化的、便捷的工作流,实 +现了一次编译,重复使用的目的。 + +基于`libbpf-bootstrap`的BPF程序对于源文件有一定的命名规则, +用于生成内核态字节码的bpf文件以`.bpf.c`结尾,用户态加载字节码的文件以`.c`结尾,且这两个文件的 +前缀必须相同。 + +基于`libbpf-bootstrap`的BPF程序在编译时会先将`*.bpf.c`文件编译为 +对应的`.o`文件,然后根据此文件生成`skeleton`文件,即`*.skel.h`,这个文件会包含内核态中定义的一些 +数据结构,以及用于装载内核态代码的关键函数。在用户态代码`include`此文件之后调用对应的装载函数即可将 +字节码装载到内核中。同样的,`libbpf-bootstrap`也有非常完备的入门教程,用户可以在[该处](https://nakryiko.com/posts/libbpf-bootstrap/) +得到详细的入门操作介绍。 + +### 2.3 eunomia-bpf + +开发、构建和分发 eBPF 一直以来都是一个高门槛的工作,使用 BCC、bpftrace 等工具开发效率高、可移植性好,但是分发部署时需要安装 LLVM、Clang等编译环境,每次运行的时候执行本地或远程编译过程,资源消耗较大;使用原生的 CO-RE libbpf时又需要编写不少用户态加载代码来帮助 eBPF 程序正确加载和从内核中获取上报的信息,同时对于 eBPF 程序的分发、管理也没有很好地解决方案. + +[eunomia-bpf](https://github.com/eunomia-bpf/eunomia-bpf) 是一个开源的 eBPF 动态加载运行时和开发工具链,是为了简化 eBPF 程序的开发、构建、分发、运行而设计的,基于 libbpf 的 CO-RE 轻量级开发框架。 + +使用 eunomia-bpf ,可以: + +- 在编写 eBPF 程序或工具时只编写内核态代码,自动获取内核态导出信息; +- 使用 WASM 进行用户态交互程序的开发,在 WASM 虚拟机内部控制整个 eBPF 程序的加载和执行,以及处理相关数据; +- eunomia-bpf 可以将预编译的 eBPF 程序打包为通用的 JSON 或 WASM 模块,跨架构和内核版本进行分发,无需重新编译即可动态加载运行。 + +eunomia-bpf 由一个编译工具链和一个运行时库组成, 对比传统的 BCC、原生 libbpf 等框架,大幅简化了 eBPF 程序的开发流程,在大多数时候只需编写内核态代码,即可轻松构建、打包、发布完整的 eBPF 应用,同时内核态 eBPF 代码保证和主流的 libbpf, libbpfgo, libbpf-rs 等开发框架的 100% 兼容性。需要编写用户态代码的时候,也可以借助 Webassembly 实现通过多种语言进行用户态开发。和 bpftrace 等脚本工具相比, eunomia-bpf 保留了类似的便捷性, 同时不仅局限于 trace 方面, 可以用于更多的场景, 如网络、安全等等。 + +> - eunomia-bpf 项目 Github 地址: +> - gitee 镜像: + +## 参考资料 \ No newline at end of file diff --git a/1-helloworld/.gitignore b/1-helloworld/.gitignore new file mode 100644 index 0000000..7d5aebf --- /dev/null +++ b/1-helloworld/.gitignore @@ -0,0 +1,6 @@ +.vscode +package.json +*.o +*.skel.json +*.skel.yaml +package.yaml diff --git a/1-helloworld/README.md b/1-helloworld/README.md new file mode 100644 index 0000000..f8a947e --- /dev/null +++ b/1-helloworld/README.md @@ -0,0 +1,57 @@ +--- +layout: post +title: minimal +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, tracepoint, example, syscall] +summary: a minimal example of a BPF application installs a tracepoint handler which is triggered by write syscall +--- + + +`minimal` is just that – a minimal practical BPF application example. It +doesn't use or require BPF CO-RE, so should run on quite old kernels. It +installs a tracepoint handler which is triggered once every second. It uses +`bpf_printk()` BPF helper to communicate with the world. + + +```console +$ sudo ecli examples/bpftools/minimal/package.json +Runing eBPF program... +``` + +To see it's output, +read `/sys/kernel/debug/tracing/trace_pipe` file as a root: + +```shell +$ sudo cat /sys/kernel/debug/tracing/trace_pipe + <...>-3840345 [010] d... 3220701.101143: bpf_trace_printk: BPF triggered from PID 3840345. + <...>-3840345 [010] d... 3220702.101265: bpf_trace_printk: BPF triggered from PID 3840345. +``` + +`minimal` is great as a bare-bones experimental playground to quickly try out +new ideas or BPF features. + +## Compile and Run + + + +Compile: + +```console +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +or compile with `ecc`: + +```console +$ ecc minimal.bpf.c +Compiling bpf object... +Packing ebpf object and config into package.json... +``` + +Run: + +```console +sudo ecli ./package.json +``` \ No newline at end of file diff --git a/1-helloworld/minimal.bpf.c b/1-helloworld/minimal.bpf.c new file mode 100644 index 0000000..0c65717 --- /dev/null +++ b/1-helloworld/minimal.bpf.c @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#define BPF_NO_GLOBAL_DATA +#include +#include +#include + +typedef unsigned int u32; +typedef int pid_t; +const pid_t pid_filter = 0; + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; + +SEC("tp/syscalls/sys_enter_write") +int handle_tp(void *ctx) +{ + pid_t pid = bpf_get_current_pid_tgid() >> 32; + if (pid_filter && pid != pid_filter) + return 0; + bpf_printk("BPF triggered from PID %d.\n", pid); + return 0; +} diff --git a/10-lsm-connect/.gitignore b/10-lsm-connect/.gitignore new file mode 100644 index 0000000..7d5aebf --- /dev/null +++ b/10-lsm-connect/.gitignore @@ -0,0 +1,6 @@ +.vscode +package.json +*.o +*.skel.json +*.skel.yaml +package.yaml diff --git a/10-lsm-connect/README.md b/10-lsm-connect/README.md new file mode 100644 index 0000000..f40a75c --- /dev/null +++ b/10-lsm-connect/README.md @@ -0,0 +1,34 @@ +--- +layout: post +title: lsm-connect +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, examples, lsm, no-output] +summary: BPF LSM program (on socket_connect hook) that prevents any connection towards 1.1.1.1 to happen. Found in demo-cloud-native-ebpf-day +--- + + +## run + +```console +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +or compile with `ecc`: + +```console +$ ecc lsm-connect.bpf.c +Compiling bpf object... +Packing ebpf object and config into package.json... +``` + +Run: + +```console +sudo ecli examples/bpftools/lsm-connect/package.json +``` + +## reference + + \ No newline at end of file diff --git a/10-lsm-connect/lsm-connect.bpf.c b/10-lsm-connect/lsm-connect.bpf.c new file mode 100644 index 0000000..c731c93 --- /dev/null +++ b/10-lsm-connect/lsm-connect.bpf.c @@ -0,0 +1,41 @@ +#include "vmlinux.h" +#include +#include +#include + +char LICENSE[] SEC("license") = "GPL"; + +#define EPERM 1 +#define AF_INET 2 + +const __u32 blockme = 16843009; // 1.1.1.1 -> int + +SEC("lsm/socket_connect") +int BPF_PROG(restrict_connect, struct socket *sock, struct sockaddr *address, int addrlen, int ret) +{ + // Satisfying "cannot override a denial" rule + if (ret != 0) + { + return ret; + } + + // Only IPv4 in this example + if (address->sa_family != AF_INET) + { + return 0; + } + + // Cast the address to an IPv4 socket address + struct sockaddr_in *addr = (struct sockaddr_in *)address; + + // Where do you want to go? + __u32 dest = addr->sin_addr.s_addr; + bpf_printk("lsm: found connect to %d", dest); + + if (dest == blockme) + { + bpf_printk("lsm: blocking %d", dest); + return -EPERM; + } + return 0; +} diff --git a/11-tc/.gitignore b/11-tc/.gitignore new file mode 100755 index 0000000..bbee7c8 --- /dev/null +++ b/11-tc/.gitignore @@ -0,0 +1,10 @@ +.vscode +package.json +*.wasm +ewasm-skel.h +ecli +ewasm +*.o +*.skel.json +*.skel.yaml +package.yaml diff --git a/11-tc/README.md b/11-tc/README.md new file mode 100644 index 0000000..380fc7c --- /dev/null +++ b/11-tc/README.md @@ -0,0 +1,56 @@ +--- +layout: post +title: tc +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, tc, example] +summary: a minimal example of a BPF application use tc +--- + + +`tc` (short for Traffic Control) is an example of handling ingress network traffics. +It creates a qdisc on the `lo` interface and attaches the `tc_ingress` BPF program to it. +It reports the metadata of the IP packets that coming into the `lo` interface. + +```shell +$ sudo ecli ./package.json +... +Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` to see output of the BPF program. +...... +``` + +The `tc` output in `/sys/kernel/debug/tracing/trace_pipe` should look +something like this: + +``` +$ sudo cat /sys/kernel/debug/tracing/trace_pipe + node-1254811 [007] ..s1 8737831.671074: 0: Got IP packet: tot_len: 79, ttl: 64 + sshd-1254728 [006] ..s1 8737831.674334: 0: Got IP packet: tot_len: 79, ttl: 64 + sshd-1254728 [006] ..s1 8737831.674349: 0: Got IP packet: tot_len: 72, ttl: 64 + node-1254811 [007] ..s1 8737831.674550: 0: Got IP packet: tot_len: 71, ttl: 64 +``` + +## Compile and Run + + + +Compile: + +```console +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +or compile with `ecc`: + +```console +$ ecc tc.bpf.c +Compiling bpf object... +Packing ebpf object and config into package.json... +``` + +Run: + +```console +sudo ecli ./package.json +``` \ No newline at end of file diff --git a/11-tc/tc.bpf.c b/11-tc/tc.bpf.c new file mode 100644 index 0000000..4b82864 --- /dev/null +++ b/11-tc/tc.bpf.c @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +/* Copyright (c) 2022 Hengqi Chen */ +#include +#include +#include +#include + +#define TC_ACT_OK 0 +#define ETH_P_IP 0x0800 /* Internet Protocol packet */ + +/// @tchook {"ifindex":1, "attach_point":"BPF_TC_INGRESS"} +/// @tcopts {"handle":1, "priority":1} +SEC("tc") +int tc_ingress(struct __sk_buff *ctx) +{ + void *data_end = (void *)(__u64)ctx->data_end; + void *data = (void *)(__u64)ctx->data; + struct ethhdr *l2; + struct iphdr *l3; + + if (ctx->protocol != bpf_htons(ETH_P_IP)) + return TC_ACT_OK; + + l2 = data; + if ((void *)(l2 + 1) > data_end) + return TC_ACT_OK; + + l3 = (struct iphdr *)(l2 + 1); + if ((void *)(l3 + 1) > data_end) + return TC_ACT_OK; + + bpf_printk("Got IP packet: tot_len: %d, ttl: %d", bpf_ntohs(l3->tot_len), l3->ttl); + return TC_ACT_OK; +} + +char __license[] SEC("license") = "GPL"; diff --git a/12-bindsnoop/.gitignore b/12-bindsnoop/.gitignore new file mode 100644 index 0000000..a1027ce --- /dev/null +++ b/12-bindsnoop/.gitignore @@ -0,0 +1,3 @@ +.vscode +package.json +ecli diff --git a/12-bindsnoop/README.md b/12-bindsnoop/README.md new file mode 100644 index 0000000..35f167a --- /dev/null +++ b/12-bindsnoop/README.md @@ -0,0 +1,106 @@ +--- +layout: post +title: bindsnoop +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, syscall, kprobe, perf-event] +summary: This tool traces the kernel function performing socket binding and print socket options set before the system call. +--- + +## origin + +origin from: + +https://github.com/iovisor/bcc/blob/master/libbpf-tools/bindsnoop.bpf.c + +## Compile and Run + +Compile: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +Run: + +```shell +sudo ./ecli run examples/bpftools/bindsnoop/package.json +``` + +## details in bcc + +Demonstrations of bindsnoop, the Linux eBPF/bcc version. + +This tool traces the kernel function performing socket binding and +print socket options set before the system call invocation that might +```console + impact bind behavior and bound interface: + SOL_IP IP_FREEBIND F.... + SOL_IP IP_TRANSPARENT .T... + SOL_IP IP_BIND_ADDRESS_NO_PORT ..N.. + SOL_SOCKET SO_REUSEADDR ...R. + SOL_SOCKET SO_REUSEPORT ....r +``` +```console +# ./bindsnoop.py +Tracing binds ... Hit Ctrl-C to end +PID COMM PROT ADDR PORT OPTS IF +3941081 test_bind_op TCP 192.168.1.102 0 F.N.. 0 +3940194 dig TCP :: 62087 ..... 0 +3940219 dig UDP :: 48665 ..... 0 +3940893 Acceptor Thr TCP :: 35343 ...R. 0 +``` +The output shows four bind system calls: +two "test_bind_op" instances, one with IP_FREEBIND and IP_BIND_ADDRESS_NO_PORT +options, dig process called bind for TCP and UDP sockets, +and Acceptor called bind for TCP with SO_REUSEADDR option set. + + +The -t option prints a timestamp column +```console +# ./bindsnoop.py -t +TIME(s) PID COMM PROT ADDR PORT OPTS IF +0.000000 3956801 dig TCP :: 49611 ..... 0 +0.011045 3956822 dig UDP :: 56343 ..... 0 +2.310629 3956498 test_bind_op TCP 192.168.1.102 39609 F...r 0 +``` + +The -U option prints a UID column: +```console +# ./bindsnoop.py -U +Tracing binds ... Hit Ctrl-C to end + UID PID COMM PROT ADDR PORT OPTS IF +127072 3956498 test_bind_op TCP 192.168.1.102 44491 F...r 0 +127072 3960261 Acceptor Thr TCP :: 48869 ...R. 0 + 0 3960729 Acceptor Thr TCP :: 44637 ...R. 0 + 0 3959075 chef-client UDP :: 61722 ..... 0 +``` + +The -u option filtering UID: +```console +# ./bindsnoop.py -Uu 0 +Tracing binds ... Hit Ctrl-C to end + UID PID COMM PROT ADDR PORT OPTS IF + 0 3966330 Acceptor Thr TCP :: 39319 ...R. 0 + 0 3968044 python3.7 TCP ::1 59371 ..... 0 + 0 10224 fetch TCP 0.0.0.0 42091 ...R. 0 +``` + +The --cgroupmap option filters based on a cgroup set. +It is meant to be used with an externally created map. +```console +# ./bindsnoop.py --cgroupmap /sys/fs/bpf/test01 +``` +For more details, see docs/special_filtering.md + + +In order to track heavy bind usage one can use --count option +```console +# ./bindsnoop.py --count +Tracing binds ... Hit Ctrl-C to end +LADDR LPORT BINDS +0.0.0.0 6771 4 +0.0.0.0 4433 4 +127.0.0.1 33665 1 +``` \ No newline at end of file diff --git a/12-bindsnoop/bindsnoop.bpf.c b/12-bindsnoop/bindsnoop.bpf.c new file mode 100644 index 0000000..dc99ba4 --- /dev/null +++ b/12-bindsnoop/bindsnoop.bpf.c @@ -0,0 +1,151 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +/* Copyright (c) 2021 Hengqi Chen */ +#include +#include +#include +#include +#include +#include "bindsnoop.bpf.h" + +#define MAX_ENTRIES 10240 +#define MAX_PORTS 1024 + +const volatile bool filter_cg = false; +const volatile pid_t target_pid = 0; +const volatile bool ignore_errors = true; +const volatile bool filter_by_port = false; + +struct { + __uint(type, BPF_MAP_TYPE_CGROUP_ARRAY); + __type(key, u32); + __type(value, u32); + __uint(max_entries, 1); +} cgroup_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_ENTRIES); + __type(key, __u32); + __type(value, struct socket *); +} sockets SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_PORTS); + __type(key, __u16); + __type(value, __u16); +} ports SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); +} events SEC(".maps"); + +static int probe_entry(struct pt_regs *ctx, struct socket *socket) +{ + __u64 pid_tgid = bpf_get_current_pid_tgid(); + __u32 pid = pid_tgid >> 32; + __u32 tid = (__u32)pid_tgid; + + if (target_pid && target_pid != pid) + return 0; + + bpf_map_update_elem(&sockets, &tid, &socket, BPF_ANY); + return 0; +}; + +static int probe_exit(struct pt_regs *ctx, short ver) +{ + __u64 pid_tgid = bpf_get_current_pid_tgid(); + __u32 pid = pid_tgid >> 32; + __u32 tid = (__u32)pid_tgid; + struct socket **socketp, *socket; + struct inet_sock *inet_sock; + struct sock *sock; + union bind_options opts; + struct bind_event event = {}; + __u16 sport = 0, *port; + int ret; + + socketp = bpf_map_lookup_elem(&sockets, &tid); + if (!socketp) + return 0; + + ret = PT_REGS_RC(ctx); + if (ignore_errors && ret != 0) + goto cleanup; + + socket = *socketp; + sock = BPF_CORE_READ(socket, sk); + inet_sock = (struct inet_sock *)sock; + + sport = bpf_ntohs(BPF_CORE_READ(inet_sock, inet_sport)); + port = bpf_map_lookup_elem(&ports, &sport); + if (filter_by_port && !port) + goto cleanup; + + opts.fields.freebind = BPF_CORE_READ_BITFIELD_PROBED(inet_sock, freebind); + opts.fields.transparent = BPF_CORE_READ_BITFIELD_PROBED(inet_sock, transparent); + opts.fields.bind_address_no_port = BPF_CORE_READ_BITFIELD_PROBED(inet_sock, bind_address_no_port); + opts.fields.reuseaddress = BPF_CORE_READ_BITFIELD_PROBED(sock, __sk_common.skc_reuse); + opts.fields.reuseport = BPF_CORE_READ_BITFIELD_PROBED(sock, __sk_common.skc_reuseport); + event.opts = opts.data; + event.ts_us = bpf_ktime_get_ns() / 1000; + event.pid = pid; + event.port = sport; + event.bound_dev_if = BPF_CORE_READ(sock, __sk_common.skc_bound_dev_if); + event.ret = ret; + event.proto = BPF_CORE_READ_BITFIELD_PROBED(sock, sk_protocol); + bpf_get_current_comm(&event.task, sizeof(event.task)); + if (ver == 4) { + event.ver = ver; + bpf_probe_read_kernel(&event.addr, sizeof(event.addr), &inet_sock->inet_saddr); + } else { /* ver == 6 */ + event.ver = ver; + bpf_probe_read_kernel(&event.addr, sizeof(event.addr), sock->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32); + } + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event)); + +cleanup: + bpf_map_delete_elem(&sockets, &tid); + return 0; +} + +SEC("kprobe/inet_bind") +int BPF_KPROBE(ipv4_bind_entry, struct socket *socket) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_entry(ctx, socket); +} + +SEC("kretprobe/inet_bind") +int BPF_KRETPROBE(ipv4_bind_exit) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_exit(ctx, 4); +} + +SEC("kprobe/inet6_bind") +int BPF_KPROBE(ipv6_bind_entry, struct socket *socket) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_entry(ctx, socket); +} + +SEC("kretprobe/inet6_bind") +int BPF_KRETPROBE(ipv6_bind_exit) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_exit(ctx, 6); +} + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; \ No newline at end of file diff --git a/12-bindsnoop/bindsnoop.bpf.h b/12-bindsnoop/bindsnoop.bpf.h new file mode 100644 index 0000000..9643c86 --- /dev/null +++ b/12-bindsnoop/bindsnoop.bpf.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __BINDSNOOP_H +#define __BINDSNOOP_H + +#define TASK_COMM_LEN 16 + +struct bind_event { + unsigned __int128 addr; + unsigned long long ts_us; + unsigned int pid; + unsigned int bound_dev_if; + int ret; + unsigned short port; + unsigned short proto; + unsigned char opts; + unsigned char ver; + char task[TASK_COMM_LEN]; +}; + +union bind_options { + unsigned char data; + struct { + unsigned char freebind : 1; + unsigned char transparent : 1; + unsigned char bind_address_no_port : 1; + unsigned char reuseaddress : 1; + unsigned char reuseport : 1; + } fields; +}; + +#endif /* __BINDSNOOP_H */ diff --git a/12-bindsnoop/bindsnoop.md b/12-bindsnoop/bindsnoop.md new file mode 100644 index 0000000..d98809a --- /dev/null +++ b/12-bindsnoop/bindsnoop.md @@ -0,0 +1,95 @@ +## eBPF 入门实践教程:编写 eBPF 程序 Bindsnoopn 监控 socket 端口绑定事件 + +### 背景 + +Bindsnoop 会跟踪操作 socket 端口绑定的内核函数,并且在可能会影响端口绑定的系统调用发生之前,打印 +现有的 socket 选项。 + +### 实现原理 + +Bindsnoop 通过kprobe实现。其主要挂载点为 inet_bind 和 inet6_bind。inet_bind 为处理 IPV4 类型 +socket 端口绑定系统调用的接口,inet6_bind 为处理IPV6类型 socket 端口绑定系统调用的接口。 + +```c +SEC("kprobe/inet_bind") +int BPF_KPROBE(ipv4_bind_entry, struct socket *socket) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_entry(ctx, socket); +} +SEC("kretprobe/inet_bind") + +int BPF_KRETPROBE(ipv4_bind_exit) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_exit(ctx, 4); +} + +SEC("kprobe/inet6_bind") +int BPF_KPROBE(ipv6_bind_entry, struct socket *socket) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_entry(ctx, socket); +} + +SEC("kretprobe/inet6_bind") +int BPF_KRETPROBE(ipv6_bind_exit) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return probe_exit(ctx, 6); +} +``` + +当系统试图进行socket端口绑定操作时, kprobe挂载的处理函数会被触发。在进入绑定函数时,`probe_entry`会先被 +调用,它会以 tid 为主键将 socket 信息存入 map 中。 + +```c +static int probe_entry(struct pt_regs *ctx, struct socket *socket) +{ + __u64 pid_tgid = bpf_get_current_pid_tgid(); + __u32 pid = pid_tgid >> 32; + __u32 tid = (__u32)pid_tgid; + + if (target_pid && target_pid != pid) + return 0; + + bpf_map_update_elem(&sockets, &tid, &socket, BPF_ANY); + return 0; +}; +``` +在执行完绑定函数后,`probe_exit`函数会被调用。该函数会读取tid对应的socket信息,将其和其他信息一起 +写入 event 结构体并输出到用户态。 + +```c +struct bind_event { + unsigned __int128 addr; + __u64 ts_us; + __u32 pid; + __u32 bound_dev_if; + int ret; + __u16 port; + __u16 proto; + __u8 opts; + __u8 ver; + char task[TASK_COMM_LEN]; +}; +``` + +当用户停止该工具时,其用户态代码会读取存入的数据并按要求打印。 + +### Eunomia中使用方式 + +![result](../imgs/mountsnoop.jpg) +![result](../imgs/bindsnoop-prometheus.png) + +### 总结 + +Bindsnoop 通过 kprobe 挂载点,实现了对 socket 端口的监视,增强了 Eunomia 的应用范围。 \ No newline at end of file diff --git a/13-tcpconnlat/.gitignore b/13-tcpconnlat/.gitignore new file mode 100644 index 0000000..3e91eef --- /dev/null +++ b/13-tcpconnlat/.gitignore @@ -0,0 +1,2 @@ +.vscode +package.json diff --git a/13-tcpconnlat/README.md b/13-tcpconnlat/README.md new file mode 100644 index 0000000..a7d8589 --- /dev/null +++ b/13-tcpconnlat/README.md @@ -0,0 +1,137 @@ +--- +layout: post +title: tcpconnlat +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, syscall, network] +summary: Traces the kernel function performing active TCP connections(eg, via a connect() syscall; accept() are passive connections). and show connection latency. +--- + +## origin + +origin from: + +https://github.com/iovisor/bcc/blob/master/libbpf-tools/tcpconnlat.bpf.c + +## Compile and Run + +Compile: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +Run: + +```shell +sudo ./ecli run package.json +``` + +TODO: support union in C + +## details in bcc + +Demonstrations of tcpconnect, the Linux eBPF/bcc version. + + +This tool traces the kernel function performing active TCP connections +(eg, via a connect() syscall; accept() are passive connections). Some example +output (IP addresses changed to protect the innocent): +```console +# ./tcpconnect +PID COMM IP SADDR DADDR DPORT +1479 telnet 4 127.0.0.1 127.0.0.1 23 +1469 curl 4 10.201.219.236 54.245.105.25 80 +1469 curl 4 10.201.219.236 54.67.101.145 80 +1991 telnet 6 ::1 ::1 23 +2015 ssh 6 fe80::2000:bff:fe82:3ac fe80::2000:bff:fe82:3ac 22 +``` +This output shows four connections, one from a "telnet" process, two from +"curl", and one from "ssh". The output details shows the IP version, source +address, destination address, and destination port. This traces attempted +connections: these may have failed. + +The overhead of this tool should be negligible, since it is only tracing the +kernel functions performing connect. It is not tracing every packet and then +filtering. + + +The -t option prints a timestamp column: +```console +# ./tcpconnect -t +TIME(s) PID COMM IP SADDR DADDR DPORT +31.871 2482 local_agent 4 10.103.219.236 10.251.148.38 7001 +31.874 2482 local_agent 4 10.103.219.236 10.101.3.132 7001 +31.878 2482 local_agent 4 10.103.219.236 10.171.133.98 7101 +90.917 2482 local_agent 4 10.103.219.236 10.251.148.38 7001 +90.928 2482 local_agent 4 10.103.219.236 10.102.64.230 7001 +90.938 2482 local_agent 4 10.103.219.236 10.115.167.169 7101 +``` +The output shows some periodic connections (or attempts) from a "local_agent" +process to various other addresses. A few connections occur every minute. + +The -d option tracks DNS responses and tries to associate each connection with +the a previous DNS query issued before it. If a DNS response matching the IP +is found, it will be printed. If no match was found, "No DNS Query" is printed +in this column. Queries for 127.0.0.1 and ::1 are automatically associated with +"localhost". If the time between when the DNS response was received and a +connect call was traced exceeds 100ms, the tool will print the time delta +after the query name. See below for www.domain.com for an example. +```console +# ./tcpconnect -d +PID COMM IP SADDR DADDR DPORT QUERY +1543 amazon-ssm-a 4 10.66.75.54 176.32.119.67 443 ec2messages.us-west-1.amazonaws.com +1479 telnet 4 127.0.0.1 127.0.0.1 23 localhost +1469 curl 4 10.201.219.236 54.245.105.25 80 www.domain.com (123.342ms) +1469 curl 4 10.201.219.236 54.67.101.145 80 No DNS Query +1991 telnet 6 ::1 ::1 23 localhost +2015 ssh 6 fe80::2000:bff:fe82:3ac fe80::2000:bff:fe82:3ac 22 anotherhost.org +``` + +The -L option prints a LPORT column: +```console +# ./tcpconnect -L +PID COMM IP SADDR LPORT DADDR DPORT +3706 nc 4 192.168.122.205 57266 192.168.122.150 5000 +3722 ssh 4 192.168.122.205 50966 192.168.122.150 22 +3779 ssh 6 fe80::1 52328 fe80::2 22 +``` + +The -U option prints a UID column: +```console +# ./tcpconnect -U +UID PID COMM IP SADDR DADDR DPORT +0 31333 telnet 6 ::1 ::1 23 +0 31333 telnet 4 127.0.0.1 127.0.0.1 23 +1000 31322 curl 4 127.0.0.1 127.0.0.1 80 +1000 31322 curl 6 ::1 ::1 80 +``` + +The -u option filtering UID: +```console +# ./tcpconnect -Uu 1000 +UID PID COMM IP SADDR DADDR DPORT +1000 31338 telnet 6 ::1 ::1 23 +1000 31338 telnet 4 127.0.0.1 127.0.0.1 23 +``` +To spot heavy outbound connections quickly one can use the -c flag. It will +count all active connections per source ip and destination ip/port. +```console +# ./tcpconnect.py -c +Tracing connect ... Hit Ctrl-C to end +^C +LADDR RADDR RPORT CONNECTS +192.168.10.50 172.217.21.194 443 70 +192.168.10.50 172.213.11.195 443 34 +192.168.10.50 172.212.22.194 443 21 +[...] +``` + +The --cgroupmap option filters based on a cgroup set. It is meant to be used +with an externally created map. +```console +# ./tcpconnect --cgroupmap /sys/fs/bpf/test01 +``` +For more details, see docs/special_filtering.md + diff --git a/13-tcpconnlat/tcpconnlat.bpf.c b/13-tcpconnlat/tcpconnlat.bpf.c new file mode 100644 index 0000000..544701b --- /dev/null +++ b/13-tcpconnlat/tcpconnlat.bpf.c @@ -0,0 +1,113 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Wenbo Zhang +#include +#include +#include +#include +#include "tcpconnlat.bpf.h" + +#define AF_INET 2 +#define AF_INET6 10 + +const volatile __u64 targ_min_us = 0; +const volatile pid_t targ_tgid = 0; + +struct piddata { + char comm[TASK_COMM_LEN]; + u64 ts; + u32 tgid; +}; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 4096); + __type(key, struct sock *); + __type(value, struct piddata); +} start SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(u32)); + __uint(value_size, sizeof(u32)); +} events SEC(".maps"); + +static int trace_connect(struct sock *sk) +{ + u32 tgid = bpf_get_current_pid_tgid() >> 32; + struct piddata piddata = {}; + + if (targ_tgid && targ_tgid != tgid) + return 0; + + bpf_get_current_comm(&piddata.comm, sizeof(piddata.comm)); + piddata.ts = bpf_ktime_get_ns(); + piddata.tgid = tgid; + bpf_map_update_elem(&start, &sk, &piddata, 0); + return 0; +} + +static int handle_tcp_rcv_state_process(void *ctx, struct sock *sk) +{ + struct piddata *piddatap; + struct event event = {}; + s64 delta; + u64 ts; + + if (BPF_CORE_READ(sk, __sk_common.skc_state) != TCP_SYN_SENT) + return 0; + + piddatap = bpf_map_lookup_elem(&start, &sk); + if (!piddatap) + return 0; + + ts = bpf_ktime_get_ns(); + delta = (s64)(ts - piddatap->ts); + if (delta < 0) + goto cleanup; + + event.delta_us = delta / 1000U; + if (targ_min_us && event.delta_us < targ_min_us) + goto cleanup; + __builtin_memcpy(&event.comm, piddatap->comm, + sizeof(event.comm)); + event.ts_us = ts / 1000; + event.tgid = piddatap->tgid; + event.lport = BPF_CORE_READ(sk, __sk_common.skc_num); + event.dport = BPF_CORE_READ(sk, __sk_common.skc_dport); + event.af = BPF_CORE_READ(sk, __sk_common.skc_family); + if (event.af == AF_INET) { + event.saddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr); + event.daddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_daddr); + } else { + BPF_CORE_READ_INTO(&event.saddr_v6, sk, + __sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32); + BPF_CORE_READ_INTO(&event.daddr_v6, sk, + __sk_common.skc_v6_daddr.in6_u.u6_addr32); + } + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, + &event, sizeof(event)); + +cleanup: + bpf_map_delete_elem(&start, &sk); + return 0; +} + +SEC("kprobe/tcp_v4_connect") +int BPF_KPROBE(tcp_v4_connect, struct sock *sk) +{ + return trace_connect(sk); +} + +SEC("kprobe/tcp_v6_connect") +int BPF_KPROBE(tcp_v6_connect, struct sock *sk) +{ + return trace_connect(sk); +} + +SEC("kprobe/tcp_rcv_state_process") +int BPF_KPROBE(tcp_rcv_state_process, struct sock *sk) +{ + return handle_tcp_rcv_state_process(ctx, sk); +} + +char LICENSE[] SEC("license") = "GPL"; diff --git a/13-tcpconnlat/tcpconnlat.bpf.h b/13-tcpconnlat/tcpconnlat.bpf.h new file mode 100644 index 0000000..d6cd930 --- /dev/null +++ b/13-tcpconnlat/tcpconnlat.bpf.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __TCPCONNLAT_H +#define __TCPCONNLAT_H + +#define TASK_COMM_LEN 16 + +struct event { + // union { + unsigned int saddr_v4; + unsigned char saddr_v6[16]; + // }; + // union { + unsigned int daddr_v4; + unsigned char daddr_v6[16]; + // }; + char comm[TASK_COMM_LEN]; + unsigned long long delta_us; + unsigned long long ts_us; + unsigned int tgid; + int af; + unsigned short lport; + unsigned short dport; +}; + + +#endif /* __TCPCONNLAT_H_ */ diff --git a/13-tcpconnlat/tcpconnlat.md b/13-tcpconnlat/tcpconnlat.md new file mode 100644 index 0000000..9f19bf1 --- /dev/null +++ b/13-tcpconnlat/tcpconnlat.md @@ -0,0 +1,186 @@ +## eBPF 入门实践教程:编写 eBPF 程序 tcpconnlat 测量 tcp 连接延时 + +### 背景 + +在互联网后端日常开发接口的时候中,不管你使用的是C、Java、PHP还是Golang,都避免不了需要调用mysql、redis等组件来获取数据,可能还需要执行一些rpc远程调用,或者再调用一些其它restful api。 在这些调用的底层,基本都是在使用TCP协议进行传输。这是因为在传输层协议中,TCP协议具备可靠的连接,错误重传,拥塞控制等优点,所以目前应用比UDP更广泛一些。但相对而言,tcp 连接也有一些缺点,例如建立连接的延时较长等。因此也会出现像 QUIC ,即 快速UDP网络连接 ( Quick UDP Internet Connections )这样的替代方案。 + +tcp 连接延时分析对于网络性能分析优化或者故障排查都能起到不少作用。 + +### tcpconnlat 的实现原理 + +tcpconnlat 这个工具跟踪执行活动TCP连接的内核函数 (例如,通过connect()系统调用),并显示本地测量的连接的延迟(时间),即从发送 SYN 到响应包的时间。 + +### tcp 连接原理 + +tcp 连接的整个过程如图所示: + +![tcpconnlate](tcpconnlat1.png) + +在这个连接过程中,我们来简单分析一下每一步的耗时: + +1. 客户端发出SYNC包:客户端一般是通过connect系统调用来发出 SYN 的,这里牵涉到本机的系统调用和软中断的 CPU 耗时开销 +2. SYN传到服务器:SYN从客户端网卡被发出,这是一次长途远距离的网络传输 +3. 服务器处理SYN包:内核通过软中断来收包,然后放到半连接队列中,然后再发出SYN/ACK响应。主要是 CPU 耗时开销 +4. SYC/ACK传到客户端:长途网络跋涉 +5. 客户端处理 SYN/ACK:客户端内核收包并处理SYN后,经过几us的CPU处理,接着发出 ACK。同样是软中断处理开销 +6. ACK传到服务器:长途网络跋涉 +7. 服务端收到ACK:服务器端内核收到并处理ACK,然后把对应的连接从半连接队列中取出来,然后放到全连接队列中。一次软中断CPU开销 +8. 服务器端用户进程唤醒:正在被accpet系统调用阻塞的用户进程被唤醒,然后从全连接队列中取出来已经建立好的连接。一次上下文切换的CPU开销 + +在客户端视角,在正常情况下一次TCP连接总的耗时也就就大约是一次网络RTT的耗时。但在某些情况下,可能会导致连接时的网络传输耗时上涨、CPU处理开销增加、甚至是连接失败。这种时候在发现延时过长之后,就可以结合其他信息进行分析。 + +### ebpf 实现原理 + +在 TCP 三次握手的时候,Linux 内核会维护两个队列,分别是: + +- 半连接队列,也称 SYN 队列; +- 全连接队列,也称 accepet 队列; + + +服务端收到客户端发起的 SYN 请求后,内核会把该连接存储到半连接队列,并向客户端响应 SYN+ACK,接着客户端会返回 ACK,服务端收到第三次握手的 ACK 后,内核会把连接从半连接队列移除,然后创建新的完全的连接,并将其添加到 accept 队列,等待进程调用 accept 函数时把连接取出来。 + +我们的 ebpf 代码实现在 https://github.com/yunwei37/Eunomia/blob/master/bpftools/tcpconnlat/tcpconnlat.bpf.c 中: + +它主要使用了 trace_tcp_rcv_state_process 和 kprobe/tcp_v4_connect 这样的跟踪点: + +```c + +SEC("kprobe/tcp_v4_connect") +int BPF_KPROBE(tcp_v4_connect, struct sock *sk) +{ + return trace_connect(sk); +} + +SEC("kprobe/tcp_v6_connect") +int BPF_KPROBE(tcp_v6_connect, struct sock *sk) +{ + return trace_connect(sk); +} + +SEC("kprobe/tcp_rcv_state_process") +int BPF_KPROBE(tcp_rcv_state_process, struct sock *sk) +{ + return handle_tcp_rcv_state_process(ctx, sk); +} +``` + +在 trace_connect 中,我们跟踪新的 tcp 连接,记录到达时间,并且把它加入 map 中: + +```c +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 4096); + __type(key, struct sock *); + __type(value, struct piddata); +} start SEC(".maps"); + +static int trace_connect(struct sock *sk) +{ + u32 tgid = bpf_get_current_pid_tgid() >> 32; + struct piddata piddata = {}; + + if (targ_tgid && targ_tgid != tgid) + return 0; + + bpf_get_current_comm(&piddata.comm, sizeof(piddata.comm)); + piddata.ts = bpf_ktime_get_ns(); + piddata.tgid = tgid; + bpf_map_update_elem(&start, &sk, &piddata, 0); + return 0; +} +``` + +在 handle_tcp_rcv_state_process 中,我们跟踪接收到的 tcp 数据包,从 map 从提取出对应的 connect 事件,并且计算延迟: + +```c +static int handle_tcp_rcv_state_process(void *ctx, struct sock *sk) +{ + struct piddata *piddatap; + struct event event = {}; + s64 delta; + u64 ts; + + if (BPF_CORE_READ(sk, __sk_common.skc_state) != TCP_SYN_SENT) + return 0; + + piddatap = bpf_map_lookup_elem(&start, &sk); + if (!piddatap) + return 0; + + ts = bpf_ktime_get_ns(); + delta = (s64)(ts - piddatap->ts); + if (delta < 0) + goto cleanup; + + event.delta_us = delta / 1000U; + if (targ_min_us && event.delta_us < targ_min_us) + goto cleanup; + __builtin_memcpy(&event.comm, piddatap->comm, + sizeof(event.comm)); + event.ts_us = ts / 1000; + event.tgid = piddatap->tgid; + event.lport = BPF_CORE_READ(sk, __sk_common.skc_num); + event.dport = BPF_CORE_READ(sk, __sk_common.skc_dport); + event.af = BPF_CORE_READ(sk, __sk_common.skc_family); + if (event.af == AF_INET) { + event.saddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_rcv_saddr); + event.daddr_v4 = BPF_CORE_READ(sk, __sk_common.skc_daddr); + } else { + BPF_CORE_READ_INTO(&event.saddr_v6, sk, + __sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32); + BPF_CORE_READ_INTO(&event.daddr_v6, sk, + __sk_common.skc_v6_daddr.in6_u.u6_addr32); + } + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, + &event, sizeof(event)); + +cleanup: + bpf_map_delete_elem(&start, &sk); + return 0; +} +``` + +### Eunomia 测试 demo + +使用命令行进行追踪: + +```bash +$ sudo build/bin/Release/eunomia run tcpconnlat +[sudo] password for yunwei: +[2022-08-07 02:13:39.601] [info] eunomia run in cmd... +[2022-08-07 02:13:40.534] [info] press 'Ctrl C' key to exit... +PID COMM IP SRC DEST PORT LAT(ms) CONATINER/OS +3477 openresty 4 172.19.0.7 172.19.0.5 2379 0.05 docker-apisix_apisix_1 +3483 openresty 4 172.19.0.7 172.19.0.5 2379 0.08 docker-apisix_apisix_1 +3477 openresty 4 172.19.0.7 172.19.0.5 2379 0.04 docker-apisix_apisix_1 +3478 openresty 4 172.19.0.7 172.19.0.5 2379 0.05 docker-apisix_apisix_1 +3478 openresty 4 172.19.0.7 172.19.0.5 2379 0.03 docker-apisix_apisix_1 +3478 openresty 4 172.19.0.7 172.19.0.5 2379 0.03 docker-apisix_apisix_1 +``` + +还可以使用 eunomia 作为 prometheus exporter,在运行上述命令之后,打开 prometheus 自带的可视化面板: + +使用下述查询命令即可看到延时的统计图表: + +``` + rate(eunomia_observed_tcpconnlat_v4_histogram_sum[5m]) +/ + rate(eunomia_observed_tcpconnlat_v4_histogram_count[5m]) +``` + +结果: + +![result](tcpconnlat_p.png) + +### 总结 + +通过上面的实验,我们可以看到,tcpconnlat 工具的实现原理是基于内核的TCP连接的跟踪,并且可以跟踪到 tcp 连接的延迟时间;除了命令行使用方式之外,还可以将其和容器、k8s 等元信息综合起来,通过 `prometheus` 和 `grafana` 等工具进行网络性能分析。 + +> `Eunomia` 是一个使用 C/C++ 开发的基于 eBPF的轻量级,高性能云原生监控工具,旨在帮助用户了解容器的各项行为、监控可疑的容器安全事件,力求提供覆盖容器全生命周期的轻量级开源监控解决方案。它使用 `Linux` `eBPF` 技术在运行时跟踪您的系统和应用程序,并分析收集的事件以检测可疑的行为模式。目前,它包含性能分析、容器集群网络可视化分析*、容器安全感知告警、一键部署、持久化存储监控等功能,提供了多样化的 ebpf 追踪点。其核心导出器/命令行工具最小仅需要约 4MB 大小的二进制程序,即可在支持的 Linux 内核上启动。 + +项目地址:https://github.com/yunwei37/Eunomia + +### 参考资料 + +1. http://kerneltravel.net/blog/2020/tcpconnlat/ +2. https://network.51cto.com/article/640631.html \ No newline at end of file diff --git a/13-tcpconnlat/tcpconnlat1.png b/13-tcpconnlat/tcpconnlat1.png new file mode 100644 index 0000000..4fd5eda Binary files /dev/null and b/13-tcpconnlat/tcpconnlat1.png differ diff --git a/13-tcpconnlat/tcpconnlat_p.png b/13-tcpconnlat/tcpconnlat_p.png new file mode 100644 index 0000000..74caa9a Binary files /dev/null and b/13-tcpconnlat/tcpconnlat_p.png differ diff --git a/14-tcpstates/.gitignore b/14-tcpstates/.gitignore new file mode 100644 index 0000000..c610807 --- /dev/null +++ b/14-tcpstates/.gitignore @@ -0,0 +1,5 @@ +.vscode +package.json +eunomia-exporter +ecli + \ No newline at end of file diff --git a/14-tcpstates/README.md b/14-tcpstates/README.md new file mode 100644 index 0000000..f284c55 --- /dev/null +++ b/14-tcpstates/README.md @@ -0,0 +1,56 @@ +--- +layout: post +title: tcpstates +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, syscall, network] +summary: Tcpstates prints TCP state change information, including the duration in each state as milliseconds +--- + + +## origin + +origin from: + +https://github.com/iovisor/bcc/blob/master/libbpf-tools/tcpconnlat.bpf.c + +## Compile and Run + +Compile: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` +Run: + +```shell +sudo ./ecli run package.json +``` + +## details in bcc + +Demonstrations of tcpstates, the Linux BPF/bcc version. + + +tcpstates prints TCP state change information, including the duration in each +state as milliseconds. For example, a single TCP session: +```console +# tcpstates +SKADDR C-PID C-COMM LADDR LPORT RADDR RPORT OLDSTATE -> NEWSTATE MS +ffff9fd7e8192000 22384 curl 100.66.100.185 0 52.33.159.26 80 CLOSE -> SYN_SENT 0.000 +ffff9fd7e8192000 0 swapper/5 100.66.100.185 63446 52.33.159.26 80 SYN_SENT -> ESTABLISHED 1.373 +ffff9fd7e8192000 22384 curl 100.66.100.185 63446 52.33.159.26 80 ESTABLISHED -> FIN_WAIT1 176.042 +ffff9fd7e8192000 0 swapper/5 100.66.100.185 63446 52.33.159.26 80 FIN_WAIT1 -> FIN_WAIT2 0.536 +ffff9fd7e8192000 0 swapper/5 100.66.100.185 63446 52.33.159.26 80 FIN_WAIT2 -> CLOSE 0.006 +^C +``` +This showed that the most time was spent in the ESTABLISHED state (which then +transitioned to FIN_WAIT1), which was 176.042 milliseconds. + +The first column is the socked address, as the output may include lines from +different sessions interleaved. The next two columns show the current on-CPU +process ID and command name: these may show the process that owns the TCP +session, depending on whether the state change executes synchronously in +process context. If that's not the case, they may show kernel details. + diff --git a/14-tcpstates/tcpstates.bpf.c b/14-tcpstates/tcpstates.bpf.c new file mode 100644 index 0000000..b479ca4 --- /dev/null +++ b/14-tcpstates/tcpstates.bpf.c @@ -0,0 +1,109 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +/* Copyright (c) 2021 Hengqi Chen */ +#include +#include +#include +#include +#include "tcpstates.bpf.h" + +#define MAX_ENTRIES 10240 +#define AF_INET 2 +#define AF_INET6 10 + +const volatile bool filter_by_sport = false; +const volatile bool filter_by_dport = false; +const volatile short target_family = 0; + +struct +{ + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_ENTRIES); + __type(key, __u16); + __type(value, __u16); +} sports SEC(".maps"); + +struct +{ + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_ENTRIES); + __type(key, __u16); + __type(value, __u16); +} dports SEC(".maps"); + +struct +{ + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_ENTRIES); + __type(key, struct sock *); + __type(value, __u64); +} timestamps SEC(".maps"); + +struct +{ + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); +} events SEC(".maps"); + +SEC("tracepoint/sock/inet_sock_set_state") +int handle_set_state(struct trace_event_raw_inet_sock_set_state *ctx) +{ + struct sock *sk = (struct sock *)ctx->skaddr; + __u16 family = ctx->family; + __u16 sport = ctx->sport; + __u16 dport = ctx->dport; + __u64 *tsp, delta_us, ts; + struct event event = {}; + + if (ctx->protocol != IPPROTO_TCP) + return 0; + + if (target_family && target_family != family) + return 0; + + if (filter_by_sport && !bpf_map_lookup_elem(&sports, &sport)) + return 0; + + if (filter_by_dport && !bpf_map_lookup_elem(&dports, &dport)) + return 0; + + tsp = bpf_map_lookup_elem(×tamps, &sk); + ts = bpf_ktime_get_ns(); + if (!tsp) + delta_us = 0; + else + delta_us = (ts - *tsp) / 1000; + + event.skaddr = (__u64)sk; + event.ts_us = ts / 1000; + event.delta_us = delta_us; + event.pid = bpf_get_current_pid_tgid() >> 32; + event.oldstate = ctx->oldstate; + event.newstate = ctx->newstate; + event.family = family; + event.sport = sport; + event.dport = dport; + bpf_get_current_comm(&event.task, sizeof(event.task)); + + if (family == AF_INET) + { + bpf_probe_read_kernel(&event.saddr, sizeof(event.saddr), &sk->__sk_common.skc_rcv_saddr); + bpf_probe_read_kernel(&event.daddr, sizeof(event.daddr), &sk->__sk_common.skc_daddr); + } + else + { /* family == AF_INET6 */ + bpf_probe_read_kernel(&event.saddr, sizeof(event.saddr), &sk->__sk_common.skc_v6_rcv_saddr.in6_u.u6_addr32); + bpf_probe_read_kernel(&event.daddr, sizeof(event.daddr), &sk->__sk_common.skc_v6_daddr.in6_u.u6_addr32); + } + + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event)); + + if (ctx->newstate == TCP_CLOSE) + bpf_map_delete_elem(×tamps, &sk); + else + bpf_map_update_elem(×tamps, &sk, &ts, BPF_ANY); + + return 0; +} + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; diff --git a/14-tcpstates/tcpstates.bpf.h b/14-tcpstates/tcpstates.bpf.h new file mode 100644 index 0000000..9084301 --- /dev/null +++ b/14-tcpstates/tcpstates.bpf.h @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +/* Copyright (c) 2021 Hengqi Chen */ +#ifndef __TCPSTATES_H +#define __TCPSTATES_H + +#define TASK_COMM_LEN 16 + +struct event +{ + unsigned __int128 saddr; + unsigned __int128 daddr; + __u64 skaddr; + __u64 ts_us; + __u64 delta_us; + __u32 pid; + int oldstate; + int newstate; + __u16 family; + __u16 sport; + __u16 dport; + char task[TASK_COMM_LEN]; +}; + +#endif /* __TCPSTATES_H */ diff --git a/15-tcprtt/tcprtt.md b/15-tcprtt/tcprtt.md new file mode 100644 index 0000000..712164d --- /dev/null +++ b/15-tcprtt/tcprtt.md @@ -0,0 +1,116 @@ +## eBPF 入门实践教程:编写 eBPF 程序 Tcprtt 测量 TCP 连接的往返时间 + +### 背景 +网络质量在互联网社会中是一个很重要的因素。导致网络质量差的因素有很多,可能是硬件因素导致,也可能是程序 +写的不好导致。为了能更好地定位网络问题,`tcprtt` 工具被提出。它可以监测TCP链接的往返时间,从而分析 +网络质量,帮助用户定位问题来源。 + +### 实现原理 +`tcprtt` 在tcp链接建立的执行点下挂载了执行函数。 +```c +SEC("fentry/tcp_rcv_established") +int BPF_PROG(tcp_rcv, struct sock *sk) +{ + const struct inet_sock *inet = (struct inet_sock *)(sk); + struct tcp_sock *ts; + struct hist *histp; + u64 key, slot; + u32 srtt; + + if (targ_sport && targ_sport != inet->inet_sport) + return 0; + if (targ_dport && targ_dport != sk->__sk_common.skc_dport) + return 0; + if (targ_saddr && targ_saddr != inet->inet_saddr) + return 0; + if (targ_daddr && targ_daddr != sk->__sk_common.skc_daddr) + return 0; + + if (targ_laddr_hist) + key = inet->inet_saddr; + else if (targ_raddr_hist) + key = inet->sk.__sk_common.skc_daddr; + else + key = 0; + histp = bpf_map_lookup_or_try_init(&hists, &key, &zero); + if (!histp) + return 0; + ts = (struct tcp_sock *)(sk); + srtt = BPF_CORE_READ(ts, srtt_us) >> 3; + if (targ_ms) + srtt /= 1000U; + slot = log2l(srtt); + if (slot >= MAX_SLOTS) + slot = MAX_SLOTS - 1; + __sync_fetch_and_add(&histp->slots[slot], 1); + if (targ_show_ext) { + __sync_fetch_and_add(&histp->latency, srtt); + __sync_fetch_and_add(&histp->cnt, 1); + } + return 0; +} + +SEC("kprobe/tcp_rcv_established") +int BPF_KPROBE(tcp_rcv_kprobe, struct sock *sk) +{ + const struct inet_sock *inet = (struct inet_sock *)(sk); + u32 srtt, saddr, daddr; + struct tcp_sock *ts; + struct hist *histp; + u64 key, slot; + + if (targ_sport) { + u16 sport; + bpf_probe_read_kernel(&sport, sizeof(sport), &inet->inet_sport); + if (targ_sport != sport) + return 0; + } + if (targ_dport) { + u16 dport; + bpf_probe_read_kernel(&dport, sizeof(dport), &sk->__sk_common.skc_dport); + if (targ_dport != dport) + return 0; + } + bpf_probe_read_kernel(&saddr, sizeof(saddr), &inet->inet_saddr); + if (targ_saddr && targ_saddr != saddr) + return 0; + bpf_probe_read_kernel(&daddr, sizeof(daddr), &sk->__sk_common.skc_daddr); + if (targ_daddr && targ_daddr != daddr) + return 0; + + if (targ_laddr_hist) + key = saddr; + else if (targ_raddr_hist) + key = daddr; + else + key = 0; + histp = bpf_map_lookup_or_try_init(&hists, &key, &zero); + if (!histp) + return 0; + ts = (struct tcp_sock *)(sk); + bpf_probe_read_kernel(&srtt, sizeof(srtt), &ts->srtt_us); + srtt >>= 3; + if (targ_ms) + srtt /= 1000U; + slot = log2l(srtt); + if (slot >= MAX_SLOTS) + slot = MAX_SLOTS - 1; + __sync_fetch_and_add(&histp->slots[slot], 1); + if (targ_show_ext) { + __sync_fetch_and_add(&histp->latency, srtt); + __sync_fetch_and_add(&histp->cnt, 1); + } + return 0; +} + +``` +当有tcp链接建立时,该工具会自动根据当前系统的支持情况,选择合适的执行函数。 +在执行函数中,`tcprtt`会收集tcp链接的各项基本底薪,包括地址,源端口,目标端口,耗时 +等等,并将其更新到直方图的map中。运行结束后通过用户态代码,展现给用户。 + +### Eunomia中使用方式 + + +### 总结 + +`tcprtt` 通过直方图的形式,可以轻松展现当前系统中网络抖动的情况,方便开发者快速定位系统网络问题 \ No newline at end of file diff --git a/16-profile/profile.md b/16-profile/profile.md new file mode 100644 index 0000000..0fdaedd --- /dev/null +++ b/16-profile/profile.md @@ -0,0 +1,104 @@ +## eBPF 入门实践教程:编写 eBPF 程序 profile 进行性能分析 + +### 背景 + +`profile` 是一款用户追踪程序执行调用流程的工具,类似于perf中的 -g 指令。但是相较于perf而言, +`profile`的功能更为细化,它可以选择用户需要追踪的层面,比如在用户态层面进行追踪,或是在内核态进行追踪。 + +### 实现原理 + +`profile` 的实现依赖于linux中的perf_event。在注入ebpf程序前,`profile` 工具会先将 perf_event +注册好。 +```c +static int open_and_attach_perf_event(int freq, struct bpf_program *prog, + struct bpf_link *links[]) +{ + struct perf_event_attr attr = { + .type = PERF_TYPE_SOFTWARE, + .freq = env.freq, + .sample_freq = env.sample_freq, + .config = PERF_COUNT_SW_CPU_CLOCK, + }; + int i, fd; + + for (i = 0; i < nr_cpus; i++) { + if (env.cpu != -1 && env.cpu != i) + continue; + + fd = syscall(__NR_perf_event_open, &attr, -1, i, -1, 0); + if (fd < 0) { + /* Ignore CPU that is offline */ + if (errno == ENODEV) + continue; + fprintf(stderr, "failed to init perf sampling: %s\n", + strerror(errno)); + return -1; + } + links[i] = bpf_program__attach_perf_event(prog, fd); + if (!links[i]) { + fprintf(stderr, "failed to attach perf event on cpu: " + "%d\n", i); + links[i] = NULL; + close(fd); + return -1; + } + } + + return 0; +} +``` +其ebpf程序实现逻辑是对程序的堆栈进行定时采样,从而捕获程序的执行流程。 +```c +SEC("perf_event") +int do_perf_event(struct bpf_perf_event_data *ctx) +{ + __u64 id = bpf_get_current_pid_tgid(); + __u32 pid = id >> 32; + __u32 tid = id; + __u64 *valp; + static const __u64 zero; + struct key_t key = {}; + + if (!include_idle && tid == 0) + return 0; + + if (targ_pid != -1 && targ_pid != pid) + return 0; + if (targ_tid != -1 && targ_tid != tid) + return 0; + + key.pid = pid; + bpf_get_current_comm(&key.name, sizeof(key.name)); + + if (user_stacks_only) + key.kern_stack_id = -1; + else + key.kern_stack_id = bpf_get_stackid(&ctx->regs, &stackmap, 0); + + if (kernel_stacks_only) + key.user_stack_id = -1; + else + key.user_stack_id = bpf_get_stackid(&ctx->regs, &stackmap, BPF_F_USER_STACK); + + if (key.kern_stack_id >= 0) { + // populate extras to fix the kernel stack + __u64 ip = PT_REGS_IP(&ctx->regs); + + if (is_kernel_addr(ip)) { + key.kernel_ip = ip; + } + } + + valp = bpf_map_lookup_or_try_init(&counts, &key, &zero); + if (valp) + __sync_fetch_and_add(valp, 1); + + return 0; +} +``` +通过这种方式,它可以根据用户指令,简单的决定追踪用户态层面的执行流程或是内核态层面的执行流程。 +### Eunomia中使用方式 + + +### 总结 +`profile` 实现了对程序执行流程的分析,在debug等操作中可以极大的帮助开发者提高效率。 \ No newline at end of file diff --git a/17-memleak/memleak.md b/17-memleak/memleak.md new file mode 100644 index 0000000..4099494 --- /dev/null +++ b/17-memleak/memleak.md @@ -0,0 +1,80 @@ +## eBPF 入门实践教程:编写 eBPF 程序 Memleak 监控内存泄漏 + +### 背景 + +内存泄漏对于一个程序而言是一个很严重的问题。倘若放任一个存在内存泄漏的程序运行,久而久之 +系统的内存会慢慢被耗尽,导致程序运行速度显著下降。为了避免这一情况,`memleak`工具被提出。 +它可以跟踪并匹配内存分配和释放的请求,并且打印出已经被分配资源而又尚未释放的堆栈信息。 + +### 实现原理 + +`memleak` 的实现逻辑非常直观。它在我们常用的动态分配内存的函数接口路径上挂载了ebpf程序, +同时在free上也挂载了ebpf程序。在调用分配内存相关函数时,`memleak` 会记录调用者的pid,分配得到 +内存的地址,分配得到的内存大小等基本数据。在free之后,`memeleak`则会去map中删除记录的对应的分配 +信息。对于用户态常用的分配函数 `malloc`, `calloc` 等,`memleak`使用了 uporbe 技术实现挂载,对于 +内核态的函数,比如 `kmalloc` 等,`memleak` 则使用了现有的 tracepoint 来实现。 +`memleak`主要的挂载点为 +```c +SEC("uprobe/malloc") + +SEC("uretprobe/malloc") + +SEC("uprobe/calloc") + +SEC("uretprobe/calloc") + +SEC("uprobe/realloc") + +SEC("uretprobe/realloc") + +SEC("uprobe/memalign") + +SEC("uretprobe/memalign") + +SEC("uprobe/posix_memalign") + +SEC("uretprobe/posix_memalign") + +SEC("uprobe/valloc") + +SEC("uretprobe/valloc") + +SEC("uprobe/pvalloc") + +SEC("uretprobe/pvalloc") + +SEC("uprobe/aligned_alloc") + +SEC("uretprobe/aligned_alloc") + +SEC("uprobe/free") + +SEC("tracepoint/kmem/kmalloc") + +SEC("tracepoint/kmem/kfree") + + +SEC("tracepoint/kmem/kmalloc_node") + +SEC("tracepoint/kmem/kmem_cache_alloc") + +SEC("tracepoint/kmem/kmem_cache_alloc_node") + +SEC("tracepoint/kmem/kmem_cache_free") + +SEC("tracepoint/kmem/mm_page_alloc") + +SEC("tracepoint/kmem/mm_page_free") + +SEC("tracepoint/percpu/percpu_alloc_percpu") + +SEC("tracepoint/percpu/percpu_free_percpu") + +``` + +### Eunomia中使用方式 + + +### 总结 +`memleak` 实现了对内存分配系列函数的监控追踪,可以避免程序发生严重的内存泄漏事故,对于开发者而言 +具有极大的帮助。 diff --git a/18-biopattern/biolatency.md b/18-biopattern/biolatency.md new file mode 100644 index 0000000..423fca2 --- /dev/null +++ b/18-biopattern/biolatency.md @@ -0,0 +1,121 @@ +## eBPF 入门实践教程:编写 eBPF 程序 Biolatency: 统计系统中发生的I/O事件 + +### 背景 + +Biolatency 可以统计在该工具运行后系统中发生的I/O事件个数,并且计算I/O事件在不同时间段内的分布情况,以 +直方图的形式展现给用户。 + +### 实现原理 + +Biolatency 主要通过 tracepoint 实现,其在 block_rq_insert, block_rq_issue, +block_rq_complete 挂载点下设置了处理函数。在 block_rq_insert 和 block_rq_issue 挂载点下, +Biolatency 会将IO操作发生时的request queue和时间计入map中。 +```c +int trace_rq_start(struct request *rq, int issue) +{ + if (issue && targ_queued && BPF_CORE_READ(rq->q, elevator)) + return 0; + + u64 ts = bpf_ktime_get_ns(); + + if (filter_dev) { + struct gendisk *disk = get_disk(rq); + u32 dev; + + dev = disk ? MKDEV(BPF_CORE_READ(disk, major), + BPF_CORE_READ(disk, first_minor)) : 0; + if (targ_dev != dev) + return 0; + } + bpf_map_update_elem(&start, &rq, &ts, 0); + return 0; +} + +SEC("tp_btf/block_rq_insert") +int block_rq_insert(u64 *ctx) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + if (LINUX_KERNEL_VERSION < KERNEL_VERSION(5, 11, 0)) + return trace_rq_start((void *)ctx[1], false); + else + return trace_rq_start((void *)ctx[0], false); +} + +SEC("tp_btf/block_rq_issue") +int block_rq_issue(u64 *ctx) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + if (LINUX_KERNEL_VERSION < KERNEL_VERSION(5, 11, 0)) + return trace_rq_start((void *)ctx[1], true); + else + return trace_rq_start((void *)ctx[0], true); +} + +``` +在block_rq_complete 挂载点下,Biolatency 会根据 request queue 从map中读取 +上一次操作发生的时间,然后计算与当前时间的差值来判断其在直方图中存在的区域,将该区域内的IO操作 +计数加一。 +```c +SEC("tp_btf/block_rq_complete") +int BPF_PROG(block_rq_complete, struct request *rq, int error, + unsigned int nr_bytes) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + u64 slot, *tsp, ts = bpf_ktime_get_ns(); + struct hist_key hkey = {}; + struct hist *histp; + s64 delta; + + tsp = bpf_map_lookup_elem(&start, &rq); + if (!tsp) + return 0; + delta = (s64)(ts - *tsp); + if (delta < 0) + goto cleanup; + + if (targ_per_disk) { + struct gendisk *disk = get_disk(rq); + + hkey.dev = disk ? MKDEV(BPF_CORE_READ(disk, major), + BPF_CORE_READ(disk, first_minor)) : 0; + } + if (targ_per_flag) + hkey.cmd_flags = rq->cmd_flags; + + histp = bpf_map_lookup_elem(&hists, &hkey); + if (!histp) { + bpf_map_update_elem(&hists, &hkey, &initial_hist, 0); + histp = bpf_map_lookup_elem(&hists, &hkey); + if (!histp) + goto cleanup; + } + + if (targ_ms) + delta /= 1000000U; + else + delta /= 1000U; + slot = log2l(delta); + if (slot >= MAX_SLOTS) + slot = MAX_SLOTS - 1; + __sync_fetch_and_add(&histp->slots[slot], 1); + +cleanup: + bpf_map_delete_elem(&start, &rq); + return 0; +} + +``` +当用户中止程序时,用户态程序会读取直方图map中的数据,并打印呈现。 + +### Eunomia中使用方式 + + +### 总结 +Biolatency 通过 tracepoint 挂载点实现了对IO事件个数的统计,并且能以直方图的 +形式进行展现,可以方便开发者了解系统I/O事件情况。 \ No newline at end of file diff --git a/18-biopattern/biopattern.md b/18-biopattern/biopattern.md new file mode 100644 index 0000000..d06a473 --- /dev/null +++ b/18-biopattern/biopattern.md @@ -0,0 +1,48 @@ +## eBPF 入门实践教程:编写 eBPF 程序 Biopattern: 统计随机/顺序磁盘 I/O + +### 背景 + +Biopattern 可以统计随机/顺序磁盘I/O次数的比例。 + +### 实现原理 + +Biopattern 的ebpf代码在 tracepoint/block/block_rq_complete 挂载点下实现。在磁盘完成IO请求 +后,程序会经过此挂载点。Biopattern 内部存有一张以设备号为主键的哈希表,当程序经过挂载点时, Biopattern +会获得操作信息,根据哈希表中该设备的上一次操作记录来判断本次操作是随机IO还是顺序IO,并更新操作计数。 + +```c +SEC("tracepoint/block/block_rq_complete") +int handle__block_rq_complete(struct trace_event_raw_block_rq_complete *ctx) +{ + sector_t *last_sectorp, sector = ctx->sector; + struct counter *counterp, zero = {}; + u32 nr_sector = ctx->nr_sector; + dev_t dev = ctx->dev; + + if (targ_dev != -1 && targ_dev != dev) + return 0; + + counterp = bpf_map_lookup_or_try_init(&counters, &dev, &zero); + if (!counterp) + return 0; + if (counterp->last_sector) { + if (counterp->last_sector == sector) + __sync_fetch_and_add(&counterp->sequential, 1); + else + __sync_fetch_and_add(&counterp->random, 1); + __sync_fetch_and_add(&counterp->bytes, nr_sector * 512); + } + counterp->last_sector = sector + nr_sector; + return 0; +} + +``` +当用户停止Biopattern后,用户态程序会读取获得的计数信息,并将其输出给用户。 + +### Eunomia中使用方式 + +尚未集成 + +### 总结 + +Biopattern 可以展现随机/顺序磁盘I/O次数的比例,对于开发者把握整体I/O情况有较大帮助。 \ No newline at end of file diff --git a/18-biopattern/biostacks.md b/18-biopattern/biostacks.md new file mode 100644 index 0000000..3fb08fd --- /dev/null +++ b/18-biopattern/biostacks.md @@ -0,0 +1,100 @@ +## eBPF 入门实践教程:编写 eBPF 程序 Biostacks: 监控内核 I/O 操作耗时 + + +### 背景 +由于有些磁盘I/O操作不是直接由应用发起的,比如元数据读写,因此有些直接捕捉磁盘I/O操作信息可能 +会有一些无法解释的I/O操作发生。为此,Biostacks 会直接追踪内核中初始化I/O操作的函数,并将磁 +盘I/O操作耗时以直方图的形式展现。 + +### 实现原理 +Biostacks 的挂载点为 fentry/blk_account_io_start, kprobe/blk_account_io_merge_bio 和 +fentry/blk_account_io_done。fentry/blk_account_io_start 和 kprobe/blk_account_io_merge_bio +挂载点均时内核需要发起I/O操作中必经的初始化路径。在经过此处时,Biostacks 会根据 request queue ,将数据存入 +map中。 +```c +static __always_inline +int trace_start(void *ctx, struct request *rq, bool merge_bio) +{ + struct internal_rqinfo *i_rqinfop = NULL, i_rqinfo = {}; + struct gendisk *disk = BPF_CORE_READ(rq, rq_disk); + dev_t dev; + + dev = disk ? MKDEV(BPF_CORE_READ(disk, major), + BPF_CORE_READ(disk, first_minor)) : 0; + if (targ_dev != -1 && targ_dev != dev) + return 0; + + if (merge_bio) + i_rqinfop = bpf_map_lookup_elem(&rqinfos, &rq); + if (!i_rqinfop) + i_rqinfop = &i_rqinfo; + + i_rqinfop->start_ts = bpf_ktime_get_ns(); + i_rqinfop->rqinfo.pid = bpf_get_current_pid_tgid(); + i_rqinfop->rqinfo.kern_stack_size = + bpf_get_stack(ctx, i_rqinfop->rqinfo.kern_stack, + sizeof(i_rqinfop->rqinfo.kern_stack), 0); + bpf_get_current_comm(&i_rqinfop->rqinfo.comm, + sizeof(&i_rqinfop->rqinfo.comm)); + i_rqinfop->rqinfo.dev = dev; + + if (i_rqinfop == &i_rqinfo) + bpf_map_update_elem(&rqinfos, &rq, i_rqinfop, 0); + return 0; +} + +SEC("fentry/blk_account_io_start") +int BPF_PROG(blk_account_io_start, struct request *rq) +{ + return trace_start(ctx, rq, false); +} + +SEC("kprobe/blk_account_io_merge_bio") +int BPF_KPROBE(blk_account_io_merge_bio, struct request *rq) +{ + return trace_start(ctx, rq, true); +} + +``` +在I/O操作完成后,fentry/blk_account_io_done 下的处理函数会从map中读取之前存入的信息,根据当下时间 +记录时间差值,得到I/O操作的耗时信息,并更新到存储直方图数据的map中。 +```c +SEC("fentry/blk_account_io_done") +int BPF_PROG(blk_account_io_done, struct request *rq) +{ + u64 slot, ts = bpf_ktime_get_ns(); + struct internal_rqinfo *i_rqinfop; + struct rqinfo *rqinfop; + struct hist *histp; + s64 delta; + + i_rqinfop = bpf_map_lookup_elem(&rqinfos, &rq); + if (!i_rqinfop) + return 0; + delta = (s64)(ts - i_rqinfop->start_ts); + if (delta < 0) + goto cleanup; + histp = bpf_map_lookup_or_try_init(&hists, &i_rqinfop->rqinfo, &zero); + if (!histp) + goto cleanup; + if (targ_ms) + delta /= 1000000U; + else + delta /= 1000U; + slot = log2l(delta); + if (slot >= MAX_SLOTS) + slot = MAX_SLOTS - 1; + __sync_fetch_and_add(&histp->slots[slot], 1); + +cleanup: + bpf_map_delete_elem(&rqinfos, &rq); + return 0; +} +``` +在用户输入程序退出指令后,其用户态程序会将直方图map中的信息读出并打印。 + +### Eunomia中使用方式 + + +### 总结 +Biostacks 从源头实现了对I/O操作的追踪,可以极大的方便我们掌握磁盘I/O情况。 \ No newline at end of file diff --git a/18-biopattern/bitesize.md b/18-biopattern/bitesize.md new file mode 100644 index 0000000..dd4a0be --- /dev/null +++ b/18-biopattern/bitesize.md @@ -0,0 +1,63 @@ +## eBPF 入门实践教程:编写 eBPF 程序 Bitesize: 监控块设备 I/O + +### 背景 + +为了能更好的获得 I/O 操作需要的磁盘块大小相关信息,Bitesize 工具被开发。它可以在启动后追踪 +不同进程所需要的块大小,并以直方图的形式显示分布 + +### 实现原理 + +Biteszie 在 block_rq_issue 追踪点下挂在了处理函数。当进程对磁盘发出了块 I/O 请求操作时, +系统会经过此挂载点,此时处理函数或许请求的信息,将其存入对应的map中。 +```c +static int trace_rq_issue(struct request *rq) +{ + struct hist_key hkey; + struct hist *histp; + u64 slot; + + if (filter_dev) { + struct gendisk *disk = get_disk(rq); + u32 dev; + + dev = disk ? MKDEV(BPF_CORE_READ(disk, major), + BPF_CORE_READ(disk, first_minor)) : 0; + if (targ_dev != dev) + return 0; + } + bpf_get_current_comm(&hkey.comm, sizeof(hkey.comm)); + if (!comm_allowed(hkey.comm)) + return 0; + + histp = bpf_map_lookup_elem(&hists, &hkey); + if (!histp) { + bpf_map_update_elem(&hists, &hkey, &initial_hist, 0); + histp = bpf_map_lookup_elem(&hists, &hkey); + if (!histp) + return 0; + } + slot = log2l(rq->__data_len / 1024); + if (slot >= MAX_SLOTS) + slot = MAX_SLOTS - 1; + __sync_fetch_and_add(&histp->slots[slot], 1); + + return 0; +} + +SEC("tp_btf/block_rq_issue") +int BPF_PROG(block_rq_issue) +{ + if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(5, 11, 0)) + return trace_rq_issue((void *)ctx[0]); + else + return trace_rq_issue((void *)ctx[1]); +} +``` + +当用户发出中止工具的指令后,其用户态代码会将map中存储的数据读出并逐进程的展示追踪结果 + +### Eunomia中使用方式 + + +### 总结 +Bitesize 以进程为粒度,使得开发者可以更好的掌握程序对磁盘 I/O 的请求情况。 \ No newline at end of file diff --git a/19-syscount/syscount.md b/19-syscount/syscount.md new file mode 100644 index 0000000..167db71 --- /dev/null +++ b/19-syscount/syscount.md @@ -0,0 +1,81 @@ +## eBPF 入门实践教程:编写 eBPF 程序 syscount 监控慢系统调用 + +### 背景 + +`syscount` 可以统计系统或者某个进程发生的各类syscall的总数或者时耗时。 + +### 实现原理 +`syscount` 的实现逻辑非常直观,他在 `sys_enter` 和 `sys_exit` 这两个 `tracepoint` 下挂载了 +执行函数。 +```c +SEC("tracepoint/raw_syscalls/sys_enter") +int sys_enter(struct trace_event_raw_sys_enter *args) +{ + u64 id = bpf_get_current_pid_tgid(); + pid_t pid = id >> 32; + u32 tid = id; + u64 ts; + + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + if (filter_pid && pid != filter_pid) + return 0; + + ts = bpf_ktime_get_ns(); + bpf_map_update_elem(&start, &tid, &ts, 0); + return 0; +} + +SEC("tracepoint/raw_syscalls/sys_exit") +int sys_exit(struct trace_event_raw_sys_exit *args) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + u64 id = bpf_get_current_pid_tgid(); + static const struct data_t zero; + pid_t pid = id >> 32; + struct data_t *val; + u64 *start_ts, lat = 0; + u32 tid = id; + u32 key; + + /* this happens when there is an interrupt */ + if (args->id == -1) + return 0; + + if (filter_pid && pid != filter_pid) + return 0; + if (filter_failed && args->ret >= 0) + return 0; + if (filter_errno && args->ret != -filter_errno) + return 0; + + if (measure_latency) { + start_ts = bpf_map_lookup_elem(&start, &tid); + if (!start_ts) + return 0; + lat = bpf_ktime_get_ns() - *start_ts; + } + + key = (count_by_process) ? pid : args->id; + val = bpf_map_lookup_or_try_init(&data, &key, &zero); + if (val) { + __sync_fetch_and_add(&val->count, 1); + if (count_by_process) + save_proc_name(val); + if (measure_latency) + __sync_fetch_and_add(&val->total_ns, lat); + } + return 0; +} + +``` +当syscall发生时,`syscount`会记录其tid和发生的时间并存入map中。在syscall完成时,`syscount` 会根据用户 +的需求,统计syscall持续的时间,或者是发生的次数。 +### Eunomia中使用方式 + + +### 总结 +`sycount` 使得用户可以较为方便的追踪某个进程或者是系统内系统调用发生的情况。 \ No newline at end of file diff --git a/2-fentry-unlink/.gitignore b/2-fentry-unlink/.gitignore new file mode 100644 index 0000000..7d5aebf --- /dev/null +++ b/2-fentry-unlink/.gitignore @@ -0,0 +1,6 @@ +.vscode +package.json +*.o +*.skel.json +*.skel.yaml +package.yaml diff --git a/2-fentry-unlink/README.md b/2-fentry-unlink/README.md new file mode 100644 index 0000000..8ba6d4b --- /dev/null +++ b/2-fentry-unlink/README.md @@ -0,0 +1,76 @@ +--- +layout: post +title: fentry-link +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, examples, fentry, no-output] +summary: an example that uses fentry and fexit BPF programs for tracing a file is deleted +--- + +## Fentry + +`fentry` is an example that uses fentry and fexit BPF programs for tracing. It +attaches `fentry` and `fexit` traces to `do_unlinkat()` which is called when a +file is deleted and logs the return value, PID, and filename to the +trace pipe. + +Important differences, compared to kprobes, are improved performance and +usability. In this example, better usability is shown with the ability to +directly dereference pointer arguments, like in normal C, instead of using +various read helpers. The big distinction between **fexit** and **kretprobe** +programs is that fexit one has access to both input arguments and returned +result, while kretprobe can only access the result. + +fentry and fexit programs are available starting from 5.5 kernels. + +```console +$ sudo ecli examples/bpftools/fentry-link/package.json +Runing eBPF program... +``` + +The `fentry` output in `/sys/kernel/debug/tracing/trace_pipe` should look +something like this: + +```console +$ sudo cat /sys/kernel/debug/tracing/trace_pipe + rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file + rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file, ret = 0 + rm-9290 [004] d..2 4637.798698: bpf_trace_printk: fentry: pid = 9290, filename = test_file2 + rm-9290 [004] d..2 4637.798843: bpf_trace_printk: fexit: pid = 9290, filename = test_file2, ret = 0 +``` + +## Run + + + +- Compile: + + ```console + docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest + ``` + + or + + ```console + $ ecc fentry-link.bpf.c + Compiling bpf object... + Packing ebpf object and config into package.json... + ``` + +- Run and help: + + ```console + sudo ecli examples/bpftools/fentry-link/package.json -h + Usage: fentry_link_bpf [--help] [--version] [--verbose] + + A simple eBPF program + + Optional arguments: + -h, --help shows help message and exits + -v, --version prints version information and exits + --verbose prints libbpf debug information + + Built with eunomia-bpf framework. + See https://github.com/eunomia-bpf/eunomia-bpf for more information. + ``` \ No newline at end of file diff --git a/2-fentry-unlink/fentry-link.bpf.c b/2-fentry-unlink/fentry-link.bpf.c new file mode 100644 index 0000000..baf5575 --- /dev/null +++ b/2-fentry-unlink/fentry-link.bpf.c @@ -0,0 +1,27 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause +/* Copyright (c) 2021 Sartura */ +#include "vmlinux.h" +#include +#include + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; + +SEC("fentry/do_unlinkat") +int BPF_PROG(do_unlinkat, int dfd, struct filename *name) +{ + pid_t pid; + + pid = bpf_get_current_pid_tgid() >> 32; + bpf_printk("fentry: pid = %d, filename = %s\n", pid, name->name); + return 0; +} + +SEC("fexit/do_unlinkat") +int BPF_PROG(do_unlinkat_exit, int dfd, struct filename *name, long ret) +{ + pid_t pid; + + pid = bpf_get_current_pid_tgid() >> 32; + bpf_printk("fexit: pid = %d, filename = %s, ret = %ld\n", pid, name->name, ret); + return 0; +} diff --git a/21-llcstat/llcstat.md b/21-llcstat/llcstat.md new file mode 100644 index 0000000..f9f887b --- /dev/null +++ b/21-llcstat/llcstat.md @@ -0,0 +1,75 @@ +## eBPF 入门实践教程:编写 eBPF 程序 llcstat 监控 cache miss 和 cache reference + +### 背景 + +为了能更好地优化程序性能,开发者有时需要考虑如何更好地减少cache miss的发生。 +但是程序到底可能发生多少次cache miss这是一个难以回答的问题。`llcstat` 通过 +ebpf技术,实现了对 cache miss 和 cache reference 的准确追踪,可以极大方便开发者 +调试程序,优化性能。 + +### 实现原理 + +`llcstat` 引入了linux中的 `perf_event` 机制,程序在用户态载入的时候, +会将现有的c `perf_event` attach到指定的位置。 +```c + if (open_and_attach_perf_event(PERF_COUNT_HW_CACHE_MISSES, + env.sample_period, + obj->progs.on_cache_miss, mlinks)) + goto cleanup; + if (open_and_attach_perf_event(PERF_COUNT_HW_CACHE_REFERENCES, + env.sample_period, + obj->progs.on_cache_ref, rlinks)) +``` + +同时,`llcstat` 在内核态中会在`perf_event`下挂载执行函数,当程序运行到了 +挂载点,执行函数会启动并开始计数,将结果写入对应的map中。 + +```c +static __always_inline +int trace_event(__u64 sample_period, bool miss) +{ + struct key_info key = {}; + struct value_info *infop, zero = {}; + + u64 pid_tgid = bpf_get_current_pid_tgid(); + key.cpu = bpf_get_smp_processor_id(); + key.pid = pid_tgid >> 32; + if (targ_per_thread) + key.tid = (u32)pid_tgid; + else + key.tid = key.pid; + + infop = bpf_map_lookup_or_try_init(&infos, &key, &zero); + if (!infop) + return 0; + if (miss) + infop->miss += sample_period; + else + infop->ref += sample_period; + bpf_get_current_comm(infop->comm, sizeof(infop->comm)); + + return 0; +} + +SEC("perf_event") +int on_cache_miss(struct bpf_perf_event_data *ctx) +{ + return trace_event(ctx->sample_period, true); +} + +SEC("perf_event") +int on_cache_ref(struct bpf_perf_event_data *ctx) +{ + return trace_event(ctx->sample_period, false); +} +``` + +用户态程序会读取map存入的 cache miss 和 cache reference 的计数信息,并 +逐进程的进行展示。 + +### Eunomia中使用方式 + + +### 总结 +`llcstat` 运用了ebpf计数,高效简洁地展示了某个线程发生cache miss和cache +reference的次数,这使得开发者们在优化程序的过程中有了更明确的量化指标。 diff --git a/3-kprobe-unlink/.gitignore b/3-kprobe-unlink/.gitignore new file mode 100644 index 0000000..7d5aebf --- /dev/null +++ b/3-kprobe-unlink/.gitignore @@ -0,0 +1,6 @@ +.vscode +package.json +*.o +*.skel.json +*.skel.yaml +package.yaml diff --git a/3-kprobe-unlink/README.md b/3-kprobe-unlink/README.md new file mode 100644 index 0000000..df7a415 --- /dev/null +++ b/3-kprobe-unlink/README.md @@ -0,0 +1,55 @@ +--- +layout: post +title: kprobe-link +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, examples, kprobe, no-output] +summary: an example of dealing with kernel-space entry and exit (return) probes, `kprobe` and `kretprobe` in libbpf lingo +--- + + +`kprobe` is an example of dealing with kernel-space entry and exit (return) +probes, `kprobe` and `kretprobe` in libbpf lingo. It attaches `kprobe` and +`kretprobe` BPF programs to the `do_unlinkat()` function and logs the PID, +filename, and return result, respectively, using `bpf_printk()` macro. + +```console +$ sudo ecli examples/bpftools/kprobe-link/package.json +Runing eBPF program... +``` + +The `kprobe` demo output in `/sys/kernel/debug/tracing/trace_pipe` should look +something like this: + +```shell +$ sudo cat /sys/kernel/debug/tracing/trace_pipe + rm-9346 [005] d..3 4710.951696: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test1 + rm-9346 [005] d..4 4710.951819: bpf_trace_printk: KPROBE EXIT: ret = 0 + rm-9346 [005] d..3 4710.951852: bpf_trace_printk: KPROBE ENTRY pid = 9346, filename = test2 + rm-9346 [005] d..4 4710.951895: bpf_trace_printk: KPROBE EXIT: ret = 0 +``` + +## Run + + + +Compile with docker: + +```console +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +or compile with `ecc`: + +```console +$ ecc kprobe-link.bpf.c +Compiling bpf object... +Packing ebpf object and config into package.json... +``` + +Run: + +```console +sudo ecli examples/bpftools/kprobe-link/package.json +``` \ No newline at end of file diff --git a/3-kprobe-unlink/kprobe-link.bpf.c b/3-kprobe-unlink/kprobe-link.bpf.c new file mode 100644 index 0000000..e1dc288 --- /dev/null +++ b/3-kprobe-unlink/kprobe-link.bpf.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause +/* Copyright (c) 2021 Sartura */ +#include "vmlinux.h" +#include +#include +#include + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; + +SEC("kprobe/do_unlinkat") +int BPF_KPROBE(do_unlinkat, int dfd, struct filename *name) +{ + pid_t pid; + const char *filename; + + pid = bpf_get_current_pid_tgid() >> 32; + filename = BPF_CORE_READ(name, name); + bpf_printk("KPROBE ENTRY pid = %d, filename = %s\n", pid, filename); + return 0; +} + +SEC("kretprobe/do_unlinkat") +int BPF_KRETPROBE(do_unlinkat_exit, long ret) +{ + pid_t pid; + + pid = bpf_get_current_pid_tgid() >> 32; + bpf_printk("KPROBE EXIT: pid = %d, ret = %ld\n", pid, ret); + return 0; +} \ No newline at end of file diff --git a/4-opensnoop/.gitignore b/4-opensnoop/.gitignore new file mode 100644 index 0000000..b669b39 --- /dev/null +++ b/4-opensnoop/.gitignore @@ -0,0 +1,7 @@ +.vscode +package.json +eunomia-exporter +ecli +*.bpf.o +*.skel.json +*.skel.yaml diff --git a/4-opensnoop/1_opensnoop.md b/4-opensnoop/1_opensnoop.md new file mode 100644 index 0000000..fdc82da --- /dev/null +++ b/4-opensnoop/1_opensnoop.md @@ -0,0 +1,263 @@ +## eBPF 入门实践教程:编写 eBPF 程序监控打开文件路径并使用 Prometheus 可视化 + +### 背景 + +通过对 open 系统调用的监测,`opensnoop`可以展现系统内所有调用了 open 系统调用的进程信息。 + +### 使用 ecli 一键运行 + +```console +$ # 下载安装 ecli 二进制 +$ wget https://aka.pw/bpf-ecli -O ./ecli && chmod +x ./ecli +$ # 使用 url 一键运行 +$ ./ecli run https://eunomia-bpf.github.io/eunomia-bpf/opensnoop/package.json + +running and waiting for the ebpf events from perf event... +time ts pid uid ret flags comm fname +00:58:08 0 812 0 9 524288 vmtoolsd /etc/mtab +00:58:08 0 812 0 11 0 vmtoolsd /proc/devices +00:58:08 0 34351 0 24 524288 ecli /etc/localtime +00:58:08 0 812 0 9 0 vmtoolsd /sys/class/block/sda5/../device/../../../class +00:58:08 0 812 0 -2 0 vmtoolsd /sys/class/block/sda5/../device/../../../label +00:58:08 0 812 0 9 0 vmtoolsd /sys/class/block/sda1/../device/../../../class +00:58:08 0 812 0 -2 0 vmtoolsd /sys/class/block/sda1/../device/../../../label +00:58:08 0 812 0 9 0 vmtoolsd /run/systemd/resolve/resolv.conf +00:58:08 0 812 0 9 0 vmtoolsd /proc/net/route +00:58:08 0 812 0 9 0 vmtoolsd /proc/net/ipv6_route +``` + +### 实现 + +使用 eunomia-bpf 可以帮助你只需要编写内核态应用程序,不需要编写任何用户态辅助框架代码;需要编写的代码由两个部分组成: + +- 头文件 opensnoop.h 里面定义需要导出的 C 语言结构体: +- 源文件 opensnoop.bpf.c 里面定义 BPF 代码: + +头文件 opensnoop.h + +```c +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __OPENSNOOP_H +#define __OPENSNOOP_H + +#define TASK_COMM_LEN 16 +#define NAME_MAX 255 +#define INVALID_UID ((uid_t)-1) + +// used for export event +struct event { + /* user terminology for pid: */ + unsigned long long ts; + int pid; + int uid; + int ret; + int flags; + char comm[TASK_COMM_LEN]; + char fname[NAME_MAX]; +}; + +#endif /* __OPENSNOOP_H */ +``` + +`opensnoop` 的实现逻辑比较简单,它在 `sys_enter_open` 和 `sys_enter_openat` 这两个追踪点下 +加了执行函数,当有 open 系统调用发生时,执行函数便会被触发。同样在,在对应的 `sys_exit_open` 和 +`sys_exit_openat` 系统调用下,`opensnoop` 也加了执行函数。 + +源文件 opensnoop.bpf.c + +```c +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2019 Facebook +// Copyright (c) 2020 Netflix +#include +#include +#include "opensnoop.h" + +struct args_t { + const char *fname; + int flags; +}; + +const volatile pid_t targ_pid = 0; +const volatile pid_t targ_tgid = 0; +const volatile uid_t targ_uid = 0; +const volatile bool targ_failed = false; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10240); + __type(key, u32); + __type(value, struct args_t); +} start SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(u32)); + __uint(value_size, sizeof(u32)); +} events SEC(".maps"); + +static __always_inline bool valid_uid(uid_t uid) { + return uid != INVALID_UID; +} + +static __always_inline +bool trace_allowed(u32 tgid, u32 pid) +{ + u32 uid; + + /* filters */ + if (targ_tgid && targ_tgid != tgid) + return false; + if (targ_pid && targ_pid != pid) + return false; + if (valid_uid(targ_uid)) { + uid = (u32)bpf_get_current_uid_gid(); + if (targ_uid != uid) { + return false; + } + } + return true; +} + +SEC("tracepoint/syscalls/sys_enter_open") +int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx) +{ + u64 id = bpf_get_current_pid_tgid(); + /* use kernel terminology here for tgid/pid: */ + u32 tgid = id >> 32; + u32 pid = id; + + /* store arg info for later lookup */ + if (trace_allowed(tgid, pid)) { + struct args_t args = {}; + args.fname = (const char *)ctx->args[0]; + args.flags = (int)ctx->args[1]; + bpf_map_update_elem(&start, &pid, &args, 0); + } + return 0; +} + +SEC("tracepoint/syscalls/sys_enter_openat") +int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter* ctx) +{ + u64 id = bpf_get_current_pid_tgid(); + /* use kernel terminology here for tgid/pid: */ + u32 tgid = id >> 32; + u32 pid = id; + + /* store arg info for later lookup */ + if (trace_allowed(tgid, pid)) { + struct args_t args = {}; + args.fname = (const char *)ctx->args[1]; + args.flags = (int)ctx->args[2]; + bpf_map_update_elem(&start, &pid, &args, 0); + } + return 0; +} + +static __always_inline +int trace_exit(struct trace_event_raw_sys_exit* ctx) +{ + struct event event = {}; + struct args_t *ap; + int ret; + u32 pid = bpf_get_current_pid_tgid(); + + ap = bpf_map_lookup_elem(&start, &pid); + if (!ap) + return 0; /* missed entry */ + ret = ctx->ret; + if (targ_failed && ret >= 0) + goto cleanup; /* want failed only */ + + /* event data */ + event.pid = bpf_get_current_pid_tgid() >> 32; + event.uid = bpf_get_current_uid_gid(); + bpf_get_current_comm(&event.comm, sizeof(event.comm)); + bpf_probe_read_user_str(&event.fname, sizeof(event.fname), ap->fname); + event.flags = ap->flags; + event.ret = ret; + + /* emit event */ + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, + &event, sizeof(event)); + +cleanup: + bpf_map_delete_elem(&start, &pid); + return 0; +} + +SEC("tracepoint/syscalls/sys_exit_open") +int tracepoint__syscalls__sys_exit_open(struct trace_event_raw_sys_exit* ctx) +{ + return trace_exit(ctx); +} + +SEC("tracepoint/syscalls/sys_exit_openat") +int tracepoint__syscalls__sys_exit_openat(struct trace_event_raw_sys_exit* ctx) +{ + return trace_exit(ctx); +} + +char LICENSE[] SEC("license") = "GPL"; +``` + +在 enter 环节,`opensnoop` 会记录调用者的 pid, comm 等基本信息,并存入 map 中。在 exit 环节,`opensnoop` +会根据 pid 读出之前存入的数据,再结合捕获的其他数据,输出到用户态处理函数中,展现给用户。 + +完整示例代码请参考:https://github.com/eunomia-bpf/eunomia-bpf/tree/master/examples/bpftools/opensnoop + +把头文件和源文件放在独立的目录里面,编译运行: + +```bash +$ # 使用容器进行编译,生成一个 package.json 文件,里面是已经编译好的代码和一些辅助信息 +$ docker run -it -v /path/to/opensnoop:/src yunwei37/ebpm:latest +$ # 运行 eBPF 程序(root shell) +$ sudo ecli run package.json +``` + +### Prometheus 可视化 + +编写 yaml 配置文件: + +```yaml +programs: + - name: opensnoop + metrics: + counters: + - name: eunomia_file_open_counter + description: test + labels: + - name: pid + - name: comm + - name: filename + from: fname + compiled_ebpf_filename: package.json +``` + +使用 eunomia-exporter 实现导出信息到 Prometheus: + +- 通过 https://github.com/eunomia-bpf/eunomia-bpf/releases 下载 eunomia-exporter + +```console +$ ls +config.yaml eunomia-exporter package.json +$ sudo ./eunomia-exporter + +Running ebpf program opensnoop takes 46 ms +Listening on http://127.0.0.1:8526 +running and waiting for the ebpf events from perf event... +Receiving request at path /metrics +``` + +![result](../img/opensnoop_prometheus.png) + +### 总结和参考资料 + +`opensnoop` 通过对 open 系统调用的追踪,使得用户可以较为方便地掌握目前系统中调用了 open 系统调用的进程信息。 + +参考资料: + +- 源代码:https://github.com/eunomia-bpf/eunomia-bpf/tree/master/examples/bpftools/opensnoop +- libbpf 参考代码:https://github.com/iovisor/bcc/blob/master/libbpf-tools/opensnoop.bpf.c +- eunomia-bpf 手册:https://eunomia-bpf.github.io/ diff --git a/4-opensnoop/README.md b/4-opensnoop/README.md new file mode 100644 index 0000000..6954b29 --- /dev/null +++ b/4-opensnoop/README.md @@ -0,0 +1,281 @@ +--- +layout: post +title: opensnoop +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, syscall] +summary: opensnoop traces the open() syscall system-wide, and prints various details. +--- + +## origin + +The kernel code is origin from: + + + +result: + +```console +$ sudo ecli examples/bpftools/opensnoop/package.json -h +Usage: opensnoop_bpf [--help] [--version] [--verbose] [--pid_target VAR] [--tgid_target VAR] [--uid_target VAR] [--failed] + +Trace open family syscalls. + +Optional arguments: + -h, --help shows help message and exits + -v, --version prints version information and exits + --verbose prints libbpf debug information + --pid_target Process ID to trace + --tgid_target Thread ID to trace + --uid_target User ID to trace + -f, --failed trace only failed events + +Built with eunomia-bpf framework. +See https://github.com/eunomia-bpf/eunomia-bpf for more information. + +$ sudo ecli examples/bpftools/opensnoop/package.json +TIME TS PID UID RET FLAGS COMM FNAME +20:31:50 0 1 0 51 524288 systemd /proc/614/cgroup +20:31:50 0 33182 0 25 524288 ecli /etc/localtime +20:31:53 0 754 0 6 0 irqbalance /proc/interrupts +20:31:53 0 754 0 6 0 irqbalance /proc/stat +20:32:03 0 754 0 6 0 irqbalance /proc/interrupts +20:32:03 0 754 0 6 0 irqbalance /proc/stat +20:32:03 0 632 0 7 524288 vmtoolsd /etc/mtab +20:32:03 0 632 0 9 0 vmtoolsd /proc/devices + +$ sudo ecli examples/bpftools/opensnoop/package.json --pid_target 754 +TIME TS PID UID RET FLAGS COMM FNAME +20:34:13 0 754 0 6 0 irqbalance /proc/interrupts +20:34:13 0 754 0 6 0 irqbalance /proc/stat +20:34:23 0 754 0 6 0 irqbalance /proc/interrupts +20:34:23 0 754 0 6 0 irqbalance /proc/stat +``` + +## Compile and Run + +Compile with docker: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +or compile with `ecc`: + +```console +$ ecc opensnoop.bpf.c opensnoop.h +Compiling bpf object... +Generating export types... +Packing ebpf object and config into package.json... +``` + +Run: + +```shell +sudo ./ecli run examples/bpftools/opensnoop/package.json +``` + +## details in bcc + +Demonstrations of opensnoop, the Linux eBPF/bcc version. + +opensnoop traces the open() syscall system-wide, and prints various details. +Example output: + +```console +# ./opensnoop +PID COMM FD ERR PATH +17326 <...> 7 0 /sys/kernel/debug/tracing/trace_pipe +1576 snmpd 9 0 /proc/net/dev +1576 snmpd 11 0 /proc/net/if_inet6 +1576 snmpd 11 0 /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms +1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/eth0/retrans_time_ms +1576 snmpd 11 0 /proc/sys/net/ipv6/conf/eth0/forwarding +1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/eth0/base_reachable_time_ms +1576 snmpd 11 0 /proc/sys/net/ipv4/neigh/lo/retrans_time_ms +1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/lo/retrans_time_ms +1576 snmpd 11 0 /proc/sys/net/ipv6/conf/lo/forwarding +1576 snmpd 11 0 /proc/sys/net/ipv6/neigh/lo/base_reachable_time_ms +1576 snmpd 9 0 /proc/diskstats +1576 snmpd 9 0 /proc/stat +1576 snmpd 9 0 /proc/vmstat +1956 supervise 9 0 supervise/status.new +1956 supervise 9 0 supervise/status.new +17358 run 3 0 /etc/ld.so.cache +17358 run 3 0 /lib/x86_64-linux-gnu/libtinfo.so.5 +17358 run 3 0 /lib/x86_64-linux-gnu/libdl.so.2 +17358 run 3 0 /lib/x86_64-linux-gnu/libc.so.6 +17358 run -1 6 /dev/tty +17358 run 3 0 /proc/meminfo +17358 run 3 0 /etc/nsswitch.conf +17358 run 3 0 /etc/ld.so.cache +17358 run 3 0 /lib/x86_64-linux-gnu/libnss_compat.so.2 +17358 run 3 0 /lib/x86_64-linux-gnu/libnsl.so.1 +17358 run 3 0 /etc/ld.so.cache +17358 run 3 0 /lib/x86_64-linux-gnu/libnss_nis.so.2 +17358 run 3 0 /lib/x86_64-linux-gnu/libnss_files.so.2 +17358 run 3 0 /etc/passwd +17358 run 3 0 ./run +^C +`` +While tracing, the snmpd process opened various /proc files (reading metrics), +and a "run" process read various libraries and config files (looks like it +was starting up: a new process). + +opensnoop can be useful for discovering configuration and log files, if used +during application startup. + +```console +The -p option can be used to filter on a PID, which is filtered in-kernel. Here +I've used it with -T to print timestamps: + + ./opensnoop -Tp 1956 +TIME(s) PID COMM FD ERR PATH +0.000000000 1956 supervise 9 0 supervise/status.new +0.000289999 1956 supervise 9 0 supervise/status.new +1.023068000 1956 supervise 9 0 supervise/status.new +1.023381997 1956 supervise 9 0 supervise/status.new +2.046030000 1956 supervise 9 0 supervise/status.new +2.046363000 1956 supervise 9 0 supervise/status.new +3.068203997 1956 supervise 9 0 supervise/status.new +3.068544999 1956 supervise 9 0 supervise/status.new +``` + +This shows the supervise process is opening the status.new file twice every +second. + +The -U option include UID on output: + +```console +# ./opensnoop -U +UID PID COMM FD ERR PATH +0 27063 vminfo 5 0 /var/run/utmp +103 628 dbus-daemon -1 2 /usr/local/share/dbus-1/system-services +103 628 dbus-daemon 18 0 /usr/share/dbus-1/system-services +103 628 dbus-daemon -1 2 /lib/dbus-1/system-services +``` + +The -u option filtering UID: + +```console +# ./opensnoop -Uu 1000 +UID PID COMM FD ERR PATH +1000 30240 ls 3 0 /etc/ld.so.cache +1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libselinux.so.1 +1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libc.so.6 +1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libpcre.so.3 +1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libdl.so.2 +1000 30240 ls 3 0 /lib/x86_64-linux-gnu/libpthread.so.0 +``` + +The -x option only prints failed opens: + +```console +# ./opensnoop -x +PID COMM FD ERR PATH +18372 run -1 6 /dev/tty +18373 run -1 6 /dev/tty +18373 multilog -1 13 lock +18372 multilog -1 13 lock +18384 df -1 2 /usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo +18384 df -1 2 /usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo +18384 df -1 2 /usr/share/locale/en_US/LC_MESSAGES/coreutils.mo +18384 df -1 2 /usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo +18384 df -1 2 /usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo +18384 df -1 2 /usr/share/locale/en/LC_MESSAGES/coreutils.mo +18385 run -1 6 /dev/tty +18386 run -1 6 /dev/tty +``` + +This caught a df command failing to open a coreutils.mo file, and trying from +different directories. + +The ERR column is the system error number. Error number 2 is ENOENT: no such +file or directory. + +A maximum tracing duration can be set with the -d option. For example, to trace +for 2 seconds: + +```console +# ./opensnoop -d 2 +PID COMM FD ERR PATH +2191 indicator-multi 11 0 /sys/block +2191 indicator-multi 11 0 /sys/block +2191 indicator-multi 11 0 /sys/block +2191 indicator-multi 11 0 /sys/block +2191 indicator-multi 11 0 /sys/block + +``` + +The -n option can be used to filter on process name using partial matches: + +```console +# ./opensnoop -n ed + +PID COMM FD ERR PATH +2679 sed 3 0 /etc/ld.so.cache +2679 sed 3 0 /lib/x86_64-linux-gnu/libselinux.so.1 +2679 sed 3 0 /lib/x86_64-linux-gnu/libc.so.6 +2679 sed 3 0 /lib/x86_64-linux-gnu/libpcre.so.3 +2679 sed 3 0 /lib/x86_64-linux-gnu/libdl.so.2 +2679 sed 3 0 /lib/x86_64-linux-gnu/libpthread.so.0 +2679 sed 3 0 /proc/filesystems +2679 sed 3 0 /usr/lib/locale/locale-archive +2679 sed -1 2 +2679 sed 3 0 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache +2679 sed 3 0 /dev/null +2680 sed 3 0 /etc/ld.so.cache +2680 sed 3 0 /lib/x86_64-linux-gnu/libselinux.so.1 +2680 sed 3 0 /lib/x86_64-linux-gnu/libc.so.6 +2680 sed 3 0 /lib/x86_64-linux-gnu/libpcre.so.3 +2680 sed 3 0 /lib/x86_64-linux-gnu/libdl.so.2 +2680 sed 3 0 /lib/x86_64-linux-gnu/libpthread.so.0 +2680 sed 3 0 /proc/filesystems +2680 sed 3 0 /usr/lib/locale/locale-archive +2680 sed -1 2 +^C +``` + +This caught the 'sed' command because it partially matches 'ed' that's passed +to the '-n' option. + +The -e option prints out extra columns; for example, the following output +contains the flags passed to open(2), in octal: + +```console +# ./opensnoop -e +PID COMM FD ERR FLAGS PATH +28512 sshd 10 0 00101101 /proc/self/oom_score_adj +28512 sshd 3 0 02100000 /etc/ld.so.cache +28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libwrap.so.0 +28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libaudit.so.1 +28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libpam.so.0 +28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libselinux.so.1 +28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libsystemd.so.0 +28512 sshd 3 0 02100000 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.2 +28512 sshd 3 0 02100000 /lib/x86_64-linux-gnu/libutil.so.1 +``` + +The -f option filters based on flags to the open(2) call, for example: + +```console +# ./opensnoop -e -f O_WRONLY -f O_RDWR +PID COMM FD ERR FLAGS PATH +28084 clear_console 3 0 00100002 /dev/tty +28084 clear_console -1 13 00100002 /dev/tty0 +28084 clear_console -1 13 00100001 /dev/tty0 +28084 clear_console -1 13 00100002 /dev/console +28084 clear_console -1 13 00100001 /dev/console +28051 sshd 8 0 02100002 /var/run/utmp +28051 sshd 7 0 00100001 /var/log/wtmp +``` + +The --cgroupmap option filters based on a cgroup set. It is meant to be used +with an externally created map. + +```console +# ./opensnoop --cgroupmap /sys/fs/bpf/test01 +``` + +For more details, see docs/special_filtering.md diff --git a/4-opensnoop/config.yaml b/4-opensnoop/config.yaml new file mode 100644 index 0000000..0a859ba --- /dev/null +++ b/4-opensnoop/config.yaml @@ -0,0 +1,12 @@ +programs: +- name: opensnoop + metrics: + counters: + - name: eunomia_file_open_counter + description: test + labels: + - name: pid + - name: comm + - name: filename + from: fname + compiled_ebpf_filename: package.json diff --git a/4-opensnoop/opensnoop.bpf.c b/4-opensnoop/opensnoop.bpf.c new file mode 100644 index 0000000..597c760 --- /dev/null +++ b/4-opensnoop/opensnoop.bpf.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2019 Facebook +// Copyright (c) 2020 Netflix +#include +#include +#include "opensnoop.h" + +struct args_t { + const char *fname; + int flags; +}; + +/// Process ID to trace +const volatile int pid_target = 0; +/// Thread ID to trace +const volatile int tgid_target = 0; +/// @description User ID to trace +const volatile int uid_target = 0; +/// @cmdarg {"default": false, "short": "f", "long": "failed"} +/// @description trace only failed events +const volatile bool targ_failed = false; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10240); + __type(key, u32); + __type(value, struct args_t); +} start SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(u32)); + __uint(value_size, sizeof(u32)); +} events SEC(".maps"); + +static __always_inline bool valid_uid(uid_t uid) { + return uid != INVALID_UID; +} + +static __always_inline +bool trace_allowed(u32 tgid, u32 pid) +{ + u32 uid; + + /* filters */ + if (tgid_target && tgid_target != tgid) + return false; + if (pid_target && pid_target != pid) + return false; + if (valid_uid(uid_target)) { + uid = (u32)bpf_get_current_uid_gid(); + if (uid_target != uid) { + return false; + } + } + return true; +} + +SEC("tracepoint/syscalls/sys_enter_open") +int tracepoint__syscalls__sys_enter_open(struct trace_event_raw_sys_enter* ctx) +{ + u64 id = bpf_get_current_pid_tgid(); + /* use kernel terminology here for tgid/pid: */ + u32 tgid = id >> 32; + u32 pid = id; + + /* store arg info for later lookup */ + if (trace_allowed(tgid, pid)) { + struct args_t args = {}; + args.fname = (const char *)ctx->args[0]; + args.flags = (int)ctx->args[1]; + bpf_map_update_elem(&start, &pid, &args, 0); + } + return 0; +} + +SEC("tracepoint/syscalls/sys_enter_openat") +int tracepoint__syscalls__sys_enter_openat(struct trace_event_raw_sys_enter* ctx) +{ + u64 id = bpf_get_current_pid_tgid(); + /* use kernel terminology here for tgid/pid: */ + u32 tgid = id >> 32; + u32 pid = id; + + /* store arg info for later lookup */ + if (trace_allowed(tgid, pid)) { + struct args_t args = {}; + args.fname = (const char *)ctx->args[1]; + args.flags = (int)ctx->args[2]; + bpf_map_update_elem(&start, &pid, &args, 0); + } + return 0; +} + +static __always_inline +int trace_exit(struct trace_event_raw_sys_exit* ctx) +{ + struct event event = {}; + struct args_t *ap; + int ret; + u32 pid = bpf_get_current_pid_tgid(); + + ap = bpf_map_lookup_elem(&start, &pid); + if (!ap) + return 0; /* missed entry */ + ret = ctx->ret; + if (targ_failed && ret >= 0) + goto cleanup; /* want failed only */ + + /* event data */ + event.pid = bpf_get_current_pid_tgid() >> 32; + event.uid = bpf_get_current_uid_gid(); + bpf_get_current_comm(&event.comm, sizeof(event.comm)); + bpf_probe_read_user_str(&event.fname, sizeof(event.fname), ap->fname); + event.flags = ap->flags; + event.ret = ret; + + /* emit event */ + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, + &event, sizeof(event)); + +cleanup: + bpf_map_delete_elem(&start, &pid); + return 0; +} + +SEC("tracepoint/syscalls/sys_exit_open") +int tracepoint__syscalls__sys_exit_open(struct trace_event_raw_sys_exit* ctx) +{ + return trace_exit(ctx); +} + +SEC("tracepoint/syscalls/sys_exit_openat") +int tracepoint__syscalls__sys_exit_openat(struct trace_event_raw_sys_exit* ctx) +{ + return trace_exit(ctx); +} + +/// Trace open family syscalls. +char LICENSE[] SEC("license") = "GPL"; diff --git a/4-opensnoop/opensnoop.h b/4-opensnoop/opensnoop.h new file mode 100644 index 0000000..a5aa43f --- /dev/null +++ b/4-opensnoop/opensnoop.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __OPENSNOOP_H +#define __OPENSNOOP_H + +#define TASK_COMM_LEN 16 +#define NAME_MAX 255 +#define INVALID_UID ((uid_t)-1) + +// used for export event +struct event { + /* user terminology for pid: */ + unsigned long long ts; + int pid; + int uid; + int ret; + int flags; + char comm[TASK_COMM_LEN]; + char fname[NAME_MAX]; +}; + +#endif /* __OPENSNOOP_H */ \ No newline at end of file diff --git a/5-uprobe-bashreadline/.gitignore b/5-uprobe-bashreadline/.gitignore new file mode 100644 index 0000000..a857114 --- /dev/null +++ b/5-uprobe-bashreadline/.gitignore @@ -0,0 +1,7 @@ +.vscode +package.json +ecli +*.o +*.skel.json +*.skel.yaml +package.yaml \ No newline at end of file diff --git a/5-uprobe-bashreadline/README.md b/5-uprobe-bashreadline/README.md new file mode 100644 index 0000000..f81050d --- /dev/null +++ b/5-uprobe-bashreadline/README.md @@ -0,0 +1,79 @@ +--- +layout: post +title: bootstrap +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, examples, uprobe, perf event] +summary: an example of a simple (but realistic) BPF application prints bash commands from all running bash shells on the system. +--- + + + +This prints bash commands from all running bash shells on the system. + +## System requirements: + +- Linux kernel > 5.5 +- Eunomia's [ecli](https://github.com/eunomia-bpf/eunomia-bpf/tree/master/ecli) installed + + +## Run + +- Compile: + + ```shell + docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest + ``` + + or + + ```shell + ecc bashreadline.bpf.c bashreadline.h + ``` + +- Run: + + ```console + $ sudo ./ecli run eunomia-bpf/examples/bpftools/bootstrap/package.json + TIME PID STR + 11:17:34 28796 whoami + 11:17:41 28796 ps -ef + 11:17:51 28796 echo "Hello eBPF!" + ``` + +## details in bcc + + +``` +Demonstrations of bashreadline, the Linux eBPF/bcc version. + +This prints bash commands from all running bash shells on the system. For +example: + +# ./bashreadline +TIME PID COMMAND +05:28:25 21176 ls -l +05:28:28 21176 date +05:28:35 21176 echo hello world +05:28:43 21176 foo this command failed +05:28:45 21176 df -h +05:29:04 3059 echo another shell +05:29:13 21176 echo first shell again + +When running the script on Arch Linux, you may need to specify the location +of libreadline.so library: + +# ./bashreadline -s /lib/libreadline.so +TIME PID COMMAND +11:17:34 28796 whoami +11:17:41 28796 ps -ef +11:17:51 28796 echo "Hello eBPF!" + + +The entered command may fail. This is just showing what command lines were +entered interactively for bash to process. + +It works by tracing the return of the readline() function using uprobes +(specifically a uretprobe). +``` \ No newline at end of file diff --git a/5-uprobe-bashreadline/bashreadline.bpf.c b/5-uprobe-bashreadline/bashreadline.bpf.c new file mode 100644 index 0000000..0464cd4 --- /dev/null +++ b/5-uprobe-bashreadline/bashreadline.bpf.c @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2021 Facebook */ +#include +#include +#include +#include "bashreadline.h" + +#define TASK_COMM_LEN 16 + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); +} events SEC(".maps"); + +/* Format of u[ret]probe section definition supporting auto-attach: + * u[ret]probe/binary:function[+offset] + * + * binary can be an absolute/relative path or a filename; the latter is resolved to a + * full binary path via bpf_program__attach_uprobe_opts. + * + * Specifying uprobe+ ensures we carry out strict matching; either "uprobe" must be + * specified (and auto-attach is not possible) or the above format is specified for + * auto-attach. + */ +SEC("uprobe//bin/bash:readline") +int BPF_KRETPROBE(printret, const void *ret) { + struct str_t data; + char comm[TASK_COMM_LEN]; + u32 pid; + + if (!ret) + return 0; + + bpf_get_current_comm(&comm, sizeof(comm)); + if (comm[0] != 'b' || comm[1] != 'a' || comm[2] != 's' || comm[3] != 'h' || comm[4] != 0 ) + return 0; + + pid = bpf_get_current_pid_tgid() >> 32; + data.pid = pid; + bpf_probe_read_user_str(&data.str, sizeof(data.str), ret); + + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &data, sizeof(data)); + + return 0; +}; + +char LICENSE[] SEC("license") = "GPL"; \ No newline at end of file diff --git a/5-uprobe-bashreadline/bashreadline.h b/5-uprobe-bashreadline/bashreadline.h new file mode 100644 index 0000000..9348347 --- /dev/null +++ b/5-uprobe-bashreadline/bashreadline.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +/* Copyright (c) 2021 Facebook */ +#ifndef __BASHREADLINE_H +#define __BASHREADLINE_H + +#define MAX_LINE_SIZE 80 + +struct str_t { + __u32 pid; + char str[MAX_LINE_SIZE]; +}; + +#endif /* __BASHREADLINE_H */ \ No newline at end of file diff --git a/6-sigsnoop/.gitignore b/6-sigsnoop/.gitignore new file mode 100755 index 0000000..bbee7c8 --- /dev/null +++ b/6-sigsnoop/.gitignore @@ -0,0 +1,10 @@ +.vscode +package.json +*.wasm +ewasm-skel.h +ecli +ewasm +*.o +*.skel.json +*.skel.yaml +package.yaml diff --git a/6-sigsnoop/README.md b/6-sigsnoop/README.md new file mode 100755 index 0000000..185aab0 --- /dev/null +++ b/6-sigsnoop/README.md @@ -0,0 +1,155 @@ +--- +layout: post +title: sigsnoop +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, syscall, kprobe, tracepoint] +summary: Trace signals generated system wide, from syscalls and others. +--- + + +## origin + +origin from: + +https://github.com/iovisor/bcc/blob/master/libbpf-tools/sigsnoop.bpf.c + +## Compile and Run + +Compile: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +Or compile with `ecc`: + +```console +$ ecc sigsnoop.bpf.c sigsnoop.h +Compiling bpf object... +Generating export types... +Packing ebpf object and config into package.json... +``` + +Run: + +```console +$ sudo ./ecli examples/bpftools/sigsnoop/package.json +TIME PID TPID SIG RET COMM +20:43:44 21276 3054 0 0 cpptools-srv +20:43:44 22407 3054 0 0 cpptools-srv +20:43:44 20222 3054 0 0 cpptools-srv +20:43:44 8933 3054 0 0 cpptools-srv +20:43:44 2915 2803 0 0 node +20:43:44 2943 2803 0 0 node +20:43:44 31453 3054 0 0 cpptools-srv +$ sudo ./ecli examples/bpftools/sigsnoop/package.json -h +Usage: sigsnoop_bpf [--help] [--version] [--verbose] [--filtered_pid VAR] [--target_signal VAR] [--failed_only] + +A simple eBPF program + +Optional arguments: + -h, --help shows help message and exits + -v, --version prints version information and exits + --verbose prints libbpf debug information + --filtered_pid set value of pid_t variable filtered_pid + --target_signal set value of int variable target_signal + --failed_only set value of bool variable failed_only + +Built with eunomia-bpf framework. +See https://github.com/eunomia-bpf/eunomia-bpf for more information. +``` + +## WASM example + +Generate WASM skel: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest gen-wasm-skel +``` + +> The skel is generated and commit, so you don't need to generate it again. +> skel includes: +> +> - eunomia-include: include headers for WASM +> - app.c: the WASM app. all library is header only. + +Build WASM module + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest build-wasm +``` + +Run: + +```console +$ sudo ./ecli run app.wasm -h +Usage: sigsnoop [-h] [-x] [-k] [-n] [-p PID] [-s SIGNAL] +Trace standard and real-time signals. + + + -h, --help show this help message and exit + -x, --failed failed signals only + -k, --killed kill only + -p, --pid= target pid + -s, --signal= target signal + +$ sudo ./ecli run app.wasm +running and waiting for the ebpf events from perf event... +{"pid":185539,"tpid":185538,"sig":17,"ret":0,"comm":"cat","sig_name":"SIGCHLD"} +{"pid":185540,"tpid":185538,"sig":17,"ret":0,"comm":"grep","sig_name":"SIGCHLD"} + +$ sudo ./ecli run app.wasm -p 1641 +running and waiting for the ebpf events from perf event... +{"pid":1641,"tpid":2368,"sig":23,"ret":0,"comm":"YDLive","sig_name":"SIGURG"} +{"pid":1641,"tpid":2368,"sig":23,"ret":0,"comm":"YDLive","sig_name":"SIGURG"} +``` + +## details in bcc + +Demonstrations of sigsnoop. + + +This traces signals generated system wide. For example: +```console +# ./sigsnoop -n +TIME PID COMM SIG TPID RESULT +19:56:14 3204808 a.out SIGSEGV 3204808 0 +19:56:14 3204808 a.out SIGPIPE 3204808 0 +19:56:14 3204808 a.out SIGCHLD 3204722 0 +``` +The first line showed that a.out (a test program) deliver a SIGSEGV signal. +The result, 0, means success. + +The second and third lines showed that a.out also deliver SIGPIPE/SIGCHLD +signals successively. + +USAGE message: +```console +# ./sigsnoop -h +Usage: sigsnoop [OPTION...] +Trace standard and real-time signals. + +USAGE: sigsnoop [-h] [-x] [-k] [-n] [-p PID] [-s SIGNAL] + +EXAMPLES: + sigsnoop # trace signals system-wide + sigsnoop -k # trace signals issued by kill syscall only + sigsnoop -x # trace failed signals only + sigsnoop -p 1216 # only trace PID 1216 + sigsnoop -s 9 # only trace signal 9 + + -k, --kill Trace signals issued by kill syscall only. + -n, --name Output signal name instead of signal number. + -p, --pid=PID Process ID to trace + -s, --signal=SIGNAL Signal to trace. + -x, --failed Trace failed signals only. + -?, --help Give this help list + --usage Give a short usage message + -V, --version Print program version +``` +Mandatory or optional arguments to long options are also mandatory or optional +for any corresponding short options. + +Report bugs to https://github.com/iovisor/bcc/tree/master/libbpf-tools. \ No newline at end of file diff --git a/6-sigsnoop/app.c b/6-sigsnoop/app.c new file mode 100755 index 0000000..adb0ba2 --- /dev/null +++ b/6-sigsnoop/app.c @@ -0,0 +1,245 @@ +#include +#include +#include +#include +#include +#include +#include "eunomia-include/wasm-app.h" +#include "eunomia-include/entry.h" +#include "eunomia-include/argp.h" +#include "sigsnoop.bpf.h" +#include "ewasm-skel.h" +#include "eunomia-include/sigsnoop.skel.h" +#define PERF_BUFFER_PAGES 16 +#define PERF_POLL_TIMEOUT_MS 100 +#define warn(...) printf(__VA_ARGS__) +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) + +static volatile int exiting = 0; + +static int target_pid = 0; +static int target_signal = 0; +static bool failed_only = false; +static bool kill_only = false; +static bool signal_name = false; +static bool verbose = false; + +static const char *sig_name[] = { + [0] = "N/A", + [1] = "SIGHUP", + [2] = "SIGINT", + [3] = "SIGQUIT", + [4] = "SIGILL", + [5] = "SIGTRAP", + [6] = "SIGABRT", + [7] = "SIGBUS", + [8] = "SIGFPE", + [9] = "SIGKILL", + [10] = "SIGUSR1", + [11] = "SIGSEGV", + [12] = "SIGUSR2", + [13] = "SIGPIPE", + [14] = "SIGALRM", + [15] = "SIGTERM", + [16] = "SIGSTKFLT", + [17] = "SIGCHLD", + [18] = "SIGCONT", + [19] = "SIGSTOP", + [20] = "SIGTSTP", + [21] = "SIGTTIN", + [22] = "SIGTTOU", + [23] = "SIGURG", + [24] = "SIGXCPU", + [25] = "SIGXFSZ", + [26] = "SIGVTALRM", + [27] = "SIGPROF", + [28] = "SIGWINCH", + [29] = "SIGIO", + [30] = "SIGPWR", + [31] = "SIGSYS", +}; + +const char *argp_program_version = "sigsnoop 0.1"; +const char *argp_program_bug_address = + "https://github.com/iovisor/bcc/tree/master/libbpf-tools"; +const char argp_program_doc[] = +"Trace standard and real-time signals.\n" +"\n" +"USAGE: sigsnoop [-h] [-x] [-k] [-n] [-p PID] [-s SIGNAL]\n" +"\n" +"EXAMPLES:\n" +" sigsnoop # trace signals system-wide\n" +" sigsnoop -k # trace signals issued by kill syscall only\n" +" sigsnoop -x # trace failed signals only\n" +" sigsnoop -p 1216 # only trace PID 1216\n" +" sigsnoop -s 9 # only trace signal 9\n"; + +static const struct argp_option opts[] = { + { "failed", 'x', NULL, 0, "Trace failed signals only." }, + { "kill", 'k', NULL, 0, "Trace signals issued by kill syscall only." }, + { "pid", 'p', "PID", 0, "Process ID to trace" }, + { "signal", 's', "SIGNAL", 0, "Signal to trace." }, + { "name", 'n', NULL, 0, "Output signal name instead of signal number." }, + { "verbose", 'v', NULL, 0, "Verbose debug output" }, + { NULL, 'h', NULL, OPTION_HIDDEN, "Show the full help" }, + {}, +}; + +static error_t parse_arg(int key, char *arg, struct argp_state *state) +{ + long pid, sig; + + switch (key) { + case 'p': + errno = 0; + pid = strtol(arg, NULL, 10); + if (errno || pid <= 0) { + warn("Invalid PID: %s\n", arg); + argp_usage(state); + } + target_pid = pid; + break; + case 's': + errno = 0; + sig = strtol(arg, NULL, 10); + if (errno || sig <= 0) { + warn("Invalid SIGNAL: %s\n", arg); + argp_usage(state); + } + target_signal = sig; + break; + case 'n': + signal_name = true; + break; + case 'x': + failed_only = true; + break; + case 'k': + kill_only = true; + break; + case 'v': + verbose = true; + break; + case 'h': + argp_state_help(state, ARGP_HELP_STD_HELP); + break; + default: + return ARGP_ERR_UNKNOWN; + } + return 0; +} + +static int libbpf_print_fn(const char *format, va_list args) +{ + if (!verbose) + return 0; + return printf(format, args); +} + +static void alias_parse(char *prog) +{ + char *name = prog; + + if (!strcmp(name, "killsnoop")) { + kill_only = true; + } +} + +static void sig_int(int signo) +{ + exiting = 1; +} + +static void handle_event(void *ctx, int cpu, void *data, unsigned int data_sz) +{ + struct event *e = data; + char ts[32] = "12:47:32"; + + if (signal_name && e->sig < ARRAY_SIZE(sig_name)) + printf("%-8s %-7d %-16s %-9s %-7d %-6d\n", + ts, e->pid, e->comm, sig_name[e->sig], e->tpid, e->ret); + else + printf("%-8s %-7d %-16s %-9d %-7d %-6d\n", + ts, e->pid, e->comm, e->sig, e->tpid, e->ret); +} + +static void handle_lost_events(void *ctx, int cpu, unsigned long long lost_cnt) +{ + warn("lost %llu events on CPU #%d\n", lost_cnt, cpu); +} + +int main(int argc, char **argv) +{ + static const struct argp argp = { + .options = opts, + .parser = parse_arg, + .doc = argp_program_doc, + }; + struct perf_buffer *pb = NULL; + struct sigsnoop_bpf *obj; + int err; + + alias_parse(argv[0]); + err = argp_parse(&argp, argc, argv, 0, NULL, NULL); + if (err) + return err; + + obj = sigsnoop_bpf__open(); + if (!obj) { + warn("failed to open BPF object\n"); + return 1; + } + + obj->rodata->filtered_pid = target_pid; + obj->rodata->target_signal = target_signal; + obj->rodata->failed_only = failed_only; + + if (kill_only) { + bpf_program__set_autoload(obj->progs.sig_trace, false); + } else { + bpf_program__set_autoload(obj->progs.kill_entry, false); + bpf_program__set_autoload(obj->progs.kill_exit, false); + bpf_program__set_autoload(obj->progs.tkill_entry, false); + bpf_program__set_autoload(obj->progs.tkill_exit, false); + bpf_program__set_autoload(obj->progs.tgkill_entry, false); + bpf_program__set_autoload(obj->progs.tgkill_exit, false); + } + + err = sigsnoop_bpf__load(obj); + if (err) { + warn("failed to load BPF object: %d\n", err); + goto cleanup; + } + + err = sigsnoop_bpf__attach(obj); + if (err) { + warn("failed to attach BPF programs: %d\n", err); + goto cleanup; + } + + pb = perf_buffer__new(bpf_map__fd(obj->maps.events), PERF_BUFFER_PAGES, + handle_event, handle_lost_events, NULL, NULL); + if (!pb) { + warn("failed to open perf buffer: %d\n", err); + goto cleanup; + } + + printf("%-8s %-7s %-16s %-9s %-7s %-6s\n", + "TIME", "PID", "COMM", "SIG", "TPID", "RESULT"); + + while (!exiting) { + err = perf_buffer__poll(pb, PERF_POLL_TIMEOUT_MS); + if (err < 0 && err != -EINTR) { + warn("error polling perf buffer: %s\n", strerror(-err)); + goto cleanup; + } + /* reset err to return 0 if exiting */ + err = 0; + } + +cleanup: + perf_buffer__free(pb); + sigsnoop_bpf__destroy(obj); + + return err != 0; +} diff --git a/6-sigsnoop/eunomia-include/argp-namefrob.h b/6-sigsnoop/eunomia-include/argp-namefrob.h new file mode 100644 index 0000000..0ce1148 --- /dev/null +++ b/6-sigsnoop/eunomia-include/argp-namefrob.h @@ -0,0 +1,96 @@ +/* Name frobnication for compiling argp outside of glibc + Copyright (C) 1997 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Written by Miles Bader . + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Library General Public License as + published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Library General Public License for more details. + + You should have received a copy of the GNU Library General Public + License along with the GNU C Library; see the file COPYING.LIB. If not, + write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, + Boston, MA 02111-1307, USA. */ + +#if !_LIBC +/* This code is written for inclusion in gnu-libc, and uses names in the + namespace reserved for libc. If we're not compiling in libc, define those + names to be the normal ones instead. */ + +/* argp-parse functions */ +#undef __argp_parse +#define __argp_parse argp_parse +#undef __option_is_end +#define __option_is_end _option_is_end +#undef __option_is_short +#define __option_is_short _option_is_short +#undef __argp_input +#define __argp_input _argp_input + +/* argp-help functions */ +#undef __argp_help +#define __argp_help argp_help +#undef __argp_error +#define __argp_error argp_error +#undef __argp_failure +#define __argp_failure argp_failure +#undef __argp_state_help +#define __argp_state_help argp_state_help +#undef __argp_usage +#define __argp_usage argp_usage +#undef __argp_basename +#define __argp_basename _argp_basename +#undef __argp_short_program_name +#define __argp_short_program_name _argp_short_program_name + +/* argp-fmtstream functions */ +#undef __argp_make_fmtstream +#define __argp_make_fmtstream argp_make_fmtstream +#undef __argp_fmtstream_free +#define __argp_fmtstream_free argp_fmtstream_free +#undef __argp_fmtstream_putc +#define __argp_fmtstream_putc argp_fmtstream_putc +#undef __argp_fmtstream_puts +#define __argp_fmtstream_puts argp_fmtstream_puts +#undef __argp_fmtstream_write +#define __argp_fmtstream_write argp_fmtstream_write +#undef __argp_fmtstream_printf +#define __argp_fmtstream_printf argp_fmtstream_printf +#undef __argp_fmtstream_set_lmargin +#define __argp_fmtstream_set_lmargin argp_fmtstream_set_lmargin +#undef __argp_fmtstream_set_rmargin +#define __argp_fmtstream_set_rmargin argp_fmtstream_set_rmargin +#undef __argp_fmtstream_set_wmargin +#define __argp_fmtstream_set_wmargin argp_fmtstream_set_wmargin +#undef __argp_fmtstream_point +#define __argp_fmtstream_point argp_fmtstream_point +#undef __argp_fmtstream_update +#define __argp_fmtstream_update _argp_fmtstream_update +#undef __argp_fmtstream_ensure +#define __argp_fmtstream_ensure _argp_fmtstream_ensure +#undef __argp_fmtstream_lmargin +#define __argp_fmtstream_lmargin argp_fmtstream_lmargin +#undef __argp_fmtstream_rmargin +#define __argp_fmtstream_rmargin argp_fmtstream_rmargin +#undef __argp_fmtstream_wmargin +#define __argp_fmtstream_wmargin argp_fmtstream_wmargin + +/* normal libc functions we call */ +#undef __sleep +#define __sleep sleep +#undef __strcasecmp +#define __strcasecmp strcasecmp +#undef __vsnprintf +#define __vsnprintf vsnprintf + +#endif /* !_LIBC */ + +#ifndef __set_errno +#define __set_errno(e) (errno = (e)) +#endif diff --git a/6-sigsnoop/eunomia-include/argp.h b/6-sigsnoop/eunomia-include/argp.h new file mode 100644 index 0000000..76234a0 --- /dev/null +++ b/6-sigsnoop/eunomia-include/argp.h @@ -0,0 +1,1854 @@ +#ifndef EUNOMIA_ARGP_H +#define EUNOMIA_ARGP_H + +/* Hierarchial argument parsing. + Copyright (C) 1995, 96, 97, 98, 99, 2003 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Written by Miles Bader . + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Library General Public License as + published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Library General Public License for more details. + + You should have received a copy of the GNU Library General Public + License along with the GNU C Library; see the file COPYING.LIB. If not, + write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, + Boston, MA 02111-1307, USA. */ + +#ifndef _ARGP_H +#define _ARGP_H + +#include +#include +#include + +#include "errno-base.h" +#include + +#ifndef __THROW +# define __THROW +#endif + +#ifndef __const +# define __const const +#endif + +#ifndef __error_t_defined +typedef int error_t; +# define __error_t_defined +#endif + +/* FIXME: What's the right way to check for __restrict? Sun's cc seems + not to have it. Perhaps it's easiest to just delete the use of + __restrict from the prototypes. */ +#ifndef __restrict +# ifndef __GNUC___ +# define __restrict +# endif +#endif + +/* NOTE: We can't use the autoconf tests, since this is supposed to be + an installed header file and argp's config.h is of course not + installed. */ +#ifndef PRINTF_STYLE +# if __GNUC__ >= 2 +# define PRINTF_STYLE(f, a) __attribute__ ((__format__ (__printf__, f, a))) +# else +# define PRINTF_STYLE(f, a) +# endif +#endif + + +#define assert(expr) ((void)(expr)) + +#ifdef __cplusplus +extern "C" { +#endif + +/* A description of a particular option. A pointer to an array of + these is passed in the OPTIONS field of an argp structure. Each option + entry can correspond to one long option and/or one short option; more + names for the same option can be added by following an entry in an option + array with options having the OPTION_ALIAS flag set. */ +struct argp_option +{ + /* The long option name. For more than one name for the same option, you + can use following options with the OPTION_ALIAS flag set. */ + __const char *name; + + /* What key is returned for this option. If > 0 and printable, then it's + also accepted as a short option. */ + int key; + + /* If non-NULL, this is the name of the argument associated with this + option, which is required unless the OPTION_ARG_OPTIONAL flag is set. */ + __const char *arg; + + /* OPTION_ flags. */ + int flags; + + /* The doc string for this option. If both NAME and KEY are 0, This string + will be printed outdented from the normal option column, making it + useful as a group header (it will be the first thing printed in its + group); in this usage, it's conventional to end the string with a `:'. */ + __const char *doc; + + /* The group this option is in. In a long help message, options are sorted + alphabetically within each group, and the groups presented in the order + 0, 1, 2, ..., n, -m, ..., -2, -1. Every entry in an options array with + if this field 0 will inherit the group number of the previous entry, or + zero if it's the first one, unless its a group header (NAME and KEY both + 0), in which case, the previous entry + 1 is the default. Automagic + options such as --help are put into group -1. */ + int group; +}; + +/* The argument associated with this option is optional. */ +#define OPTION_ARG_OPTIONAL 0x1 + +/* This option isn't displayed in any help messages. */ +#define OPTION_HIDDEN 0x2 + +/* This option is an alias for the closest previous non-alias option. This + means that it will be displayed in the same help entry, and will inherit + fields other than NAME and KEY from the aliased option. */ +#define OPTION_ALIAS 0x4 + +/* This option isn't actually an option (and so should be ignored by the + actual option parser), but rather an arbitrary piece of documentation that + should be displayed in much the same manner as the options. If this flag + is set, then the option NAME field is displayed unmodified (e.g., no `--' + prefix is added) at the left-margin (where a *short* option would normally + be displayed), and the documentation string in the normal place. For + purposes of sorting, any leading whitespace and puncuation is ignored, + except that if the first non-whitespace character is not `-', this entry + is displayed after all options (and OPTION_DOC entries with a leading `-') + in the same group. */ +#define OPTION_DOC 0x8 + +/* This option shouldn't be included in `long' usage messages (but is still + included in help messages). This is mainly intended for options that are + completely documented in an argp's ARGS_DOC field, in which case including + the option in the generic usage list would be redundant. For instance, + if ARGS_DOC is "FOO BAR\n-x BLAH", and the `-x' option's purpose is to + distinguish these two cases, -x should probably be marked + OPTION_NO_USAGE. */ +#define OPTION_NO_USAGE 0x10 + +struct argp; /* fwd declare this type */ +struct argp_state; /* " */ +struct argp_child; /* " */ + +/* The type of a pointer to an argp parsing function. */ +typedef error_t (*argp_parser_t) (int key, char *arg, + struct argp_state *state); + +/* What to return for unrecognized keys. For special ARGP_KEY_ keys, such + returns will simply be ignored. For user keys, this error will be turned + into EINVAL (if the call to argp_parse is such that errors are propagated + back to the user instead of exiting); returning EINVAL itself would result + in an immediate stop to parsing in *all* cases. */ +#define ARGP_ERR_UNKNOWN E2BIG /* Hurd should never need E2BIG. XXX */ + +/* Special values for the KEY argument to an argument parsing function. + ARGP_ERR_UNKNOWN should be returned if they aren't understood. + + The sequence of keys to a parsing function is either (where each + uppercased word should be prefixed by `ARGP_KEY_' and opt is a user key): + + INIT opt... NO_ARGS END SUCCESS -- No non-option arguments at all + or INIT (opt | ARG)... END SUCCESS -- All non-option args parsed + or INIT (opt | ARG)... SUCCESS -- Some non-option arg unrecognized + + The third case is where every parser returned ARGP_KEY_UNKNOWN for an + argument, in which case parsing stops at that argument (returning the + unparsed arguments to the caller of argp_parse if requested, or stopping + with an error message if not). + + If an error occurs (either detected by argp, or because the parsing + function returned an error value), then the parser is called with + ARGP_KEY_ERROR, and no further calls are made. */ + +/* This is not an option at all, but rather a command line argument. If a + parser receiving this key returns success, the fact is recorded, and the + ARGP_KEY_NO_ARGS case won't be used. HOWEVER, if while processing the + argument, a parser function decrements the NEXT field of the state it's + passed, the option won't be considered processed; this is to allow you to + actually modify the argument (perhaps into an option), and have it + processed again. */ +#define ARGP_KEY_ARG 0 +/* There are remaining arguments not parsed by any parser, which may be found + starting at (STATE->argv + STATE->next). If success is returned, but + STATE->next left untouched, it's assumed that all arguments were consume, + otherwise, the parser should adjust STATE->next to reflect any arguments + consumed. */ +#define ARGP_KEY_ARGS 0x1000006 +/* There are no more command line arguments at all. */ +#define ARGP_KEY_END 0x1000001 +/* Because it's common to want to do some special processing if there aren't + any non-option args, user parsers are called with this key if they didn't + successfully process any non-option arguments. Called just before + ARGP_KEY_END (where more general validity checks on previously parsed + arguments can take place). */ +#define ARGP_KEY_NO_ARGS 0x1000002 +/* Passed in before any parsing is done. Afterwards, the values of each + element of the CHILD_INPUT field, if any, in the state structure is + copied to each child's state to be the initial value of the INPUT field. */ +#define ARGP_KEY_INIT 0x1000003 +/* Use after all other keys, including SUCCESS & END. */ +#define ARGP_KEY_FINI 0x1000007 +/* Passed in when parsing has successfully been completed (even if there are + still arguments remaining). */ +#define ARGP_KEY_SUCCESS 0x1000004 +/* Passed in if an error occurs. */ +#define ARGP_KEY_ERROR 0x1000005 + +/* An argp structure contains a set of options declarations, a function to + deal with parsing one, documentation string, a possible vector of child + argp's, and perhaps a function to filter help output. When actually + parsing options, getopt is called with the union of all the argp + structures chained together through their CHILD pointers, with conflicts + being resolved in favor of the first occurrence in the chain. */ +struct argp +{ + /* An array of argp_option structures, terminated by an entry with both + NAME and KEY having a value of 0. */ + __const struct argp_option *options; + + /* What to do with an option from this structure. KEY is the key + associated with the option, and ARG is any associated argument (NULL if + none was supplied). If KEY isn't understood, ARGP_ERR_UNKNOWN should be + returned. If a non-zero, non-ARGP_ERR_UNKNOWN value is returned, then + parsing is stopped immediately, and that value is returned from + argp_parse(). For special (non-user-supplied) values of KEY, see the + ARGP_KEY_ definitions below. */ + argp_parser_t parser; + + /* A string describing what other arguments are wanted by this program. It + is only used by argp_usage to print the `Usage:' message. If it + contains newlines, the strings separated by them are considered + alternative usage patterns, and printed on separate lines (lines after + the first are prefix by ` or: ' instead of `Usage:'). */ + __const char *args_doc; + + /* If non-NULL, a string containing extra text to be printed before and + after the options in a long help message (separated by a vertical tab + `\v' character). */ + __const char *doc; + + /* A vector of argp_children structures, terminated by a member with a 0 + argp field, pointing to child argps should be parsed with this one. Any + conflicts are resolved in favor of this argp, or early argps in the + CHILDREN list. This field is useful if you use libraries that supply + their own argp structure, which you want to use in conjunction with your + own. */ + __const struct argp_child *children; + + /* If non-zero, this should be a function to filter the output of help + messages. KEY is either a key from an option, in which case TEXT is + that option's help text, or a special key from the ARGP_KEY_HELP_ + defines, below, describing which other help text TEXT is. The function + should return either TEXT, if it should be used as-is, a replacement + string, which should be malloced, and will be freed by argp, or NULL, + meaning `print nothing'. The value for TEXT is *after* any translation + has been done, so if any of the replacement text also needs translation, + that should be done by the filter function. INPUT is either the input + supplied to argp_parse, or NULL, if argp_help was called directly. */ + char *(*help_filter) (int __key, __const char *__text, void *__input); + + /* If non-zero the strings used in the argp library are translated using + the domain described by this string. Otherwise the currently installed + default domain is used. */ + const char *argp_domain; +}; + +/* Possible KEY arguments to a help filter function. */ +#define ARGP_KEY_HELP_PRE_DOC 0x2000001 /* Help text preceeding options. */ +#define ARGP_KEY_HELP_POST_DOC 0x2000002 /* Help text following options. */ +#define ARGP_KEY_HELP_HEADER 0x2000003 /* Option header string. */ +#define ARGP_KEY_HELP_EXTRA 0x2000004 /* After all other documentation; + TEXT is NULL for this key. */ +/* Explanatory note emitted when duplicate option arguments have been + suppressed. */ +#define ARGP_KEY_HELP_DUP_ARGS_NOTE 0x2000005 +#define ARGP_KEY_HELP_ARGS_DOC 0x2000006 /* Argument doc string. */ + +/* When an argp has a non-zero CHILDREN field, it should point to a vector of + argp_child structures, each of which describes a subsidiary argp. */ +struct argp_child +{ + /* The child parser. */ + __const struct argp *argp; + + /* Flags for this child. */ + int flags; + + /* If non-zero, an optional header to be printed in help output before the + child options. As a side-effect, a non-zero value forces the child + options to be grouped together; to achieve this effect without actually + printing a header string, use a value of "". */ + __const char *header; + + /* Where to group the child options relative to the other (`consolidated') + options in the parent argp; the values are the same as the GROUP field + in argp_option structs, but all child-groupings follow parent options at + a particular group level. If both this field and HEADER are zero, then + they aren't grouped at all, but rather merged with the parent options + (merging the child's grouping levels with the parents). */ + int group; +}; + +/* Parsing state. This is provided to parsing functions called by argp, + which may examine and, as noted, modify fields. */ +struct argp_state +{ + /* The top level ARGP being parsed. */ + __const struct argp *root_argp; + + /* The argument vector being parsed. May be modified. */ + int argc; + char **argv; + + /* The index in ARGV of the next arg that to be parsed. May be modified. */ + int next; + + /* The flags supplied to argp_parse. May be modified. */ + unsigned flags; + + /* While calling a parsing function with a key of ARGP_KEY_ARG, this is the + number of the current arg, starting at zero, and incremented after each + such call returns. At all other times, this is the number of such + arguments that have been processed. */ + unsigned arg_num; + + /* If non-zero, the index in ARGV of the first argument following a special + `--' argument (which prevents anything following being interpreted as an + option). Only set once argument parsing has proceeded past this point. */ + int quoted; + + /* An arbitrary pointer passed in from the user. */ + void *input; + /* Values to pass to child parsers. This vector will be the same length as + the number of children for the current parser. */ + void **child_inputs; + + /* For the parser's use. Initialized to 0. */ + void *hook; + + /* The name used when printing messages. This is initialized to ARGV[0], + or PROGRAM_INVOCATION_NAME if that is unavailable. */ + char *name; + + void *pstate; /* Private, for use by argp. */ +}; + +/* Flags for argp_parse (note that the defaults are those that are + convenient for program command line parsing): */ + +/* Don't ignore the first element of ARGV. Normally (and always unless + ARGP_NO_ERRS is set) the first element of the argument vector is + skipped for option parsing purposes, as it corresponds to the program name + in a command line. */ +#define ARGP_PARSE_ARGV0 0x01 + +/* Don't print error messages for unknown options to stderr; unless this flag + is set, ARGP_PARSE_ARGV0 is ignored, as ARGV[0] is used as the program + name in the error messages. This flag implies ARGP_NO_EXIT (on the + assumption that silent exiting upon errors is bad behaviour). */ +#define ARGP_NO_ERRS 0x02 + +/* Don't parse any non-option args. Normally non-option args are parsed by + calling the parse functions with a key of ARGP_KEY_ARG, and the actual arg + as the value. Since it's impossible to know which parse function wants to + handle it, each one is called in turn, until one returns 0 or an error + other than ARGP_ERR_UNKNOWN; if an argument is handled by no one, the + argp_parse returns prematurely (but with a return value of 0). If all + args have been parsed without error, all parsing functions are called one + last time with a key of ARGP_KEY_END. This flag needn't normally be set, + as the normal behavior is to stop parsing as soon as some argument can't + be handled. */ +#define ARGP_NO_ARGS 0x04 + +/* Parse options and arguments in the same order they occur on the command + line -- normally they're rearranged so that all options come first. */ +#define ARGP_IN_ORDER 0x08 + +/* Don't provide the standard long option --help, which causes usage and + option help information to be output to stdout, and exit (0) called. */ +#define ARGP_NO_HELP 0x10 + +/* Don't exit on errors (they may still result in error messages). */ +#define ARGP_NO_EXIT 0x20 + +/* Use the gnu getopt `long-only' rules for parsing arguments. */ +#define ARGP_LONG_ONLY 0x40 + +/* Turns off any message-printing/exiting options. */ +#define ARGP_SILENT (ARGP_NO_EXIT | ARGP_NO_ERRS | ARGP_NO_HELP) + +/* Parse the options strings in ARGC & ARGV according to the options in ARGP. + FLAGS is one of the ARGP_ flags above. If ARG_INDEX is non-NULL, the + index in ARGV of the first unparsed option is returned in it. If an + unknown option is present, ARGP_ERR_UNKNOWN is returned; if some parser + routine returned a non-zero value, it is returned; otherwise 0 is + returned. This function may also call exit unless the ARGP_NO_HELP flag + is set. INPUT is a pointer to a value to be passed in to the parser. */ +extern error_t argp_parse (__const struct argp *__restrict argp, + int argc, char **__restrict argv, + unsigned flags, int *__restrict arg_index, + void *__restrict input) __THROW; +extern error_t __argp_parse (__const struct argp *__restrict argp, + int argc, char **__restrict argv, + unsigned flags, int *__restrict arg_index, + void *__restrict input) __THROW; + +/* Global variables. */ + +/* If defined or set by the user program to a non-zero value, then a default + option --version is added (unless the ARGP_NO_HELP flag is used), which + will print this string followed by a newline and exit (unless the + ARGP_NO_EXIT flag is used). Overridden by ARGP_PROGRAM_VERSION_HOOK. */ +extern __const char *argp_program_version; + +/* If defined or set by the user program to a non-zero value, then a default + option --version is added (unless the ARGP_NO_HELP flag is used), which + calls this function with a stream to print the version to and a pointer to + the current parsing state, and then exits (unless the ARGP_NO_EXIT flag is + used). This variable takes precedent over ARGP_PROGRAM_VERSION. */ +extern void (*argp_program_version_hook) ( + struct argp_state *__restrict + __state); + +/* If defined or set by the user program, it should point to string that is + the bug-reporting address for the program. It will be printed by + argp_help if the ARGP_HELP_BUG_ADDR flag is set (as it is by various + standard help messages), embedded in a sentence that says something like + `Report bugs to ADDR.'. */ +extern __const char *argp_program_bug_address; + +/* The exit status that argp will use when exiting due to a parsing error. + If not defined or set by the user program, this defaults to EX_USAGE from + . */ +extern error_t argp_err_exit_status; + +/* Flags for argp_help. */ +#define ARGP_HELP_USAGE 0x01 /* a Usage: message. */ +#define ARGP_HELP_SHORT_USAGE 0x02 /* " but don't actually print options. */ +#define ARGP_HELP_SEE 0x04 /* a `Try ... for more help' message. */ +#define ARGP_HELP_LONG 0x08 /* a long help message. */ +#define ARGP_HELP_PRE_DOC 0x10 /* doc string preceding long help. */ +#define ARGP_HELP_POST_DOC 0x20 /* doc string following long help. */ +#define ARGP_HELP_DOC (ARGP_HELP_PRE_DOC | ARGP_HELP_POST_DOC) +#define ARGP_HELP_BUG_ADDR 0x40 /* bug report address */ +#define ARGP_HELP_LONG_ONLY 0x80 /* modify output appropriately to + reflect ARGP_LONG_ONLY mode. */ + +/* These ARGP_HELP flags are only understood by argp_state_help. */ +#define ARGP_HELP_EXIT_ERR 0x100 /* Call exit(1) instead of returning. */ +#define ARGP_HELP_EXIT_OK 0x200 /* Call exit(0) instead of returning. */ + +/* The standard thing to do after a program command line parsing error, if an + error message has already been printed. */ +#define ARGP_HELP_STD_ERR \ + (ARGP_HELP_SEE | ARGP_HELP_EXIT_ERR) +/* The standard thing to do after a program command line parsing error, if no + more specific error message has been printed. */ +#define ARGP_HELP_STD_USAGE \ + (ARGP_HELP_SHORT_USAGE | ARGP_HELP_SEE | ARGP_HELP_EXIT_ERR) +/* The standard thing to do in response to a --help option. */ +#define ARGP_HELP_STD_HELP \ + (ARGP_HELP_SHORT_USAGE | ARGP_HELP_LONG | ARGP_HELP_EXIT_OK \ + | ARGP_HELP_DOC | ARGP_HELP_BUG_ADDR) + + +/* Possibly output the standard usage message for ARGP to stderr and exit. */ +extern void argp_usage (__const struct argp_state *__state) __THROW; +extern void __argp_usage (__const struct argp_state *__state) __THROW; + +/* If appropriate, print the printf string FMT and following args, preceded + by the program name and `:', to stderr, and followed by a `Try ... --help' + message, then exit (1). */ +extern void argp_error (__const struct argp_state *__restrict __state, + __const char *__restrict __fmt, ...) __THROW + PRINTF_STYLE(2,3); +extern void __argp_error (__const struct argp_state *__restrict __state, + __const char *__restrict __fmt, ...) __THROW + PRINTF_STYLE(2,3); + +/* Returns true if the option OPT is a valid short option. */ +extern int _option_is_short (__const struct argp_option *__opt) __THROW; +extern int __option_is_short (__const struct argp_option *__opt) __THROW; + +/* Returns true if the option OPT is in fact the last (unused) entry in an + options array. */ +extern int _option_is_end (__const struct argp_option *__opt) __THROW; +extern int __option_is_end (__const struct argp_option *__opt) __THROW; + +/* Return the input field for ARGP in the parser corresponding to STATE; used + by the help routines. */ +extern void *_argp_input (__const struct argp *__restrict __argp, + __const struct argp_state *__restrict __state) + __THROW; +extern void *__argp_input (__const struct argp *__restrict __argp, + __const struct argp_state *__restrict __state) + __THROW; + +/* Used for extracting the program name from argv[0] */ +extern char *_argp_basename(char *name) __THROW; +extern char *__argp_basename(char *name) __THROW; + +/* Getting the program name given an argp state */ +extern char * +_argp_short_program_name(const struct argp_state *state) __THROW; +extern char * +__argp_short_program_name(const struct argp_state *state) __THROW; + + +#ifdef __USE_EXTERN_INLINES + +# if !_LIBC +# define __argp_usage argp_usage +# define __argp_state_help argp_state_help +# define __option_is_short _option_is_short +# define __option_is_end _option_is_end +# endif + +# ifndef ARGP_EI +# define ARGP_EI extern __inline__ +# endif + +ARGP_EI void +__argp_usage (__const struct argp_state *__state) +{ + __argp_state_help (__state, stderr, ARGP_HELP_STD_USAGE); +} + +ARGP_EI int +__option_is_short (__const struct argp_option *__opt) +{ + if (__opt->flags & OPTION_DOC) + return 0; + else + { + int __key = __opt->key; + return __key > 0 && isprint (__key); + } +} + +ARGP_EI int +__option_is_end (__const struct argp_option *__opt) +{ + return !__opt->key && !__opt->name && !__opt->doc && !__opt->group; +} + +# if !_LIBC +# undef __argp_usage +# undef __argp_state_help +# undef __option_is_short +# undef __option_is_end +# endif +#endif /* Use extern inlines. */ + +#ifdef __cplusplus +} +#endif + +#endif /* argp.h */ + + +#ifndef _GNU_SOURCE +# define _GNU_SOURCE 1 +#endif + +#ifdef HAVE_CONFIG_H +#include +#endif + +/* AIX requires this to be the first thing in the file. */ +#ifndef __GNUC__ +# if HAVE_ALLOCA_H +# include +# else +# ifdef _AIX + #pragma alloca +# else +# ifndef alloca /* predefined by HP cc +Olibcalls */ +char *alloca (); +# endif +# endif +# endif +#endif + +#include +#include +#if HAVE_UNISTD_H +# include +#endif +#include +#include + +#if HAVE_MALLOC_H +/* Needed, for alloca on windows */ +# include +#endif + +#ifndef _ +/* This is for other GNU distributions with internationalized messages. + When compiling libc, the _ macro is predefined. */ +# if defined HAVE_LIBINTL_H || defined _LIBC +# include +# ifdef _LIBC +# undef dgettext +# define dgettext(domain, msgid) __dcgettext (domain, msgid, LC_MESSAGES) +# endif +# else +# define dgettext(domain, msgid) (msgid) +# define gettext(msgid) (msgid) +# endif +#endif +#ifndef N_ +# define N_(msgid) (msgid) +#endif + +#if _LIBC - 0 +#include +#else +#ifdef HAVE_CTHREADS_H +#include +#endif +#endif /* _LIBC */ + +#include "argp.h" + + + + +/* The meta-argument used to prevent any further arguments being interpreted + as options. */ +#define QUOTE "--" + +/* EZ alias for ARGP_ERR_UNKNOWN. */ +#define EBADKEY ARGP_ERR_UNKNOWN + + +/* Default options. */ + +/* When argp is given the --HANG switch, _ARGP_HANG is set and argp will sleep + for one second intervals, decrementing _ARGP_HANG until it's zero. Thus + you can force the program to continue by attaching a debugger and setting + it to 0 yourself. */ +volatile int _argp_hang; + +#define OPT_PROGNAME -2 +#define OPT_USAGE -3 +#if HAVE_SLEEP && HAVE_GETPID +#define OPT_HANG -4 +#endif + +static const struct argp_option argp_default_options[] = +{ + {"help", '?', 0, 0, N_("Give this help list"), -1}, + {"usage", OPT_USAGE, 0, 0, N_("Give a short usage message"), 0 }, + {"program-name",OPT_PROGNAME,"NAME", OPTION_HIDDEN, + N_("Set the program name"), 0}, +#if OPT_HANG + {"HANG", OPT_HANG, "SECS", OPTION_ARG_OPTIONAL | OPTION_HIDDEN, + N_("Hang for SECS seconds (default 3600)"), 0 }, +#endif + {0, 0, 0, 0, 0, 0} +}; + +static error_t +argp_default_parser (int key, char *arg, struct argp_state *state) +{ + switch (key) + { + case '?': + // __argp_state_help (state, ARGP_HELP_STD_HELP); + break; + case OPT_USAGE: + // __argp_state_help (state, + // ARGP_HELP_USAGE | ARGP_HELP_EXIT_OK); + break; + + case OPT_PROGNAME: /* Set the program name. */ +#if HAVE_DECL_PROGRAM_INVOCATION_NAME + program_invocation_name = arg; +#endif + /* [Note that some systems only have PROGRAM_INVOCATION_SHORT_NAME (aka + __PROGNAME), in which case, PROGRAM_INVOCATION_NAME is just defined + to be that, so we have to be a bit careful here.] */ + + /* Update what we use for messages. */ + + state->name = arg; + +#if HAVE_DECL_PROGRAM_INVOCATION_SHORT_NAME + program_invocation_short_name = state->name; +#endif + + if ((state->flags & (ARGP_PARSE_ARGV0 | ARGP_NO_ERRS)) + == ARGP_PARSE_ARGV0) + /* Update what getopt uses too. */ + state->argv[0] = arg; + + break; + +#if OPT_HANG + case OPT_HANG: + _argp_hang = atoi (arg ? arg : "3600"); + printf( "%s: pid = %ld\n", + state->name, (long) getpid()); + while (_argp_hang-- > 0) + __sleep (1); + break; +#endif + + default: + return EBADKEY; + } + return 0; +} + +static const struct argp argp_default_argp = + {argp_default_options, &argp_default_parser, NULL, NULL, NULL, NULL, "libc"}; + + +static const struct argp_option argp_version_options[] = +{ + {"version", 'V', 0, 0, N_("Print program version"), -1}, + {0, 0, 0, 0, 0, 0 } +}; + +static error_t +argp_version_parser (int key, char *arg, struct argp_state *state) +{ + switch (key) + { + case 'V': + if (argp_program_version_hook) + (*argp_program_version_hook) (state); + else if (argp_program_version) + printf ("%s\n", argp_program_version); + else; + // __argp_error (state, dgettext (state->root_argp->argp_domain, + // "(PROGRAM ERROR) No version known!?")); + if (! (state->flags & ARGP_NO_EXIT)) + exit (0); + break; + default: + return EBADKEY; + } + return 0; +} + +static const struct argp argp_version_argp = + {argp_version_options, &argp_version_parser, NULL, NULL, NULL, NULL, "libc"}; + + + +/* The state of a `group' during parsing. Each group corresponds to a + particular argp structure from the tree of such descending from the top + level argp passed to argp_parse. */ +struct group +{ + /* This group's parsing function. */ + argp_parser_t parser; + + /* Which argp this group is from. */ + const struct argp *argp; + + /* The number of non-option args sucessfully handled by this parser. */ + unsigned args_processed; + + /* This group's parser's parent's group. */ + struct group *parent; + unsigned parent_index; /* And the our position in the parent. */ + + /* These fields are swapped into and out of the state structure when + calling this group's parser. */ + void *input, **child_inputs; + void *hook; +}; + +/* Call GROUP's parser with KEY and ARG, swapping any group-specific info + from STATE before calling, and back into state afterwards. If GROUP has + no parser, EBADKEY is returned. */ +static error_t +group_parse (struct group *group, struct argp_state *state, int key, char *arg) +{ + if (group->parser) + { + error_t err; + state->hook = group->hook; + state->input = group->input; + state->child_inputs = group->child_inputs; + state->arg_num = group->args_processed; + err = (*group->parser)(key, arg, state); + group->hook = state->hook; + return err; + } + else + return EBADKEY; +} + +struct parser +{ + const struct argp *argp; + + const char *posixly_correct; + + /* True if there are only no-option arguments left, which are just + passed verbatim with ARGP_KEY_ARG. This is set if we encounter a + quote, or the end of the proper options, but may be cleared again + if the user moves the next argument pointer backwards. */ + int args_only; + + /* Describe how to deal with options that follow non-option ARGV-elements. + + If the caller did not specify anything, the default is + REQUIRE_ORDER if the environment variable POSIXLY_CORRECT is + defined, PERMUTE otherwise. + + REQUIRE_ORDER means don't recognize them as options; stop option + processing when the first non-option is seen. This is what Unix + does. This mode of operation is selected by either setting the + environment variable POSIXLY_CORRECT, or using `+' as the first + character of the list of option characters. + + PERMUTE is the default. We permute the contents of ARGV as we + scan, so that eventually all the non-options are at the end. This + allows options to be given in any order, even with programs that + were not written to expect this. + + RETURN_IN_ORDER is an option available to programs that were + written to expect options and other ARGV-elements in any order + and that care about the ordering of the two. We describe each + non-option ARGV-element as if it were the argument of an option + with character code 1. Using `-' as the first character of the + list of option characters selects this mode of operation. + + */ + enum { REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER } ordering; + + /* A segment of non-option arguments that have been skipped for + later processing, after all options. `first_nonopt' is the index + in ARGV of the first of them; `last_nonopt' is the index after + the last of them. + + If quoted or args_only is non-zero, this segment should be empty. */ + + /* FIXME: I'd prefer to use unsigned, but it's more consistent to + use the same type as for state.next. */ + int first_nonopt; + int last_nonopt; + + /* String of all recognized short options. Needed for ARGP_LONG_ONLY. */ + /* FIXME: Perhaps change to a pointer to a suitable bitmap instead? */ + char *short_opts; + + /* For parsing combined short options. */ + char *nextchar; + + /* States of the various parsing groups. */ + struct group *groups; + /* The end of the GROUPS array. */ + struct group *egroup; + /* An vector containing storage for the CHILD_INPUTS field in all groups. */ + void **child_inputs; + + /* State block supplied to parsing routines. */ + struct argp_state state; + + /* Memory used by this parser. */ + void *storage; +}; + +/* Search for a group defining a short option. */ +static const struct argp_option * +find_short_option(struct parser *parser, int key, struct group **p) +{ + struct group *group; + + for (group = parser->groups; group < parser->egroup; group++) + { + const struct argp_option *opts; + + for (opts = group->argp->options; !__option_is_end(opts); opts++) + if (opts->key == key) + { + *p = group; + return opts; + } + } + return NULL; +} + +enum match_result { MATCH_EXACT, MATCH_PARTIAL, MATCH_NO }; + +/* If defined, allow complete.el-like abbreviations of long options. */ +#ifndef ARGP_COMPLETE +#define ARGP_COMPLETE 0 +#endif + +/* Matches an encountern long-option argument ARG against an option NAME. + * ARG is terminated by NUL or '='. */ +static enum match_result +match_option(const char *arg, const char *name) +{ + unsigned i, j; + for (i = j = 0;; i++, j++) + { + switch(arg[i]) + { + case '\0': + case '=': + return name[j] ? MATCH_PARTIAL : MATCH_EXACT; +#if ARGP_COMPLETE + case '-': + while (name[j] != '-') + if (!name[j++]) + return MATCH_NO; + break; +#endif + default: + if (arg[i] != name[j]) + return MATCH_NO; + } + } +} + +static const struct argp_option * +find_long_option(struct parser *parser, + const char *arg, + struct group **p) +{ + struct group *group; + + /* Partial match found so far. */ + struct group *matched_group = NULL; + const struct argp_option *matched_option = NULL; + + /* Number of partial matches. */ + int num_partial = 0; + + for (group = parser->groups; group < parser->egroup; group++) + { + const struct argp_option *opts; + + for (opts = group->argp->options; !__option_is_end(opts); opts++) + { + if (!opts->name) + continue; + switch (match_option(arg, opts->name)) + { + case MATCH_NO: + break; + case MATCH_PARTIAL: + num_partial++; + + matched_group = group; + matched_option = opts; + + break; + case MATCH_EXACT: + /* Exact match. */ + *p = group; + return opts; + } + } + } + if (num_partial == 1) + { + *p = matched_group; + return matched_option; + } + + return NULL; +} + + +/* The next usable entries in the various parser tables being filled in by + convert_options. */ +struct parser_convert_state +{ + struct parser *parser; + char *short_end; + void **child_inputs_end; +}; + +/* Initialize GROUP from ARGP. If CVT->SHORT_END is non-NULL, short + options are recorded in the short options string. Returns the next + unused group entry. CVT holds state used during the conversion. */ +static struct group * +convert_options (const struct argp *argp, + struct group *parent, unsigned parent_index, + struct group *group, struct parser_convert_state *cvt) +{ + const struct argp_option *opt = argp->options; + const struct argp_child *children = argp->children; + + if (opt || argp->parser) + { + /* This parser needs a group. */ + if (cvt->short_end) + { + /* Record any short options. */ + for ( ; !__option_is_end (opt); opt++) + if (__option_is_short(opt)) + *cvt->short_end++ = opt->key; + } + + group->parser = argp->parser; + group->argp = argp; + group->args_processed = 0; + group->parent = parent; + group->parent_index = parent_index; + group->input = 0; + group->hook = 0; + group->child_inputs = 0; + + if (children) + /* Assign GROUP's CHILD_INPUTS field some space from + CVT->child_inputs_end.*/ + { + unsigned num_children = 0; + while (children[num_children].argp) + num_children++; + group->child_inputs = cvt->child_inputs_end; + cvt->child_inputs_end += num_children; + } + parent = group++; + } + else + parent = 0; + + if (children) + { + unsigned index = 0; + while (children->argp) + group = + convert_options (children++->argp, parent, index++, group, cvt); + } + + return group; +} +/* Allocate and initialize the group structures, so that they are + ordered as if by traversing the corresponding argp parser tree in + pre-order. Also build the list of short options, if that is needed. */ +static void +parser_convert (struct parser *parser, const struct argp *argp) +{ + struct parser_convert_state cvt; + + cvt.parser = parser; + cvt.short_end = parser->short_opts; + cvt.child_inputs_end = parser->child_inputs; + + parser->argp = argp; + + if (argp) + parser->egroup = convert_options (argp, 0, 0, parser->groups, &cvt); + else + parser->egroup = parser->groups; /* No parsers at all! */ + + if (parser->short_opts) + *cvt.short_end ='\0'; +} + +/* Lengths of various parser fields which we will allocated. */ +struct parser_sizes +{ + /* Needed only ARGP_LONG_ONLY */ + size_t short_len; /* Number of short options. */ + + size_t num_groups; /* Group structures we allocate. */ + size_t num_child_inputs; /* Child input slots. */ +}; + +/* For ARGP, increments the NUM_GROUPS field in SZS by the total + number of argp structures descended from it, and the SHORT_LEN by + the total number of short options. */ +static void +calc_sizes (const struct argp *argp, struct parser_sizes *szs) +{ + const struct argp_child *child = argp->children; + const struct argp_option *opt = argp->options; + + if (opt || argp->parser) + { + /* This parser needs a group. */ + szs->num_groups++; + if (opt) + { + while (__option_is_short (opt++)) + szs->short_len++; + } + } + + if (child) + while (child->argp) + { + calc_sizes ((child++)->argp, szs); + szs->num_child_inputs++; + } +} + +/* Initializes PARSER to parse ARGP in a manner described by FLAGS. */ +static error_t +parser_init (struct parser *parser, const struct argp *argp, + int argc, char **argv, int flags, void *input) +{ + error_t err = 0; + struct group *group; + struct parser_sizes szs; + + if (flags & ARGP_IN_ORDER) + parser->ordering = RETURN_IN_ORDER; + else if (flags & ARGP_NO_ARGS) + parser->ordering = REQUIRE_ORDER; + else if (parser->posixly_correct) + parser->ordering = REQUIRE_ORDER; + else + parser->ordering = PERMUTE; + + szs.short_len = 0; + szs.num_groups = 0; + szs.num_child_inputs = 0; + + if (argp) + calc_sizes (argp, &szs); + + if (!(flags & ARGP_LONG_ONLY)) + /* We have no use for the short option array. */ + szs.short_len = 0; + + /* Lengths of the various bits of storage used by PARSER. */ +#define GLEN (szs.num_groups + 1) * sizeof (struct group) +#define CLEN (szs.num_child_inputs * sizeof (void *)) +#define SLEN (szs.short_len + 1) +#define STORAGE(offset) ((void *) (((char *) parser->storage) + (offset))) + + parser->storage = malloc (GLEN + CLEN + SLEN); + if (! parser->storage) + return ENOMEM; + + parser->groups = parser->storage; + + parser->child_inputs = STORAGE(GLEN); + memset (parser->child_inputs, 0, szs.num_child_inputs * sizeof (void *)); + + if (flags & ARGP_LONG_ONLY) + parser->short_opts = STORAGE(GLEN + CLEN); + else + parser->short_opts = NULL; + + parser_convert (parser, argp); + + memset (&parser->state, 0, sizeof (struct argp_state)); + + parser->state.root_argp = parser->argp; + parser->state.argc = argc; + parser->state.argv = argv; + parser->state.flags = flags; + parser->state.pstate = parser; + + parser->args_only = 0; + parser->nextchar = NULL; + parser->first_nonopt = parser->last_nonopt = 0; + + /* Call each parser for the first time, giving it a chance to propagate + values to child parsers. */ + if (parser->groups < parser->egroup) + parser->groups->input = input; + for (group = parser->groups; + group < parser->egroup && (!err || err == EBADKEY); + group++) + { + if (group->parent) + /* If a child parser, get the initial input value from the parent. */ + group->input = group->parent->child_inputs[group->parent_index]; + + if (!group->parser + && group->argp->children && group->argp->children->argp) + /* For the special case where no parsing function is supplied for an + argp, propagate its input to its first child, if any (this just + makes very simple wrapper argps more convenient). */ + group->child_inputs[0] = group->input; + + err = group_parse (group, &parser->state, ARGP_KEY_INIT, 0); + } + if (err == EBADKEY) + err = 0; /* Some parser didn't understand. */ + + if (err) + return err; + + if (argv[0] && !(parser->state.flags & ARGP_PARSE_ARGV0)) + /* There's an argv[0]; use it for messages. */ + { + parser->state.name = argv[0]; + + /* Don't parse it as an argument. */ + parser->state.next = 1; + } + else + parser->state.name = ""; + + return 0; +} + +/* Free any storage consumed by PARSER (but not PARSER itself). */ +static error_t +parser_finalize (struct parser *parser, + error_t err, int arg_ebadkey, int *end_index) +{ + struct group *group; + + if (err == EBADKEY && arg_ebadkey) + /* Suppress errors generated by unparsed arguments. */ + err = 0; + + if (! err) + { + if (parser->state.next == parser->state.argc) + /* We successfully parsed all arguments! Call all the parsers again, + just a few more times... */ + { + for (group = parser->groups; + group < parser->egroup && (!err || err==EBADKEY); + group++) + if (group->args_processed == 0) + err = group_parse (group, &parser->state, ARGP_KEY_NO_ARGS, 0); + for (group = parser->egroup - 1; + group >= parser->groups && (!err || err==EBADKEY); + group--) + err = group_parse (group, &parser->state, ARGP_KEY_END, 0); + + if (err == EBADKEY) + err = 0; /* Some parser didn't understand. */ + + /* Tell the user that all arguments are parsed. */ + if (end_index) + *end_index = parser->state.next; + } + else if (end_index) + /* Return any remaining arguments to the user. */ + *end_index = parser->state.next; + else + /* No way to return the remaining arguments, they must be bogus. */ + { + if (!(parser->state.flags & ARGP_NO_ERRS)) + printf( + dgettext (parser->argp->argp_domain, + "%s: Too many arguments\n"), + parser->state.name); + err = EBADKEY; + } + } + + /* Okay, we're all done, with either an error or success; call the parsers + to indicate which one. */ + + if (err) + { + /* Maybe print an error message. */ + if (err == EBADKEY) + /* An appropriate message describing what the error was should have + been printed earlier. */ + // __argp_state_help (&parser->state, + // ARGP_HELP_STD_ERR); + + /* Since we didn't exit, give each parser an error indication. */ + for (group = parser->groups; group < parser->egroup; group++) + group_parse (group, &parser->state, ARGP_KEY_ERROR, 0); + } + else + /* Notify parsers of success, and propagate back values from parsers. */ + { + /* We pass over the groups in reverse order so that child groups are + given a chance to do there processing before passing back a value to + the parent. */ + for (group = parser->egroup - 1 + ; group >= parser->groups && (!err || err == EBADKEY) + ; group--) + err = group_parse (group, &parser->state, ARGP_KEY_SUCCESS, 0); + if (err == EBADKEY) + err = 0; /* Some parser didn't understand. */ + } + + /* Call parsers once more, to do any final cleanup. Errors are ignored. */ + for (group = parser->egroup - 1; group >= parser->groups; group--) + group_parse (group, &parser->state, ARGP_KEY_FINI, 0); + + if (err == EBADKEY) + err = EINVAL; + + free (parser->storage); + + return err; +} + +/* Call the user parsers to parse the non-option argument VAL, at the + current position, returning any error. The state NEXT pointer + should point to the argument; this function will adjust it + correctly to reflect however many args actually end up being + consumed. */ +static error_t +parser_parse_arg (struct parser *parser, char *val) +{ + /* Save the starting value of NEXT */ + int index = parser->state.next; + error_t err = EBADKEY; + struct group *group; + int key = 0; /* Which of ARGP_KEY_ARG[S] we used. */ + + /* Try to parse the argument in each parser. */ + for (group = parser->groups + ; group < parser->egroup && err == EBADKEY + ; group++) + { + parser->state.next++; /* For ARGP_KEY_ARG, consume the arg. */ + key = ARGP_KEY_ARG; + err = group_parse (group, &parser->state, key, val); + + if (err == EBADKEY) + /* This parser doesn't like ARGP_KEY_ARG; try ARGP_KEY_ARGS instead. */ + { + parser->state.next--; /* For ARGP_KEY_ARGS, put back the arg. */ + key = ARGP_KEY_ARGS; + err = group_parse (group, &parser->state, key, 0); + } + } + + if (! err) + { + if (key == ARGP_KEY_ARGS) + /* The default for ARGP_KEY_ARGS is to assume that if NEXT isn't + changed by the user, *all* arguments should be considered + consumed. */ + parser->state.next = parser->state.argc; + + if (parser->state.next > index) + /* Remember that we successfully processed a non-option + argument -- but only if the user hasn't gotten tricky and set + the clock back. */ + (--group)->args_processed += (parser->state.next - index); + else + /* The user wants to reparse some args, so try looking for options again. */ + parser->args_only = 0; + } + + return err; +} + +/* Exchange two adjacent subsequences of ARGV. + One subsequence is elements [first_nonopt,last_nonopt) + which contains all the non-options that have been skipped so far. + The other is elements [last_nonopt,next), which contains all + the options processed since those non-options were skipped. + + `first_nonopt' and `last_nonopt' are relocated so that they describe + the new indices of the non-options in ARGV after they are moved. */ + +static void +exchange (struct parser *parser) +{ + int bottom = parser->first_nonopt; + int middle = parser->last_nonopt; + int top = parser->state.next; + char **argv = parser->state.argv; + + char *tem; + + /* Exchange the shorter segment with the far end of the longer segment. + That puts the shorter segment into the right place. + It leaves the longer segment in the right place overall, + but it consists of two parts that need to be swapped next. */ + + while (top > middle && middle > bottom) + { + if (top - middle > middle - bottom) + { + /* Bottom segment is the short one. */ + int len = middle - bottom; + register int i; + + /* Swap it with the top part of the top segment. */ + for (i = 0; i < len; i++) + { + tem = argv[bottom + i]; + argv[bottom + i] = argv[top - (middle - bottom) + i]; + argv[top - (middle - bottom) + i] = tem; + } + /* Exclude the moved bottom segment from further swapping. */ + top -= len; + } + else + { + /* Top segment is the short one. */ + int len = top - middle; + register int i; + + /* Swap it with the bottom part of the bottom segment. */ + for (i = 0; i < len; i++) + { + tem = argv[bottom + i]; + argv[bottom + i] = argv[middle + i]; + argv[middle + i] = tem; + } + /* Exclude the moved top segment from further swapping. */ + bottom += len; + } + } + + /* Update records for the slots the non-options now occupy. */ + + parser->first_nonopt += (parser->state.next - parser->last_nonopt); + parser->last_nonopt = parser->state.next; +} + + +enum arg_type { ARG_ARG, ARG_SHORT_OPTION, + ARG_LONG_OPTION, ARG_LONG_ONLY_OPTION, + ARG_QUOTE }; + +static enum arg_type +classify_arg(struct parser *parser, char *arg, char **opt) +{ + if (arg[0] == '-') + /* Looks like an option... */ + switch (arg[1]) + { + case '\0': + /* "-" is not an option. */ + return ARG_ARG; + case '-': + /* Long option, or quote. */ + if (!arg[2]) + return ARG_QUOTE; + + /* A long option. */ + if (opt) + *opt = arg + 2; + return ARG_LONG_OPTION; + + default: + /* Short option. But if ARGP_LONG_ONLY, it can also be a long option. */ + + if (opt) + *opt = arg + 1; + + if (parser->state.flags & ARGP_LONG_ONLY) + { + /* Rules from getopt.c: + + If long_only and the ARGV-element has the form "-f", + where f is a valid short option, don't consider it an + abbreviated form of a long option that starts with f. + Otherwise there would be no way to give the -f short + option. + + On the other hand, if there's a long option "fubar" and + the ARGV-element is "-fu", do consider that an + abbreviation of the long option, just like "--fu", and + not "-f" with arg "u". + + This distinction seems to be the most useful approach. */ + + assert(parser->short_opts); + + if (arg[2] || !strchr(parser->short_opts, arg[1])) + return ARG_LONG_ONLY_OPTION; + } + + return ARG_SHORT_OPTION; + } + + else + return ARG_ARG; +} + +/* Parse the next argument in PARSER (as indicated by PARSER->state.next). + Any error from the parsers is returned, and *ARGP_EBADKEY indicates + whether a value of EBADKEY is due to an unrecognized argument (which is + generally not fatal). */ +static error_t +parser_parse_next (struct parser *parser, int *arg_ebadkey) +{ + if (parser->state.quoted && parser->state.next < parser->state.quoted) + /* The next argument pointer has been moved to before the quoted + region, so pretend we never saw the quoting `--', and start + looking for options again. If the `--' is still there we'll just + process it one more time. */ + parser->state.quoted = parser->args_only = 0; + + /* Give FIRST_NONOPT & LAST_NONOPT rational values if NEXT has been + moved back by the user (who may also have changed the arguments). */ + if (parser->last_nonopt > parser->state.next) + parser->last_nonopt = parser->state.next; + if (parser->first_nonopt > parser->state.next) + parser->first_nonopt = parser->state.next; + + if (parser->nextchar) + /* Deal with short options. */ + { + struct group *group; + char c; + const struct argp_option *option; + char *value = NULL;; + + assert(!parser->args_only); + + c = *parser->nextchar++; + + option = find_short_option(parser, c, &group); + if (!option) + { + if (parser->posixly_correct) + /* 1003.2 specifies the format of this message. */ + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: illegal option -- %c\n"), + parser->state.name, c); + else + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: invalid option -- %c\n"), + parser->state.name, c); + + *arg_ebadkey = 0; + return EBADKEY; + } + + if (!*parser->nextchar) + parser->nextchar = NULL; + + if (option->arg) + { + value = parser->nextchar; + parser->nextchar = NULL; + + if (!value + && !(option->flags & OPTION_ARG_OPTIONAL)) + /* We need an mandatory argument. */ + { + if (parser->state.next == parser->state.argc) + /* Missing argument */ + { + /* 1003.2 specifies the format of this message. */ + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: option requires an argument -- %c\n"), + parser->state.name, c); + + *arg_ebadkey = 0; + return EBADKEY; + } + value = parser->state.argv[parser->state.next++]; + } + } + return group_parse(group, &parser->state, + option->key, value); + } + else + /* Advance to the next ARGV-element. */ + { + if (parser->args_only) + { + *arg_ebadkey = 1; + if (parser->state.next >= parser->state.argc) + /* We're done. */ + return EBADKEY; + else + return parser_parse_arg(parser, + parser->state.argv[parser->state.next]); + } + + if (parser->state.next >= parser->state.argc) + /* Almost done. If there are non-options that we skipped + previously, we should process them now. */ + { + *arg_ebadkey = 1; + if (parser->first_nonopt != parser->last_nonopt) + { + exchange(parser); + + /* Start processing the arguments we skipped previously. */ + parser->state.next = parser->first_nonopt; + + parser->first_nonopt = parser->last_nonopt = 0; + + parser->args_only = 1; + return 0; + } + else + /* Indicate that we're really done. */ + return EBADKEY; + } + else + /* Look for options. */ + { + char *arg = parser->state.argv[parser->state.next]; + + char *optstart; + enum arg_type token = classify_arg(parser, arg, &optstart); + + switch (token) + { + case ARG_ARG: + switch (parser->ordering) + { + case PERMUTE: + if (parser->first_nonopt == parser->last_nonopt) + /* Skipped sequence is empty; start a new one. */ + parser->first_nonopt = parser->last_nonopt = parser->state.next; + + else if (parser->last_nonopt != parser->state.next) + /* We have a non-empty skipped sequence, and + we're not at the end-point, so move it. */ + exchange(parser); + + assert(parser->last_nonopt == parser->state.next); + + /* Skip this argument for now. */ + parser->state.next++; + parser->last_nonopt = parser->state.next; + + return 0; + + case REQUIRE_ORDER: + /* Implicit quote before the first argument. */ + parser->args_only = 1; + return 0; + + case RETURN_IN_ORDER: + *arg_ebadkey = 1; + return parser_parse_arg(parser, arg); + + default: + exit(1); + } + case ARG_QUOTE: + /* Skip it, then exchange with any previous non-options. */ + parser->state.next++; + assert (parser->last_nonopt != parser->state.next); + + if (parser->first_nonopt != parser->last_nonopt) + { + exchange(parser); + + /* Start processing the skipped and the quoted + arguments. */ + + parser->state.quoted = parser->state.next = parser->first_nonopt; + + /* Also empty the skipped-list, to avoid confusion + if the user resets the next pointer. */ + parser->first_nonopt = parser->last_nonopt = 0; + } + else + parser->state.quoted = parser->state.next; + + parser->args_only = 1; + return 0; + + case ARG_LONG_ONLY_OPTION: + case ARG_LONG_OPTION: + { + struct group *group; + const struct argp_option *option; + char *value; + + parser->state.next++; + option = find_long_option(parser, optstart, &group); + + if (!option) + { + /* NOTE: This includes any "=something" in the output. */ + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: unrecognized option `%s'\n"), + parser->state.name, arg); + *arg_ebadkey = 0; + return EBADKEY; + } + + value = strchr(optstart, '='); + if (value) + value++; + + if (value && !option->arg) + /* Unexpected argument. */ + { + if (token == ARG_LONG_OPTION) + /* --option */ + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: option `--%s' doesn't allow an argument\n"), + parser->state.name, option->name); + else + /* +option or -option */ + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: option `%c%s' doesn't allow an argument\n"), + parser->state.name, arg[0], option->name); + + *arg_ebadkey = 0; + return EBADKEY; + } + + if (option->arg && !value + && !(option->flags & OPTION_ARG_OPTIONAL)) + /* We need an mandatory argument. */ + { + if (parser->state.next == parser->state.argc) + /* Missing argument */ + { + if (token == ARG_LONG_OPTION) + /* --option */ + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: option `--%s' requires an argument\n"), + parser->state.name, option->name); + else + /* +option or -option */ + printf( + dgettext(parser->state.root_argp->argp_domain, + "%s: option `%c%s' requires an argument\n"), + parser->state.name, arg[0], option->name); + + *arg_ebadkey = 0; + return EBADKEY; + } + + value = parser->state.argv[parser->state.next++]; + } + *arg_ebadkey = 0; + return group_parse(group, &parser->state, + option->key, value); + } + case ARG_SHORT_OPTION: + parser->state.next++; + parser->nextchar = optstart; + return 0; + + default: + exit(1); + } + } + } +} + +/* Parse the options strings in ARGC & ARGV according to the argp in ARGP. + FLAGS is one of the ARGP_ flags above. If END_INDEX is non-NULL, the + index in ARGV of the first unparsed option is returned in it. If an + unknown option is present, EINVAL is returned; if some parser routine + returned a non-zero value, it is returned; otherwise 0 is returned. */ +error_t +argp_parse (const struct argp *argp, int argc, char **argv, unsigned flags, + int *end_index, void *input) +{ + error_t err; + struct parser parser; + + /* If true, then err == EBADKEY is a result of a non-option argument failing + to be parsed (which in some cases isn't actually an error). */ + int arg_ebadkey = 0; + + if (! (flags & ARGP_NO_HELP)) + /* Add our own options. */ + { + struct argp_child *child = alloca (4 * sizeof (struct argp_child)); + struct argp *top_argp = alloca (sizeof (struct argp)); + + /* TOP_ARGP has no options, it just serves to group the user & default + argps. */ + memset (top_argp, 0, sizeof (*top_argp)); + top_argp->children = child; + + memset (child, 0, 4 * sizeof (struct argp_child)); + + if (argp) + (child++)->argp = argp; + (child++)->argp = &argp_default_argp; + if (argp_program_version || argp_program_version_hook) + (child++)->argp = &argp_version_argp; + child->argp = 0; + + argp = top_argp; + } + + /* Construct a parser for these arguments. */ + err = parser_init (&parser, argp, argc, argv, flags, input); + + if (! err) + /* Parse! */ + { + while (! err) + err = parser_parse_next (&parser, &arg_ebadkey); + err = parser_finalize (&parser, err, arg_ebadkey, end_index); + } + + return err; +} +#ifdef weak_alias +weak_alias (__argp_parse, argp_parse) +#endif + +/* Return the input field for ARGP in the parser corresponding to STATE; used + by the help routines. */ +void * +__argp_input (const struct argp *argp, const struct argp_state *state) +{ + if (state) + { + struct group *group; + struct parser *parser = state->pstate; + + for (group = parser->groups; group < parser->egroup; group++) + if (group->argp == argp) + return group->input; + } + + return 0; +} +#ifdef weak_alias +weak_alias (__argp_input, _argp_input) +#endif + +/* Defined here, in case a user is not inlining the definitions in + * argp.h */ +void +argp_usage (__const struct argp_state *__state) +{ +// __argp_state_help (__state, ARGP_HELP_STD_USAGE); +} + +int +__option_is_short (__const struct argp_option *__opt) +{ + if (__opt->flags & OPTION_DOC) + return 0; + else + { + int __key = __opt->key; + /* FIXME: whether or not a particular key implies a short option + * ought not to be locale dependent. */ + return __key > 0 && isprint (__key); + } +} + +int +__option_is_end (__const struct argp_option *__opt) +{ + return !__opt->key && !__opt->name && !__opt->doc && !__opt->group; +} + +#endif diff --git a/6-sigsnoop/eunomia-include/argparse/argparse.c b/6-sigsnoop/eunomia-include/argparse/argparse.c new file mode 100644 index 0000000..b42bced --- /dev/null +++ b/6-sigsnoop/eunomia-include/argparse/argparse.c @@ -0,0 +1,403 @@ +#ifndef ARGPARSE_C_H_ +#define ARGPARSE_C_H_ + +/** + * Copyright (C) 2012-2015 Yecheng Fu + * All rights reserved. + * + * Use of this source code is governed by a MIT-style license that can be found + * in the LICENSE file. + */ +#include +#include +#include +#include +#include +#include "argparse.h" + +#define OPT_UNSET 1 +#define OPT_LONG (1 << 1) + +/* We define these the same for all machines. + Changes from this to the outside world should be done in `_exit'. */ +#define EXIT_FAILURE 1 /* Failing exit status. */ +#define EXIT_SUCCESS 0 /* Successful exit status. */ + +static const char * +prefix_skip(const char *str, const char *prefix) +{ + size_t len = strlen(prefix); + return strncmp(str, prefix, len) ? NULL : str + len; +} + +static int +prefix_cmp(const char *str, const char *prefix) +{ + for (;; str++, prefix++) + if (!*prefix) { + return 0; + } else if (*str != *prefix) { + return (unsigned char)*prefix - (unsigned char)*str; + } +} + +static void +argparse_error(struct argparse *self, const struct argparse_option *opt, + const char *reason, int flags) +{ + (void)self; + if (flags & OPT_LONG) { + printf("error: option `--%s` %s\n", opt->long_name, reason); + } else { + printf("error: option `-%c` %s\n", opt->short_name, reason); + } + exit(EXIT_FAILURE); +} + +static int +argparse_getvalue(struct argparse *self, const struct argparse_option *opt, + int flags) +{ + const char *s = NULL; + if (!opt->value) + goto skipped; + switch (opt->type) { + case ARGPARSE_OPT_BOOLEAN: + if (flags & OPT_UNSET) { + *(int *)opt->value = *(int *)opt->value - 1; + } else { + *(int *)opt->value = *(int *)opt->value + 1; + } + if (*(int *)opt->value < 0) { + *(int *)opt->value = 0; + } + break; + case ARGPARSE_OPT_BIT: + if (flags & OPT_UNSET) { + *(int *)opt->value &= ~opt->data; + } else { + *(int *)opt->value |= opt->data; + } + break; + case ARGPARSE_OPT_STRING: + if (self->optvalue) { + *(const char **)opt->value = self->optvalue; + self->optvalue = NULL; + } else if (self->argc > 1) { + self->argc--; + *(const char **)opt->value = *++self->argv; + } else { + argparse_error(self, opt, "requires a value", flags); + } + break; + case ARGPARSE_OPT_INTEGER: + // errno = 0; + if (self->optvalue) { + *(int *)opt->value = strtol(self->optvalue, (char **)&s, 0); + self->optvalue = NULL; + } else if (self->argc > 1) { + self->argc--; + *(int *)opt->value = strtol(*++self->argv, (char **)&s, 0); + } else { + argparse_error(self, opt, "requires a value", flags); + } + // if (errno == ERANGE) + // argparse_error(self, opt, "numerical result out of range", flags); + if (s[0] != '\0') // no digits or contains invalid characters + argparse_error(self, opt, "expects an integer value", flags); + break; + case ARGPARSE_OPT_FLOAT: + // errno = 0; + if (self->optvalue) { + *(float *)opt->value = strtod(self->optvalue, (char **)&s); + self->optvalue = NULL; + } else if (self->argc > 1) { + self->argc--; + *(float *)opt->value = strtod(*++self->argv, (char **)&s); + } else { + argparse_error(self, opt, "requires a value", flags); + } + // if (errno == ERANGE) + // argparse_error(self, opt, "numerical result out of range", flags); + if (s[0] != '\0') // no digits or contains invalid characters + argparse_error(self, opt, "expects a numerical value", flags); + break; + default: + exit(EXIT_FAILURE); + } + +skipped: + if (opt->callback) { + return opt->callback(self, opt); + } + return 0; +} + +static void +argparse_options_check(const struct argparse_option *options) +{ + for (; options->type != ARGPARSE_OPT_END; options++) { + switch (options->type) { + case ARGPARSE_OPT_END: + case ARGPARSE_OPT_BOOLEAN: + case ARGPARSE_OPT_BIT: + case ARGPARSE_OPT_INTEGER: + case ARGPARSE_OPT_FLOAT: + case ARGPARSE_OPT_STRING: + case ARGPARSE_OPT_GROUP: + continue; + default: + printf("wrong option type: %d", options->type); + break; + } + } +} + +static int +argparse_short_opt(struct argparse *self, const struct argparse_option *options) +{ + for (; options->type != ARGPARSE_OPT_END; options++) { + if (options->short_name == *self->optvalue) { + self->optvalue = self->optvalue[1] ? self->optvalue + 1 : NULL; + return argparse_getvalue(self, options, 0); + } + } + return -2; +} + +static int +argparse_long_opt(struct argparse *self, const struct argparse_option *options) +{ + for (; options->type != ARGPARSE_OPT_END; options++) { + const char *rest; + int opt_flags = 0; + if (!options->long_name) + continue; + + rest = prefix_skip(self->argv[0] + 2, options->long_name); + if (!rest) { + // negation disabled? + if (options->flags & OPT_NONEG) { + continue; + } + // only OPT_BOOLEAN/OPT_BIT supports negation + if (options->type != ARGPARSE_OPT_BOOLEAN && options->type != + ARGPARSE_OPT_BIT) { + continue; + } + + if (prefix_cmp(self->argv[0] + 2, "no-")) { + continue; + } + rest = prefix_skip(self->argv[0] + 2 + 3, options->long_name); + if (!rest) + continue; + opt_flags |= OPT_UNSET; + } + if (*rest) { + if (*rest != '=') + continue; + self->optvalue = rest + 1; + } + return argparse_getvalue(self, options, opt_flags | OPT_LONG); + } + return -2; +} + +int +argparse_init(struct argparse *self, struct argparse_option *options, + const char *const *usages, int flags) +{ + memset(self, 0, sizeof(*self)); + self->options = options; + self->usages = usages; + self->flags = flags; + self->description = NULL; + self->epilog = NULL; + return 0; +} + +void +argparse_describe(struct argparse *self, const char *description, + const char *epilog) +{ + self->description = description; + self->epilog = epilog; +} + +int +argparse_parse(struct argparse *self, int argc, const char **argv) +{ + self->argc = argc - 1; + self->argv = argv + 1; + self->out = argv; + + argparse_options_check(self->options); + + for (; self->argc; self->argc--, self->argv++) { + const char *arg = self->argv[0]; + if (arg[0] != '-' || !arg[1]) { + if (self->flags & ARGPARSE_STOP_AT_NON_OPTION) { + goto end; + } + // if it's not option or is a single char '-', copy verbatim + self->out[self->cpidx++] = self->argv[0]; + continue; + } + // short option + if (arg[1] != '-') { + self->optvalue = arg + 1; + switch (argparse_short_opt(self, self->options)) { + case -1: + break; + case -2: + goto unknown; + } + while (self->optvalue) { + switch (argparse_short_opt(self, self->options)) { + case -1: + break; + case -2: + goto unknown; + } + } + continue; + } + // if '--' presents + if (!arg[2]) { + self->argc--; + self->argv++; + break; + } + // long option + switch (argparse_long_opt(self, self->options)) { + case -1: + break; + case -2: + goto unknown; + } + continue; + +unknown: + printf("error: unknown option `%s`\n", self->argv[0]); + argparse_usage(self); + if (!(self->flags & ARGPARSE_IGNORE_UNKNOWN_ARGS)) { + exit(EXIT_FAILURE); + } + } + +end: + memmove(self->out + self->cpidx, self->argv, + self->argc * sizeof(*self->out)); + self->out[self->cpidx + self->argc] = NULL; + + return self->cpidx + self->argc; +} + +void +argparse_usage(struct argparse *self) +{ + if (self->usages) { + printf("Usage: %s\n", *self->usages++); + while (*self->usages && **self->usages) + printf(" or: %s\n", *self->usages++); + } else { + printf("Usage:\n"); + } + + // print description + if (self->description) + printf("%s\n", self->description); + + putchar('\n'); + + const struct argparse_option *options; + + // figure out best width + size_t usage_opts_width = 0; + size_t len; + options = self->options; + for (; options->type != ARGPARSE_OPT_END; options++) { + len = 0; + if ((options)->short_name) { + len += 2; + } + if ((options)->short_name && (options)->long_name) { + len += 2; // separator ", " + } + if ((options)->long_name) { + len += strlen((options)->long_name) + 2; + } + if (options->type == ARGPARSE_OPT_INTEGER) { + len += strlen("="); + } + if (options->type == ARGPARSE_OPT_FLOAT) { + len += strlen("="); + } else if (options->type == ARGPARSE_OPT_STRING) { + len += strlen("="); + } + len = (len + 3) - ((len + 3) & 3); + if (usage_opts_width < len) { + usage_opts_width = len; + } + } + usage_opts_width += 4; // 4 spaces prefix + + options = self->options; + for (; options->type != ARGPARSE_OPT_END; options++) { + size_t pos = 0; + size_t pad = 0; + if (options->type == ARGPARSE_OPT_GROUP) { + putchar('\n'); + printf("%s", options->help); + putchar('\n'); + continue; + } + pos = printf(" "); + if (options->short_name) { + pos += printf("-%c", options->short_name); + } + if (options->long_name && options->short_name) { + pos += printf(", "); + } + if (options->long_name) { + pos += printf("--%s", options->long_name); + } + if (options->type == ARGPARSE_OPT_INTEGER) { + pos += printf("="); + } else if (options->type == ARGPARSE_OPT_FLOAT) { + pos += printf("="); + } else if (options->type == ARGPARSE_OPT_STRING) { + pos += printf("="); + } + if (pos <= usage_opts_width) { + pad = usage_opts_width - pos; + } else { + putchar('\n'); + pad = usage_opts_width; + } + printf(" %s\n", options->help); + } + + // print epilog + if (self->epilog) + printf("%s\n", self->epilog); +} + +int +argparse_help_cb_no_exit(struct argparse *self, + const struct argparse_option *option) +{ + (void)option; + argparse_usage(self); + return (EXIT_SUCCESS); +} + +int +argparse_help_cb(struct argparse *self, const struct argparse_option *option) +{ + argparse_help_cb_no_exit(self, option); + exit(EXIT_SUCCESS); +} + +#endif /* ARGPARSE_C_H */ diff --git a/6-sigsnoop/eunomia-include/argparse/argparse.h b/6-sigsnoop/eunomia-include/argparse/argparse.h new file mode 100644 index 0000000..fd1ddfc --- /dev/null +++ b/6-sigsnoop/eunomia-include/argparse/argparse.h @@ -0,0 +1,133 @@ +/** + * Copyright (C) 2012-2015 Yecheng Fu + * All rights reserved. + * + * Use of this source code is governed by a MIT-style license that can be found + * in the LICENSE file. + */ +#ifndef ARGPARSE_H +#define ARGPARSE_H + +/* For c++ compatibility */ +#ifdef __cplusplus +extern "C" { +#endif + +#include + +struct argparse; +struct argparse_option; + +typedef int argparse_callback (struct argparse *self, + const struct argparse_option *option); + +enum argparse_flag { + ARGPARSE_STOP_AT_NON_OPTION = 1 << 0, + ARGPARSE_IGNORE_UNKNOWN_ARGS = 1 << 1, +}; + +enum argparse_option_type { + /* special */ + ARGPARSE_OPT_END, + ARGPARSE_OPT_GROUP, + /* options with no arguments */ + ARGPARSE_OPT_BOOLEAN, + ARGPARSE_OPT_BIT, + /* options with arguments (optional or required) */ + ARGPARSE_OPT_INTEGER, + ARGPARSE_OPT_FLOAT, + ARGPARSE_OPT_STRING, +}; + +enum argparse_option_flags { + OPT_NONEG = 1, /* disable negation */ +}; + +/** + * argparse option + * + * `type`: + * holds the type of the option, you must have an ARGPARSE_OPT_END last in your + * array. + * + * `short_name`: + * the character to use as a short option name, '\0' if none. + * + * `long_name`: + * the long option name, without the leading dash, NULL if none. + * + * `value`: + * stores pointer to the value to be filled. + * + * `help`: + * the short help message associated to what the option does. + * Must never be NULL (except for ARGPARSE_OPT_END). + * + * `callback`: + * function is called when corresponding argument is parsed. + * + * `data`: + * associated data. Callbacks can use it like they want. + * + * `flags`: + * option flags. + */ +struct argparse_option { + enum argparse_option_type type; + const char short_name; + const char *long_name; + void *value; + const char *help; + argparse_callback *callback; + intptr_t data; + int flags; +}; + +/** + * argpparse + */ +struct argparse { + // user supplied + const struct argparse_option *options; + const char *const *usages; + int flags; + const char *description; // a description after usage + const char *epilog; // a description at the end + // internal context + int argc; + const char **argv; + const char **out; + int cpidx; + const char *optvalue; // current option value +}; + +// built-in callbacks +int argparse_help_cb(struct argparse *self, + const struct argparse_option *option); +int argparse_help_cb_no_exit(struct argparse *self, + const struct argparse_option *option); + +// built-in option macros +#define OPT_END() { ARGPARSE_OPT_END, 0, NULL, NULL, 0, NULL, 0, 0 } +#define OPT_BOOLEAN(...) { ARGPARSE_OPT_BOOLEAN, __VA_ARGS__ } +#define OPT_BIT(...) { ARGPARSE_OPT_BIT, __VA_ARGS__ } +#define OPT_INTEGER(...) { ARGPARSE_OPT_INTEGER, __VA_ARGS__ } +#define OPT_FLOAT(...) { ARGPARSE_OPT_FLOAT, __VA_ARGS__ } +#define OPT_STRING(...) { ARGPARSE_OPT_STRING, __VA_ARGS__ } +#define OPT_GROUP(h) { ARGPARSE_OPT_GROUP, 0, NULL, NULL, h, NULL, 0, 0 } +#define OPT_HELP() OPT_BOOLEAN('h', "help", NULL, \ + "show this help message and exit", \ + argparse_help_cb, 0, OPT_NONEG) + +int argparse_init(struct argparse *self, struct argparse_option *options, + const char *const *usages, int flags); +void argparse_describe(struct argparse *self, const char *description, + const char *epilog); +int argparse_parse(struct argparse *self, int argc, const char **argv); +void argparse_usage(struct argparse *self); + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/6-sigsnoop/eunomia-include/cJSON/cJSON.c b/6-sigsnoop/eunomia-include/cJSON/cJSON.c new file mode 100644 index 0000000..270c1c8 --- /dev/null +++ b/6-sigsnoop/eunomia-include/cJSON/cJSON.c @@ -0,0 +1,2917 @@ +#ifndef CJSON_SRC_H +#define CJSON_SRC_H +/* + Copyright (c) 2009-2017 Dave Gamble and cJSON contributors + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. + */ + +/* cJSON */ +/* JSON parser in C. */ + +/* disable warnings about old C89 functions in MSVC */ +#if !defined(_CRT_SECURE_NO_DEPRECATE) && defined(_MSC_VER) +#define _CRT_SECURE_NO_DEPRECATE +#endif + +#ifdef __GNUC__ +#pragma GCC visibility push(default) +#endif +#if defined(_MSC_VER) +#pragma warning(push) +/* disable warning about single line comments in system headers */ +#pragma warning(disable : 4001) +#endif + +#ifndef true +/* define our own boolean type */ +#define true ((cJSON_bool)1) +#endif +#ifndef false +#define false ((cJSON_bool)0) +#endif + +// declare to disable warning +void *realloc(void *__ptr, size_t __size); + +#include "cJSON.h" +#include +#include +#include +#include +#include + +// a basic strtod implementation +double +strtod(const char *str, char **endptr) +{ + double result = 0.0; + char signedResult = '\0'; + char signedExponent = '\0'; + int decimals = 0; + int isExponent = false; + int hasExponent = false; + int hasResult = false; + // exponent is logically int but is coded as double so that its eventual + // overflow detection can be the same as for double result + double exponent = 0; + char c; + + for (; '\0' != (c = *str); ++str) { + if ((c >= '0') && (c <= '9')) { + int digit = c - '0'; + if (isExponent) { + exponent = (10 * exponent) + digit; + hasExponent = true; + } + else if (decimals == 0) { + result = (10 * result) + digit; + hasResult = true; + } + else { + result += (double)digit / decimals; + decimals *= 10; + } + continue; + } + + if (c == '.') { + if (!hasResult) + break; // don't allow leading '.' + if (isExponent) + break; // don't allow decimal places in exponent + if (decimals != 0) + break; // this is the 2nd time we've found a '.' + + decimals = 10; + continue; + } + + if ((c == '-') || (c == '+')) { + if (isExponent) { + if (signedExponent || (exponent != 0)) + break; + else + signedExponent = c; + } + else { + if (signedResult || (result != 0)) + break; + else + signedResult = c; + } + continue; + } + + if (c == 'E') { + if (!hasResult) + break; // don't allow leading 'E' + if (isExponent) + break; + else + isExponent = true; + continue; + } + + break; // unexpected character + } + + if (isExponent && !hasExponent) { + while (*str != 'E') + --str; + } + + if (!hasResult && signedResult) + --str; + + if (endptr) + *endptr = (char *)(str); + + for (; exponent != 0; --exponent) { + if (signedExponent == '-') + result /= 10; + else + result *= 10; + } + + if (signedResult == '-' && result != 0) + result = -result; + + return result; +} + +#ifdef ENABLE_LOCALES +#include +#endif + +#if defined(_MSC_VER) +#pragma warning(pop) +#endif +#ifdef __GNUC__ +#pragma GCC visibility pop +#endif + +typedef struct { + const unsigned char *json; + size_t position; +} error; +static error global_error = { NULL, 0 }; + +CJSON_PUBLIC(const char *) cJSON_GetErrorPtr(void) +{ + return (const char *)(global_error.json + global_error.position); +} + +CJSON_PUBLIC(char *) cJSON_GetStringValue(cJSON *item) +{ + if (!cJSON_IsString(item)) { + return NULL; + } + + return item->valuestring; +} + +/* This is a safeguard to prevent copy-pasters from using incompatible C and + * header files */ +#if (CJSON_VERSION_MAJOR != 1) || (CJSON_VERSION_MINOR != 7) \ + || (CJSON_VERSION_PATCH != 10) +#error cJSON.h and cJSON.c have different versions. Make sure that both have the same. +#endif + +CJSON_PUBLIC(const char *) cJSON_Version(void) +{ + static char version[15]; + snprintf(version, sizeof(version), "%i.%i.%i", CJSON_VERSION_MAJOR, + CJSON_VERSION_MINOR, CJSON_VERSION_PATCH); + + return version; +} + +/* Case insensitive string comparison, doesn't consider two NULL pointers equal + * though */ +static int +case_insensitive_strcmp(const unsigned char *string1, + const unsigned char *string2) +{ + if ((string1 == NULL) || (string2 == NULL)) { + return 1; + } + + if (string1 == string2) { + return 0; + } + + for (; tolower(*string1) == tolower(*string2); (void)string1++, string2++) { + if (*string1 == '\0') { + return 0; + } + } + + return tolower(*string1) - tolower(*string2); +} + +typedef struct internal_hooks { + void *(CJSON_CDECL *allocate)(size_t size); + void(CJSON_CDECL *deallocate)(void *pointer); + void *(CJSON_CDECL *reallocate)(void *pointer, size_t size); +} internal_hooks; + +#if defined(_MSC_VER) +/* work around MSVC error C2322: '...' address of dillimport '...' + is not static */ +static void *CJSON_CDECL +internal_malloc(size_t size) +{ + return malloc(size); +} +static void CJSON_CDECL +internal_free(void *pointer) +{ + free(pointer); +} +static void *CJSON_CDECL +internal_realloc(void *pointer, size_t size) +{ + return realloc(pointer, size); +} +#else +#define internal_malloc malloc +#define internal_free free +#define internal_realloc realloc +#endif + +/* clang-format off */ +static internal_hooks global_hooks = { + internal_malloc, + internal_free, + internal_realloc +}; +/* clang-format on */ + +static unsigned char * +cJSON_strdup(const unsigned char *string, const internal_hooks *const hooks) +{ + size_t length = 0; + unsigned char *copy = NULL; + + if (string == NULL) { + return NULL; + } + + length = strlen((const char *)string) + sizeof(""); + copy = (unsigned char *)hooks->allocate(length); + if (copy == NULL) { + return NULL; + } + memcpy(copy, string, length); + + return copy; +} + +CJSON_PUBLIC(void) cJSON_InitHooks(cJSON_Hooks *hooks) +{ + if (hooks == NULL) { + /* Reset hooks */ + global_hooks.allocate = malloc; + global_hooks.deallocate = free; + global_hooks.reallocate = realloc; + return; + } + + global_hooks.allocate = malloc; + if (hooks->malloc_fn != NULL) { + global_hooks.allocate = hooks->malloc_fn; + } + + global_hooks.deallocate = free; + if (hooks->free_fn != NULL) { + global_hooks.deallocate = hooks->free_fn; + } + + /* use realloc only if both free and malloc are used */ + global_hooks.reallocate = NULL; + if ((global_hooks.allocate == malloc) + && (global_hooks.deallocate == free)) { + global_hooks.reallocate = realloc; + } +} + +/* Internal constructor. */ +static cJSON * +cJSON_New_Item(const internal_hooks *const hooks) +{ + cJSON *node = (cJSON *)hooks->allocate(sizeof(cJSON)); + if (node) { + memset(node, '\0', sizeof(cJSON)); + } + + return node; +} + +/* Delete a cJSON structure. */ +CJSON_PUBLIC(void) cJSON_Delete(cJSON *item) +{ + cJSON *next = NULL; + while (item != NULL) { + next = item->next; + if (!(item->type & cJSON_IsReference) && (item->child != NULL)) { + cJSON_Delete(item->child); + } + if (!(item->type & cJSON_IsReference) && (item->valuestring != NULL)) { + global_hooks.deallocate(item->valuestring); + } + if (!(item->type & cJSON_StringIsConst) && (item->string != NULL)) { + global_hooks.deallocate(item->string); + } + global_hooks.deallocate(item); + item = next; + } +} + +/* get the decimal point character of the current locale */ +static unsigned char +get_decimal_point(void) +{ +#ifdef ENABLE_LOCALES + struct lconv *lconv = localeconv(); + return (unsigned char)lconv->decimal_point[0]; +#else + return '.'; +#endif +} + +typedef struct { + const unsigned char *content; + size_t length; + size_t offset; + size_t depth; /* How deeply nested (in arrays/objects) is the input at the + current offset. */ + internal_hooks hooks; +} parse_buffer; + +/* check if the given size is left to read in a given parse buffer (starting + * with 1) */ +#define can_read(buffer, size) \ + ((buffer != NULL) && (((buffer)->offset + size) <= (buffer)->length)) +/* check if the buffer can be accessed at the given index (starting with 0) */ +#define can_access_at_index(buffer, index) \ + ((buffer != NULL) && (((buffer)->offset + index) < (buffer)->length)) +#define cannot_access_at_index(buffer, index) \ + (!can_access_at_index(buffer, index)) +/* get a pointer to the buffer at the position */ +#define buffer_at_offset(buffer) ((buffer)->content + (buffer)->offset) + +/* Parse the input text to generate a number, and populate the result + into item. */ +static cJSON_bool +parse_number(cJSON *const item, parse_buffer *const input_buffer) +{ + double number = 0; + unsigned char *after_end = NULL; + unsigned char number_c_string[64]; + unsigned char decimal_point = get_decimal_point(); + size_t i = 0; + + if ((input_buffer == NULL) || (input_buffer->content == NULL)) { + return false; + } + + /* copy the number into a temporary buffer and replace '.' with the decimal + * point of the current locale (for strtod) + * This also takes care of '\0' not necessarily being available for marking + * the end of the input */ + for (i = 0; (i < (sizeof(number_c_string) - 1)) + && can_access_at_index(input_buffer, i); + i++) { + switch (buffer_at_offset(input_buffer)[i]) { + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + case '+': + case '-': + case 'e': + case 'E': + number_c_string[i] = buffer_at_offset(input_buffer)[i]; + break; + + case '.': + number_c_string[i] = decimal_point; + break; + + default: + goto loop_end; + } + } +loop_end: + number_c_string[i] = '\0'; + + number = strtod((const char *)number_c_string, (char **)&after_end); + if (number_c_string == after_end) { + return false; /* parse_error */ + } + + item->valuedouble = number; + + /* use saturation in case of overflow */ + if (number >= INT_MAX) { + item->valueint = INT_MAX; + } + else if (number <= (double)INT_MIN) { + item->valueint = INT_MIN; + } + else { + item->valueint = (int)number; + } + + item->type = cJSON_Number; + + input_buffer->offset += (size_t)(after_end - number_c_string); + return true; +} + +/* don't ask me, but the original cJSON_SetNumberValue returns an integer or + * double */ +CJSON_PUBLIC(double) cJSON_SetNumberHelper(cJSON *object, double number) +{ + if (number >= INT_MAX) { + object->valueint = INT_MAX; + } + else if (number <= (double)INT_MIN) { + object->valueint = INT_MIN; + } + else { + object->valueint = (int)number; + } + + return object->valuedouble = number; +} + +typedef struct { + unsigned char *buffer; + size_t length; + size_t offset; + size_t depth; /* current nesting depth (for formatted printing) */ + cJSON_bool noalloc; + cJSON_bool format; /* is this print a formatted print */ + internal_hooks hooks; +} printbuffer; + +/* realloc printbuffer if necessary to have at least "needed" bytes more */ +static unsigned char * +ensure(printbuffer *const p, size_t needed) +{ + unsigned char *newbuffer = NULL; + size_t newsize = 0; + + if ((p == NULL) || (p->buffer == NULL)) { + return NULL; + } + + if ((p->length > 0) && (p->offset >= p->length)) { + /* make sure that offset is valid */ + return NULL; + } + + if (needed > INT_MAX) { + /* sizes bigger than INT_MAX are currently not supported */ + return NULL; + } + + needed += p->offset + 1; + if (needed <= p->length) { + return p->buffer + p->offset; + } + + if (p->noalloc) { + return NULL; + } + + /* calculate new buffer size */ + if (needed > (INT_MAX / 2)) { + /* overflow of int, use INT_MAX if possible */ + if (needed <= INT_MAX) { + newsize = INT_MAX; + } + else { + return NULL; + } + } + else { + newsize = needed * 2; + } + + if (p->hooks.reallocate != NULL) { + /* reallocate with realloc if available */ + newbuffer = (unsigned char *)p->hooks.reallocate(p->buffer, newsize); + if (newbuffer == NULL) { + p->hooks.deallocate(p->buffer); + p->length = 0; + p->buffer = NULL; + + return NULL; + } + } + else { + /* otherwise reallocate manually */ + newbuffer = (unsigned char *)p->hooks.allocate(newsize); + if (!newbuffer) { + p->hooks.deallocate(p->buffer); + p->length = 0; + p->buffer = NULL; + + return NULL; + } + if (newbuffer) { + memcpy(newbuffer, p->buffer, p->offset + 1); + } + p->hooks.deallocate(p->buffer); + } + p->length = newsize; + p->buffer = newbuffer; + + return newbuffer + p->offset; +} + +/* calculate the new length of the string in a printbuffer and update the offset + */ +static void +update_offset(printbuffer *const buffer) +{ + const unsigned char *buffer_pointer = NULL; + if ((buffer == NULL) || (buffer->buffer == NULL)) { + return; + } + buffer_pointer = buffer->buffer + buffer->offset; + + buffer->offset += strlen((const char *)buffer_pointer); +} + +/* Render the number nicely from the given item into a string. */ +static cJSON_bool +print_number(const cJSON *const item, printbuffer *const output_buffer) +{ + unsigned char *output_pointer = NULL; + double d = item->valuedouble; + int length = 0; + size_t i = 0; + unsigned char + number_buffer[26]; /* temporary buffer to print the number into */ + unsigned char decimal_point = get_decimal_point(); + double test; + + if (output_buffer == NULL) { + return false; + } + + /* This checks for NaN and Infinity */ + if ((d * 0) != 0) { + length = snprintf((char *)number_buffer, sizeof(number_buffer), "null"); + } + else { + /* Try 15 decimal places of precision to avoid nonsignificant nonzero + * digits */ + length = + snprintf((char *)number_buffer, sizeof(number_buffer), "%1.15g", d); + } + + /* snprintf failed or buffer overrun occured */ + if ((length < 0) || (length > (int)(sizeof(number_buffer) - 1))) { + return false; + } + + /* reserve appropriate space in the output */ + output_pointer = ensure(output_buffer, (size_t)length + sizeof("")); + if (output_pointer == NULL) { + return false; + } + + /* copy the printed number to the output and replace locale + * dependent decimal point with '.' */ + for (i = 0; i < ((size_t)length); i++) { + if (number_buffer[i] == decimal_point) { + output_pointer[i] = '.'; + continue; + } + + output_pointer[i] = number_buffer[i]; + } + output_pointer[i] = '\0'; + + output_buffer->offset += (size_t)length; + + return true; +} + +/* parse 4 digit hexadecimal number */ +static unsigned +parse_hex4(const unsigned char *const input) +{ + unsigned int h = 0; + size_t i = 0; + + for (i = 0; i < 4; i++) { + /* parse digit */ + if ((input[i] >= '0') && (input[i] <= '9')) { + h += (unsigned int)input[i] - '0'; + } + else if ((input[i] >= 'A') && (input[i] <= 'F')) { + h += (unsigned int)10 + input[i] - 'A'; + } + else if ((input[i] >= 'a') && (input[i] <= 'f')) { + h += (unsigned int)10 + input[i] - 'a'; + } + else /* invalid */ + { + return 0; + } + + if (i < 3) { + /* shift left to make place for the next nibble */ + h = h << 4; + } + } + + return h; +} + +/* converts a UTF-16 literal to UTF-8 + * A literal can be one or two sequences of the form \uXXXX */ +static unsigned char +utf16_literal_to_utf8(const unsigned char *const input_pointer, + const unsigned char *const input_end, + unsigned char **output_pointer) +{ + long unsigned int codepoint = 0; + unsigned int first_code = 0; + const unsigned char *first_sequence = input_pointer; + unsigned char utf8_length = 0; + unsigned char utf8_position = 0; + unsigned char sequence_length = 0; + unsigned char first_byte_mark = 0; + + if ((input_end - first_sequence) < 6) { + /* input ends unexpectedly */ + goto fail; + } + + /* get the first utf16 sequence */ + first_code = parse_hex4(first_sequence + 2); + + /* check that the code is valid */ + if (((first_code >= 0xDC00) && (first_code <= 0xDFFF))) { + goto fail; + } + + /* UTF16 surrogate pair */ + if ((first_code >= 0xD800) && (first_code <= 0xDBFF)) { + const unsigned char *second_sequence = first_sequence + 6; + unsigned int second_code = 0; + sequence_length = 12; /* \uXXXX\uXXXX */ + + if ((input_end - second_sequence) < 6) { + /* input ends unexpectedly */ + goto fail; + } + + if ((second_sequence[0] != '\\') || (second_sequence[1] != 'u')) { + /* missing second half of the surrogate pair */ + goto fail; + } + + /* get the second utf16 sequence */ + second_code = parse_hex4(second_sequence + 2); + /* check that the code is valid */ + if ((second_code < 0xDC00) || (second_code > 0xDFFF)) { + /* invalid second half of the surrogate pair */ + goto fail; + } + + /* calculate the unicode codepoint from the surrogate pair */ + codepoint = + 0x10000 + (((first_code & 0x3FF) << 10) | (second_code & 0x3FF)); + } + else { + sequence_length = 6; /* \uXXXX */ + codepoint = first_code; + } + + /* encode as UTF-8 + * takes at maximum 4 bytes to encode: + * 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx */ + if (codepoint < 0x80) { + /* normal ascii, encoding 0xxxxxxx */ + utf8_length = 1; + } + else if (codepoint < 0x800) { + /* two bytes, encoding 110xxxxx 10xxxxxx */ + utf8_length = 2; + first_byte_mark = 0xC0; /* 11000000 */ + } + else if (codepoint < 0x10000) { + /* three bytes, encoding 1110xxxx 10xxxxxx 10xxxxxx */ + utf8_length = 3; + first_byte_mark = 0xE0; /* 11100000 */ + } + else if (codepoint <= 0x10FFFF) { + /* four bytes, encoding 1110xxxx 10xxxxxx 10xxxxxx 10xxxxxx */ + utf8_length = 4; + first_byte_mark = 0xF0; /* 11110000 */ + } + else { + /* invalid unicode codepoint */ + goto fail; + } + + /* encode as utf8 */ + for (utf8_position = (unsigned char)(utf8_length - 1); utf8_position > 0; + utf8_position--) { + /* 10xxxxxx */ + (*output_pointer)[utf8_position] = + (unsigned char)((codepoint | 0x80) & 0xBF); + codepoint >>= 6; + } + /* encode first byte */ + if (utf8_length > 1) { + (*output_pointer)[0] = + (unsigned char)((codepoint | first_byte_mark) & 0xFF); + } + else { + (*output_pointer)[0] = (unsigned char)(codepoint & 0x7F); + } + + *output_pointer += utf8_length; + + return sequence_length; + +fail: + return 0; +} + +/* Parse the input text into an unescaped cinput, and populate item. */ +static cJSON_bool +parse_string(cJSON *const item, parse_buffer *const input_buffer) +{ + const unsigned char *input_pointer = buffer_at_offset(input_buffer) + 1; + const unsigned char *input_end = buffer_at_offset(input_buffer) + 1; + unsigned char *output_pointer = NULL; + unsigned char *output = NULL; + + /* not a string */ + if (buffer_at_offset(input_buffer)[0] != '\"') { + goto fail; + } + + { + /* calculate approximate size of the output (overestimate) */ + size_t allocation_length = 0; + size_t skipped_bytes = 0; + while ( + ((size_t)(input_end - input_buffer->content) < input_buffer->length) + && (*input_end != '\"')) { + /* is escape sequence */ + if (input_end[0] == '\\') { + if ((size_t)(input_end + 1 - input_buffer->content) + >= input_buffer->length) { + /* prevent buffer overflow when last input character is a + * backslash */ + goto fail; + } + skipped_bytes++; + input_end++; + } + input_end++; + } + if (((size_t)(input_end - input_buffer->content) + >= input_buffer->length) + || (*input_end != '\"')) { + goto fail; + /* string ended unexpectedly */ + } + + /* This is at most how much we need for the output */ + allocation_length = (size_t)(input_end - buffer_at_offset(input_buffer)) + - skipped_bytes; + output = (unsigned char *)input_buffer->hooks.allocate(allocation_length + + sizeof("")); + if (output == NULL) { + goto fail; + /* allocation failure */ + } + } + + output_pointer = output; + /* loop through the string literal */ + while (input_pointer < input_end) { + if (*input_pointer != '\\') { + *output_pointer++ = *input_pointer++; + } + /* escape sequence */ + else { + unsigned char sequence_length = 2; + if ((input_end - input_pointer) < 1) { + goto fail; + } + + switch (input_pointer[1]) { + case 'b': + *output_pointer++ = '\b'; + break; + case 'f': + *output_pointer++ = '\f'; + break; + case 'n': + *output_pointer++ = '\n'; + break; + case 'r': + *output_pointer++ = '\r'; + break; + case 't': + *output_pointer++ = '\t'; + break; + case '\"': + case '\\': + case '/': + *output_pointer++ = input_pointer[1]; + break; + + /* UTF-16 literal */ + case 'u': + sequence_length = utf16_literal_to_utf8( + input_pointer, input_end, &output_pointer); + if (sequence_length == 0) { + /* failed to convert UTF16-literal to UTF-8 */ + goto fail; + } + break; + + default: + goto fail; + } + input_pointer += sequence_length; + } + } + + /* zero terminate the output */ + *output_pointer = '\0'; + + item->type = cJSON_String; + item->valuestring = (char *)output; + + input_buffer->offset = (size_t)(input_end - input_buffer->content); + input_buffer->offset++; + + return true; + +fail: + if (output != NULL) { + input_buffer->hooks.deallocate(output); + } + + if (input_pointer != NULL) { + input_buffer->offset = (size_t)(input_pointer - input_buffer->content); + } + + return false; +} + +/* Render the cstring provided to an escaped version that can be printed. */ +static cJSON_bool +print_string_ptr(const unsigned char *const input, + printbuffer *const output_buffer) +{ + const unsigned char *input_pointer = NULL; + unsigned char *output = NULL, *output_end; + unsigned char *output_pointer = NULL; + size_t output_length = 0; + /* numbers of additional characters needed for escaping */ + size_t escape_characters = 0; + + if (output_buffer == NULL) { + return false; + } + + /* empty string */ + if (input == NULL) { + output = ensure(output_buffer, sizeof("\"\"")); + if (output == NULL) { + return false; + } + strcpy((char *)output, "\"\""); + + return true; + } + + /* set "flag" to 1 if something needs to be escaped */ + for (input_pointer = input; *input_pointer; input_pointer++) { + switch (*input_pointer) { + case '\"': + case '\\': + case '\b': + case '\f': + case '\n': + case '\r': + case '\t': + /* one character escape sequence */ + escape_characters++; + break; + default: + if (*input_pointer < 32) { + /* UTF-16 escape sequence uXXXX */ + escape_characters += 5; + } + break; + } + } + output_length = (size_t)(input_pointer - input) + escape_characters; + + output = ensure(output_buffer, output_length + sizeof("\"\"")); + if (output == NULL) { + return false; + } + output_end = output + output_length + sizeof("\"\""); + + /* no characters have to be escaped */ + if (escape_characters == 0) { + output[0] = '\"'; + memcpy(output + 1, input, output_length); + output[output_length + 1] = '\"'; + output[output_length + 2] = '\0'; + + return true; + } + + output[0] = '\"'; + output_pointer = output + 1; + /* copy the string */ + for (input_pointer = input; *input_pointer != '\0'; + (void)input_pointer++, output_pointer++) { + if ((*input_pointer > 31) && (*input_pointer != '\"') + && (*input_pointer != '\\')) { + /* normal character, copy */ + *output_pointer = *input_pointer; + } + else { + /* character needs to be escaped */ + *output_pointer++ = '\\'; + switch (*input_pointer) { + case '\\': + *output_pointer = '\\'; + break; + case '\"': + *output_pointer = '\"'; + break; + case '\b': + *output_pointer = 'b'; + break; + case '\f': + *output_pointer = 'f'; + break; + case '\n': + *output_pointer = 'n'; + break; + case '\r': + *output_pointer = 'r'; + break; + case '\t': + *output_pointer = 't'; + break; + default: + /* escape and print as unicode codepoint */ + snprintf((char *)output_pointer, + output_end - output_pointer, "u%04x", + *input_pointer); + output_pointer += 4; + break; + } + } + } + output[output_length + 1] = '\"'; + output[output_length + 2] = '\0'; + + return true; +} + +/* Invoke print_string_ptr (which is useful) on an item. */ +static cJSON_bool +print_string(const cJSON *const item, printbuffer *const p) +{ + return print_string_ptr((unsigned char *)item->valuestring, p); +} + +/* Predeclare these prototypes. */ +static cJSON_bool +parse_value(cJSON *const item, parse_buffer *const input_buffer); +static cJSON_bool +print_value(const cJSON *const item, printbuffer *const output_buffer); +static cJSON_bool +parse_array(cJSON *const item, parse_buffer *const input_buffer); +static cJSON_bool +print_array(const cJSON *const item, printbuffer *const output_buffer); +static cJSON_bool +parse_object(cJSON *const item, parse_buffer *const input_buffer); +static cJSON_bool +print_object(const cJSON *const item, printbuffer *const output_buffer); + +/* Utility to jump whitespace and cr/lf */ +static parse_buffer * +buffer_skip_whitespace(parse_buffer *const buffer) +{ + if ((buffer == NULL) || (buffer->content == NULL)) { + return NULL; + } + + while (can_access_at_index(buffer, 0) + && (buffer_at_offset(buffer)[0] <= 32)) { + buffer->offset++; + } + + if (buffer->offset == buffer->length) { + buffer->offset--; + } + + return buffer; +} + +/* skip the UTF-8 BOM (byte order mark) if it is at the beginning of a buffer */ +static parse_buffer * +skip_utf8_bom(parse_buffer *const buffer) +{ + if ((buffer == NULL) || (buffer->content == NULL) + || (buffer->offset != 0)) { + return NULL; + } + + if (can_access_at_index(buffer, 4) + && (strncmp((const char *)buffer_at_offset(buffer), "\xEF\xBB\xBF", 3) + == 0)) { + buffer->offset += 3; + } + + return buffer; +} + +/* Parse an object - create a new root, and populate. */ +CJSON_PUBLIC(cJSON *) +cJSON_ParseWithOpts(const char *value, const char **return_parse_end, + cJSON_bool require_null_terminated) +{ + parse_buffer buffer = { 0, 0, 0, 0, { 0, 0, 0 } }; + cJSON *item = NULL; + + /* reset error position */ + global_error.json = NULL; + global_error.position = 0; + + if (value == NULL) { + goto fail; + } + + buffer.content = (const unsigned char *)value; + buffer.length = strlen((const char *)value) + sizeof(""); + buffer.offset = 0; + buffer.hooks = global_hooks; + + item = cJSON_New_Item(&global_hooks); + if (item == NULL) /* memory fail */ + { + goto fail; + } + + if (!parse_value(item, buffer_skip_whitespace(skip_utf8_bom(&buffer)))) { + /* parse failure. ep is set. */ + goto fail; + } + + /* if we require null-terminated JSON without appended garbage, skip and + * then check for a null terminator */ + if (require_null_terminated) { + buffer_skip_whitespace(&buffer); + if ((buffer.offset >= buffer.length) + || buffer_at_offset(&buffer)[0] != '\0') { + goto fail; + } + } + if (return_parse_end) { + *return_parse_end = (const char *)buffer_at_offset(&buffer); + } + + return item; + +fail: + if (item != NULL) { + cJSON_Delete(item); + } + + if (value != NULL) { + error local_error; + local_error.json = (const unsigned char *)value; + local_error.position = 0; + + if (buffer.offset < buffer.length) { + local_error.position = buffer.offset; + } + else if (buffer.length > 0) { + local_error.position = buffer.length - 1; + } + + if (return_parse_end != NULL) { + *return_parse_end = + (const char *)local_error.json + local_error.position; + } + + global_error = local_error; + } + + return NULL; +} + +/* Default options for cJSON_Parse */ +CJSON_PUBLIC(cJSON *) cJSON_Parse(const char *value) +{ + return cJSON_ParseWithOpts(value, 0, 0); +} + +#define cjson_min(a, b) ((a < b) ? a : b) + +static unsigned char * +print(const cJSON *const item, cJSON_bool format, + const internal_hooks *const hooks) +{ + static const size_t default_buffer_size = 256; + printbuffer buffer[1]; + unsigned char *printed = NULL; + + memset(buffer, 0, sizeof(buffer)); + + /* create buffer */ + buffer->buffer = (unsigned char *)hooks->allocate(default_buffer_size); + buffer->length = default_buffer_size; + buffer->format = format; + buffer->hooks = *hooks; + if (buffer->buffer == NULL) { + goto fail; + } + + /* print the value */ + if (!print_value(item, buffer)) { + goto fail; + } + update_offset(buffer); + + /* check if reallocate is available */ + if (hooks->reallocate != NULL) { + printed = (unsigned char *)hooks->reallocate(buffer->buffer, + buffer->offset + 1); + if (printed == NULL) { + goto fail; + } + buffer->buffer = NULL; + } + else /* otherwise copy the JSON over to a new buffer */ + { + printed = (unsigned char *)hooks->allocate(buffer->offset + 1); + if (printed == NULL) { + goto fail; + } + memcpy(printed, buffer->buffer, + cjson_min(buffer->length, buffer->offset + 1)); + printed[buffer->offset] = '\0'; /* just to be sure */ + + /* free the buffer */ + hooks->deallocate(buffer->buffer); + } + + return printed; + +fail: + if (buffer->buffer != NULL) { + hooks->deallocate(buffer->buffer); + } + + if (printed != NULL) { + hooks->deallocate(printed); + } + + return NULL; +} + +/* Render a cJSON item/entity/structure to text. */ +CJSON_PUBLIC(char *) cJSON_Print(const cJSON *item) +{ + return (char *)print(item, true, &global_hooks); +} + +CJSON_PUBLIC(char *) cJSON_PrintUnformatted(const cJSON *item) +{ + return (char *)print(item, false, &global_hooks); +} + +CJSON_PUBLIC(char *) +cJSON_PrintBuffered(const cJSON *item, int prebuffer, cJSON_bool fmt) +{ + printbuffer p = { 0, 0, 0, 0, 0, 0, { 0, 0, 0 } }; + + if (prebuffer < 0) { + return NULL; + } + + p.buffer = (unsigned char *)global_hooks.allocate((size_t)prebuffer); + if (!p.buffer) { + return NULL; + } + + p.length = (size_t)prebuffer; + p.offset = 0; + p.noalloc = false; + p.format = fmt; + p.hooks = global_hooks; + + if (!print_value(item, &p)) { + global_hooks.deallocate(p.buffer); + return NULL; + } + + return (char *)p.buffer; +} + +CJSON_PUBLIC(cJSON_bool) +cJSON_PrintPreallocated(cJSON *item, char *buf, const int len, + const cJSON_bool fmt) +{ + printbuffer p = { 0, 0, 0, 0, 0, 0, { 0, 0, 0 } }; + + if ((len < 0) || (buf == NULL)) { + return false; + } + + p.buffer = (unsigned char *)buf; + p.length = (size_t)len; + p.offset = 0; + p.noalloc = true; + p.format = fmt; + p.hooks = global_hooks; + + return print_value(item, &p); +} + +/* Parser core - when encountering text, process appropriately. */ +static cJSON_bool +parse_value(cJSON *const item, parse_buffer *const input_buffer) +{ + if ((input_buffer == NULL) || (input_buffer->content == NULL)) { + return false; /* no input */ + } + + /* parse the different types of values */ + /* null */ + if (can_read(input_buffer, 4) + && (strncmp((const char *)buffer_at_offset(input_buffer), "null", 4) + == 0)) { + item->type = cJSON_NULL; + input_buffer->offset += 4; + return true; + } + /* false */ + if (can_read(input_buffer, 5) + && (strncmp((const char *)buffer_at_offset(input_buffer), "false", 5) + == 0)) { + item->type = cJSON_False; + input_buffer->offset += 5; + return true; + } + /* true */ + if (can_read(input_buffer, 4) + && (strncmp((const char *)buffer_at_offset(input_buffer), "true", 4) + == 0)) { + item->type = cJSON_True; + item->valueint = 1; + input_buffer->offset += 4; + return true; + } + /* string */ + if (can_access_at_index(input_buffer, 0) + && (buffer_at_offset(input_buffer)[0] == '\"')) { + return parse_string(item, input_buffer); + } + /* number */ + if (can_access_at_index(input_buffer, 0) + && ((buffer_at_offset(input_buffer)[0] == '-') + || ((buffer_at_offset(input_buffer)[0] >= '0') + && (buffer_at_offset(input_buffer)[0] <= '9')))) { + return parse_number(item, input_buffer); + } + /* array */ + if (can_access_at_index(input_buffer, 0) + && (buffer_at_offset(input_buffer)[0] == '[')) { + return parse_array(item, input_buffer); + } + /* object */ + if (can_access_at_index(input_buffer, 0) + && (buffer_at_offset(input_buffer)[0] == '{')) { + return parse_object(item, input_buffer); + } + + return false; +} + +/* Render a value to text. */ +static cJSON_bool +print_value(const cJSON *const item, printbuffer *const output_buffer) +{ + unsigned char *output = NULL; + + if ((item == NULL) || (output_buffer == NULL)) { + return false; + } + + switch ((item->type) & 0xFF) { + case cJSON_NULL: + output = ensure(output_buffer, 5); + if (output == NULL) { + return false; + } + strcpy((char *)output, "null"); + return true; + + case cJSON_False: + output = ensure(output_buffer, 6); + if (output == NULL) { + return false; + } + strcpy((char *)output, "false"); + return true; + + case cJSON_True: + output = ensure(output_buffer, 5); + if (output == NULL) { + return false; + } + strcpy((char *)output, "true"); + return true; + + case cJSON_Number: + return print_number(item, output_buffer); + + case cJSON_Raw: + { + size_t raw_length = 0; + if (item->valuestring == NULL) { + return false; + } + + raw_length = strlen(item->valuestring) + sizeof(""); + output = ensure(output_buffer, raw_length); + if (output == NULL) { + return false; + } + memcpy(output, item->valuestring, raw_length); + return true; + } + + case cJSON_String: + return print_string(item, output_buffer); + + case cJSON_Array: + return print_array(item, output_buffer); + + case cJSON_Object: + return print_object(item, output_buffer); + + default: + return false; + } +} + +/* Build an array from input text. */ +static cJSON_bool +parse_array(cJSON *const item, parse_buffer *const input_buffer) +{ + cJSON *head = NULL; /* head of the linked list */ + cJSON *current_item = NULL; + + if (input_buffer->depth >= CJSON_NESTING_LIMIT) { + return false; /* to deeply nested */ + } + input_buffer->depth++; + + if (buffer_at_offset(input_buffer)[0] != '[') { + /* not an array */ + goto fail; + } + + input_buffer->offset++; + buffer_skip_whitespace(input_buffer); + if (can_access_at_index(input_buffer, 0) + && (buffer_at_offset(input_buffer)[0] == ']')) { + /* empty array */ + goto success; + } + + /* check if we skipped to the end of the buffer */ + if (cannot_access_at_index(input_buffer, 0)) { + input_buffer->offset--; + goto fail; + } + + /* step back to character in front of the first element */ + input_buffer->offset--; + /* loop through the comma separated array elements */ + do { + /* allocate next item */ + cJSON *new_item = cJSON_New_Item(&(input_buffer->hooks)); + if (new_item == NULL) { + goto fail; + /* allocation failure */ + } + + /* attach next item to list */ + if (head == NULL) { + /* start the linked list */ + current_item = head = new_item; + } + else { + /* add to the end and advance */ + current_item->next = new_item; + new_item->prev = current_item; + current_item = new_item; + } + + /* parse next value */ + input_buffer->offset++; + buffer_skip_whitespace(input_buffer); + if (!parse_value(current_item, input_buffer)) { + goto fail; + /* failed to parse value */ + } + buffer_skip_whitespace(input_buffer); + } while (can_access_at_index(input_buffer, 0) + && (buffer_at_offset(input_buffer)[0] == ',')); + + if (cannot_access_at_index(input_buffer, 0) + || buffer_at_offset(input_buffer)[0] != ']') { + goto fail; + /* expected end of array */ + } + +success: + input_buffer->depth--; + + item->type = cJSON_Array; + item->child = head; + + input_buffer->offset++; + + return true; + +fail: + if (head != NULL) { + cJSON_Delete(head); + } + + return false; +} + +/* Render an array to text */ +static cJSON_bool +print_array(const cJSON *const item, printbuffer *const output_buffer) +{ + unsigned char *output_pointer = NULL; + size_t length = 0; + cJSON *current_element = item->child; + + if (output_buffer == NULL) { + return false; + } + + /* Compose the output array. */ + /* opening square bracket */ + output_pointer = ensure(output_buffer, 1); + if (output_pointer == NULL) { + return false; + } + + *output_pointer = '['; + output_buffer->offset++; + output_buffer->depth++; + + while (current_element != NULL) { + if (!print_value(current_element, output_buffer)) { + return false; + } + update_offset(output_buffer); + if (current_element->next) { + length = (size_t)(output_buffer->format ? 2 : 1); + output_pointer = ensure(output_buffer, length + 1); + if (output_pointer == NULL) { + return false; + } + *output_pointer++ = ','; + if (output_buffer->format) { + *output_pointer++ = ' '; + } + *output_pointer = '\0'; + output_buffer->offset += length; + } + current_element = current_element->next; + } + + output_pointer = ensure(output_buffer, 2); + if (output_pointer == NULL) { + return false; + } + *output_pointer++ = ']'; + *output_pointer = '\0'; + output_buffer->depth--; + + return true; +} + +/* Build an object from the text. */ +static cJSON_bool +parse_object(cJSON *const item, parse_buffer *const input_buffer) +{ + cJSON *head = NULL; /* linked list head */ + cJSON *current_item = NULL; + + if (input_buffer->depth >= CJSON_NESTING_LIMIT) { + return false; /* to deeply nested */ + } + input_buffer->depth++; + + if (cannot_access_at_index(input_buffer, 0) + || (buffer_at_offset(input_buffer)[0] != '{')) { + goto fail; + /* not an object */ + } + + input_buffer->offset++; + buffer_skip_whitespace(input_buffer); + if (can_access_at_index(input_buffer, 0) + && (buffer_at_offset(input_buffer)[0] == '}')) { + goto success; + /* empty object */ + } + + /* check if we skipped to the end of the buffer */ + if (cannot_access_at_index(input_buffer, 0)) { + input_buffer->offset--; + goto fail; + } + + /* step back to character in front of the first element */ + input_buffer->offset--; + /* loop through the comma separated array elements */ + do { + /* allocate next item */ + cJSON *new_item = cJSON_New_Item(&(input_buffer->hooks)); + if (new_item == NULL) { + goto fail; + /* allocation failure */ + } + + /* attach next item to list */ + if (head == NULL) { + /* start the linked list */ + current_item = head = new_item; + } + else { + /* add to the end and advance */ + current_item->next = new_item; + new_item->prev = current_item; + current_item = new_item; + } + + /* parse the name of the child */ + input_buffer->offset++; + buffer_skip_whitespace(input_buffer); + if (!parse_string(current_item, input_buffer)) { + goto fail; + /* faile to parse name */ + } + buffer_skip_whitespace(input_buffer); + + /* swap valuestring and string, because we parsed the name */ + current_item->string = current_item->valuestring; + current_item->valuestring = NULL; + + if (cannot_access_at_index(input_buffer, 0) + || (buffer_at_offset(input_buffer)[0] != ':')) { + goto fail; + /* invalid object */ + } + + /* parse the value */ + input_buffer->offset++; + buffer_skip_whitespace(input_buffer); + if (!parse_value(current_item, input_buffer)) { + goto fail; + /* failed to parse value */ + } + buffer_skip_whitespace(input_buffer); + } while (can_access_at_index(input_buffer, 0) + && (buffer_at_offset(input_buffer)[0] == ',')); + + if (cannot_access_at_index(input_buffer, 0) + || (buffer_at_offset(input_buffer)[0] != '}')) { + goto fail; + /* expected end of object */ + } + +success: + input_buffer->depth--; + + item->type = cJSON_Object; + item->child = head; + + input_buffer->offset++; + return true; + +fail: + if (head != NULL) { + cJSON_Delete(head); + } + + return false; +} + +/* Render an object to text. */ +static cJSON_bool +print_object(const cJSON *const item, printbuffer *const output_buffer) +{ + unsigned char *output_pointer = NULL; + size_t length = 0; + cJSON *current_item = item->child; + + if (output_buffer == NULL) { + return false; + } + + /* Compose the output: */ + length = (size_t)(output_buffer->format ? 2 : 1); /* fmt: {\n */ + output_pointer = ensure(output_buffer, length + 1); + if (output_pointer == NULL) { + return false; + } + + *output_pointer++ = '{'; + output_buffer->depth++; + if (output_buffer->format) { + *output_pointer++ = '\n'; + } + output_buffer->offset += length; + + while (current_item) { + if (output_buffer->format) { + size_t i; + output_pointer = ensure(output_buffer, output_buffer->depth); + if (output_pointer == NULL) { + return false; + } + for (i = 0; i < output_buffer->depth; i++) { + *output_pointer++ = '\t'; + } + output_buffer->offset += output_buffer->depth; + } + + /* print key */ + if (!print_string_ptr((unsigned char *)current_item->string, + output_buffer)) { + return false; + } + update_offset(output_buffer); + + length = (size_t)(output_buffer->format ? 2 : 1); + output_pointer = ensure(output_buffer, length); + if (output_pointer == NULL) { + return false; + } + *output_pointer++ = ':'; + if (output_buffer->format) { + *output_pointer++ = '\t'; + } + output_buffer->offset += length; + + /* print value */ + if (!print_value(current_item, output_buffer)) { + return false; + } + update_offset(output_buffer); + + /* print comma if not last */ + length = ((size_t)(output_buffer->format ? 1 : 0) + + (size_t)(current_item->next ? 1 : 0)); + output_pointer = ensure(output_buffer, length + 1); + if (output_pointer == NULL) { + return false; + } + if (current_item->next) { + *output_pointer++ = ','; + } + + if (output_buffer->format) { + *output_pointer++ = '\n'; + } + *output_pointer = '\0'; + output_buffer->offset += length; + + current_item = current_item->next; + } + + output_pointer = ensure( + output_buffer, output_buffer->format ? (output_buffer->depth + 1) : 2); + if (output_pointer == NULL) { + return false; + } + if (output_buffer->format) { + size_t i; + for (i = 0; i < (output_buffer->depth - 1); i++) { + *output_pointer++ = '\t'; + } + } + *output_pointer++ = '}'; + *output_pointer = '\0'; + output_buffer->depth--; + + return true; +} + +/* Get Array size/item / object item. */ +CJSON_PUBLIC(int) cJSON_GetArraySize(const cJSON *array) +{ + cJSON *child = NULL; + size_t size = 0; + + if (array == NULL) { + return 0; + } + + child = array->child; + + while (child != NULL) { + size++; + child = child->next; + } + + /* FIXME: Can overflow here. Cannot be fixed without breaking the API */ + + return (int)size; +} + +static cJSON * +get_array_item(const cJSON *array, size_t index) +{ + cJSON *current_child = NULL; + + if (array == NULL) { + return NULL; + } + + current_child = array->child; + while ((current_child != NULL) && (index > 0)) { + index--; + current_child = current_child->next; + } + + return current_child; +} + +CJSON_PUBLIC(cJSON *) cJSON_GetArrayItem(const cJSON *array, int index) +{ + if (index < 0) { + return NULL; + } + + return get_array_item(array, (size_t)index); +} + +static cJSON * +get_object_item(const cJSON *const object, const char *const name, + const cJSON_bool case_sensitive) +{ + cJSON *current_element = NULL; + + if ((object == NULL) || (name == NULL)) { + return NULL; + } + + current_element = object->child; + if (case_sensitive) { + while ((current_element != NULL) && (current_element->string != NULL) + && (strcmp(name, current_element->string) != 0)) { + current_element = current_element->next; + } + } + else { + while ((current_element != NULL) + && (case_insensitive_strcmp( + (const unsigned char *)name, + (const unsigned char *)(current_element->string)) + != 0)) { + current_element = current_element->next; + } + } + + if ((current_element == NULL) || (current_element->string == NULL)) { + return NULL; + } + + return current_element; +} + +CJSON_PUBLIC(cJSON *) +cJSON_GetObjectItem(const cJSON *const object, const char *const string) +{ + return get_object_item(object, string, false); +} + +CJSON_PUBLIC(cJSON *) +cJSON_GetObjectItemCaseSensitive(const cJSON *const object, + const char *const string) +{ + return get_object_item(object, string, true); +} + +CJSON_PUBLIC(cJSON_bool) +cJSON_HasObjectItem(const cJSON *object, const char *string) +{ + return cJSON_GetObjectItem(object, string) ? 1 : 0; +} + +/* Utility for array list handling. */ +static void +suffix_object(cJSON *prev, cJSON *item) +{ + prev->next = item; + item->prev = prev; +} + +/* Utility for handling references. */ +static cJSON * +create_reference(const cJSON *item, const internal_hooks *const hooks) +{ + cJSON *reference = NULL; + if (item == NULL) { + return NULL; + } + + reference = cJSON_New_Item(hooks); + if (reference == NULL) { + return NULL; + } + + memcpy(reference, item, sizeof(cJSON)); + reference->string = NULL; + reference->type |= cJSON_IsReference; + reference->next = reference->prev = NULL; + return reference; +} + +static cJSON_bool +add_item_to_array(cJSON *array, cJSON *item) +{ + cJSON *child = NULL; + + if ((item == NULL) || (array == NULL)) { + return false; + } + + child = array->child; + + if (child == NULL) { + /* list is empty, start new one */ + array->child = item; + } + else { + /* append to the end */ + while (child->next) { + child = child->next; + } + suffix_object(child, item); + } + + return true; +} + +/* Add item to array/object. */ +CJSON_PUBLIC(cJSON_bool) cJSON_AddItemToArray(cJSON *array, cJSON *item) +{ + return add_item_to_array(array, item); +} + +#if defined(__clang__) \ + || (defined(__GNUC__) \ + && ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ > 5)))) +#pragma GCC diagnostic push +#endif +#ifdef __GNUC__ +#pragma GCC diagnostic ignored "-Wcast-qual" +#endif +/* helper function to cast away const */ +static void * +cast_away_const(const void *string) +{ + return (void *)string; +} +#if defined(__clang__) \ + || (defined(__GNUC__) \ + && ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ > 5)))) +#pragma GCC diagnostic pop +#endif + +static cJSON_bool +add_item_to_object(cJSON *const object, const char *const string, + cJSON *const item, const internal_hooks *const hooks, + const cJSON_bool constant_key) +{ + char *new_key = NULL; + int new_type = cJSON_Invalid; + + if ((object == NULL) || (string == NULL) || (item == NULL)) { + return false; + } + + if (constant_key) { + new_key = (char *)cast_away_const(string); + new_type = item->type | cJSON_StringIsConst; + } + else { + new_key = (char *)cJSON_strdup((const unsigned char *)string, hooks); + if (new_key == NULL) { + return false; + } + + new_type = item->type & ~cJSON_StringIsConst; + } + + if (!(item->type & cJSON_StringIsConst) && (item->string != NULL)) { + hooks->deallocate(item->string); + } + + item->string = new_key; + item->type = new_type; + + return add_item_to_array(object, item); +} + +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemToObject(cJSON *object, const char *string, cJSON *item) +{ + return add_item_to_object(object, string, item, &global_hooks, false); +} + +/* Add an item to an object with constant string as key */ +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemToObjectCS(cJSON *object, const char *string, cJSON *item) +{ + return add_item_to_object(object, string, item, &global_hooks, true); +} + +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemReferenceToArray(cJSON *array, cJSON *item) +{ + if (array == NULL) { + return false; + } + + return add_item_to_array(array, create_reference(item, &global_hooks)); +} + +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemReferenceToObject(cJSON *object, const char *string, cJSON *item) +{ + if ((object == NULL) || (string == NULL)) { + return false; + } + + return add_item_to_object(object, string, + create_reference(item, &global_hooks), + &global_hooks, false); +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddNullToObject(cJSON *const object, const char *const name) +{ + cJSON *null = cJSON_CreateNull(); + if (add_item_to_object(object, name, null, &global_hooks, false)) { + return null; + } + + cJSON_Delete(null); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddTrueToObject(cJSON *const object, const char *const name) +{ + cJSON *true_item = cJSON_CreateTrue(); + if (add_item_to_object(object, name, true_item, &global_hooks, false)) { + return true_item; + } + + cJSON_Delete(true_item); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddFalseToObject(cJSON *const object, const char *const name) +{ + cJSON *false_item = cJSON_CreateFalse(); + if (add_item_to_object(object, name, false_item, &global_hooks, false)) { + return false_item; + } + + cJSON_Delete(false_item); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddBoolToObject(cJSON *const object, const char *const name, + const cJSON_bool boolean) +{ + cJSON *bool_item = cJSON_CreateBool(boolean); + if (add_item_to_object(object, name, bool_item, &global_hooks, false)) { + return bool_item; + } + + cJSON_Delete(bool_item); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddNumberToObject(cJSON *const object, const char *const name, + const double number) +{ + cJSON *number_item = cJSON_CreateNumber(number); + if (add_item_to_object(object, name, number_item, &global_hooks, false)) { + return number_item; + } + + cJSON_Delete(number_item); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddStringToObject(cJSON *const object, const char *const name, + const char *const string) +{ + cJSON *string_item = cJSON_CreateString(string); + if (add_item_to_object(object, name, string_item, &global_hooks, false)) { + return string_item; + } + + cJSON_Delete(string_item); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddRawToObject(cJSON *const object, const char *const name, + const char *const raw) +{ + cJSON *raw_item = cJSON_CreateRaw(raw); + if (add_item_to_object(object, name, raw_item, &global_hooks, false)) { + return raw_item; + } + + cJSON_Delete(raw_item); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddObjectToObject(cJSON *const object, const char *const name) +{ + cJSON *object_item = cJSON_CreateObject(); + if (add_item_to_object(object, name, object_item, &global_hooks, false)) { + return object_item; + } + + cJSON_Delete(object_item); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_AddArrayToObject(cJSON *const object, const char *const name) +{ + cJSON *array = cJSON_CreateArray(); + if (add_item_to_object(object, name, array, &global_hooks, false)) { + return array; + } + + cJSON_Delete(array); + return NULL; +} + +CJSON_PUBLIC(cJSON *) +cJSON_DetachItemViaPointer(cJSON *parent, cJSON *const item) +{ + if ((parent == NULL) || (item == NULL)) { + return NULL; + } + + if (item->prev != NULL) { + /* not the first element */ + item->prev->next = item->next; + } + if (item->next != NULL) { + /* not the last element */ + item->next->prev = item->prev; + } + + if (item == parent->child) { + /* first element */ + parent->child = item->next; + } + /* make sure the detached item doesn't point anywhere anymore */ + item->prev = NULL; + item->next = NULL; + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_DetachItemFromArray(cJSON *array, int which) +{ + if (which < 0) { + return NULL; + } + + return cJSON_DetachItemViaPointer(array, + get_array_item(array, (size_t)which)); +} + +CJSON_PUBLIC(void) cJSON_DeleteItemFromArray(cJSON *array, int which) +{ + cJSON_Delete(cJSON_DetachItemFromArray(array, which)); +} + +CJSON_PUBLIC(cJSON *) +cJSON_DetachItemFromObject(cJSON *object, const char *string) +{ + cJSON *to_detach = cJSON_GetObjectItem(object, string); + + return cJSON_DetachItemViaPointer(object, to_detach); +} + +CJSON_PUBLIC(cJSON *) +cJSON_DetachItemFromObjectCaseSensitive(cJSON *object, const char *string) +{ + cJSON *to_detach = cJSON_GetObjectItemCaseSensitive(object, string); + + return cJSON_DetachItemViaPointer(object, to_detach); +} + +CJSON_PUBLIC(void) cJSON_DeleteItemFromObject(cJSON *object, const char *string) +{ + cJSON_Delete(cJSON_DetachItemFromObject(object, string)); +} + +CJSON_PUBLIC(void) +cJSON_DeleteItemFromObjectCaseSensitive(cJSON *object, const char *string) +{ + cJSON_Delete(cJSON_DetachItemFromObjectCaseSensitive(object, string)); +} + +/* Replace array/object items with new ones. */ +CJSON_PUBLIC(cJSON_bool) +cJSON_InsertItemInArray(cJSON *array, int which, cJSON *newitem) +{ + cJSON *after_inserted = NULL; + + if (which < 0) { + return false; + } + + after_inserted = get_array_item(array, (size_t)which); + if (after_inserted == NULL) { + return add_item_to_array(array, newitem); + } + + newitem->next = after_inserted; + newitem->prev = after_inserted->prev; + after_inserted->prev = newitem; + if (after_inserted == array->child) { + array->child = newitem; + } + else { + newitem->prev->next = newitem; + } + return true; +} + +CJSON_PUBLIC(cJSON_bool) +cJSON_ReplaceItemViaPointer(cJSON *const parent, cJSON *const item, + cJSON *replacement) +{ + if ((parent == NULL) || (replacement == NULL) || (item == NULL)) { + return false; + } + + if (replacement == item) { + return true; + } + + replacement->next = item->next; + replacement->prev = item->prev; + + if (replacement->next != NULL) { + replacement->next->prev = replacement; + } + if (replacement->prev != NULL) { + replacement->prev->next = replacement; + } + if (parent->child == item) { + parent->child = replacement; + } + + item->next = NULL; + item->prev = NULL; + cJSON_Delete(item); + + return true; +} + +CJSON_PUBLIC(void) +cJSON_ReplaceItemInArray(cJSON *array, int which, cJSON *newitem) +{ + if (which < 0) { + return; + } + + cJSON_ReplaceItemViaPointer(array, get_array_item(array, (size_t)which), + newitem); +} + +static cJSON_bool +replace_item_in_object(cJSON *object, const char *string, cJSON *replacement, + cJSON_bool case_sensitive) +{ + if ((replacement == NULL) || (string == NULL)) { + return false; + } + + /* replace the name in the replacement */ + if (!(replacement->type & cJSON_StringIsConst) + && (replacement->string != NULL)) { + cJSON_free(replacement->string); + } + replacement->string = + (char *)cJSON_strdup((const unsigned char *)string, &global_hooks); + replacement->type &= ~cJSON_StringIsConst; + + cJSON_ReplaceItemViaPointer( + object, get_object_item(object, string, case_sensitive), replacement); + + return true; +} + +CJSON_PUBLIC(void) +cJSON_ReplaceItemInObject(cJSON *object, const char *string, cJSON *newitem) +{ + replace_item_in_object(object, string, newitem, false); +} + +CJSON_PUBLIC(void) +cJSON_ReplaceItemInObjectCaseSensitive(cJSON *object, const char *string, + cJSON *newitem) +{ + replace_item_in_object(object, string, newitem, true); +} + +/* Create basic types: */ +CJSON_PUBLIC(cJSON *) cJSON_CreateNull(void) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_NULL; + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateTrue(void) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_True; + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateFalse(void) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_False; + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateBool(cJSON_bool b) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = b ? cJSON_True : cJSON_False; + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateNumber(double num) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_Number; + item->valuedouble = num; + + /* use saturation in case of overflow */ + if (num >= INT_MAX) { + item->valueint = INT_MAX; + } + else if (num <= (double)INT_MIN) { + item->valueint = INT_MIN; + } + else { + item->valueint = (int)num; + } + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateString(const char *string) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_String; + item->valuestring = + (char *)cJSON_strdup((const unsigned char *)string, &global_hooks); + if (!item->valuestring) { + cJSON_Delete(item); + return NULL; + } + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateStringReference(const char *string) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item != NULL) { + item->type = cJSON_String | cJSON_IsReference; + item->valuestring = (char *)cast_away_const(string); + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateObjectReference(const cJSON *child) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item != NULL) { + item->type = cJSON_Object | cJSON_IsReference; + item->child = (cJSON *)cast_away_const(child); + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateArrayReference(const cJSON *child) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item != NULL) { + item->type = cJSON_Array | cJSON_IsReference; + item->child = (cJSON *)cast_away_const(child); + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateRaw(const char *raw) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_Raw; + item->valuestring = + (char *)cJSON_strdup((const unsigned char *)raw, &global_hooks); + if (!item->valuestring) { + cJSON_Delete(item); + return NULL; + } + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateArray(void) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_Array; + } + + return item; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateObject(void) +{ + cJSON *item = cJSON_New_Item(&global_hooks); + if (item) { + item->type = cJSON_Object; + } + + return item; +} + +/* Create Arrays: */ +CJSON_PUBLIC(cJSON *) cJSON_CreateIntArray(const int *numbers, int count) +{ + size_t i = 0; + cJSON *n = NULL; + cJSON *p = NULL; + cJSON *a = NULL; + + if ((count < 0) || (numbers == NULL)) { + return NULL; + } + + a = cJSON_CreateArray(); + for (i = 0; a && (i < (size_t)count); i++) { + n = cJSON_CreateNumber(numbers[i]); + if (!n) { + cJSON_Delete(a); + return NULL; + } + if (!i) { + a->child = n; + } + else { + suffix_object(p, n); + } + p = n; + } + + return a; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateFloatArray(const float *numbers, int count) +{ + size_t i = 0; + cJSON *n = NULL; + cJSON *p = NULL; + cJSON *a = NULL; + + if ((count < 0) || (numbers == NULL)) { + return NULL; + } + + a = cJSON_CreateArray(); + + for (i = 0; a && (i < (size_t)count); i++) { + n = cJSON_CreateNumber((double)numbers[i]); + if (!n) { + cJSON_Delete(a); + return NULL; + } + if (!i) { + a->child = n; + } + else { + suffix_object(p, n); + } + p = n; + } + + return a; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateDoubleArray(const double *numbers, int count) +{ + size_t i = 0; + cJSON *n = NULL; + cJSON *p = NULL; + cJSON *a = NULL; + + if ((count < 0) || (numbers == NULL)) { + return NULL; + } + + a = cJSON_CreateArray(); + + for (i = 0; a && (i < (size_t)count); i++) { + n = cJSON_CreateNumber(numbers[i]); + if (!n) { + cJSON_Delete(a); + return NULL; + } + if (!i) { + a->child = n; + } + else { + suffix_object(p, n); + } + p = n; + } + + return a; +} + +CJSON_PUBLIC(cJSON *) cJSON_CreateStringArray(const char **strings, int count) +{ + size_t i = 0; + cJSON *n = NULL; + cJSON *p = NULL; + cJSON *a = NULL; + + if ((count < 0) || (strings == NULL)) { + return NULL; + } + + a = cJSON_CreateArray(); + + for (i = 0; a && (i < (size_t)count); i++) { + n = cJSON_CreateString(strings[i]); + if (!n) { + cJSON_Delete(a); + return NULL; + } + if (!i) { + a->child = n; + } + else { + suffix_object(p, n); + } + p = n; + } + + return a; +} + +/* Duplication */ +CJSON_PUBLIC(cJSON *) cJSON_Duplicate(const cJSON *item, cJSON_bool recurse) +{ + cJSON *newitem = NULL; + cJSON *child = NULL; + cJSON *next = NULL; + cJSON *newchild = NULL; + + /* Bail on bad ptr */ + if (!item) { + goto fail; + } + /* Create new item */ + newitem = cJSON_New_Item(&global_hooks); + if (!newitem) { + goto fail; + } + /* Copy over all vars */ + newitem->type = item->type & (~cJSON_IsReference); + newitem->valueint = item->valueint; + newitem->valuedouble = item->valuedouble; + if (item->valuestring) { + newitem->valuestring = (char *)cJSON_strdup( + (unsigned char *)item->valuestring, &global_hooks); + if (!newitem->valuestring) { + goto fail; + } + } + if (item->string) { + newitem->string = (item->type & cJSON_StringIsConst) + ? item->string + : (char *)cJSON_strdup( + (unsigned char *)item->string, &global_hooks); + if (!newitem->string) { + goto fail; + } + } + /* If non-recursive, then we're done! */ + if (!recurse) { + return newitem; + } + /* Walk the ->next chain for the child. */ + child = item->child; + while (child != NULL) { + newchild = cJSON_Duplicate( + child, + true); /* Duplicate (with recurse) each item in the ->next chain */ + if (!newchild) { + goto fail; + } + if (next != NULL) { + /* If newitem->child already set, then crosswire ->prev and ->next + * and move on */ + next->next = newchild; + newchild->prev = next; + next = newchild; + } + else { + /* Set newitem->child and move to it */ + newitem->child = newchild; + next = newchild; + } + child = child->next; + } + + return newitem; + +fail: + if (newitem != NULL) { + cJSON_Delete(newitem); + } + + return NULL; +} + +CJSON_PUBLIC(void) cJSON_Minify(char *json) +{ + unsigned char *into = (unsigned char *)json; + + if (json == NULL) { + return; + } + + while (*json) { + if (*json == ' ') { + json++; + } + else if (*json == '\t') { + /* Whitespace characters. */ + json++; + } + else if (*json == '\r') { + json++; + } + else if (*json == '\n') { + json++; + } + else if ((*json == '/') && (json[1] == '/')) { + /* double-slash comments, to end of line. */ + while (*json && (*json != '\n')) { + json++; + } + } + else if ((*json == '/') && (json[1] == '*')) { + /* multiline comments. */ + while (*json && !((*json == '*') && (json[1] == '/'))) { + json++; + } + json += 2; + } + else if (*json == '\"') { + /* string literals, which are \" sensitive. */ + *into++ = (unsigned char)*json++; + while (*json && (*json != '\"')) { + if (*json == '\\') { + *into++ = (unsigned char)*json++; + } + *into++ = (unsigned char)*json++; + } + *into++ = (unsigned char)*json++; + } + else { + /* All other characters. */ + *into++ = (unsigned char)*json++; + } + } + + /* and null-terminate. */ + *into = '\0'; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsInvalid(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_Invalid; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsFalse(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_False; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsTrue(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xff) == cJSON_True; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsBool(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & (cJSON_True | cJSON_False)) != 0; +} +CJSON_PUBLIC(cJSON_bool) cJSON_IsNull(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_NULL; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsNumber(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_Number; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsString(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_String; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsArray(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_Array; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsObject(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_Object; +} + +CJSON_PUBLIC(cJSON_bool) cJSON_IsRaw(const cJSON *const item) +{ + if (item == NULL) { + return false; + } + + return (item->type & 0xFF) == cJSON_Raw; +} + +CJSON_PUBLIC(cJSON_bool) +cJSON_Compare(const cJSON *const a, const cJSON *const b, + const cJSON_bool case_sensitive) +{ + if ((a == NULL) || (b == NULL) || ((a->type & 0xFF) != (b->type & 0xFF)) + || cJSON_IsInvalid(a)) { + return false; + } + + /* check if type is valid */ + switch (a->type & 0xFF) { + case cJSON_False: + case cJSON_True: + case cJSON_NULL: + case cJSON_Number: + case cJSON_String: + case cJSON_Raw: + case cJSON_Array: + case cJSON_Object: + break; + + default: + return false; + } + + /* identical objects are equal */ + if (a == b) { + return true; + } + + switch (a->type & 0xFF) { + /* in these cases and equal type is enough */ + case cJSON_False: + case cJSON_True: + case cJSON_NULL: + return true; + + case cJSON_Number: + if (a->valuedouble == b->valuedouble) { + return true; + } + return false; + + case cJSON_String: + case cJSON_Raw: + if ((a->valuestring == NULL) || (b->valuestring == NULL)) { + return false; + } + if (strcmp(a->valuestring, b->valuestring) == 0) { + return true; + } + + return false; + + case cJSON_Array: + { + cJSON *a_element = a->child; + cJSON *b_element = b->child; + + for (; (a_element != NULL) && (b_element != NULL);) { + if (!cJSON_Compare(a_element, b_element, case_sensitive)) { + return false; + } + + a_element = a_element->next; + b_element = b_element->next; + } + + /* one of the arrays is longer than the other */ + if (a_element != b_element) { + return false; + } + + return true; + } + + case cJSON_Object: + { + cJSON *a_element = NULL; + cJSON *b_element = NULL; + cJSON_ArrayForEach(a_element, a) + { + /* TODO This has O(n^2) runtime, which is horrible! */ + b_element = + get_object_item(b, a_element->string, case_sensitive); + if (b_element == NULL) { + return false; + } + + if (!cJSON_Compare(a_element, b_element, case_sensitive)) { + return false; + } + } + + /* doing this twice, once on a and b to prevent true comparison if a + * subset of b + * TODO: Do this the proper way, this is just a fix for now */ + cJSON_ArrayForEach(b_element, b) + { + a_element = + get_object_item(a, b_element->string, case_sensitive); + if (a_element == NULL) { + return false; + } + + if (!cJSON_Compare(b_element, a_element, case_sensitive)) { + return false; + } + } + + return true; + } + + default: + return false; + } +} + +CJSON_PUBLIC(void *) cJSON_malloc(size_t size) +{ + return global_hooks.allocate(size); +} + +CJSON_PUBLIC(void) cJSON_free(void *object) +{ + global_hooks.deallocate(object); +} +#endif diff --git a/6-sigsnoop/eunomia-include/cJSON/cJSON.h b/6-sigsnoop/eunomia-include/cJSON/cJSON.h new file mode 100644 index 0000000..6f42eb4 --- /dev/null +++ b/6-sigsnoop/eunomia-include/cJSON/cJSON.h @@ -0,0 +1,358 @@ +/* + Copyright (c) 2009-2017 Dave Gamble and cJSON contributors + Permission is hereby granted, free of charge, to any person obtaining a copy + of this software and associated documentation files (the "Software"), to deal + in the Software without restriction, including without limitation the rights + to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + copies of the Software, and to permit persons to whom the Software is + furnished to do so, subject to the following conditions: + The above copyright notice and this permission notice shall be included in + all copies or substantial portions of the Software. + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + THE SOFTWARE. + + A header only cJSON library for C and C++. + */ + +#ifndef cJSON__h +#define cJSON__h + +#ifdef __cplusplus +extern "C" { +#endif + +#if !defined(__WINDOWS__) \ + && (defined(WIN32) || defined(WIN64) || defined(_MSC_VER) \ + || defined(_WIN32)) +#define __WINDOWS__ +#endif + +#ifdef __WINDOWS__ + +/** + * When compiling for windows, we specify a specific calling convention to avoid + * issues where we are being called from a project with a different default + * calling convention. For windows you have 3 define options: + * CJSON_HIDE_SYMBOLS - Define this in the case where you don't want to ever + * dllexport symbols + * CJSON_EXPORT_SYMBOLS - Define this on library build when you want to + * dllexport symbols (default) + * CJSON_IMPORT_SYMBOLS - Define this if you want to dllimport symbol + * + * For *nix builds that support visibility attribute, you can define similar + * behavior by setting default visibility to hidden by adding + * -fvisibility=hidden (for gcc) + * or + * -xldscope=hidden (for sun cc) + * to CFLAGS, then using the CJSON_API_VISIBILITY flag to "export" the same + * symbols the way CJSON_EXPORT_SYMBOLS does + */ + +#define CJSON_CDECL __cdecl +#define CJSON_STDCALL __stdcall + +/* export symbols by default, this is necessary for copy pasting the C and + header file */ +#if !defined(CJSON_HIDE_SYMBOLS) && !defined(CJSON_IMPORT_SYMBOLS) \ + && !defined(CJSON_EXPORT_SYMBOLS) +#define CJSON_EXPORT_SYMBOLS +#endif + +#if defined(CJSON_HIDE_SYMBOLS) +#define CJSON_PUBLIC(type) type CJSON_STDCALL +#elif defined(CJSON_EXPORT_SYMBOLS) +#define CJSON_PUBLIC(type) __declspec(dllexport) type CJSON_STDCALL +#elif defined(CJSON_IMPORT_SYMBOLS) +#define CJSON_PUBLIC(type) __declspec(dllimport) type CJSON_STDCALL +#endif +#else /* !__WINDOWS__ */ +#define CJSON_CDECL +#define CJSON_STDCALL + +#if (defined(__GNUC__) || defined(__SUNPRO_CC) || defined(__SUNPRO_C)) \ + && defined(CJSON_API_VISIBILITY) +#define CJSON_PUBLIC(type) __attribute__((visibility("default"))) type +#else +#define CJSON_PUBLIC(type) type +#endif +#endif + +/* project version */ +#define CJSON_VERSION_MAJOR 1 +#define CJSON_VERSION_MINOR 7 +#define CJSON_VERSION_PATCH 10 + +#include + +/* cJSON Types: */ +#define cJSON_Invalid (0) +#define cJSON_False (1 << 0) +#define cJSON_True (1 << 1) +#define cJSON_NULL (1 << 2) +#define cJSON_Number (1 << 3) +#define cJSON_String (1 << 4) +#define cJSON_Array (1 << 5) +#define cJSON_Object (1 << 6) +#define cJSON_Raw (1 << 7) /* raw json */ + +#define cJSON_IsReference 256 +#define cJSON_StringIsConst 512 + +/* The cJSON structure: */ +typedef struct cJSON { + /* next/prev allow you to walk array/object chains. Alternatively, use + GetArraySize/GetArrayItem/GetObjectItem */ + struct cJSON *next; + struct cJSON *prev; + /* An array or object item will have a child pointer pointing to a chain of + the items in the array/object. */ + struct cJSON *child; + + /* The type of the item, as above. */ + int type; + + /* The item's string, if type==cJSON_String and type == cJSON_Raw */ + char *valuestring; + /* writing to valueint is DEPRECATED, use cJSON_SetNumberValue instead */ + int valueint; + /* The item's number, if type==cJSON_Number */ + double valuedouble; + + /* The item's name string, if this item is the child of, or is in the list + of subitems of an object. */ + char *string; +} cJSON; + +typedef struct cJSON_Hooks { + /* malloc/free are CDECL on Windows regardless of the default calling + * convention of the compiler, so ensure the hooks allow passing those + * functions directly. */ + void *(CJSON_CDECL *malloc_fn)(size_t sz); + void(CJSON_CDECL *free_fn)(void *ptr); +} cJSON_Hooks; + +typedef int cJSON_bool; + +/* Limits how deeply nested arrays/objects can be before cJSON rejects to parse + them. This is to prevent stack overflows. */ +#ifndef CJSON_NESTING_LIMIT +#define CJSON_NESTING_LIMIT 1000 +#endif + +/* returns the version of cJSON as a string */ +CJSON_PUBLIC(const char *) cJSON_Version(void); + +/* Supply malloc, realloc and free functions to cJSON */ +CJSON_PUBLIC(void) cJSON_InitHooks(cJSON_Hooks *hooks); + +/* Memory Management: the caller is always responsible to free the results from + * all variants of cJSON_Parse (with cJSON_Delete) and cJSON_Print (with stdlib + * free, cJSON_Hooks.free_fn, or cJSON_free as appropriate). The exception is + * cJSON_PrintPreallocated, where the caller has full responsibility of the + * buffer. */ +/* Supply a block of JSON, and this returns a cJSON object you can interrogate. + */ +CJSON_PUBLIC(cJSON *) cJSON_Parse(const char *value); +/* ParseWithOpts allows you to require (and check) that the JSON is null + * terminated, and to retrieve the pointer to the final byte parsed. */ +/* If you supply a ptr in return_parse_end and parsing fails, then + * return_parse_end will contain a pointer to the error so will match + * cJSON_GetErrorPtr(). */ +CJSON_PUBLIC(cJSON *) +cJSON_ParseWithOpts(const char *value, const char **return_parse_end, + cJSON_bool require_null_terminated); + +/* Render a cJSON entity to text for transfer/storage. */ +CJSON_PUBLIC(char *) cJSON_Print(const cJSON *item); +/* Render a cJSON entity to text for transfer/storage without any formatting. */ +CJSON_PUBLIC(char *) cJSON_PrintUnformatted(const cJSON *item); +/* Render a cJSON entity to text using a buffered strategy. prebuffer is a guess + * at the final size. guessing well reduces reallocation. fmt=0 gives + * unformatted, =1 gives formatted */ +CJSON_PUBLIC(char *) +cJSON_PrintBuffered(const cJSON *item, int prebuffer, cJSON_bool fmt); +/* Render a cJSON entity to text using a buffer already allocated in memory with + * given length. Returns 1 on success and 0 on failure. */ +/* NOTE: cJSON is not always 100% accurate in estimating how much memory it will + * use, so to be safe allocate 5 bytes more than you actually need */ +CJSON_PUBLIC(cJSON_bool) +cJSON_PrintPreallocated(cJSON *item, char *buffer, const int length, + const cJSON_bool format); +/* Delete a cJSON entity and all subentities. */ +CJSON_PUBLIC(void) cJSON_Delete(cJSON *c); + +/* Returns the number of items in an array (or object). */ +CJSON_PUBLIC(int) cJSON_GetArraySize(const cJSON *array); +/* Retrieve item number "index" from array "array". Returns NULL if + * unsuccessful. */ +CJSON_PUBLIC(cJSON *) cJSON_GetArrayItem(const cJSON *array, int index); +/* Get item "string" from object. Case insensitive. */ +CJSON_PUBLIC(cJSON *) +cJSON_GetObjectItem(const cJSON *const object, const char *const string); +CJSON_PUBLIC(cJSON *) +cJSON_GetObjectItemCaseSensitive(const cJSON *const object, + const char *const string); +CJSON_PUBLIC(cJSON_bool) +cJSON_HasObjectItem(const cJSON *object, const char *string); +/* For analysing failed parses. This returns a pointer to the parse error. + * You'll probably need to look a few chars back to make sense of it. Defined + * when cJSON_Parse() returns 0. 0 when cJSON_Parse() succeeds. */ +CJSON_PUBLIC(const char *) cJSON_GetErrorPtr(void); + +/* Check if the item is a string and return its valuestring */ +CJSON_PUBLIC(char *) cJSON_GetStringValue(cJSON *item); + +/* These functions check the type of an item */ +CJSON_PUBLIC(cJSON_bool) cJSON_IsInvalid(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsFalse(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsTrue(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsBool(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsNull(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsNumber(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsString(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsArray(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsObject(const cJSON *const item); +CJSON_PUBLIC(cJSON_bool) cJSON_IsRaw(const cJSON *const item); + +/* These calls create a cJSON item of the appropriate type. */ +CJSON_PUBLIC(cJSON *) cJSON_CreateNull(void); +CJSON_PUBLIC(cJSON *) cJSON_CreateTrue(void); +CJSON_PUBLIC(cJSON *) cJSON_CreateFalse(void); +CJSON_PUBLIC(cJSON *) cJSON_CreateBool(cJSON_bool boolean); +CJSON_PUBLIC(cJSON *) cJSON_CreateNumber(double num); +CJSON_PUBLIC(cJSON *) cJSON_CreateString(const char *string); +/* raw json */ +CJSON_PUBLIC(cJSON *) cJSON_CreateRaw(const char *raw); +CJSON_PUBLIC(cJSON *) cJSON_CreateArray(void); +CJSON_PUBLIC(cJSON *) cJSON_CreateObject(void); + +/* Create a string where valuestring references a string so + it will not be freed by cJSON_Delete */ +CJSON_PUBLIC(cJSON *) cJSON_CreateStringReference(const char *string); +/* Create an object/arrray that only references it's elements so + they will not be freed by cJSON_Delete */ +CJSON_PUBLIC(cJSON *) cJSON_CreateObjectReference(const cJSON *child); +CJSON_PUBLIC(cJSON *) cJSON_CreateArrayReference(const cJSON *child); + +/* These utilities create an Array of count items. */ +CJSON_PUBLIC(cJSON *) cJSON_CreateIntArray(const int *numbers, int count); +CJSON_PUBLIC(cJSON *) cJSON_CreateFloatArray(const float *numbers, int count); +CJSON_PUBLIC(cJSON *) cJSON_CreateDoubleArray(const double *numbers, int count); +CJSON_PUBLIC(cJSON *) cJSON_CreateStringArray(const char **strings, int count); + +/* Append item to the specified array/object. */ +CJSON_PUBLIC(cJSON_bool) cJSON_AddItemToArray(cJSON *array, cJSON *item); +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemToObject(cJSON *object, const char *string, cJSON *item); +/* Use this when string is definitely const (i.e. a literal, or as good as), and + * will definitely survive the cJSON object. WARNING: When this function was + * used, make sure to always check that (item->type & cJSON_StringIsConst) is + * zero before writing to `item->string` */ +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemToObjectCS(cJSON *object, const char *string, cJSON *item); +/* Append reference to item to the specified array/object. Use this when you + * want to add an existing cJSON to a new cJSON, but don't want to corrupt your + * existing cJSON. */ +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemReferenceToArray(cJSON *array, cJSON *item); +CJSON_PUBLIC(cJSON_bool) +cJSON_AddItemReferenceToObject(cJSON *object, const char *string, cJSON *item); + +/* Remove/Detatch items from Arrays/Objects. */ +CJSON_PUBLIC(cJSON *) +cJSON_DetachItemViaPointer(cJSON *parent, cJSON *const item); +CJSON_PUBLIC(cJSON *) cJSON_DetachItemFromArray(cJSON *array, int which); +CJSON_PUBLIC(void) cJSON_DeleteItemFromArray(cJSON *array, int which); +CJSON_PUBLIC(cJSON *) +cJSON_DetachItemFromObject(cJSON *object, const char *string); +CJSON_PUBLIC(cJSON *) +cJSON_DetachItemFromObjectCaseSensitive(cJSON *object, const char *string); +CJSON_PUBLIC(void) +cJSON_DeleteItemFromObject(cJSON *object, const char *string); +CJSON_PUBLIC(void) +cJSON_DeleteItemFromObjectCaseSensitive(cJSON *object, const char *string); + +/* Update array items. */ +CJSON_PUBLIC(cJSON_bool) +cJSON_InsertItemInArray( + cJSON *array, int which, + cJSON *newitem); /* Shifts pre-existing items to the right. */ +CJSON_PUBLIC(cJSON_bool) +cJSON_ReplaceItemViaPointer(cJSON *const parent, cJSON *const item, + cJSON *replacement); +CJSON_PUBLIC(void) +cJSON_ReplaceItemInArray(cJSON *array, int which, cJSON *newitem); +CJSON_PUBLIC(void) +cJSON_ReplaceItemInObject(cJSON *object, const char *string, cJSON *newitem); +CJSON_PUBLIC(void) +cJSON_ReplaceItemInObjectCaseSensitive(cJSON *object, const char *string, + cJSON *newitem); + +/* Duplicate a cJSON item */ +CJSON_PUBLIC(cJSON *) cJSON_Duplicate(const cJSON *item, cJSON_bool recurse); +/* Duplicate will create a new, identical cJSON item to the one you pass, in new + memory that will need to be released. With recurse!=0, it will duplicate any + children connected to the item. The item->next and ->prev pointers are always + zero on return from Duplicate. */ +/* Recursively compare two cJSON items for equality. If either a or b is NULL or + * invalid, they will be considered unequal. + * case_sensitive determines if object keys are treated case sensitive (1) or + * case insensitive (0) */ +CJSON_PUBLIC(cJSON_bool) +cJSON_Compare(const cJSON *const a, const cJSON *const b, + const cJSON_bool case_sensitive); + +CJSON_PUBLIC(void) cJSON_Minify(char *json); + +/* Helper functions for creating and adding items to an object at the same time. + They return the added item or NULL on failure. */ +CJSON_PUBLIC(cJSON *) +cJSON_AddNullToObject(cJSON *const object, const char *const name); +CJSON_PUBLIC(cJSON *) +cJSON_AddTrueToObject(cJSON *const object, const char *const name); +CJSON_PUBLIC(cJSON *) +cJSON_AddFalseToObject(cJSON *const object, const char *const name); +CJSON_PUBLIC(cJSON *) +cJSON_AddBoolToObject(cJSON *const object, const char *const name, + const cJSON_bool boolean); +CJSON_PUBLIC(cJSON *) +cJSON_AddNumberToObject(cJSON *const object, const char *const name, + const double number); +CJSON_PUBLIC(cJSON *) +cJSON_AddStringToObject(cJSON *const object, const char *const name, + const char *const string); +CJSON_PUBLIC(cJSON *) +cJSON_AddRawToObject(cJSON *const object, const char *const name, + const char *const raw); +CJSON_PUBLIC(cJSON *) +cJSON_AddObjectToObject(cJSON *const object, const char *const name); +CJSON_PUBLIC(cJSON *) +cJSON_AddArrayToObject(cJSON *const object, const char *const name); + +/* When assigning an integer value, it needs to be propagated to valuedouble + too. */ +#define cJSON_SetIntValue(object, number) \ + ((object) ? (object)->valueint = (object)->valuedouble = (number) \ + : (number)) +/* helper for the cJSON_SetNumberValue macro */ +CJSON_PUBLIC(double) cJSON_SetNumberHelper(cJSON *object, double number); +#define cJSON_SetNumberValue(object, number) \ + ((object != NULL) ? cJSON_SetNumberHelper(object, (double)number) \ + : (number)) + +/* Macro for iterating over an array or object */ +#define cJSON_ArrayForEach(element, array) \ + for (element = (array != NULL) ? (array)->child : NULL; element != NULL; \ + element = element->next) + +/* malloc/free objects using the malloc/free functions that have been set with + cJSON_InitHooks */ +CJSON_PUBLIC(void *) cJSON_malloc(size_t size); +CJSON_PUBLIC(void) cJSON_free(void *object); + +#endif \ No newline at end of file diff --git a/6-sigsnoop/eunomia-include/entry.h b/6-sigsnoop/eunomia-include/entry.h new file mode 100644 index 0000000..0aa5657 --- /dev/null +++ b/6-sigsnoop/eunomia-include/entry.h @@ -0,0 +1,40 @@ +#ifndef ENTRY_H_ +#define ENTRY_H_ + +// header only helpers for develop wasm app +#include "cJSON/cJSON.c" +#include "helpers.h" + +#define MAX_ARGS 32 + +int main(int argc, char **argv); +int bpf_main(char *env_json, int str_len) +{ + cJSON *env = cJSON_Parse(env_json); + if (!env) + { + printf("cJSON_Parse failed for env json args."); + return 1; + } + if (!cJSON_IsArray(env)) { + printf("env json args is not an array."); + return 1; + } + int argc = cJSON_GetArraySize(env); + if (argc > MAX_ARGS) { + printf("env json args is too long."); + return 1; + } + char *argv[MAX_ARGS]; + for (int i = 0; i < argc; i++) { + cJSON *item = cJSON_GetArrayItem(env, i); + if (!cJSON_IsString(item)) { + printf("env json args is not a string."); + return 1; + } + argv[i] = item->valuestring; + } + return main(argc, argv); +} + +#endif diff --git a/6-sigsnoop/eunomia-include/errno-base.h b/6-sigsnoop/eunomia-include/errno-base.h new file mode 100644 index 0000000..9653140 --- /dev/null +++ b/6-sigsnoop/eunomia-include/errno-base.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _ASM_GENERIC_ERRNO_BASE_H +#define _ASM_GENERIC_ERRNO_BASE_H + +#define EPERM 1 /* Operation not permitted */ +#define ENOENT 2 /* No such file or directory */ +#define ESRCH 3 /* No such process */ +#define EINTR 4 /* Interrupted system call */ +#define EIO 5 /* I/O error */ +#define ENXIO 6 /* No such device or address */ +#define E2BIG 7 /* Argument list too long */ +#define ENOEXEC 8 /* Exec format error */ +#define EBADF 9 /* Bad file number */ +#define ECHILD 10 /* No child processes */ +#define EAGAIN 11 /* Try again */ +#define ENOMEM 12 /* Out of memory */ +#define EACCES 13 /* Permission denied */ +#define EFAULT 14 /* Bad address */ +#define ENOTBLK 15 /* Block device required */ +#define EBUSY 16 /* Device or resource busy */ +#define EEXIST 17 /* File exists */ +#define EXDEV 18 /* Cross-device link */ +#define ENODEV 19 /* No such device */ +#define ENOTDIR 20 /* Not a directory */ +#define EISDIR 21 /* Is a directory */ +#define EINVAL 22 /* Invalid argument */ +#define ENFILE 23 /* File table overflow */ +#define EMFILE 24 /* Too many open files */ +#define ENOTTY 25 /* Not a typewriter */ +#define ETXTBSY 26 /* Text file busy */ +#define EFBIG 27 /* File too large */ +#define ENOSPC 28 /* No space left on device */ +#define ESPIPE 29 /* Illegal seek */ +#define EROFS 30 /* Read-only file system */ +#define EMLINK 31 /* Too many links */ +#define EPIPE 32 /* Broken pipe */ +#define EDOM 33 /* Math argument out of domain of func */ +#define ERANGE 34 /* Math result not representable */ + +#endif diff --git a/6-sigsnoop/eunomia-include/helpers.h b/6-sigsnoop/eunomia-include/helpers.h new file mode 100644 index 0000000..cda0c75 --- /dev/null +++ b/6-sigsnoop/eunomia-include/helpers.h @@ -0,0 +1,54 @@ +#ifndef EWASM_APP_HELPERS_H_ +#define EWASM_APP_HELPERS_H_ + +#include "native-ewasm.h" +#include +#include +#include +#include +#include "cJSON/cJSON.h" + +/// @brief start the eBPF program with JSON and wait for it to exit +/// @param program_data the json data of eBPF program +/// @return 0 on success, -1 on failure, the eBPF program will be terminated in failure case +int +start_bpf_program(char *program_data) +{ + int res = create_bpf(program_data, strlen(program_data)); + if (res < 0) { + printf("create_bpf failed %d", res); + return -1; + } + res = run_bpf(res); + if (res < 0) { + printf("run_bpf failed %d\n", res); + return -1; + } + res = wait_and_poll_bpf(res); + if (res < 0) { + printf("wait_and_poll_bpf failed %d\n", res); + return -1; + } + return 0; +} + +/// @brief set the global variable of bpf program to the value +/// @param program the json program data +/// @param key global +/// @param value arg value +/// @return new eBPF program +cJSON * +set_bpf_program_global_var(cJSON *program, char *key, cJSON *value) +{ + + cJSON *args = cJSON_GetObjectItem(program, "runtime_args"); + if (args == NULL) + { + args = cJSON_CreateObject(); + cJSON_AddItemToObject(program, "runtime_args", args); + } + cJSON_AddItemToObject(args, key, value); + return program; +} + +#endif // EWASM_APP_INIT_H diff --git a/6-sigsnoop/eunomia-include/native-ewasm.h b/6-sigsnoop/eunomia-include/native-ewasm.h new file mode 100644 index 0000000..975d675 --- /dev/null +++ b/6-sigsnoop/eunomia-include/native-ewasm.h @@ -0,0 +1,50 @@ +#ifndef EWASM_NATIVE_API_H_ +#define EWASM_NATIVE_API_H_ + +/// c function interface to called from wasm +#ifdef __cplusplus +extern "C" { +#endif +/// @brief create a ebpf program with json data +/// @param ebpf_json +/// @return id on success, -1 on failure +int +create_bpf(char *ebpf_json, int str_len); + +/// @brief start running the ebpf program +/// @details load and attach the ebpf program to the kernel to run the ebpf +/// program if the ebpf program has maps to export to user space, you need to +/// call the wait and export. +int +run_bpf(int id); + +/// @brief wait for the program to exit and receive data from export maps and +/// print the data +/// @details if the program has a ring buffer or perf event to export data +/// to user space, the program will help load the map info and poll the +/// events automatically. +int +wait_and_poll_bpf(int id); +#ifdef __cplusplus +} +#endif + + +/// @brief init the eBPF program +/// @param env_json the env config from input +/// @return 0 on success, -1 on failure, the eBPF program will be terminated in +/// failure case +int +bpf_main(char *env_json, int str_len); + +/// @brief handle the event output from the eBPF program, valid only when +/// wait_and_poll_events is called +/// @param ctx user defined context +/// @param e json event message +/// @return 0 on success, -1 on failure, +/// the event will be send to next handler in chain on success, or dropped in +/// failure +int +process_event(int ctx, char *e, int str_len); + +#endif // NATIVE_EWASM_H_ diff --git a/6-sigsnoop/eunomia-include/sigsnoop.skel.h b/6-sigsnoop/eunomia-include/sigsnoop.skel.h new file mode 100644 index 0000000..e65696f --- /dev/null +++ b/6-sigsnoop/eunomia-include/sigsnoop.skel.h @@ -0,0 +1,195 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ + +/* THIS FILE IS AUTOGENERATED BY BPFTOOL! */ +#ifndef __SIGSNOOP_BPF_SKEL_H__ +#define __SIGSNOOP_BPF_SKEL_H__ + +extern int errno; +#include + +struct bpf_object_skeleton; +struct bpf_object; +struct bpf_map; +struct bpf_program; +struct bpf_object_open_opts; +struct bpf_link; + +struct sigsnoop_bpf { + struct bpf_object_skeleton *skeleton; + struct bpf_object *obj; + struct { + struct bpf_map *events; + struct bpf_map *values; + struct bpf_map *rodata; + } maps; + struct { + struct bpf_program *kill_entry; + struct bpf_program *kill_exit; + struct bpf_program *tkill_entry; + struct bpf_program *tkill_exit; + struct bpf_program *tgkill_entry; + struct bpf_program *tgkill_exit; + struct bpf_program *sig_trace; + } progs; + struct { + struct bpf_link *kill_entry; + struct bpf_link *kill_exit; + struct bpf_link *tkill_entry; + struct bpf_link *tkill_exit; + struct bpf_link *tgkill_entry; + struct bpf_link *tgkill_exit; + struct bpf_link *sig_trace; + } links; + struct sigsnoop_bpf__rodata { + int filtered_pid; + int target_signal; + bool failed_only; + } *rodata; + +#ifdef __cplusplus + static inline struct sigsnoop_bpf *open(const struct bpf_object_open_opts *opts = nullptr); + static inline struct sigsnoop_bpf *open_and_load(); + static inline int load(struct sigsnoop_bpf *skel); + static inline int attach(struct sigsnoop_bpf *skel); + static inline void detach(struct sigsnoop_bpf *skel); + static inline void destroy(struct sigsnoop_bpf *skel); + static inline const void *elf_bytes(size_t *sz); +#endif /* __cplusplus */ +}; + +static void +sigsnoop_bpf__destroy(struct sigsnoop_bpf *obj) +{ + +} + +static inline int +sigsnoop_bpf__create_skeleton(struct sigsnoop_bpf *obj); + +static inline struct sigsnoop_bpf * +sigsnoop_bpf__open_opts(const struct bpf_object_open_opts *opts) +{ + struct sigsnoop_bpf *obj; + int err; + + obj = (struct sigsnoop_bpf *)calloc(1, sizeof(*obj)); + if (!obj) { + errno = ENOMEM; + return NULL; + } + return obj; +} + +static inline struct sigsnoop_bpf * +sigsnoop_bpf__open(void) +{ + return sigsnoop_bpf__open_opts(NULL); +} + +static inline int +sigsnoop_bpf__load(struct sigsnoop_bpf *obj) +{ + return 0; +} + +static inline struct sigsnoop_bpf * +sigsnoop_bpf__open_and_load(void) +{ + return NULL; +} + +static inline int +sigsnoop_bpf__attach(struct sigsnoop_bpf *obj) +{ + return 0; +} + +static inline void +sigsnoop_bpf__detach(struct sigsnoop_bpf *obj) +{ +} + +static inline const void *sigsnoop_bpf__elf_bytes(size_t *sz); + +static inline int +sigsnoop_bpf__create_skeleton(struct sigsnoop_bpf *obj) +{ + return 0; +} + +#ifdef __cplusplus +struct sigsnoop_bpf *sigsnoop_bpf::open(const struct bpf_object_open_opts *opts) { return sigsnoop_bpf__open_opts(opts); } +struct sigsnoop_bpf *sigsnoop_bpf::open_and_load() { return sigsnoop_bpf__open_and_load(); } +int sigsnoop_bpf::load(struct sigsnoop_bpf *skel) { return sigsnoop_bpf__load(skel); } +int sigsnoop_bpf::attach(struct sigsnoop_bpf *skel) { return sigsnoop_bpf__attach(skel); } +void sigsnoop_bpf::detach(struct sigsnoop_bpf *skel) { sigsnoop_bpf__detach(skel); } +void sigsnoop_bpf::destroy(struct sigsnoop_bpf *skel) { sigsnoop_bpf__destroy(skel); } +const void *sigsnoop_bpf::elf_bytes(size_t *sz) { return sigsnoop_bpf__elf_bytes(sz); } +#endif /* __cplusplus */ + +__attribute__((unused)) static void +sigsnoop_bpf__assert(struct sigsnoop_bpf *s __attribute__((unused))) +{ +#ifdef __cplusplus +#define _Static_assert static_assert +#endif + _Static_assert(sizeof(s->rodata->filtered_pid) == 4, "unexpected size of 'filtered_pid'"); + _Static_assert(sizeof(s->rodata->target_signal) == 4, "unexpected size of 'target_signal'"); + _Static_assert(sizeof(s->rodata->failed_only) == 1, "unexpected size of 'failed_only'"); +#ifdef __cplusplus +#undef _Static_assert +#endif +} + +struct perf_buffer; +void perf_buffer__free(struct perf_buffer *pb) { +} +int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms) { + return start_bpf_program(program_data); +} +int bpf_program__set_autoload(struct bpf_program *prog, bool autoload) { + return 0; +} +char* strerror(int errnum) { + return "error"; +} +int bpf_map__fd(const struct bpf_map *map) { + return 0; +} +typedef void (*perf_buffer_sample_fn)(void *ctx, int cpu, + void *data, unsigned int size); +typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, unsigned long long cnt); +struct perf_buffer; + +perf_buffer_sample_fn global_cb; +struct perf_buffer_opts; + +struct perf_buffer * +perf_buffer__new(int map_fd, size_t page_cnt, + perf_buffer_sample_fn sample_cb, perf_buffer_lost_fn lost_cb, void *ctx, + const struct perf_buffer_opts *opts) { + global_cb = sample_cb; + return (void*)1; + } + +int process_event(int ctx, char *e, int str_len) +{ + struct event eve = {0}; + cJSON *json = cJSON_Parse(e); + eve.sig = cJSON_GetObjectItem(json, "sig")->valueint; + eve.pid = cJSON_GetObjectItem(json, "pid")->valueint; + strcpy(eve.comm, cJSON_GetObjectItem(json, "comm")->valuestring); + eve.tpid = cJSON_GetObjectItem(json, "tpid")->valueint; + eve.ret = cJSON_GetObjectItem(json, "ret")->valueint; + global_cb((void*)ctx, 0, &eve, str_len); + return 0; +} + +extern const char argp_program_doc[]; + +void argp_state_help(const struct argp_state *__state, int flag) { + printf("%s", argp_program_doc); + exit(0); +} + +#endif /* __SIGSNOOP_BPF_SKEL_H__ */ diff --git a/6-sigsnoop/eunomia-include/wasm-app.h b/6-sigsnoop/eunomia-include/wasm-app.h new file mode 100644 index 0000000..77eee77 --- /dev/null +++ b/6-sigsnoop/eunomia-include/wasm-app.h @@ -0,0 +1,8 @@ +#ifndef EWASM_EWASM_APP_H_ +#define EWASM_EWASM_APP_H_ + +// header only helpers for develop wasm app +#include "cJSON/cJSON.c" +#include "helpers.h" + +#endif // EWASM_EWASM_APP_H diff --git a/6-sigsnoop/sigsnoop.bpf.c b/6-sigsnoop/sigsnoop.bpf.c new file mode 100755 index 0000000..e03981f --- /dev/null +++ b/6-sigsnoop/sigsnoop.bpf.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +/* Copyright (c) 2021~2022 Hengqi Chen */ +#include +#include +#include "sigsnoop.h" + +#define MAX_ENTRIES 10240 + +const volatile pid_t filtered_pid = 0; +const volatile int target_signal = 0; +const volatile bool failed_only = false; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_ENTRIES); + __type(key, __u32); + __type(value, struct event); +} values SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); +} events SEC(".maps"); + +static int probe_entry(pid_t tpid, int sig) +{ + struct event event = {}; + __u64 pid_tgid; + __u32 pid, tid; + + if (target_signal && sig != target_signal) + return 0; + + pid_tgid = bpf_get_current_pid_tgid(); + pid = pid_tgid >> 32; + tid = (__u32)pid_tgid; + if (filtered_pid && pid != filtered_pid) + return 0; + + event.pid = pid; + event.tpid = tpid; + event.sig = sig; + bpf_get_current_comm(event.comm, sizeof(event.comm)); + bpf_map_update_elem(&values, &tid, &event, BPF_ANY); + return 0; +} + +static int probe_exit(void *ctx, int ret) +{ + __u64 pid_tgid = bpf_get_current_pid_tgid(); + __u32 tid = (__u32)pid_tgid; + struct event *eventp; + + eventp = bpf_map_lookup_elem(&values, &tid); + if (!eventp) + return 0; + + if (failed_only && ret >= 0) + goto cleanup; + + eventp->ret = ret; + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, eventp, sizeof(*eventp)); + +cleanup: + bpf_map_delete_elem(&values, &tid); + return 0; +} + +SEC("tracepoint/syscalls/sys_enter_kill") +int kill_entry(struct trace_event_raw_sys_enter *ctx) +{ + pid_t tpid = (pid_t)ctx->args[0]; + int sig = (int)ctx->args[1]; + + return probe_entry(tpid, sig); +} + +SEC("tracepoint/syscalls/sys_exit_kill") +int kill_exit(struct trace_event_raw_sys_exit *ctx) +{ + return probe_exit(ctx, ctx->ret); +} + +SEC("tracepoint/syscalls/sys_enter_tkill") +int tkill_entry(struct trace_event_raw_sys_enter *ctx) +{ + pid_t tpid = (pid_t)ctx->args[0]; + int sig = (int)ctx->args[1]; + + return probe_entry(tpid, sig); +} + +SEC("tracepoint/syscalls/sys_exit_tkill") +int tkill_exit(struct trace_event_raw_sys_exit *ctx) +{ + return probe_exit(ctx, ctx->ret); +} + +SEC("tracepoint/syscalls/sys_enter_tgkill") +int tgkill_entry(struct trace_event_raw_sys_enter *ctx) +{ + pid_t tpid = (pid_t)ctx->args[1]; + int sig = (int)ctx->args[2]; + + return probe_entry(tpid, sig); +} + +SEC("tracepoint/syscalls/sys_exit_tgkill") +int tgkill_exit(struct trace_event_raw_sys_exit *ctx) +{ + return probe_exit(ctx, ctx->ret); +} + +SEC("tracepoint/signal/signal_generate") +int sig_trace(struct trace_event_raw_signal_generate *ctx) +{ + struct event event = {}; + pid_t tpid = ctx->pid; + int ret = ctx->errno; + int sig = ctx->sig; + __u64 pid_tgid; + __u32 pid; + + if (failed_only && ret == 0) + return 0; + + if (target_signal && sig != target_signal) + return 0; + + pid_tgid = bpf_get_current_pid_tgid(); + pid = pid_tgid >> 32; + if (filtered_pid && pid != filtered_pid) + return 0; + + event.pid = pid; + event.tpid = tpid; + event.sig = sig; + event.ret = ret; + bpf_get_current_comm(event.comm, sizeof(event.comm)); + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event)); + return 0; +} + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; diff --git a/6-sigsnoop/sigsnoop.h b/6-sigsnoop/sigsnoop.h new file mode 100755 index 0000000..a9826d8 --- /dev/null +++ b/6-sigsnoop/sigsnoop.h @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +/* Copyright (c) 2021~2022 Hengqi Chen */ +#ifndef __SIGSNOOP_H +#define __SIGSNOOP_H + +#define TASK_COMM_LEN 16 + +struct event { + unsigned int pid; + unsigned int tpid; + int sig; + int ret; + char comm[TASK_COMM_LEN]; +}; + +#endif /* __SIGSNOOP_H */ diff --git a/6-sigsnoop/sigsnoop.md b/6-sigsnoop/sigsnoop.md new file mode 100644 index 0000000..e59540c --- /dev/null +++ b/6-sigsnoop/sigsnoop.md @@ -0,0 +1,92 @@ +## eBPF 入门实践教程:编写 eBPF 程序 sigsnoop 工具监控全局 signal 事件 + +### 背景 + +### 实现原理 + +`sigsnoop` 在利用了linux的tracepoint挂载点,其在syscall进入和退出的各个关键挂载点均挂载了执行函数。 +```c +SEC("tracepoint/syscalls/sys_enter_kill") +int kill_entry(struct trace_event_raw_sys_enter *ctx) +{ + pid_t tpid = (pid_t)ctx->args[0]; + int sig = (int)ctx->args[1]; + + return probe_entry(tpid, sig); +} + +SEC("tracepoint/syscalls/sys_exit_kill") +int kill_exit(struct trace_event_raw_sys_exit *ctx) +{ + return probe_exit(ctx, ctx->ret); +} + +SEC("tracepoint/syscalls/sys_enter_tkill") +int tkill_entry(struct trace_event_raw_sys_enter *ctx) +{ + pid_t tpid = (pid_t)ctx->args[0]; + int sig = (int)ctx->args[1]; + + return probe_entry(tpid, sig); +} + +SEC("tracepoint/syscalls/sys_exit_tkill") +int tkill_exit(struct trace_event_raw_sys_exit *ctx) +{ + return probe_exit(ctx, ctx->ret); +} + +SEC("tracepoint/syscalls/sys_enter_tgkill") +int tgkill_entry(struct trace_event_raw_sys_enter *ctx) +{ + pid_t tpid = (pid_t)ctx->args[1]; + int sig = (int)ctx->args[2]; + + return probe_entry(tpid, sig); +} + +SEC("tracepoint/syscalls/sys_exit_tgkill") +int tgkill_exit(struct trace_event_raw_sys_exit *ctx) +{ + return probe_exit(ctx, ctx->ret); +} + +SEC("tracepoint/signal/signal_generate") +int sig_trace(struct trace_event_raw_signal_generate *ctx) +{ + struct event event = {}; + pid_t tpid = ctx->pid; + int ret = ctx->errno; + int sig = ctx->sig; + __u64 pid_tgid; + __u32 pid; + + if (failed_only && ret == 0) + return 0; + + if (target_signal && sig != target_signal) + return 0; + + pid_tgid = bpf_get_current_pid_tgid(); + pid = pid_tgid >> 32; + if (filtered_pid && pid != filtered_pid) + return 0; + + event.pid = pid; + event.tpid = tpid; + event.sig = sig; + event.ret = ret; + bpf_get_current_comm(event.comm, sizeof(event.comm)); + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &event, sizeof(event)); + return 0; +} + +``` + + +### Eunomia中使用方式 + +![result](../imgs/sigsnoop.png) +![result](../imgs/sigsnoop-prometheus.png) + +### 总结 diff --git a/7-execsnoop/.gitignore b/7-execsnoop/.gitignore new file mode 100644 index 0000000..0e8325e --- /dev/null +++ b/7-execsnoop/.gitignore @@ -0,0 +1,3 @@ +ecli +package.json + diff --git a/7-execsnoop/README.md b/7-execsnoop/README.md new file mode 100644 index 0000000..804c073 --- /dev/null +++ b/7-execsnoop/README.md @@ -0,0 +1,148 @@ +--- +layout: post +title: execsnoop +date: 2022-11-17 19:57 +category: bpftools +author: yunwei37 +tags: [bpftools, syscall] +summary: execsnoop traces the exec() syscall system-wide, and prints various details. +--- + +## origin + +origin from: + +https://github.com/iovisor/bcc/blob/master/libbpf-tools/execsnoop.bpf.c + +## Compile and Run + +Compile: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +Run: + +``` +$ sudo ./ecli run package.json + +running and waiting for the ebpf events from perf event... +time pid ppid uid retval args_count args_size comm args +23:07:35 32940 32783 1000 0 1 13 cat /usr/bin/cat +23:07:43 32946 24577 1000 0 1 10 bash /bin/bash +23:07:43 32948 32946 1000 0 1 18 lesspipe /usr/bin/lesspipe +23:07:43 32949 32948 1000 0 2 36 basename /usr/bin/basename +23:07:43 32951 32950 1000 0 2 35 dirname /usr/bin/dirname +23:07:43 32952 32946 1000 0 2 22 dircolors /usr/bin/dircolors +23:07:48 32953 32946 1000 0 2 25 ls /usr/bin/ls +23:07:53 32957 32946 1000 0 2 17 sleep /usr/bin/sleep +23:07:57 32959 32946 1000 0 1 17 oneko /usr/games/oneko + +``` + +## details in bcc + +Demonstrations of execsnoop, the Linux eBPF/bcc version. + +execsnoop traces the exec() syscall system-wide, and prints various details. +Example output: + +``` +# ./execsnoop +COMM PID PPID RET ARGS +bash 33161 24577 0 /bin/bash +lesspipe 33163 33161 0 /usr/bin/lesspipe +basename 33164 33163 0 /usr/bin/basename /usr/bin/lesspipe +dirname 33166 33165 0 /usr/bin/dirname /usr/bin/lesspipe +dircolors 33167 33161 0 /usr/bin/dircolors -b +ls 33172 33161 0 /usr/bin/ls --color=auto +top 33173 33161 0 /usr/bin/top +oneko 33174 33161 0 /usr/games/oneko +systemctl 33175 2975 0 /bin/systemctl is-enabled -q whoopsie.path +apport-checkrep 33176 2975 0 /usr/share/apport/apport-checkreports +apport-checkrep 33177 2975 0 /usr/share/apport/apport-checkreports --system +apport-checkrep 33178 2975 0 /usr/share/apport/apport-checkreports --system + +``` + +This shows process information when exec system call is called. + +USAGE message: + +``` +usage: execsnoop [-h] [-T] [-t] [-x] [--cgroupmap CGROUPMAP] + [--mntnsmap MNTNSMAP] [-u USER] [-q] [-n NAME] + [-l LINE] [-U] [--max-args MAX_ARGS] + +Trace exec() syscalls + +options: + -h, --help show this help message and exit + -T, --time include time column on output (HH:MM:SS) + -t, --timestamp include timestamp on output + -x, --fails include failed exec()s + --cgroupmap CGROUPMAP + trace cgroups in this BPF map only + --mntnsmap MNTNSMAP trace mount namespaces in this BPF map only + -u USER, --uid USER trace this UID only + -q, --quote Add quotemarks (") around arguments. + -n NAME, --name NAME only print commands matching this name (regex), any + arg + -l LINE, --line LINE only print commands where arg contains this line + (regex) + -U, --print-uid print UID column + --max-args MAX_ARGS maximum number of arguments parsed and displayed, + defaults to 20 + +examples: + ./execsnoop # trace all exec() syscalls + ./execsnoop -x # include failed exec()s + ./execsnoop -T # include time (HH:MM:SS) + ./execsnoop -U # include UID + ./execsnoop -u 1000 # only trace UID 1000 + ./execsnoop -u user # get user UID and trace only them + ./execsnoop -t # include timestamps + ./execsnoop -q # add "quotemarks" around arguments + ./execsnoop -n main # only print command lines containing "main" + ./execsnoop -l tpkg # only print command where arguments contains "tpkg" + ./execsnoop --cgroupmap mappath # only trace cgroups in this BPF map + ./execsnoop --mntnsmap mappath # only trace mount namespaces in the map + + +``` + +The -T and -t option include time and timestamps on output: + +``` +# ./execsnoop -T -t +TIME TIME(s) PCOMM PID PPID RET ARGS +23:35:25 4.335 bash 33360 24577 0 /bin/bash +23:35:25 4.338 lesspipe 33361 33360 0 /usr/bin/lesspipe +23:35:25 4.340 basename 33362 33361 0 /usr/bin/basename /usr/bin/lesspipe +23:35:25 4.342 dirname 33364 33363 0 /usr/bin/dirname /usr/bin/lesspipe +23:35:25 4.347 dircolors 33365 33360 0 /usr/bin/dircolors -b +23:35:40 19.327 touch 33367 33366 0 /usr/bin/touch /run/udev/gdm-machine-has-hardware-gpu +23:35:40 19.329 snap-device-hel 33368 33366 0 /usr/lib/snapd/snap-device-helper change snap_firefox_firefox /devices/pci0000:00/0000:00:02.0/drm/card0 226:0 +23:35:40 19.331 snap-device-hel 33369 33366 0 /usr/lib/snapd/snap-device-helper change snap_firefox_geckodriver /devices/pci0000:00/0000:00:02.0/drm/card0 226:0 +23:35:40 19.332 snap-device-hel 33370 33366 0 /usr/lib/snapd/snap-device-helper change snap_snap-store_snap-store /devices/pci0000:00/0000:00:02.0/drm/card0 226:0 + +``` + +The -u option filtering UID: + +``` +# ./execsnoop -Uu 1000 +UID PCOMM PID PPID RET ARGS +1000 bash 33604 24577 0 /bin/bash +1000 lesspipe 33606 33604 0 /usr/bin/lesspipe +1000 basename 33607 33606 0 /usr/bin/basename /usr/bin/lesspipe +1000 dirname 33609 33608 0 /usr/bin/dirname /usr/bin/lesspipe +1000 dircolors 33610 33604 0 /usr/bin/dircolors -b +1000 sleep 33615 33604 0 /usr/bin/sleep +1000 sleep 33616 33604 0 /usr/bin/sleep 1 +1000 clear 33617 33604 0 /usr/bin/clear + +``` + +Report bugs to https://github.com/iovisor/bcc/tree/master/libbpf-tools. diff --git a/7-execsnoop/execsnoop.bpf.c b/7-execsnoop/execsnoop.bpf.c new file mode 100644 index 0000000..06e08f4 --- /dev/null +++ b/7-execsnoop/execsnoop.bpf.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +#include +#include +#include +#include "execsnoop.bpf.h" + +const volatile bool filter_cg = false; +const volatile bool ignore_failed = true; +const volatile uid_t targ_uid = INVALID_UID; +const volatile int max_args = DEFAULT_MAXARGS; + +static const struct event empty_event = {}; + +struct { + __uint(type, BPF_MAP_TYPE_CGROUP_ARRAY); + __type(key, u32); + __type(value, u32); + __uint(max_entries, 1); +} cgroup_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10240); + __type(key, pid_t); + __type(value, struct event); +} execs SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(u32)); + __uint(value_size, sizeof(u32)); +} events SEC(".maps"); + +static __always_inline bool valid_uid(uid_t uid) { + return uid != INVALID_UID; +} + +SEC("tracepoint/syscalls/sys_enter_execve") +int tracepoint__syscalls__sys_enter_execve(struct trace_event_raw_sys_enter* ctx) +{ + u64 id; + pid_t pid, tgid; + unsigned int ret; + struct event *event; + struct task_struct *task; + const char **args = (const char **)(ctx->args[1]); + const char *argp; + + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + uid_t uid = (u32)bpf_get_current_uid_gid(); + int i; + + if (valid_uid(targ_uid) && targ_uid != uid) + return 0; + + id = bpf_get_current_pid_tgid(); + pid = (pid_t)id; + tgid = id >> 32; + if (bpf_map_update_elem(&execs, &pid, &empty_event, BPF_NOEXIST)) + return 0; + + event = bpf_map_lookup_elem(&execs, &pid); + if (!event) + return 0; + + event->pid = tgid; + event->uid = uid; + task = (struct task_struct*)bpf_get_current_task(); + event->ppid = (pid_t)BPF_CORE_READ(task, real_parent, tgid); + event->args_count = 0; + event->args_size = 0; + + ret = bpf_probe_read_user_str(event->args, ARGSIZE, (const char*)ctx->args[0]); + if (ret <= ARGSIZE) { + event->args_size += ret; + } else { + /* write an empty string */ + event->args[0] = '\0'; + event->args_size++; + } + + event->args_count++; + #pragma unroll + for (i = 1; i < TOTAL_MAX_ARGS && i < max_args; i++) { + bpf_probe_read_user(&argp, sizeof(argp), &args[i]); + if (!argp) + return 0; + + if (event->args_size > LAST_ARG) + return 0; + + ret = bpf_probe_read_user_str(&event->args[event->args_size], ARGSIZE, argp); + if (ret > ARGSIZE) + return 0; + + event->args_count++; + event->args_size += ret; + } + /* try to read one more argument to check if there is one */ + bpf_probe_read_user(&argp, sizeof(argp), &args[max_args]); + if (!argp) + return 0; + + /* pointer to max_args+1 isn't null, asume we have more arguments */ + event->args_count++; + return 0; +} + +SEC("tracepoint/syscalls/sys_exit_execve") +int tracepoint__syscalls__sys_exit_execve(struct trace_event_raw_sys_exit* ctx) +{ + u64 id; + pid_t pid; + int ret; + struct event *event; + + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + u32 uid = (u32)bpf_get_current_uid_gid(); + + if (valid_uid(targ_uid) && targ_uid != uid) + return 0; + id = bpf_get_current_pid_tgid(); + pid = (pid_t)id; + event = bpf_map_lookup_elem(&execs, &pid); + if (!event) + return 0; + ret = ctx->ret; + if (ignore_failed && ret < 0) + goto cleanup; + + event->retval = ret; + bpf_get_current_comm(&event->comm, sizeof(event->comm)); + size_t len =((size_t)(&((struct event*)0)->args) + event->args_size); + if (len <= sizeof(*event)) + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, event, len); +cleanup: + bpf_map_delete_elem(&execs, &pid); + return 0; +} + +char LICENSE[] SEC("license") = "GPL"; + diff --git a/7-execsnoop/execsnoop.bpf.h b/7-execsnoop/execsnoop.bpf.h new file mode 100644 index 0000000..37ed7ac --- /dev/null +++ b/7-execsnoop/execsnoop.bpf.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __EXECSNOOP_H +#define __EXECSNOOP_H + +#define ARGSIZE 128 +#define TASK_COMM_LEN 16 +#define TOTAL_MAX_ARGS 60 +#define DEFAULT_MAXARGS 20 +#define FULL_MAX_ARGS_ARR (TOTAL_MAX_ARGS * ARGSIZE) +#define INVALID_UID ((uid_t)-1) +#define LAST_ARG (FULL_MAX_ARGS_ARR - ARGSIZE) + +struct event { + int pid; + int ppid; + int uid; + int retval; + int args_count; + unsigned int args_size; + char comm[TASK_COMM_LEN]; + char args[FULL_MAX_ARGS_ARR]; +}; + +#endif /* __EXECSNOOP_H */ + + diff --git a/8-runqslower/.gitignore b/8-runqslower/.gitignore new file mode 100644 index 0000000..e4bde33 --- /dev/null +++ b/8-runqslower/.gitignore @@ -0,0 +1,4 @@ +.vscode +package.json +eunomia-exporter +ecli diff --git a/8-runqslower/README.md b/8-runqslower/README.md new file mode 100644 index 0000000..2190350 --- /dev/null +++ b/8-runqslower/README.md @@ -0,0 +1,147 @@ +| layout | title | date | category | author | tags | summary | +| ------ | ---------- | ---------------- | -------- | -------- | --------------- | ----------------------------------------------- | +| post | runqslower | 2022-11-11-20:50 | bpftools | yunwei37 | bpftool syscall | runqslower Trace long process scheduling delays | + +## origin + +origin from: + +https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqslower.bpf.c + +result: + +``` +$ sudo ecli/build/bin/Release/ecli run examples/bpftools/runqslower/package.json + +running and waiting for the ebpf events from perf event... +time task prev_task delta_us pid prev_pid +20:11:59 gnome-shell swapper/0 32 2202 0 +20:11:59 ecli swapper/3 23 3437 0 +20:11:59 rcu_sched swapper/1 1 14 0 +20:11:59 gnome-terminal- swapper/1 13 2714 0 +20:11:59 ecli swapper/3 2 3437 0 +20:11:59 kworker/3:3 swapper/3 3 215 0 +20:11:59 containerd swapper/1 8 1088 0 +20:11:59 ecli swapper/2 5 3437 0 +20:11:59 HangDetector swapper/3 6 854 0 +20:11:59 ecli swapper/2 60 3437 0 +20:11:59 rcu_sched swapper/1 26 14 0 +20:11:59 kworker/0:1 swapper/0 26 3414 0 +20:11:59 ecli swapper/2 6 3437 0 +``` + +## Compile and Run + +Compile: + +``` +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +Run: + +``` +sudo ./ecli run examples/bpftools/runqslower/package.json +``` + +## details in bcc + +Demonstrations of runqslower, the Linux eBPF/bcc version. + +runqslower traces high scheduling delays between tasks being ready to run and them running on CPU after that. Example output: + +``` +# ./runqslower +Tracing run queue latency higher than 10000 us +TIME COMM TID LAT(us) +13:11:43 b'kworker/0:2' 8680 10250 +13:12:18 b'irq/16-vmwgfx' 422 10838 +13:12:18 b'systemd-oomd' 753 11012 +13:12:18 b'containerd' 8272 11254 +13:12:18 b'HangDetector' 764 12042 +^C +`` +This measures the time a task spends waiting on a run queue for a turn on-CPU, and shows this time as a individual events. This time should be small, but a task may need to wait its turn due to CPU load. + +This measures two types of run queue latency: +1. The time from a task being enqueued on a run queue to its context switch and execution. This traces ttwu_do_wakeup(), wake_up_new_task() -> finish_task_switch() with either raw tracepoints (if supported) or kprobes and instruments the run queue latency after a voluntary context switch. +2. The time from when a task was involuntary context switched and still in the runnable state, to when it next executed. This is instrumented from finish_task_switch() alone. + +The overhead of this tool may become significant for some workloads: see the OVERHEAD section. + +This works by tracing various kernel scheduler functions using dynamic tracing, and will need updating to match any changes to these functions. + +Since this uses BPF, only the root user can use this tool. + +```console +Usage: runqslower [-h] [-p PID | -t TID | -P] [min_us] +``` + +The min_us option sets the latency of the run queue to track: + +``` +# ./runqslower 100 +Tracing run queue latency higher than 100 us +TIME COMM TID LAT(us) +20:48:26 b'gnome-shell' 3005 201 +20:48:26 b'gnome-shell' 3005 202 +20:48:26 b'gnome-shell' 3005 254 +20:48:26 b'gnome-shell' 3005 208 +20:48:26 b'gnome-shell' 3005 132 +20:48:26 b'gnome-shell' 3005 213 +20:48:26 b'gnome-shell' 3005 205 +20:48:26 b'python3' 5224 127 +20:48:26 b'gnome-shell' 3005 214 +20:48:26 b'gnome-shell' 3005 126 +20:48:26 b'gnome-shell' 3005 285 +20:48:26 b'Xorg' 2869 296 +20:48:26 b'gnome-shell' 3005 119 +20:48:26 b'gnome-shell' 3005 206 +``` + +The -p PID option only traces this PID: + +``` +# ./runqslower -p 3005 +Tracing run queue latency higher than 10000 us +TIME COMM TID LAT(us) +20:46:22 b'gnome-shell' 3005 16024 +20:46:45 b'gnome-shell' 3005 11494 +20:46:45 b'gnome-shell' 3005 21430 +20:46:45 b'gnome-shell' 3005 14948 +20:47:16 b'gnome-shell' 3005 10164 +20:47:16 b'gnome-shell' 3005 18070 +20:47:17 b'gnome-shell' 3005 13272 +20:47:18 b'gnome-shell' 3005 10451 +20:47:18 b'gnome-shell' 3005 15010 +20:47:18 b'gnome-shell' 3005 19449 +20:47:22 b'gnome-shell' 3005 19327 +20:47:23 b'gnome-shell' 3005 13178 +20:47:23 b'gnome-shell' 3005 13483 +20:47:23 b'gnome-shell' 3005 15562 +20:47:23 b'gnome-shell' 3005 13655 +20:47:23 b'gnome-shell' 3005 19571 +``` + +The -P option also shows previous task name and TID: + +``` +# ./runqslower -P +Tracing run queue latency higher than 10000 us +TIME COMM TID LAT(us) PREV COMM PREV TID +20:42:48 b'sysbench' 5159 10562 b'sysbench' 5152 +20:42:48 b'sysbench' 5159 10367 b'sysbench' 5152 +20:42:49 b'sysbench' 5158 11818 b'sysbench' 5159 +20:42:49 b'sysbench' 5160 16913 b'sysbench' 5153 +20:42:49 b'sysbench' 5157 13742 b'sysbench' 5160 +20:42:49 b'sysbench' 5152 13746 b'sysbench' 5160 +20:42:49 b'sysbench' 5153 13731 b'sysbench' 5160 +20:42:49 b'sysbench' 5158 14688 b'sysbench' 5161 +20:42:50 b'sysbench' 5155 10468 b'sysbench' 5152 +20:42:50 b'sysbench' 5156 17695 b'sysbench' 5158 +20:42:50 b'sysbench' 5155 11251 b'sysbench' 5152 +20:42:50 b'sysbench' 5154 13283 b'sysbench' 5152 +20:42:50 b'sysbench' 5158 22278 b'sysbench' 5157 +``` + +For more details, see docs/special_filtering.md \ No newline at end of file diff --git a/8-runqslower/core_fixes.h b/8-runqslower/core_fixes.h new file mode 100644 index 0000000..003163a --- /dev/null +++ b/8-runqslower/core_fixes.h @@ -0,0 +1,112 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +/* Copyright (c) 2021 Hengqi Chen */ + +#ifndef __CORE_FIXES_BPF_H +#define __CORE_FIXES_BPF_H + +#include +#include + +/** + * commit 2f064a59a1 ("sched: Change task_struct::state") changes + * the name of task_struct::state to task_struct::__state + * see: + * https://github.com/torvalds/linux/commit/2f064a59a1 + */ +struct task_struct___o { + volatile long int state; +} __attribute__((preserve_access_index)); + +struct task_struct___x { + unsigned int __state; +} __attribute__((preserve_access_index)); + +static __always_inline __s64 get_task_state(void *task) +{ + struct task_struct___x *t = task; + + if (bpf_core_field_exists(t->__state)) + return BPF_CORE_READ(t, __state); + return BPF_CORE_READ((struct task_struct___o *)task, state); +} + +/** + * commit 309dca309fc3 ("block: store a block_device pointer in struct bio") + * adds a new member bi_bdev which is a pointer to struct block_device + * see: + * https://github.com/torvalds/linux/commit/309dca309fc3 + */ +struct bio___o { + struct gendisk *bi_disk; +} __attribute__((preserve_access_index)); + +struct bio___x { + struct block_device *bi_bdev; +} __attribute__((preserve_access_index)); + +static __always_inline struct gendisk *get_gendisk(void *bio) +{ + struct bio___x *b = bio; + + if (bpf_core_field_exists(b->bi_bdev)) + return BPF_CORE_READ(b, bi_bdev, bd_disk); + return BPF_CORE_READ((struct bio___o *)bio, bi_disk); +} + +/** + * commit d5869fdc189f ("block: introduce block_rq_error tracepoint") + * adds a new tracepoint block_rq_error and it shares the same arguments + * with tracepoint block_rq_complete. As a result, the kernel BTF now has + * a `struct trace_event_raw_block_rq_completion` instead of + * `struct trace_event_raw_block_rq_complete`. + * see: + * https://github.com/torvalds/linux/commit/d5869fdc189f + */ +struct trace_event_raw_block_rq_complete___x { + dev_t dev; + sector_t sector; + unsigned int nr_sector; +} __attribute__((preserve_access_index)); + +struct trace_event_raw_block_rq_completion___x { + dev_t dev; + sector_t sector; + unsigned int nr_sector; +} __attribute__((preserve_access_index)); + +static __always_inline bool has_block_rq_completion() +{ + if (bpf_core_type_exists(struct trace_event_raw_block_rq_completion___x)) + return true; + return false; +} + +/** + * commit d152c682f03c ("block: add an explicit ->disk backpointer to the + * request_queue") and commit f3fa33acca9f ("block: remove the ->rq_disk + * field in struct request") make some changes to `struct request` and + * `struct request_queue`. Now, to get the `struct gendisk *` field in a CO-RE + * way, we need both `struct request` and `struct request_queue`. + * see: + * https://github.com/torvalds/linux/commit/d152c682f03c + * https://github.com/torvalds/linux/commit/f3fa33acca9f + */ +struct request_queue___x { + struct gendisk *disk; +} __attribute__((preserve_access_index)); + +struct request___x { + struct request_queue___x *q; + struct gendisk *rq_disk; +} __attribute__((preserve_access_index)); + +static __always_inline struct gendisk *get_disk(void *request) +{ + struct request___x *r = request; + + if (bpf_core_field_exists(r->rq_disk)) + return BPF_CORE_READ(r, rq_disk); + return BPF_CORE_READ(r, q, disk); +} + +#endif /* __CORE_FIXES_BPF_H */ diff --git a/8-runqslower/runqslower.bpf.c b/8-runqslower/runqslower.bpf.c new file mode 100644 index 0000000..ebc0103 --- /dev/null +++ b/8-runqslower/runqslower.bpf.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2019 Facebook +#include +#include +#include +#include +#include "runqslower.bpf.h" +#include "core_fixes.h" + +#define TASK_RUNNING 0 + +const volatile __u64 min_us = 0; +const volatile pid_t targ_pid = 0; +const volatile pid_t targ_tgid = 0; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10240); + __type(key, u32); + __type(value, u64); +} start SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); + __uint(key_size, sizeof(u32)); + __uint(value_size, sizeof(u32)); +} events SEC(".maps"); + +/* record enqueue timestamp */ +static int trace_enqueue(u32 tgid, u32 pid) +{ + u64 ts; + + if (!pid) + return 0; + if (targ_tgid && targ_tgid != tgid) + return 0; + if (targ_pid && targ_pid != pid) + return 0; + + ts = bpf_ktime_get_ns(); + bpf_map_update_elem(&start, &pid, &ts, 0); + return 0; +} + +static int handle_switch(void *ctx, struct task_struct *prev, struct task_struct *next) +{ + struct event event = {}; + u64 *tsp, delta_us; + u32 pid; + + /* ivcsw: treat like an enqueue event and store timestamp */ + if (get_task_state(prev) == TASK_RUNNING) + trace_enqueue(BPF_CORE_READ(prev, tgid), BPF_CORE_READ(prev, pid)); + + pid = BPF_CORE_READ(next, pid); + + /* fetch timestamp and calculate delta */ + tsp = bpf_map_lookup_elem(&start, &pid); + if (!tsp) + return 0; /* missed enqueue */ + + delta_us = (bpf_ktime_get_ns() - *tsp) / 1000; + if (min_us && delta_us <= min_us) + return 0; + + event.pid = pid; + event.prev_pid = BPF_CORE_READ(prev, pid); + event.delta_us = delta_us; + bpf_probe_read_kernel_str(&event.task, sizeof(event.task), next->comm); + bpf_probe_read_kernel_str(&event.prev_task, sizeof(event.prev_task), prev->comm); + + /* output */ + bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, + &event, sizeof(event)); + + bpf_map_delete_elem(&start, &pid); + return 0; +} + +SEC("tp_btf/sched_wakeup") +int BPF_PROG(sched_wakeup, struct task_struct *p) +{ + return trace_enqueue(p->tgid, p->pid); +} + +SEC("tp_btf/sched_wakeup_new") +int BPF_PROG(sched_wakeup_new, struct task_struct *p) +{ + return trace_enqueue(p->tgid, p->pid); +} + +SEC("tp_btf/sched_switch") +int BPF_PROG(sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next) +{ + return handle_switch(ctx, prev, next); +} + +SEC("raw_tp/sched_wakeup") +int BPF_PROG(handle_sched_wakeup, struct task_struct *p) +{ + return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid)); +} + +SEC("raw_tp/sched_wakeup_new") +int BPF_PROG(handle_sched_wakeup_new, struct task_struct *p) +{ + return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid)); +} + +SEC("raw_tp/sched_switch") +int BPF_PROG(handle_sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next) +{ + return handle_switch(ctx, prev, next); +} + +char LICENSE[] SEC("license") = "GPL"; diff --git a/8-runqslower/runqslower.bpf.h b/8-runqslower/runqslower.bpf.h new file mode 100644 index 0000000..06e91f4 --- /dev/null +++ b/8-runqslower/runqslower.bpf.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __RUNQSLOWER_H +#define __RUNQSLOWER_H + +#define TASK_COMM_LEN 16 + +struct event { + char task[TASK_COMM_LEN]; + char prev_task[TASK_COMM_LEN]; + __u64 delta_us; + int pid; + int prev_pid; +}; + +#endif /* __RUNQSLOWER_H */ diff --git a/9-runqlat/.gitignore b/9-runqlat/.gitignore new file mode 100644 index 0000000..7d5aebf --- /dev/null +++ b/9-runqlat/.gitignore @@ -0,0 +1,6 @@ +.vscode +package.json +*.o +*.skel.json +*.skel.yaml +package.yaml diff --git a/9-runqlat/README.md b/9-runqlat/README.md new file mode 100755 index 0000000..59cbe30 --- /dev/null +++ b/9-runqlat/README.md @@ -0,0 +1,675 @@ +--- +layout: post +title: runqlat +date: 2022-10-10 16:18 +category: bpftools +author: yunwei37 +tags: [bpftools, syscall, tracepoint] +summary: Summarize run queue (scheduler) latency as a histogram. +--- + + +## origin + +origin from: + + + +This program summarizes scheduler run queue latency as a histogram, showing +how long tasks spent waiting their turn to run on-CPU. + +## Compile and Run + +Compile: + +```shell +docker run -it -v `pwd`/:/src/ yunwei37/ebpm:latest +``` + +```console +$ ecc runqlat.bpf.c runqlat.h +Compiling bpf object... +Generating export types... +Packing ebpf object and config into package.json... +``` + +Run: + +```console +$ sudo ecli examples/bpftools/runqlat/package.json -h +Usage: runqlat_bpf [--help] [--version] [--verbose] [--filter_cg] [--targ_per_process] [--targ_per_thread] [--targ_per_pidns] [--targ_ms] [--targ_tgid VAR] + +A simple eBPF program + +Optional arguments: + -h, --help shows help message and exits + -v, --version prints version information and exits + --verbose prints libbpf debug information + --filter_cg set value of bool variable filter_cg + --targ_per_process set value of bool variable targ_per_process + --targ_per_thread set value of bool variable targ_per_thread + --targ_per_pidns set value of bool variable targ_per_pidns + --targ_ms set value of bool variable targ_ms + --targ_tgid set value of pid_t variable targ_tgid + +Built with eunomia-bpf framework. +See https://github.com/eunomia-bpf/eunomia-bpf for more information. + +$ sudo ecli examples/bpftools/runqlat/package.json +key = 4294967295 +comm = rcu_preempt + + (unit) : count distribution + 0 -> 1 : 9 |**** | + 2 -> 3 : 6 |** | + 4 -> 7 : 12 |***** | + 8 -> 15 : 28 |************* | + 16 -> 31 : 40 |******************* | + 32 -> 63 : 83 |****************************************| + 64 -> 127 : 57 |*************************** | + 128 -> 255 : 19 |********* | + 256 -> 511 : 11 |***** | + 512 -> 1023 : 2 | | + 1024 -> 2047 : 2 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 0 | | + 8192 -> 16383 : 0 | | + 16384 -> 32767 : 1 | | + +$ sudo ecli examples/bpftools/runqlat/package.json --targ_per_process +key = 3189 +comm = cpptools + + (unit) : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 1 |*** | + 16 -> 31 : 2 |******* | + 32 -> 63 : 11 |****************************************| + 64 -> 127 : 8 |***************************** | + 128 -> 255 : 3 |********** | +``` + +## details in bcc + +```text +Demonstrations of runqlat, the Linux eBPF/bcc version. + + +This program summarizes scheduler run queue latency as a histogram, showing +how long tasks spent waiting their turn to run on-CPU. + +Here is a heavily loaded system: + +# ./runqlat +Tracing run queue latency... Hit Ctrl-C to end. +^C + usecs : count distribution + 0 -> 1 : 233 |*********** | + 2 -> 3 : 742 |************************************ | + 4 -> 7 : 203 |********** | + 8 -> 15 : 173 |******** | + 16 -> 31 : 24 |* | + 32 -> 63 : 0 | | + 64 -> 127 : 30 |* | + 128 -> 255 : 6 | | + 256 -> 511 : 3 | | + 512 -> 1023 : 5 | | + 1024 -> 2047 : 27 |* | + 2048 -> 4095 : 30 |* | + 4096 -> 8191 : 20 | | + 8192 -> 16383 : 29 |* | + 16384 -> 32767 : 809 |****************************************| + 32768 -> 65535 : 64 |*** | + +The distribution is bimodal, with one mode between 0 and 15 microseconds, +and another between 16 and 65 milliseconds. These modes are visible as the +spikes in the ASCII distribution (which is merely a visual representation +of the "count" column). As an example of reading one line: 809 events fell +into the 16384 to 32767 microsecond range (16 to 32 ms) while tracing. + +I would expect the two modes to be due the workload: 16 hot CPU-bound threads, +and many other mostly idle threads doing occasional work. I suspect the mostly +idle threads will run with a higher priority when they wake up, and are +the reason for the low latency mode. The high latency mode will be the +CPU-bound threads. More analysis with this and other tools can confirm. + + +A -m option can be used to show milliseconds instead, as well as an interval +and a count. For example, showing three x five second summary in milliseconds: + +# ./runqlat -m 5 3 +Tracing run queue latency... Hit Ctrl-C to end. + + msecs : count distribution + 0 -> 1 : 3818 |****************************************| + 2 -> 3 : 39 | | + 4 -> 7 : 39 | | + 8 -> 15 : 62 | | + 16 -> 31 : 2214 |*********************** | + 32 -> 63 : 226 |** | + + msecs : count distribution + 0 -> 1 : 3775 |****************************************| + 2 -> 3 : 52 | | + 4 -> 7 : 37 | | + 8 -> 15 : 65 | | + 16 -> 31 : 2230 |*********************** | + 32 -> 63 : 212 |** | + + msecs : count distribution + 0 -> 1 : 3816 |****************************************| + 2 -> 3 : 49 | | + 4 -> 7 : 40 | | + 8 -> 15 : 53 | | + 16 -> 31 : 2228 |*********************** | + 32 -> 63 : 221 |** | + +This shows a similar distribution across the three summaries. + + +A -p option can be used to show one PID only, which is filtered in kernel for +efficiency. For example, PID 4505, and one second summaries: + +# ./runqlat -mp 4505 1 +Tracing run queue latency... Hit Ctrl-C to end. + + msecs : count distribution + 0 -> 1 : 1 |* | + 2 -> 3 : 2 |*** | + 4 -> 7 : 1 |* | + 8 -> 15 : 0 | | + 16 -> 31 : 25 |****************************************| + 32 -> 63 : 3 |**** | + + msecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 2 |** | + 4 -> 7 : 0 | | + 8 -> 15 : 1 |* | + 16 -> 31 : 30 |****************************************| + 32 -> 63 : 1 |* | + + msecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 28 |****************************************| + 32 -> 63 : 2 |** | + + msecs : count distribution + 0 -> 1 : 1 |* | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 27 |****************************************| + 32 -> 63 : 4 |***** | +[...] + +For comparison, here is pidstat(1) for that process: + +# pidstat -p 4505 1 +Linux 4.4.0-virtual (bgregg-xxxxxxxx) 02/08/2016 _x86_64_ (8 CPU) + +08:56:11 AM UID PID %usr %system %guest %CPU CPU Command +08:56:12 AM 0 4505 9.00 3.00 0.00 12.00 0 bash +08:56:13 AM 0 4505 7.00 5.00 0.00 12.00 0 bash +08:56:14 AM 0 4505 10.00 2.00 0.00 12.00 0 bash +08:56:15 AM 0 4505 11.00 2.00 0.00 13.00 0 bash +08:56:16 AM 0 4505 9.00 3.00 0.00 12.00 0 bash +[...] + +This is a synthetic workload that is CPU bound. It's only spending 12% on-CPU +each second because of high CPU demand on this server: the remaining time +is spent waiting on a run queue, as visualized by runqlat. + + +Here is the same system, but when it is CPU idle: + +# ./runqlat 5 1 +Tracing run queue latency... Hit Ctrl-C to end. + + usecs : count distribution + 0 -> 1 : 2250 |******************************** | + 2 -> 3 : 2340 |********************************** | + 4 -> 7 : 2746 |****************************************| + 8 -> 15 : 418 |****** | + 16 -> 31 : 93 |* | + 32 -> 63 : 28 | | + 64 -> 127 : 119 |* | + 128 -> 255 : 9 | | + 256 -> 511 : 4 | | + 512 -> 1023 : 20 | | + 1024 -> 2047 : 22 | | + 2048 -> 4095 : 5 | | + 4096 -> 8191 : 2 | | + +Back to a microsecond scale, this time there is little run queue latency past 1 +millisecond, as would be expected. + + +Now 16 threads are performing heavy disk I/O: + +# ./runqlat 5 1 +Tracing run queue latency... Hit Ctrl-C to end. + + usecs : count distribution + 0 -> 1 : 204 | | + 2 -> 3 : 944 |* | + 4 -> 7 : 16315 |********************* | + 8 -> 15 : 29897 |****************************************| + 16 -> 31 : 1044 |* | + 32 -> 63 : 23 | | + 64 -> 127 : 128 | | + 128 -> 255 : 24 | | + 256 -> 511 : 5 | | + 512 -> 1023 : 13 | | + 1024 -> 2047 : 15 | | + 2048 -> 4095 : 13 | | + 4096 -> 8191 : 10 | | + +The distribution hasn't changed too much. While the disks are 100% busy, there +is still plenty of CPU headroom, and threads still don't spend much time +waiting their turn. + + +A -P option will print a distribution for each PID: + +# ./runqlat -P +Tracing run queue latency... Hit Ctrl-C to end. +^C + +pid = 0 + usecs : count distribution + 0 -> 1 : 351 |******************************** | + 2 -> 3 : 96 |******** | + 4 -> 7 : 437 |****************************************| + 8 -> 15 : 12 |* | + 16 -> 31 : 10 | | + 32 -> 63 : 0 | | + 64 -> 127 : 16 |* | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 0 | | + 2048 -> 4095 : 0 | | + 4096 -> 8191 : 0 | | + 8192 -> 16383 : 1 | | + +pid = 12929 + usecs : count distribution + 0 -> 1 : 1 |****************************************| + 2 -> 3 : 0 | | + 4 -> 7 : 1 |****************************************| + +pid = 12930 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 1 |****************************************| + 32 -> 63 : 0 | | + 64 -> 127 : 1 |****************************************| + +pid = 12931 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 1 |******************** | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 2 |****************************************| + +pid = 12932 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 1 |****************************************| + 256 -> 511 : 0 | | + 512 -> 1023 : 1 |****************************************| + +pid = 7 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 426 |************************************* | + 4 -> 7 : 457 |****************************************| + 8 -> 15 : 16 |* | + +pid = 9 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 425 |****************************************| + 8 -> 15 : 16 |* | + +pid = 11 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 10 |****************************************| + +pid = 14 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 8 |****************************************| + 4 -> 7 : 2 |********** | + +pid = 18 + usecs : count distribution + 0 -> 1 : 414 |****************************************| + 2 -> 3 : 0 | | + 4 -> 7 : 20 |* | + 8 -> 15 : 8 | | + +pid = 12928 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 1 |****************************************| + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 1 |****************************************| + +pid = 1867 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 15 |****************************************| + 16 -> 31 : 1 |** | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 4 |********** | + +pid = 1871 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 2 |****************************************| + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 1 |******************** | + +pid = 1876 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 1 |****************************************| + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 1 |****************************************| + +pid = 1878 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 3 |****************************************| + +pid = 1880 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 3 |****************************************| + +pid = 9307 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 1 |****************************************| + +pid = 1886 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 1 |******************** | + 8 -> 15 : 2 |****************************************| + +pid = 1888 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 3 |****************************************| + +pid = 3297 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 1 |****************************************| + +pid = 1892 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 1 |******************** | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 2 |****************************************| + +pid = 7024 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 4 |****************************************| + +pid = 16468 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 3 |****************************************| + +pid = 12922 + usecs : count distribution + 0 -> 1 : 1 |****************************************| + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 1 |****************************************| + 16 -> 31 : 1 |****************************************| + 32 -> 63 : 0 | | + 64 -> 127 : 1 |****************************************| + +pid = 12923 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 1 |******************** | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 2 |****************************************| + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 1 |******************** | + 1024 -> 2047 : 1 |******************** | + +pid = 12924 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 2 |******************** | + 8 -> 15 : 4 |****************************************| + 16 -> 31 : 1 |********** | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 0 | | + 1024 -> 2047 : 1 |********** | + +pid = 12925 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 1 |****************************************| + +pid = 12926 + usecs : count distribution + 0 -> 1 : 0 | | + 2 -> 3 : 1 |****************************************| + 4 -> 7 : 0 | | + 8 -> 15 : 1 |****************************************| + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 0 | | + 512 -> 1023 : 1 |****************************************| + +pid = 12927 + usecs : count distribution + 0 -> 1 : 1 |****************************************| + 2 -> 3 : 0 | | + 4 -> 7 : 1 |****************************************| + + +A -L option will print a distribution for each TID: + +# ./runqlat -L +Tracing run queue latency... Hit Ctrl-C to end. +^C + +tid = 0 + usecs : count distribution + 0 -> 1 : 593 |**************************** | + 2 -> 3 : 829 |****************************************| + 4 -> 7 : 300 |************** | + 8 -> 15 : 321 |*************** | + 16 -> 31 : 132 |****** | + 32 -> 63 : 58 |** | + 64 -> 127 : 0 | | + 128 -> 255 : 0 | | + 256 -> 511 : 13 | | + +tid = 7 + usecs : count distribution + 0 -> 1 : 8 |******** | + 2 -> 3 : 19 |******************** | + 4 -> 7 : 37 |****************************************| +[...] + + +And a --pidnss option (short for PID namespaces) will print for each PID +namespace, for analyzing container performance: + +# ./runqlat --pidnss -m +Tracing run queue latency... Hit Ctrl-C to end. +^C + +pidns = 4026532870 + msecs : count distribution + 0 -> 1 : 40 |****************************************| + 2 -> 3 : 1 |* | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 2 |** | + 64 -> 127 : 5 |***** | + +pidns = 4026532809 + msecs : count distribution + 0 -> 1 : 67 |****************************************| + +pidns = 4026532748 + msecs : count distribution + 0 -> 1 : 63 |****************************************| + +pidns = 4026532687 + msecs : count distribution + 0 -> 1 : 7 |****************************************| + +pidns = 4026532626 + msecs : count distribution + 0 -> 1 : 45 |****************************************| + 2 -> 3 : 0 | | + 4 -> 7 : 0 | | + 8 -> 15 : 0 | | + 16 -> 31 : 0 | | + 32 -> 63 : 0 | | + 64 -> 127 : 3 |** | + +pidns = 4026531836 + msecs : count distribution + 0 -> 1 : 314 |****************************************| + 2 -> 3 : 1 | | + 4 -> 7 : 11 |* | + 8 -> 15 : 28 |*** | + 16 -> 31 : 137 |***************** | + 32 -> 63 : 86 |********** | + 64 -> 127 : 1 | | + +pidns = 4026532382 + msecs : count distribution + 0 -> 1 : 285 |****************************************| + 2 -> 3 : 5 | | + 4 -> 7 : 16 |** | + 8 -> 15 : 9 |* | + 16 -> 31 : 69 |********* | + 32 -> 63 : 25 |*** | + +Many of these distributions have two modes: the second, in this case, is +caused by capping CPU usage via CPU shares. + + +USAGE message: + +# ./runqlat -h +usage: runqlat.py [-h] [-T] [-m] [-P] [--pidnss] [-L] [-p PID] + [interval] [count] + +Summarize run queue (scheduler) latency as a histogram + +positional arguments: + interval output interval, in seconds + count number of outputs + +optional arguments: + -h, --help show this help message and exit + -T, --timestamp include timestamp on output + -m, --milliseconds millisecond histogram + -P, --pids print a histogram per process ID + --pidnss print a histogram per PID namespace + -L, --tids print a histogram per thread ID + -p PID, --pid PID trace this PID only + +examples: + ./runqlat # summarize run queue latency as a histogram + ./runqlat 1 10 # print 1 second summaries, 10 times + ./runqlat -mT 1 # 1s summaries, milliseconds, and timestamps + ./runqlat -P # show each PID separately + ./runqlat -p 185 # trace PID 185 only + +``` diff --git a/9-runqlat/bits.bpf.h b/9-runqlat/bits.bpf.h new file mode 100644 index 0000000..a2b7bb9 --- /dev/null +++ b/9-runqlat/bits.bpf.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __BITS_BPF_H +#define __BITS_BPF_H + +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x)) +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = val) + +static __always_inline u64 log2(u32 v) +{ + u32 shift, r; + + r = (v > 0xFFFF) << 4; v >>= r; + shift = (v > 0xFF) << 3; v >>= shift; r |= shift; + shift = (v > 0xF) << 2; v >>= shift; r |= shift; + shift = (v > 0x3) << 1; v >>= shift; r |= shift; + r |= (v >> 1); + + return r; +} + +static __always_inline u64 log2l(u64 v) +{ + u32 hi = v >> 32; + + if (hi) + return log2(hi) + 32; + else + return log2(v); +} + +#endif /* __BITS_BPF_H */ diff --git a/9-runqlat/core_fixes.bpf.h b/9-runqlat/core_fixes.bpf.h new file mode 100644 index 0000000..003163a --- /dev/null +++ b/9-runqlat/core_fixes.bpf.h @@ -0,0 +1,112 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +/* Copyright (c) 2021 Hengqi Chen */ + +#ifndef __CORE_FIXES_BPF_H +#define __CORE_FIXES_BPF_H + +#include +#include + +/** + * commit 2f064a59a1 ("sched: Change task_struct::state") changes + * the name of task_struct::state to task_struct::__state + * see: + * https://github.com/torvalds/linux/commit/2f064a59a1 + */ +struct task_struct___o { + volatile long int state; +} __attribute__((preserve_access_index)); + +struct task_struct___x { + unsigned int __state; +} __attribute__((preserve_access_index)); + +static __always_inline __s64 get_task_state(void *task) +{ + struct task_struct___x *t = task; + + if (bpf_core_field_exists(t->__state)) + return BPF_CORE_READ(t, __state); + return BPF_CORE_READ((struct task_struct___o *)task, state); +} + +/** + * commit 309dca309fc3 ("block: store a block_device pointer in struct bio") + * adds a new member bi_bdev which is a pointer to struct block_device + * see: + * https://github.com/torvalds/linux/commit/309dca309fc3 + */ +struct bio___o { + struct gendisk *bi_disk; +} __attribute__((preserve_access_index)); + +struct bio___x { + struct block_device *bi_bdev; +} __attribute__((preserve_access_index)); + +static __always_inline struct gendisk *get_gendisk(void *bio) +{ + struct bio___x *b = bio; + + if (bpf_core_field_exists(b->bi_bdev)) + return BPF_CORE_READ(b, bi_bdev, bd_disk); + return BPF_CORE_READ((struct bio___o *)bio, bi_disk); +} + +/** + * commit d5869fdc189f ("block: introduce block_rq_error tracepoint") + * adds a new tracepoint block_rq_error and it shares the same arguments + * with tracepoint block_rq_complete. As a result, the kernel BTF now has + * a `struct trace_event_raw_block_rq_completion` instead of + * `struct trace_event_raw_block_rq_complete`. + * see: + * https://github.com/torvalds/linux/commit/d5869fdc189f + */ +struct trace_event_raw_block_rq_complete___x { + dev_t dev; + sector_t sector; + unsigned int nr_sector; +} __attribute__((preserve_access_index)); + +struct trace_event_raw_block_rq_completion___x { + dev_t dev; + sector_t sector; + unsigned int nr_sector; +} __attribute__((preserve_access_index)); + +static __always_inline bool has_block_rq_completion() +{ + if (bpf_core_type_exists(struct trace_event_raw_block_rq_completion___x)) + return true; + return false; +} + +/** + * commit d152c682f03c ("block: add an explicit ->disk backpointer to the + * request_queue") and commit f3fa33acca9f ("block: remove the ->rq_disk + * field in struct request") make some changes to `struct request` and + * `struct request_queue`. Now, to get the `struct gendisk *` field in a CO-RE + * way, we need both `struct request` and `struct request_queue`. + * see: + * https://github.com/torvalds/linux/commit/d152c682f03c + * https://github.com/torvalds/linux/commit/f3fa33acca9f + */ +struct request_queue___x { + struct gendisk *disk; +} __attribute__((preserve_access_index)); + +struct request___x { + struct request_queue___x *q; + struct gendisk *rq_disk; +} __attribute__((preserve_access_index)); + +static __always_inline struct gendisk *get_disk(void *request) +{ + struct request___x *r = request; + + if (bpf_core_field_exists(r->rq_disk)) + return BPF_CORE_READ(r, rq_disk); + return BPF_CORE_READ(r, q, disk); +} + +#endif /* __CORE_FIXES_BPF_H */ diff --git a/9-runqlat/maps.bpf.h b/9-runqlat/maps.bpf.h new file mode 100644 index 0000000..51d1012 --- /dev/null +++ b/9-runqlat/maps.bpf.h @@ -0,0 +1,26 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +// Copyright (c) 2020 Anton Protopopov +#ifndef __MAPS_BPF_H +#define __MAPS_BPF_H + +#include +#include + +static __always_inline void * +bpf_map_lookup_or_try_init(void *map, const void *key, const void *init) +{ + void *val; + long err; + + val = bpf_map_lookup_elem(map, key); + if (val) + return val; + + err = bpf_map_update_elem(map, key, init, BPF_NOEXIST); + if (err && err != -EEXIST) + return 0; + + return bpf_map_lookup_elem(map, key); +} + +#endif /* __MAPS_BPF_H */ diff --git a/9-runqlat/runqlat.bpf.c b/9-runqlat/runqlat.bpf.c new file mode 100644 index 0000000..0659151 --- /dev/null +++ b/9-runqlat/runqlat.bpf.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2020 Wenbo Zhang +#include +#include +#include +#include +#include "runqlat.h" +#include "bits.bpf.h" +#include "maps.bpf.h" +#include "core_fixes.bpf.h" + +#define MAX_ENTRIES 10240 +#define TASK_RUNNING 0 + +const volatile bool filter_cg = false; +const volatile bool targ_per_process = false; +const volatile bool targ_per_thread = false; +const volatile bool targ_per_pidns = false; +const volatile bool targ_ms = false; +const volatile pid_t targ_tgid = 0; + +struct { + __uint(type, BPF_MAP_TYPE_CGROUP_ARRAY); + __type(key, u32); + __type(value, u32); + __uint(max_entries, 1); +} cgroup_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_ENTRIES); + __type(key, u32); + __type(value, u64); +} start SEC(".maps"); + +static struct hist zero; + +/// @sample {"interval": 1000, "type" : "log2_hist"} +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, MAX_ENTRIES); + __type(key, u32); + __type(value, struct hist); +} hists SEC(".maps"); + +static int trace_enqueue(u32 tgid, u32 pid) +{ + u64 ts; + + if (!pid) + return 0; + if (targ_tgid && targ_tgid != tgid) + return 0; + + ts = bpf_ktime_get_ns(); + bpf_map_update_elem(&start, &pid, &ts, BPF_ANY); + return 0; +} + +static unsigned int pid_namespace(struct task_struct *task) +{ + struct pid *pid; + unsigned int level; + struct upid upid; + unsigned int inum; + + /* get the pid namespace by following task_active_pid_ns(), + * pid->numbers[pid->level].ns + */ + pid = BPF_CORE_READ(task, thread_pid); + level = BPF_CORE_READ(pid, level); + bpf_core_read(&upid, sizeof(upid), &pid->numbers[level]); + inum = BPF_CORE_READ(upid.ns, ns.inum); + + return inum; +} + +static int handle_switch(bool preempt, struct task_struct *prev, struct task_struct *next) +{ + struct hist *histp; + u64 *tsp, slot; + u32 pid, hkey; + s64 delta; + + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + if (get_task_state(prev) == TASK_RUNNING) + trace_enqueue(BPF_CORE_READ(prev, tgid), BPF_CORE_READ(prev, pid)); + + pid = BPF_CORE_READ(next, pid); + + tsp = bpf_map_lookup_elem(&start, &pid); + if (!tsp) + return 0; + delta = bpf_ktime_get_ns() - *tsp; + if (delta < 0) + goto cleanup; + + if (targ_per_process) + hkey = BPF_CORE_READ(next, tgid); + else if (targ_per_thread) + hkey = pid; + else if (targ_per_pidns) + hkey = pid_namespace(next); + else + hkey = -1; + histp = bpf_map_lookup_or_try_init(&hists, &hkey, &zero); + if (!histp) + goto cleanup; + if (!histp->comm[0]) + bpf_probe_read_kernel_str(&histp->comm, sizeof(histp->comm), + next->comm); + if (targ_ms) + delta /= 1000000U; + else + delta /= 1000U; + slot = log2l(delta); + if (slot >= MAX_SLOTS) + slot = MAX_SLOTS - 1; + __sync_fetch_and_add(&histp->slots[slot], 1); + +cleanup: + bpf_map_delete_elem(&start, &pid); + return 0; +} + +SEC("raw_tp/sched_wakeup") +int BPF_PROG(handle_sched_wakeup, struct task_struct *p) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid)); +} + +SEC("raw_tp/sched_wakeup_new") +int BPF_PROG(handle_sched_wakeup_new, struct task_struct *p) +{ + if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0)) + return 0; + + return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid)); +} + +SEC("raw_tp/sched_switch") +int BPF_PROG(handle_sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next) +{ + return handle_switch(preempt, prev, next); +} + +char LICENSE[] SEC("license") = "GPL"; diff --git a/9-runqlat/runqlat.h b/9-runqlat/runqlat.h new file mode 100644 index 0000000..b6f0a02 --- /dev/null +++ b/9-runqlat/runqlat.h @@ -0,0 +1,14 @@ + +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __RUNQLAT_H +#define __RUNQLAT_H + +#define TASK_COMM_LEN 16 +#define MAX_SLOTS 26 + +struct hist { + __u32 slots[MAX_SLOTS]; + char comm[TASK_COMM_LEN]; +}; + +#endif /* __RUNQLAT_H */