diff --git a/README.md b/README.md index 9942c38..f120b9c 100644 --- a/README.md +++ b/README.md @@ -46,7 +46,7 @@ Dive straight into eBPF development with this concise tutorial, built around the - [lesson 15-javagc](src/15-javagc/README.md) 使用 usdt 捕获用户态 Java GC 事件耗时 - [lesson 16-memleak](src/16-memleak/README.md) 检测内存泄漏 - [lesson 17-biopattern](src/17-biopattern/README.md) 捕获磁盘 IO 模式 -- [lesson 18-further-reading](src/18-further-reading/README.md) 更进一步的相关资料? +- [lesson 18-further-reading](src/18-further-reading/README.md) 更进一步的相关资料:论文列表、项目、博客等等 - [lesson 19-lsm-connect](src/19-lsm-connect/README.md) 使用 LSM 进行安全检测防御 - [lesson 20-tc](src/20-tc/README.md) 使用 eBPF 进行 tc 流量控制 - [lesson 21-xdp](src/21-xdp/README.md) 使用 eBPF 进行 XDP 报文处理 diff --git a/README_en.md b/README_en.md index 4088252..6bb0e73 100644 --- a/README_en.md +++ b/README_en.md @@ -34,7 +34,7 @@ Includes simple eBPF program samples and introductions. - [lesson 14-tcpstates](src/14-tcpstates/README_en.md) Records TCP connection state and TCP RTT.- [lesson 15-javagc](src/15-javagc/README_en.md) Capture user-level Java GC event duration using usdt - [lesson 16-memleak](src/16-memleak/README_en.md) Detect memory leaks - [lesson 17-biopattern](src/17-biopattern/README_en.md) Capture disk IO patterns -- [lesson 18-further-reading](src/18-further-reading/README_en.md) Further reading? +- [lesson 18-further-reading](src/18-further-reading/README_en.md) Further reading: papers list, projects, blogs, etc. - [lesson 19-lsm-connect](src/19-lsm-connect/README_en.md) Use LSM for security detection and defense - [lesson 20-tc](src/20-tc/README_en.md) Use eBPF for tc traffic control - [lesson 21-xdp](src/21-xdp/README_en.md) Use eBPF for XDP packet processing diff --git a/src/18-further-reading/README.md b/src/18-further-reading/README.md index 323ef68..9d9480f 100644 --- a/src/18-further-reading/README.md +++ b/src/18-further-reading/README.md @@ -1,6 +1,145 @@ -# 更多的参考资料 +# 更多的参考资料:论文、项目等等 可以在这里找到更多关于 eBPF 的信息: -- -- +- 一个关于 eBPF 相关内容和信息的详细列表: +- eBPF 相关项目、教程: + +这是我近年来读过的与 eBPF 相关的论文列表,可能对于对 eBPF 相关研究感兴趣的人有所帮助。 + +eBPF(扩展的伯克利数据包过滤器)是一种新兴的技术,允许在 Linux 内核中安全地执行用户提供的程序。近年来,它因加速网络处理、增强可观察性和实现可编程数据包处理而得到了广泛的应用。此文档列出了过去几年关于 eBPF 的一些关键研究论文。这些论文涵盖了 eBPF 的几个方面,包括加速分布式系统、存储和网络,正式验证 eBPF 的 JIT 编译器和验证器,将 eBPF 用于入侵检测,以及从 eBPF 程序自动生成硬件设计。 + +一些关键亮点: + +- eBPF 允许在内核中执行自定义函数,以加速分布式协议、存储引擎和网络应用,与传统的用户空间实现相比,可以提高吞吐量和降低延迟。 +- eBPF 组件(如 JIT 和验证器)的正式验证确保了正确性,并揭示了实际实现中的错误。 +- eBPF 的可编程性和效率使其适合在内核中完全构建入侵检测和网络监控应用。 +- 从 eBPF 程序中自动生成硬件设计允许软件开发人员快速生成网络卡中的优化数据包处理管道。 + +这些论文展示了 eBPF 在加速系统、增强安全性和简化网络编程方面的多功能性。随着 eBPF 的采用不断增加,它是一个与性能、安全性、硬件集成和易用性相关的系统研究的重要领域。 + +如果您有任何建议或添加论文的意见,请随时开放一个问题或PR。此列表创建于 2023.10,未来将添加新的论文。 + +> 如果您对 eBPF 有些进一步的兴趣的话,也可以查看我们在 [eunomia-bpf](https://github.com/eunomia-bpf) 的开源项目和 [bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) 的 eBPF 教程。我也在寻找 2024/2025 年系统和网络领域的 PhD 相关机会,这是我的 [Github](https://github.com/yunwei37) 和 [邮箱](yunwei356@gmail.com)。 + +## XRP: In-Kernel Storage Functions with eBPF + +随着微秒级 NVMe 存储设备的出现,Linux 内核存储堆栈开销变得显著,几乎使访问时间翻倍。我们介绍了 XRP,一个框架,允许应用程序从 eBPF 在 NVMe 驱动程序中的钩子执行用户定义的存储功能,如索引查找或聚合,安全地绕过大部分内核的存储堆栈。为了保持文件系统的语义,XRP 将少量的内核状态传播到其 NVMe 驱动程序钩子,在那里调用用户注册的 eBPF 函数。我们展示了如何利用 XRP 显著提高两个键值存储,BPF-KV,一个简单的 B+ 树键值存储,和 WiredTiger,一个流行的日志结构合并树存储引擎的吞吐量和延迟。 + +OSDI '22 最佳论文: + +## Specification and verification in the field: Applying formal methods to BPF just-in-time compilers in the Linux kernel + +本文描述了我们将形式方法应用于 Linux 内核中的一个关键组件,即 Berkeley 数据包过滤器 (BPF) 虚拟机的即时编译器 ("JIT") 的经验。我们使用 Jitterbug 验证这些 JIT,这是第一个提供 JIT 正确性的精确规范的框架,能够排除实际错误,并提供一个自动化的证明策略,该策略可以扩展到实际实现。使用 Jitterbug,我们设计、实施并验证了一个新的针对 32 位 RISC-V 的 BPF JIT,在五个其他部署的 JIT 中找到并修复了 16 个之前未知的错误,并开发了新的 JIT 优化;所有这些更改都已上传到 Linux 内核。结果表明,在一个大型的、未经验证的系统中,通过仔细设计规范和证明策略,可以构建一个经过验证的组件。 + +OSDI 20: + +## λ-IO: A Unified IO Stack for Computational Storage + +新兴的计算存储设备为存储内计算提供了一个机会。它减少了主机与设备之间的数据移动开销,从而加速了数据密集型应用程序。在这篇文章中,我们介绍 λ-IO,一个统一的 IO 堆栈,跨主机和设备管理计算和存储资源。我们提出了一套设计 - 接口、运行时和调度 - 来解决三个关键问题。我们在全堆栈软件和硬件环境中实施了 λ-IO,并使用合成和实际应用程序对其 + +进行评估,与 Linux IO 相比,显示出高达 5.12 倍的性能提升。 + +FAST23: + +## Extension Framework for File Systems in User space + +用户文件系统相对于其内核实现提供了许多优势,例如开发的简易性和更好的系统可靠性。然而,它们会导致重大的性能损失。我们观察到现有的用户文件系统框架非常通用;它们由一个位于内核中的最小干预层组成,该层简单地将所有低级请求转发到用户空间。虽然这种设计提供了灵活性,但由于频繁的内核-用户上下文切换,它也严重降低了性能。 + +这项工作介绍了 ExtFUSE,一个用于开发可扩展用户文件系统的框架,该框架还允许应用程序在内核中注册"薄"的专用请求处理程序,以满足其特定的操作需求,同时在用户空间中保留复杂的功能。我们使用两个 FUSE 文件系统对 ExtFUSE 进行评估,结果表明 ExtFUSE 可以通过平均不到几百行的改动来提高用户文件系统的性能。ExtFUSE 可在 GitHub 上找到。 + +ATC 19: + +## Electrode: Accelerating Distributed Protocols with eBPF + +在标准的Linux内核网络栈下实现分布式协议可以享受到负载感知的CPU缩放、高兼容性以及强大的安全性和隔离性。但由于过多的用户-内核切换和内核网络栈遍历,其性能较低。我们介绍了Electrode,这是一套为分布式协议设计的基于eBPF的性能优化。这些优化在网络栈之前在内核中执行,但实现了与用户空间中实现的相似功能(例如,消息广播,收集ack的仲裁),从而避免了用户-内核切换和内核网络栈遍历所带来的开销。我们展示,当应用于经典的Multi-Paxos状态机复制协议时,Electrode可以提高其吞吐量高达128.4%,并将延迟降低高达41.7%。 + +NSDI 23: [链接](https://www.usenix.org/conference/nsdi23/presentation/zhou) + +## BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing + +内存键值存储是帮助扩展大型互联网服务的关键组件,通过提供对流行数据的低延迟访问。Memcached是最受欢迎的键值存储之一,由于Linux网络栈固有的性能限制,当使用高速网络接口时,其性能不高。虽然可以使用DPDK基础方案绕过Linux网络栈,但这种方法需要对软件栈进行完全重新设计,而且在客户端负载较低时也会导致高CPU利用率。 + +为了克服这些限制,我们提出了BMC,这是一个为Memcached设计的内核缓存,可以在执行标准网络栈之前服务于请求。对BMC缓存的请求被视为NIC中断的一部分,这允许性能随着为NIC队列服务的核心数量而扩展。为确保安全,BMC使用eBPF实现。尽管eBPF具有安全约束,但我们展示了实现复杂缓存服务是可能的。因为BMC在商用硬件上运行,并且不需要修改Linux内核或Memcached应用程序,所以它可以在现有系统上广泛部署。BMC优化了Facebook样式的小型请求的处理时间。在这个目标工作负载上,我们的评估显示,与原始的Memcached应用程序相比,BMC的吞吐量提高了高达18倍,与使用SO_REUSEPORT套接字标志的优化版Memcached相比,提高了高达6倍。此外,我们的结果还显示,对于非目标工作负载,BMC的开销可以忽略不计,并且不会降低吞吐量。 + +NSDI 21: [链接](https://www.usenix.org/conference/nsdi21/presentation/ghigoff) + +## hXDP: Efficient Software Packet Processing on FPGA NICs + +FPGA加速器在NIC上使得从CPU卸载昂贵的数据包处理任务成为可能。但是,FPGA有限的资源可能需要在多个应用程序之间共享,而编程它们则很困难。 + +我们提出了一种在FPGA上运行Linux的eXpress Data Path程序的解决方案,这些程序使用eBPF编写,仅使用可用硬件资源的一部分,同时匹配高端CPU的性能。eBPF的迭代执行模型不适合FPGA加速器。尽管如此,我们展示了,当针对一个特定的FPGA执行器时,一个eBPF程序的许多指令可以被压缩、并行化或完全删除,从而显著提高性能。我们利用这一点设计了hXDP,它包括(i)一个优化编译器,该编译器并行化并将eBPF字节码转换为我们定义的扩展eBPF指令集架构;(ii)一个在FPGA上执行这些指令的软处理器;以及(iii)一个基于FPGA的基础设施,提供XDP的maps和Linux内核中定义的helper函数。 + +我们在FPGA NIC上实现了hXDP,并评估了其运行真实世界的未经修改的eBPF程序的性能。我们的实现以156.25MHz的速度时钟,使用约15%的FPGA资源,并可以运行动态加载的程序。尽管有这些适度的要求,但它达到了高端CPU核心的数据包处理吞吐量,并提供了10倍低的数据包转发延迟。 + +OSDI 20: [链接](https://www.usenix.org/conference/osdi20/presentation/brunella) + +## Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code + +微服务正变得越来越复杂,给传统的性能监控解决方案带来了新的挑战。一方面,微服务的快速演变给现有的分布式跟踪框架的使用和维护带来了巨大的负担。另一方面,复杂的基础设施增加了网络性能问题的概率,并在网络侧创造了更多的盲点。在这篇论文中,我们介绍了 DeepFlow,一个用于微服务故障排除的以网络为中心的分布式跟踪框架。DeepFlow 通过一个以网络为中心的跟踪平面和隐式的上下文传播提供开箱即用的跟踪。此外,它消除了网络基础设施中的盲点,以低成本方式捕获网络指标,并增强了不同组件和层之间的关联性。我们从分析和实证上证明,DeepFlow 能够准确地定位微服务性能异常,而开销几乎可以忽略不计。DeepFlow 已经为超过26家公司发现了71多个关键性能异常,并已被数百名开发人员所使用。我们的生产评估显示,DeepFlow 能够为用户节省数小时的仪表化工作,并将故障排除时间从数小时缩短到几分钟。 + +SIGCOMM 23: + +## Fast In-kernel Traffic Sketching in eBPF + +扩展的伯克利数据包过滤器(eBPF)是一个基础设施,允许在不重新编译的情况下动态加载并直接在 Linux 内核中运行微程序。 + +在这项工作中,我们研究如何在 eBPF 中开发高性能的网络测量。我们以绘图为案例研究,因为它们具有支持广泛任务的能力,同时提供低内存占用和准确性保证。我们实现了 NitroSketch,一个用于用户空间网络的最先进的绘图,并表明用户空间网络的最佳实践不能直接应用于 eBPF,因为它的性能特点不同。通过应用我们学到的经验教训,我们将其性能提高了40%,与初级实现相比。 + +SIGCOMM 23: + +## SPRIGHT: extracting the server from serverless computing! high-performance eBPF-based event-driven, shared-memory processing + +无服务器计算在云环境中承诺提供高效、低成本的计算能力。然而,现有的解决方案,如Knative这样的开源平台,包含了繁重的组件,破坏了无服务器计算的目标。此外,这种无服务器平台缺乏数据平面优化,无法实现高效的、高性能的功能链,这也是流行的微服务开发范式的设施。它们为构建功能链使用的不必要的复杂和重复的功能严重降低了性能。"冷启动"延迟是另一个威慑因素。 + +我们描述了 SPRIGHT,一个轻量级、高性能、响应式的无服务器框架。SPRIGHT 利用共享内存处理显著提高了数据平面的可伸缩性,通过避免不必要的协议处理和序列化-反序列化开销。SPRIGHT 大量利用扩展的伯克利数据包过滤器 (eBPF) 进行事件驱动处理。我们创造性地使用 eBPF 的套接字消息机制支持共享内存处理,其开销严格与负载成正比。与常驻、基于轮询的DPDK相比,SPRIGHT 在真实工作负载下实现了相同的数据平面性能,但 CPU 使用率降低了10倍。此外,eBPF 为 SPRIGHT 带来了好处,替换了繁重的无服务器组件,使我们能够以微不足道的代价保持函数处于"暖"状态。 + +我们的初步实验结果显示,与 Knative 相比,SPRIGHT 在吞吐量和延迟方面实现了一个数量级的提高,同时大大减少了 CPU 使用,并消除了 "冷启动"的需要。 + + + +## Programmable System Call Security with eBPF + +利用 eBPF 进行可编程的系统调用安全 + +系统调用过滤是一种广泛用于保护共享的 OS 内核免受不受信任的用户应用程序威胁的安全机制。但是,现有的系统调用过滤技术要么由于用户空间代理带来的上下文切换开销过于昂贵,要么缺乏足够的可编程性来表达高级策略。Seccomp 是 Linux 的系统调用过滤模块,广泛用于现代的容器技术、移动应用和系统管理服务。尽管采用了经典的 BPF 语言(cBPF),但 Seccomp 中的安全策略主要限于静态的允许列表,主要是因为 cBPF 不支持有状态的策略。因此,许多关键的安全功能无法准确地表达,和/或需要修改内核。 + +在这篇论文中,我们介绍了一个可编程的系统调用过滤机制,它通过利用扩展的 BPF 语言(eBPF)使得更高级的安全策略得以表达。更具体地说,我们创建了一个新的 Seccomp eBPF 程序类型,暴露、修改或创建新的 eBPF 助手函数来安全地管理过滤状态、访问内核和用户状态,以及利用同步原语。重要的是,我们的系统与现有的内核特权和能力机制集成,使非特权用户能够安全地安装高级过滤器。我们的评估表明,我们基于 eBPF 的过滤可以增强现有策略(例如,通过时间专化,减少早期执行阶段的攻击面积高达55.4%)、缓解实际漏洞并加速过滤器。 + + + +## Cross Container Attacks: The Bewildered eBPF on Clouds + +在云上困惑的 eBPF 之间的容器攻击 + +扩展的伯克利数据包过滤器(eBPF)为用户空间程序提供了强大而灵活的内核接口,通过在内核空间直接运行字节码来扩展内核功能。它已被云服务广泛使用,以增强容器安全性、网络管理和系统可观察性。然而,我们发现在 Linux 主机上广泛讨论的攻击性 eBPF 可以为容器带来新的攻击面。通过 eBPF 的追踪特性,攻击者可以破坏容器的隔离并攻击主机,例如,窃取敏感数据、进行 DoS 攻击,甚至逃逸容器。在这篇论文中,我们研究基于 eBPF 的跨容器攻击,并揭示其在实际服务中的安全影响。利用 eBPF 攻击,我们成功地妨害了五个在线的 Jupyter/交互式 Shell 服务和 Google Cloud Platform 的 Cloud Shell。此外,我们发现三家领先的云供应商提供的 Kubernetes 服务在攻击者通过 eBPF 逃逸容器后可以被利用来发起跨节点攻击。具体来说,在阿里巴巴的 Kubernetes 服务中,攻击者可以通过滥用他们过度特权的云指标或管理 Pods 来妨害整个集群。不幸的是,容器上的 eBPF 攻击鲜为人知,并且现有的入侵检测系统几乎无法发现它们。此外,现有的 eBPF 权限模型无法限制 eBPF 并确保在共享内核的容器环境中安全使用。为此,我们提出了一个新的 eBPF 权限模型,以对抗容器中的 eBPF 攻击。 + + + +## Comparing Security in eBPF and WebAssembly + +比较 eBPF 和 WebAssembly 中的安全性 + +本文研究了 eBPF 和 WebAssembly(Wasm)的安全性,这两种技术近年来得到了广泛的采用,尽管它们是为非常不同的用途和环境而设计的。当 eBPF 主要用于 Linux 等操作系统内核时,Wasm 是一个为基于堆栈的虚拟机设计的二进制指令格式,其用途超出了 web。鉴于 eBPF 的增长和不断扩大的雄心,Wasm 可能提供有启发性的见解,因为它围绕在如 web 浏览器和云等复杂和敌对环境中安全执行任意不受信任的程序进行设计。我们分析了两种技术的安全目标 + +、社区发展、内存模型和执行模型,并进行了比较安全性评估,探讨了内存安全性、控制流完整性、API 访问和旁路通道。我们的结果表明,eBPF 有一个首先关注性能、其次关注安全的历史,而 Wasm 更强调安全,尽管要支付一些运行时开销。考虑 eBPF 的基于语言的限制和一个用于 API 访问的安全模型是未来工作的有益方向。 + + + +更多内容可以在第一个 eBPF 研讨会中找到: + +## A flow-based IDS using Machine Learning in eBPF + +基于eBPF中的机器学习的流式入侵检测系统 + +eBPF 是一种新技术,允许动态加载代码片段到 Linux 内核中。它可以大大加速网络,因为它使内核能够处理某些数据包而无需用户空间程序的参与。到目前为止,eBPF 主要用于简单的数据包过滤应用,如防火墙或拒绝服务保护。我们证明在 eBPF 中完全基于机器学习开发流式网络入侵检测系统是可行的。我们的解决方案使用决策树,并为每个数据包决定它是否恶意,考虑到网络流的整个先前上下文。与作为用户空间程序实现的同一解决方案相比,我们实现了超过 20% 的性能提升。 + + + +## Femto-containers: lightweight virtualization and fault isolation for small software functions on low-power IoT microcontrollers + +针对低功耗 IoT 微控制器上的小型软件功能的轻量级虚拟化和故障隔离: Femto-容器 + +低功耗的 IoT 微控制器上运行的操作系统运行时通常提供基础的 API、基本的连接性和(有时)一个(安全的)固件更新机制。相比之下,在硬件约束较少的场合,网络化软件已进入无服务器、微服务和敏捷的时代。考虑到弥合这一差距,我们在论文中设计了 Femto-容器,这是一种新的中间件运行时,可以嵌入到各种低功耗 IoT 设备中。Femto-容器使得可以在低功耗 IoT 设备上通过网络安全地部署、执行和隔离小型虚拟软件功能。我们实施了 Femto-容器,并在 RIOT 中提供了集成,这是一个受欢迎的开源 IoT 操作系统。然后,我们评估了我们的实现性能,它已被正式验证用于故障隔离,确保 RIOT 受到加载并在 Femto-容器中执行的逻辑的保护。我们在各种受欢迎的微控制器架构(Arm Cortex-M、ESP32 和 RISC-V)上的实验表明,Femto-容器在内存占用开销、能源消耗和安全性方面提供了有吸引力的权衡。 + + diff --git a/src/18-further-reading/README_en.md b/src/18-further-reading/README_en.md index ff754c6..b77b39b 100644 --- a/src/18-further-reading/README_en.md +++ b/src/18-further-reading/README_en.md @@ -1,6 +1,132 @@ -# More Reference Materials +# More Reference Materials: papers, projects You may find more about eBPF in these places: - A curated list of awesome projects related to eBPF: -- +- A website of eBPF projects and tutorials: + +This is also list of eBPF related papers I read in recent years, might be helpful for people who are interested in eBPF related research. + +eBPF (extended Berkeley Packet Filter) is an emerging technology that allows safe execution of user-provided programs in the Linux kernel. It has gained widespread adoption in recent years for accelerating network processing, enhancing observability, and enabling programmable packet processing. + +This document list some key research papers on eBPF over the past few years. The papers cover several aspects of eBPF, including accelerating distributed systems, storage, and networking, formally verifying the eBPF JIT compiler and verifier, applying eBPF for intrusion detection, and automatically generating hardware designs from eBPF programs. + +Some key highlights: + +- eBPF enables executing custom functions in the kernel to accelerate distributed protocols, storage engines, and networking applications with improved throughput and lower latency compared to traditional userspace implementations. +- Formal verification of eBPF components like JIT and verifier ensures correctness and reveals bugs in real-world implementations. +- eBPF's programmability and efficiency make it suitable for building intrusion detection and network monitoring applications entirely in the kernel. +- Automated synthesis of hardware designs from eBPF programs allows software developers to quickly generate optimized packet processing pipelines in network cards. + +The papers demonstrate eBPF's versatility in accelerating systems, enhancing security, and simplifying network programming. As eBPF adoption grows, it is an important area of systems research with many open problems related to performance, safety, hardware integration, and ease of use. + +If you have any suggestions or adding papers, please feel free to open an issue or PR. The list was created in 2023.10, New papers will be added in the future. + +> Check out our open-source projects at [eunomia-bpf](https://github.com/eunomia-bpf) and eBPF tutorials at [bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial). I'm also looking for a PhD position in the area of systems and networking in 2024/2025. My [Github](https://github.com/yunwei37) and [email](yunwei356@gmail.com). + +## XRP: In-Kernel Storage Functions with eBPF + +With the emergence of microsecond-scale NVMe storage devices, the Linux kernel storage stack overhead has become significant, almost doubling access times. We present XRP, a framework that allows applications to execute user-defined storage functions, such as index lookups or aggregations, from an eBPF hook in the NVMe driver, safely bypassing most of the kernel’s storage stack. To preserve file system semantics, XRP propagates a small amount of kernel state to its NVMe driver hook where the user-registered eBPF functions are called. We show how two key-value stores, BPF-KV, a simple B+-tree key-value store, and WiredTiger, a popular log-structured merge tree storage engine, can leverage XRP to significantly improve throughput and latency. + +OSDI '22 Best Paper: + +## Specification and verification in the field: Applying formal methods to BPF just-in-time compilers in the Linux kernel + +This paper describes our experience applying formal methods to a critical component in the Linux kernel, the just-in-time compilers ("JITs") for the Berkeley Packet Filter (BPF) virtual machine. We verify these JITs using Jitterbug, the first framework to provide a precise specification of JIT correctness that is capable of ruling out real-world bugs, and an automated proof strategy that scales to practical implementations. Using Jitterbug, we have designed, implemented, and verified a new BPF JIT for 32-bit RISC-V, found and fixed 16 previously unknown bugs in five other deployed JITs, and developed new JIT optimizations; all of these changes have been upstreamed to the Linux kernel. The results show that it is possible to build a verified component within a large, unverified system with careful design of specification and proof strategy. + +OSDI 20: + +## λ-IO: A Unified IO Stack for Computational Storage + +The emerging computational storage device offers an opportunity for in-storage computing. It alleviates the overhead of data movement between the host and the device, and thus accelerates data-intensive applications. In this paper, we present λ-IO, a unified IO stack managing both computation and storage resources across the host and the device. We propose a set of designs – interface, runtime, and scheduling – to tackle three critical issues. We implement λ-IO in full-stack software and hardware environment, and evaluate it with synthetic and real applications against Linux IO, showing up to 5.12× performance improvement. + +FAST23: + +## Extension Framework for File Systems in User space + +User file systems offer numerous advantages over their in-kernel implementations, such as ease of development and better system reliability. However, they incur heavy performance penalty. We observe that existing user file system frameworks are highly general; they consist of a minimal interposition layer in the kernel that simply forwards all low-level requests to user space. While this design offers flexibility, it also severely degrades performance due to frequent kernel-user context switching. + +This work introduces ExtFUSE, a framework for developing extensible user file systems that also allows applications to register "thin" specialized request handlers in the kernel to meet their specific operative needs, while retaining the complex functionality in user space. Our evaluation with two FUSE file systems shows that ExtFUSE can improve the performance of user file systems with less than a few hundred lines on average. ExtFUSE is available on GitHub. + +ATC 19: + +## Electrode: Accelerating Distributed Protocols with eBPF + +Implementing distributed protocols under a standard Linux kernel networking stack enjoys the benefits of load-aware CPU scaling, high compatibility, and robust security and isolation. However, it suffers from low performance because of excessive user-kernel crossings and kernel networking stack traversing. We present Electrode with a set of eBPF-based performance optimizations designed for distributed protocols. These optimizations get executed in the kernel before the networking stack but achieve similar functionalities as were implemented in user space (e.g., message broadcasting, collecting quorum of acknowledgments), thus avoiding the overheads incurred by user-kernel crossings and kernel networking stack traversing. We show that when applied to a classic Multi-Paxos state machine replication protocol, Electrode improves its throughput by up to 128.4% and latency by up to 41.7%. + +NSDI 23: + +## BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing + +In-memory key-value stores are critical components that help scale large internet services by providing low-latency access to popular data. Memcached, one of the most popular key-value stores, suffers from performance limitations inherent to the Linux networking stack and fails to achieve high performance when using high-speed network interfaces. While the Linux network stack can be bypassed using DPDK based solutions, such approaches require a complete redesign of the software stack and induce high CPU utilization even when client load is low. + +To overcome these limitations, we present BMC, an in-kernel cache for Memcached that serves requests before the execution of the standard network stack. Requests to the BMC cache are treated as part of the NIC interrupts, which allows performance to scale with the number of cores serving the NIC queues. To ensure safety, BMC is implemented using eBPF. Despite the safety constraints of eBPF, we show that it is possible to implement a complex cache service. Because BMC runs on commodity hardware and requires modification of neither the Linux kernel nor the Memcached application, it can be widely deployed on existing systems. BMC optimizes the processing time of Facebook-like small-size requests. On this target workload, our evaluations show that BMC improves throughput by up to 18x compared to the vanilla Memcached application and up to 6x compared to an optimized version of Memcached that uses the SO_REUSEPORT socket flag. In addition, our results also show that BMC has negligible overhead and does not deteriorate throughput when treating non-target workloads. + +NSDI 21: + +## hXDP: Efficient Software Packet Processing on FPGA NICs + +FPGA accelerators on the NIC enable the offloading of expensive packet processing tasks from the CPU. However, FPGAs have limited resources that may need to be shared among diverse applications, and programming them is difficult. + +We present a solution to run Linux's eXpress Data Path programs written in eBPF on FPGAs, using only a fraction of the available hardware resources while matching the performance of high-end CPUs. The iterative execution model of eBPF is not a good fit for FPGA accelerators. Nonetheless, we show that many of the instructions of an eBPF program can be compressed, parallelized or completely removed, when targeting a purpose-built FPGA executor, thereby significantly improving performance. We leverage that to design hXDP, which includes (i) an optimizing-compiler that parallelizes and translates eBPF bytecode to an extended eBPF Instruction-set Architecture defined by us; a (ii) soft-processor to execute such instructions on FPGA; and (iii) an FPGA-based infrastructure to provide XDP's maps and helper functions as defined within the Linux kernel. + +We implement hXDP on an FPGA NIC and evaluate it running real-world unmodified eBPF programs. Our implementation is clocked at 156.25MHz, uses about 15% of the FPGA resources, and can run dynamically loaded programs. Despite these modest requirements, it achieves the packet processing throughput of a high-end CPU core and provides a 10x lower packet forwarding latency. + +OSDI 20: + +## Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code + +Microservices are becoming more complicated, posing new challenges for traditional performance monitoring solutions. On the one hand, the rapid evolution of microservices places a significant burden on the utilization and maintenance of existing distributed tracing frameworks. On the other hand, complex infrastructure increases the probability of network performance problems and creates more blind spots on the network side. In this paper, we present DeepFlow, a network-centric distributed tracing framework for troubleshooting microservices. DeepFlow provides out-of-the-box tracing via a network-centric tracing plane and implicit context propagation. In addition, it eliminates blind spots in network infrastructure, captures network metrics in a low-cost way, and enhances correlation between different components and layers. We demonstrate analytically and empirically that DeepFlow is capable of locating microservice performance anomalies with negligible overhead. DeepFlow has already identified over 71 critical performance anomalies for more than 26 companies and has been utilized by hundreds of individual developers. Our production evaluations demonstrate that DeepFlow is able to save users hours of instrumentation efforts and reduce troubleshooting time from several hours to just a few minutes. + +SIGCOMM 23: + +## Fast In-kernel Traffic Sketching in eBPF + +The extended Berkeley Packet Filter (eBPF) is an infrastructure that allows to dynamically load and run micro-programs directly in the Linux kernel without recompiling it. + +In this work, we study how to develop high-performance network measurements in eBPF. We take sketches as case-study, given their ability to support a wide-range of tasks while providing low-memory footprint and accuracy guarantees. We implemented NitroSketch, the state-of-the-art sketch for user-space networking and show that best practices in user-space networking cannot be directly applied to eBPF, because of its different performance characteristics. By applying our lesson learned we improve its performance by 40% compared to a naive implementation. + +SIGCOMM 23: + +## SPRIGHT: extracting the server from serverless computing! high-performance eBPF-based event-driven, shared-memory processing + +Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. 'Cold-start' latency is another deterrent. + +We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF's socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10× less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions 'warm' with negligible penalty. + +Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for 'cold-start'. + + + +## Programmable System Call Security with eBPF + +System call filtering is a widely used security mechanism for protecting a shared OS kernel against untrusted user applications. However, existing system call filtering techniques either are too expensive due to the context switch overhead imposed by userspace agents, or lack sufficient programmability to express advanced policies. Seccomp, Linux's system call filtering module, is widely used by modern container technologies, mobile apps, and system management services. Despite the adoption of the classic BPF language (cBPF), security policies in Seccomp are mostly limited to static allow lists, primarily because cBPF does not support stateful policies. Consequently, many essential security features cannot be expressed precisely and/or require kernel modifications. +In this paper, we present a programmable system call filtering mechanism, which enables more advanced security policies to be expressed by leveraging the extended BPF language (eBPF). More specifically, we create a new Seccomp eBPF program type, exposing, modifying or creating new eBPF helper functions to safely manage filter state, access kernel and user state, and utilize synchronization primitives. Importantly, our system integrates with existing kernel privilege and capability mechanisms, enabling unprivileged users to install advanced filters safely. Our evaluation shows that our eBPF-based filtering can enhance existing policies (e.g., reducing the attack surface of early execution phase by up to 55.4% for temporal specialization), mitigate real-world vulnerabilities, and accelerate filters. + + + +## Cross Container Attacks: The Bewildered eBPF on Clouds + +The extended Berkeley Packet Filter (eBPF) provides powerful and flexible kernel interfaces to extend the kernel functions for user space programs via running bytecode directly in the kernel space. It has been widely used by cloud services to enhance container security, network management, and system observability. However, we discover that the offensive eBPF that have been extensively discussed in Linux hosts can bring new attack surfaces to containers. With eBPF tracing features, attackers can break the container's isolation and attack the host, e.g., steal sensitive data, DoS, and even escape the container. In this paper, we study the eBPF-based cross container attacks and reveal their security impacts in real world services. With eBPF attacks, we successfully compromise five online Jupyter/Interactive Shell services and the Cloud Shell of Google Cloud Platform. Furthermore, we find that the Kubernetes services offered by three leading cloud vendors can be exploited to launch cross-node attacks after the attackers escape the container via eBPF. Specifically, in Alibaba's Kubernetes services, attackers can compromise the whole cluster by abusing their over-privileged cloud metrics or management Pods. Unfortunately, the eBPF attacks on containers are seldom known and can hardly be discovered by existing intrusion detection systems. Also, the existing eBPF permission model cannot confine the eBPF and ensure secure usage in shared-kernel container environments. To this end, we propose a new eBPF permission model to counter the eBPF attacks in containers. + + + +## Comparing Security in eBPF and WebAssembly + +This paper examines the security of eBPF and WebAssembly (Wasm), two technologies that have gained widespread adoption in recent years, despite being designed for very different use cases and environments. While eBPF is a technology primarily used within operating system kernels such as Linux, Wasm is a binary instruction format designed for a stack-based virtual machine with use cases extending beyond the web. Recognizing the growth and expanding ambitions of eBPF, Wasm may provide instructive insights, given its design around securely executing arbitrary untrusted programs in complex and hostile environments such as web browsers and clouds. We analyze the security goals, community evolution, memory models, and execution models of both technologies, and conduct a comparative security assessment, exploring memory safety, control flow integrity, API access, and side-channels. Our results show that eBPF has a history of focusing on performance first and security second, while Wasm puts more emphasis on security at the cost of some runtime overheads. Considering language-based restrictions for eBPF and a security model for API access are fruitful directions for future work. + + + +More about can be found in the first workshop: + +## A flow-based IDS using Machine Learning in eBPF + +eBPF is a new technology which allows dynamically loading pieces of code into the Linux kernel. It can greatly speed up networking since it enables the kernel to process certain packets without the involvement of a userspace program. So far eBPF has been used for simple packet filtering applications such as firewalls or Denial of Service protection. We show that it is possible to develop a flow based network intrusion detection system based on machine learning entirely in eBPF. Our solution uses a decision tree and decides for each packet whether it is malicious or not, considering the entire previous context of the network flow. We achieve a performance increase of over 20% compared to the same solution implemented as a userspace program. + + + +## Femto-containers: lightweight virtualization and fault isolation for small software functions on low-power IoT microcontrollers + +Low-power operating system runtimes used on IoT microcontrollers typically provide rudimentary APIs, basic connectivity and, sometimes, a (secure) firmware update mechanism. In contrast, on less constrained hardware, networked software has entered the age of serverless, microservices and agility. With a view to bridge this gap, in the paper we design Femto-Containers, a new middleware runtime which can be embedded on heterogeneous low-power IoT devices. Femto-Containers enable the secure deployment, execution and isolation of small virtual software functions on low-power IoT devices, over the network. We implement Femto-Containers, and provide integration in RIOT, a popular open source IoT operating system. We then evaluate the performance of our implementation, which was formally verified for fault-isolation, guaranteeing that RIOT is shielded from logic loaded and executed in a Femto-Container. Our experiments on various popular micro-controller architectures (Arm Cortex-M, ESP32 and RISC-V) show that Femto-Containers offer an attractive trade-off in terms of memory footprint overhead, energy consumption, and security. + + diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 2dfbbeb..6f548d9 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -4,8 +4,7 @@ 本教程不会进行复杂的概念讲解和场景介绍,主要希望提供一些 eBPF 小工具的案例(**非常短小,从二十行代码开始入门!**),来帮助 eBPF 应用的开发者快速上手 eBPF 的开发方法和技巧。教程内容可以在目录中找到,每个目录都是一个独立的 eBPF 工具案例。 -教程关注于可观测性、网络、安全等等方面的 eBPF 示例。完整的代码和教程可以在 [https://github.com/eunomia-bpf/bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) GitHub 开源仓库中找到。 - +教程关注于可观测性、网络、安全等等方面的 eBPF 示例。完整的代码和教程可以在 [https://github.com/eunomia-bpf/bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) GitHub 开源仓库中找到。**如果您认为本教程对您有所帮助,也请给我们一个 star 鼓励一下!** # 目录 @@ -26,7 +25,7 @@ - [使用 USDT 捕获用户态 Java GC 事件耗时](15-javagc/README.md) - [编写 eBPF 程序 Memleak 监控内存泄漏](16-memleak/README.md) - [编写 eBPF 程序 Biopattern 统计随机/顺序磁盘 I/O](17-biopattern/README.md) -- [更多的参考资料](18-further-reading/README.md) +- [更多的参考资料:论文列表、项目、博客等等](18-further-reading/README.md) - [使用 LSM 进行安全检测防御](19-lsm-connect/README.md) - [使用 eBPF 进行 tc 流量控制](20-tc/README.md) diff --git a/src/SUMMARY_en.md b/src/SUMMARY_en.md index 745ec7a..7262c8e 100644 --- a/src/SUMMARY_en.md +++ b/src/SUMMARY_en.md @@ -6,7 +6,7 @@ This is a development tutorial for eBPF based on CO-RE (Compile Once, Run Everyw This tutorial does not cover complex concepts and scenario introductions. Its main purpose is to provide examples of eBPF tools (**very short, starting with twenty lines of code!**) to help eBPF application developers quickly grasp eBPF development methods and techniques. The tutorial content can be found in the directory, with each directory being an independent eBPF tool example. -For the complete source code of the tutorial, please refer to the repo [https://github.com/eunomia-bpf/bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) on GitHub. +For the complete source code of the tutorial, please refer to the repo [https://github.com/eunomia-bpf/bpf-developer-tutorial](https://github.com/eunomia-bpf/bpf-developer-tutorial) on GitHub. **If you find this tutorial helpful, please give us a star!** # Table of Contents @@ -27,7 +27,7 @@ For the complete source code of the tutorial, please refer to the repo [https:// - [Capturing user space Java GC event duration using USDT](15-javagc/README.md) - [Writing eBPF program Memleak to monitor memory leaks](16-memleak/README.md) - [Writing eBPF program Biopattern to measure random/sequential disk I/O](17-biopattern/README.md) -- [More reference materials](18-further-reading/README.md) +- [More reference materials: papers list, projects, blogs, etc.](18-further-reading/README.md) - [Performing security detection and defense using LSM](19-lsm-connect/README.md) - [Performing traffic control using eBPF and tc](20-tc/README.md)