docs: Update README files to include information on extending GPU driver behavior with eBPF and the gpu_ext project

2026-02-03 18:24:27 +08:00 · 2026-01-11 23:27:20 -08:00
parent 48afb3265d
commit 34f690385e
6 changed files with 13 additions and 3 deletions
--- a/src/xpu/gpu-kernel-driver/README.md
+++ b/src/xpu/gpu-kernel-driver/README.md
@@ -394,7 +394,7 @@ sudo cat /sys/kernel/debug/tracing/available_events | grep -E '(gpu_scheduler|i9

 This tutorial focuses on kernel-side GPU driver tracing, which provides visibility into job scheduling, memory management, and driver-firmware communication. However, kernel tracepoints have fundamental limitations. When `drm_run_job` fires, we know a job started executing on GPU hardware, but we cannot observe what happens inside the GPU itself. The execution of thousands of parallel threads, their memory access patterns, branch divergence, warp occupancy, and instruction-level behavior remain invisible. These details are critical for understanding performance bottlenecks - whether memory coalescing is failing, whether thread divergence is killing efficiency, or whether shared memory bank conflicts are stalling execution.

-To achieve fine-grained GPU observability, eBPF programs must run directly on the GPU. This is the direction explored by the eGPU paper and [bpftime GPU examples](https://github.com/eunomia-bpf/bpftime/tree/master/example/gpu). bpftime converts eBPF bytecode to PTX instructions that GPUs can execute, then dynamically patches CUDA binaries at runtime to inject these eBPF programs at kernel entry/exit points. This enables observing GPU-specific information like block indices, thread indices, global timers, and warp-level metrics. Developers can instrument critical paths inside GPU kernels to measure execution behavior and diagnose complex performance issues that kernel-side tracing cannot reach. This GPU-internal observability complements kernel tracepoints - together they provide end-to-end visibility from API calls through kernel drivers to GPU execution.
+To achieve fine-grained GPU observability, eBPF programs must run directly on the GPU. This is the direction explored by the eGPU paper and [bpftime GPU examples](https://github.com/eunomia-bpf/bpftime/tree/master/example/gpu). bpftime converts eBPF bytecode to PTX instructions that GPUs can execute, then dynamically patches CUDA binaries at runtime to inject these eBPF programs at kernel entry/exit points. This enables observing GPU-specific information like block indices, thread indices, global timers, and warp-level metrics. Developers can instrument critical paths inside GPU kernels to measure execution behavior and diagnose complex performance issues that kernel-side tracing cannot reach. This GPU-internal observability complements kernel tracepoints - together they provide end-to-end visibility from API calls through kernel drivers to GPU execution. Beyond tracing, eBPF can also extend GPU driver behavior—see our [gpu_ext project](https://github.com/eunomia-bpf/gpu_ext) for GPU scheduling and memory offloading via BPF struct_ops ([LPC 2024 talk](https://lpc.events/event/19/contributions/2168/)).

 ## Summary