Problem:
`Path(__file__)` can be ".", in such case `self.oncpu_tool` is "oncputime", which can not be found by the shell.
Solution:
Translate `Path(__file__)` to absolute path.
- Added an example flamegraph for Qwen3 LLM inference, highlighting key insights and performance bottlenecks.
- Updated README.md to include detailed explanations of CPU and GPU profiling results, emphasizing the correlation between CPU stacks and GPU kernels.
- Modified gpuperf.py to ensure absolute paths are used for output files, improving reliability across different working directories.
- Enhanced merge_gpu_cpu_trace.py to strip ANSI escape sequences from CPU stack traces, ensuring cleaner output for analysis.
- Introduced a new SVG file for the Qwen3 flamegraph, providing a visual representation of profiling data with interactive features.
- Updated README.zh.md for GPU kernel driver to improve clarity and formatting.
- Added nvidia_driver.bt script for monitoring NVIDIA proprietary GPU driver activity using kernel probes.
- Revised README.md for NPU kernel driver to enhance explanations and correct minor grammatical issues.
- Introduced a new tutorial on using HID-BPF to create virtual mouse devices and modify their input dynamically.
- Explained the common issues with HID devices and how traditional methods are cumbersome.
- Provided detailed implementation steps for creating a virtual HID device using uhid and modifying input with eBPF.
- Included example code for both user space and BPF programs, demonstrating how to intercept and modify HID reports.
- Highlighted the advantages of using virtual devices for learning and experimentation.
- Added references for further reading on HID-BPF and related projects.
- Introduced a comprehensive tutorial in README.md explaining how to fix broken HID devices using eBPF without kernel patches.
- Implemented a userspace program (hid-input-modifier.c) that creates a virtual HID mouse using the uhid interface and sends synthetic mouse events.
- Developed a BPF program (hid-input-modifier.bpf.c) that intercepts HID events and modifies mouse movement data, effectively doubling the X and Y movement.
- Created necessary header files (hid_bpf.h, hid_bpf_defs.h, hid_bpf_helpers.h) to define structures and helper functions for the BPF program.
- Added functionality to find and manage the virtual HID device, ensuring seamless integration with the BPF program.
- Created SUMMARY.md.template and SUMMARY.zh.md.template for eBPF tutorial.
- Updated generate_toc.py to generate table of contents for English and Chinese versions.
- Added configuration files for new eBPF examples, categorizing them by level and type.
- Updated SUMMARY.md and SUMMARY.zh.md to reflect new lessons and reorganized sections.
- Introduced new features related to GPU and tracing in the tutorial.
- Removed unnecessary print statements for arena sum and number of elements in the test_arena_list_add_del function.
- Simplified output to focus on essential test results, improving clarity and conciseness of the test logs.
- Introduced a comprehensive README.md detailing the use of eBPF for monitoring GPU activity through kernel tracepoints.
- Added bpftrace scripts for monitoring AMD GPU operations, including buffer object creation, command submission, and interrupts.
- Created a bpftrace script for tracking DRM GPU scheduler activity across all modern GPU drivers.
- Developed a bpftrace script to monitor display vertical blanking events for frame timing analysis.
- Implemented a bpftrace script for Intel i915 GPU activity, focusing on GEM object management, memory operations, and page faults.
- Introduced BPF workqueues to enable asynchronous work from BPF programs, allowing deferred processing, non-blocking operations, and sleepable contexts for long-running tasks.
- Added README.md to document the BPF workqueues, including use cases, technical architecture, and code examples.
- Created bpf_experimental.h header file to define necessary BPF workqueue functions and structures.
- Implemented a simple BPF workqueue example (wq_simple) demonstrating the initialization, scheduling, and execution of work in a separate context.
- Developed a userspace test (wq_simple.c) to verify the functionality of the BPF workqueue by triggering a syscall and checking the execution results.
This commit introduces a new BPF program that manages an arena list, allowing for the addition and deletion of elements. The following changes were made:
- Added `arena_list.bpf.c` to implement BPF functions for adding and deleting elements in an arena list.
- Created `arena_list.c` for user-space testing of the BPF program, including functions to sum elements and validate the arena list operations.
- Introduced `bpf_arena_alloc.h` and `bpf_arena_common.h` for memory allocation and common definitions related to arena management.
- Defined `bpf_arena_list.h` to establish the structure of the arena list nodes and heads.
- Added `bpf_experimental.h` to include experimental BPF features and helper functions for object management.
These changes enhance the BPF capabilities for managing memory in a structured way, facilitating efficient allocation and deallocation of resources.
* Add combined on-CPU and off-CPU profiler script
- Implemented a new profiling tool that captures both on-CPU and off-CPU activity for a specified process.
- The script runs 'oncputime' and 'offcputime' tools simultaneously and combines their results into a unified flamegraph.
- Added functionality to discover threads and profile them individually if the application is multi-threaded.
- Included error handling for tool execution and output processing.
- Created methods for generating flamegraph data and SVG visualizations.
- Added command-line argument parsing for user-defined profiling parameters.
- Implemented detailed analysis reports for both individual threads and overall profiling results.
* feat: integrate blazesym v0.2.0 for improved symbol resolution and add test program for memory leak detection
* docs: update README to enhance wall clock profiling tutorial with detailed explanations and examples
* feat: add wallclock-profiler tests to CI workflow for tool validation
* fix: rename combined_profiler.py to wallclock_profiler.py in test scripts and usage examples