mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-02-03 10:14:44 +08:00
Add Python stack profiler tutorial with eBPF
Implement a complete Python stack profiler that demonstrates how to: - Walk CPython interpreter frame structures from eBPF - Extract Python function names, filenames, and line numbers - Combine native C stacks with Python interpreter stacks - Profile Python applications with minimal overhead Key features: - Python internal struct definitions (PyFrameObject, PyCodeObject, PyThreadState) - String reading for both PyUnicodeObject and PyBytesObject - Frame walking with configurable stack depth - Both human-readable and flamegraph-compatible output formats - Command-line options for PID filtering and sampling frequency Files added: - python-stack.bpf.c: eBPF program for capturing Python stacks - python-stack.c: Userspace program for printing results - python-stack.h: Python internal structure definitions - test_program.py: Python test workload - run_test.sh: Automated test script - README.md: Comprehensive tutorial documentation - Makefile: Build configuration - .gitignore: Ignore build artifacts This tutorial serves as an educational foundation for understanding: 1. How to read userspace memory from eBPF 2. CPython internals and frame management 3. Sampling-based profiling techniques 4. Combining kernel and userspace observability Note: Current implementation demonstrates concepts but requires additional work for production use (thread state discovery, multi-version support, symbol resolution). 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -28,11 +28,51 @@ This tutorial shows how to use eBPF to capture both native C stacks AND Python i
|
||||
- Root access (for loading eBPF programs)
|
||||
- Understanding of stack traces and profiling concepts
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Build the profiler
|
||||
make
|
||||
|
||||
# Run the test
|
||||
sudo ./run_test.sh
|
||||
|
||||
# Or profile a specific Python process
|
||||
sudo ./python-stack -p <PID> -d 10
|
||||
```
|
||||
|
||||
## Building and Running
|
||||
|
||||
### Build
|
||||
|
||||
```bash
|
||||
make
|
||||
sudo ./python-stack
|
||||
```
|
||||
|
||||
### Profile All Python Processes
|
||||
|
||||
```bash
|
||||
sudo ./python-stack -d 10
|
||||
```
|
||||
|
||||
### Profile Specific Process
|
||||
|
||||
```bash
|
||||
# Find your Python process
|
||||
ps aux | grep python
|
||||
|
||||
# Profile it
|
||||
sudo ./python-stack -p 12345 -d 30
|
||||
```
|
||||
|
||||
### Generate Flamegraph
|
||||
|
||||
```bash
|
||||
# Collect folded stacks
|
||||
sudo ./python-stack -p 12345 -f -d 10 > stacks.txt
|
||||
|
||||
# Generate flamegraph (requires flamegraph.pl from Brendan Gregg)
|
||||
flamegraph.pl stacks.txt > flamegraph.svg
|
||||
```
|
||||
|
||||
## How It Works
|
||||
@@ -79,12 +119,44 @@ Each line shows the stack trace and sample count.
|
||||
- **Data processing**: Optimize pandas, polars operations
|
||||
- **General Python**: Any Python application performance analysis
|
||||
|
||||
## Current Limitations
|
||||
|
||||
This is an educational implementation demonstrating the concepts. For production use, you would need:
|
||||
|
||||
1. **Python Thread State Discovery**: The current implementation requires manually populating the `python_thread_states` map. A complete implementation would:
|
||||
- Parse `/proc/<pid>/maps` to find `libpython.so`
|
||||
- Read Python's global interpreter state (`_PyRuntime`)
|
||||
- Walk the thread state list to find each thread's `PyThreadState`
|
||||
- Use uprobes on Python's thread creation functions
|
||||
|
||||
2. **Python Version Compatibility**: Python internal structures vary between versions (3.8, 3.9, 3.10, 3.11, 3.12). A robust implementation would:
|
||||
- Detect Python version from the binary
|
||||
- Use different struct layouts per version
|
||||
- Support both debug and release builds
|
||||
|
||||
3. **Symbol Resolution**: Native stack addresses need symbol resolution via:
|
||||
- `/proc/<pid>/maps` for address ranges
|
||||
- DWARF/ELF parsing for function names
|
||||
- Integration with blazesym (like in oncputime)
|
||||
|
||||
## Production Alternatives
|
||||
|
||||
For production Python profiling, consider:
|
||||
- **py-spy**: Sampling profiler that doesn't require instrumentation
|
||||
- **Austin**: Frame stack sampler for CPython
|
||||
- **Pyroscope**: Continuous profiling platform with Python support
|
||||
- **pyperf** with **eBPF backend**: Official Python profiling with eBPF
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Extend to capture GIL contention
|
||||
- Add Python object allocation tracking
|
||||
- Integrate with other eBPF metrics (CPU, memory)
|
||||
- Build flamegraph visualization
|
||||
Extend this tutorial to:
|
||||
- Implement Python thread state discovery via `/proc` parsing
|
||||
- Add multi-version Python struct support (3.8-3.12)
|
||||
- Integrate blazesym for native symbol resolution
|
||||
- Capture GIL contention events
|
||||
- Track Python object allocation
|
||||
- Measure function-level CPU time
|
||||
- Support PyPy and other Python implementations
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -1,10 +1,7 @@
|
||||
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
|
||||
/*
|
||||
* profile Profile CPU usage by sampling stack traces at a timed interval.
|
||||
* Copyright (c) 2022 LG Electronics
|
||||
*
|
||||
* Based on profile from BCC by Brendan Gregg and others.
|
||||
* 28-Dec-2021 Eunseon Lee Created this.
|
||||
* Python Stack Profiler - Profile Python applications with eBPF
|
||||
* Based on oncputime by Eunseon Lee
|
||||
*/
|
||||
#include <argp.h>
|
||||
#include <signal.h>
|
||||
@@ -19,44 +16,116 @@
|
||||
#include <bpf/bpf.h>
|
||||
#include <sys/stat.h>
|
||||
#include <string.h>
|
||||
#include "oncputime.h"
|
||||
#include "oncputime.skel.h"
|
||||
#include "blazesym.h"
|
||||
#include "arg_parse.h"
|
||||
#include "python-stack.h"
|
||||
#include "python-stack.skel.h"
|
||||
|
||||
#define SYM_INFO_LEN 2048
|
||||
|
||||
/*
|
||||
* -EFAULT in get_stackid normally means the stack-trace is not available,
|
||||
* such as getting kernel stack trace in user mode
|
||||
*/
|
||||
#define STACK_ID_EFAULT(stack_id) (stack_id == -EFAULT)
|
||||
|
||||
#define STACK_ID_ERR(stack_id) ((stack_id < 0) && !STACK_ID_EFAULT(stack_id))
|
||||
|
||||
/* hash collision (-EEXIST) suggests that stack map size may be too small */
|
||||
#define CHECK_STACK_COLLISION(ustack_id, kstack_id) \
|
||||
(kstack_id == -EEXIST || ustack_id == -EEXIST)
|
||||
|
||||
#define MISSING_STACKS(ustack_id, kstack_id) \
|
||||
(!env.user_stacks_only && STACK_ID_ERR(kstack_id)) + (!env.kernel_stacks_only && STACK_ID_ERR(ustack_id))
|
||||
(STACK_ID_ERR(kstack_id) + STACK_ID_ERR(ustack_id))
|
||||
|
||||
/* This structure combines key_t and count which should be sorted together */
|
||||
struct key_ext_t {
|
||||
struct key_t k;
|
||||
__u64 v;
|
||||
};
|
||||
|
||||
static blaze_symbolizer *symbolizer;
|
||||
static struct env {
|
||||
int duration;
|
||||
int sample_freq;
|
||||
int cpu;
|
||||
bool verbose;
|
||||
bool folded;
|
||||
bool python_only;
|
||||
int pid;
|
||||
int perf_max_stack_depth;
|
||||
int stack_storage_size;
|
||||
} env = {
|
||||
.duration = 10,
|
||||
.sample_freq = 49,
|
||||
.cpu = -1,
|
||||
.verbose = false,
|
||||
.folded = false,
|
||||
.python_only = true,
|
||||
.pid = -1,
|
||||
.perf_max_stack_depth = 127,
|
||||
.stack_storage_size = 10240,
|
||||
};
|
||||
|
||||
static int nr_cpus;
|
||||
static volatile sig_atomic_t exiting = 0;
|
||||
|
||||
const char argp_program_doc[] =
|
||||
"Profile Python applications using eBPF.\n"
|
||||
"\n"
|
||||
"USAGE: python-stack [OPTIONS]\n"
|
||||
"\n"
|
||||
"EXAMPLES:\n"
|
||||
" python-stack # profile all Python processes for 10 seconds\n"
|
||||
" python-stack -p 1234 # profile Python process with PID 1234\n"
|
||||
" python-stack -F 99 -d 30 # profile at 99 Hz for 30 seconds\n";
|
||||
|
||||
static const struct argp_option opts[] = {
|
||||
{ "pid", 'p', "PID", 0, "Profile Python process with this PID" },
|
||||
{ "frequency", 'F', "FREQ", 0, "Sample frequency (default: 49 Hz)" },
|
||||
{ "duration", 'd', "DURATION", 0, "Duration in seconds (default: 10)" },
|
||||
{ "cpu", 'C', "CPU", 0, "CPU to profile on" },
|
||||
{ "folded", 'f', NULL, 0, "Output folded format for flame graphs" },
|
||||
{ "verbose", 'v', NULL, 0, "Verbose debug output" },
|
||||
{ NULL, 'h', NULL, OPTION_HIDDEN, "Show this help" },
|
||||
{},
|
||||
};
|
||||
|
||||
static error_t parse_arg(int key, char *arg, struct argp_state *state)
|
||||
{
|
||||
switch (key) {
|
||||
case 'p':
|
||||
env.pid = atoi(arg);
|
||||
break;
|
||||
case 'F':
|
||||
env.sample_freq = atoi(arg);
|
||||
break;
|
||||
case 'd':
|
||||
env.duration = atoi(arg);
|
||||
break;
|
||||
case 'C':
|
||||
env.cpu = atoi(arg);
|
||||
break;
|
||||
case 'f':
|
||||
env.folded = true;
|
||||
break;
|
||||
case 'v':
|
||||
env.verbose = true;
|
||||
break;
|
||||
case 'h':
|
||||
argp_state_help(state, stderr, ARGP_HELP_STD_HELP);
|
||||
break;
|
||||
default:
|
||||
return ARGP_ERR_UNKNOWN;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int libbpf_print_fn(enum libbpf_print_level level, const char *format,
|
||||
va_list args)
|
||||
{
|
||||
if (level == LIBBPF_DEBUG && !env.verbose)
|
||||
return 0;
|
||||
return vfprintf(stderr, format, args);
|
||||
}
|
||||
|
||||
static void sig_handler(int sig)
|
||||
{
|
||||
exiting = 1;
|
||||
}
|
||||
|
||||
static int open_and_attach_perf_event(struct bpf_program *prog,
|
||||
struct bpf_link *links[])
|
||||
{
|
||||
struct perf_event_attr attr = {
|
||||
.type = PERF_TYPE_SOFTWARE,
|
||||
.freq = env.freq,
|
||||
.freq = 1,
|
||||
.sample_freq = env.sample_freq,
|
||||
.config = PERF_COUNT_SW_CPU_CLOCK,
|
||||
};
|
||||
@@ -68,10 +137,8 @@ static int open_and_attach_perf_event(struct bpf_program *prog,
|
||||
|
||||
fd = syscall(__NR_perf_event_open, &attr, -1, i, -1, 0);
|
||||
if (fd < 0) {
|
||||
/* Ignore CPU that is offline */
|
||||
if (errno == ENODEV)
|
||||
continue;
|
||||
|
||||
fprintf(stderr, "failed to init perf sampling: %s\n",
|
||||
strerror(errno));
|
||||
return -1;
|
||||
@@ -79,9 +146,7 @@ static int open_and_attach_perf_event(struct bpf_program *prog,
|
||||
|
||||
links[i] = bpf_program__attach_perf_event(prog, fd);
|
||||
if (!links[i]) {
|
||||
fprintf(stderr, "failed to attach perf event on cpu: "
|
||||
"%d\n", i);
|
||||
links[i] = NULL;
|
||||
fprintf(stderr, "failed to attach perf event on cpu %d\n", i);
|
||||
close(fd);
|
||||
return -1;
|
||||
}
|
||||
@@ -90,139 +155,91 @@ static int open_and_attach_perf_event(struct bpf_program *prog,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args)
|
||||
{
|
||||
if (level == LIBBPF_DEBUG && !env.verbose)
|
||||
return 0;
|
||||
|
||||
return vfprintf(stderr, format, args);
|
||||
}
|
||||
|
||||
static void sig_handler(int sig)
|
||||
{
|
||||
}
|
||||
|
||||
static int cmp_counts(const void *a, const void *b)
|
||||
{
|
||||
const __u64 x = ((struct key_ext_t *) a)->v;
|
||||
const __u64 y = ((struct key_ext_t *) b)->v;
|
||||
|
||||
/* descending order */
|
||||
const __u64 x = ((struct key_ext_t *)a)->v;
|
||||
const __u64 y = ((struct key_ext_t *)b)->v;
|
||||
return y - x;
|
||||
}
|
||||
|
||||
static int read_counts_map(int fd, struct key_ext_t *items, __u32 *count)
|
||||
static void print_python_stack(const struct python_stack *py_stack)
|
||||
{
|
||||
struct key_t empty = {};
|
||||
struct key_t *lookup_key = ∅
|
||||
int i = 0;
|
||||
int err;
|
||||
if (py_stack->depth == 0)
|
||||
return;
|
||||
|
||||
while (bpf_map_get_next_key(fd, lookup_key, &items[i].k) == 0) {
|
||||
err = bpf_map_lookup_elem(fd, &items[i].k, &items[i].v);
|
||||
if (err < 0) {
|
||||
fprintf(stderr, "failed to lookup counts: %d\n", err);
|
||||
return -err;
|
||||
for (int i = py_stack->depth - 1; i >= 0; i--) {
|
||||
const struct python_frame *frame = &py_stack->frames[i];
|
||||
|
||||
if (env.folded) {
|
||||
// Folded format for flamegraphs
|
||||
if (i < py_stack->depth - 1)
|
||||
printf(";");
|
||||
printf("%s:%s:%d", frame->file_name,
|
||||
frame->function_name, frame->line_number);
|
||||
} else {
|
||||
// Multi-line format
|
||||
printf(" %s:%d %s\n", frame->file_name,
|
||||
frame->line_number, frame->function_name);
|
||||
}
|
||||
|
||||
if (items[i].v == 0)
|
||||
continue;
|
||||
|
||||
lookup_key = &items[i].k;
|
||||
i++;
|
||||
}
|
||||
|
||||
*count = i;
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int print_count(struct key_t *event, __u64 count, int stack_map)
|
||||
{
|
||||
unsigned long *ip;
|
||||
int ret;
|
||||
bool has_kernel_stack, has_user_stack;
|
||||
|
||||
ip = calloc(env.perf_max_stack_depth, sizeof(unsigned long));
|
||||
if (!ip) {
|
||||
fprintf(stderr, "failed to alloc ip\n");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
has_kernel_stack = !STACK_ID_EFAULT(event->kern_stack_id);
|
||||
has_user_stack = !STACK_ID_EFAULT(event->user_stack_id);
|
||||
bool has_python_stack = (event->py_stack.depth > 0);
|
||||
|
||||
if (!env.folded) {
|
||||
/* multi-line stack output */
|
||||
/* Show kernel stack first */
|
||||
if (!env.user_stacks_only && has_kernel_stack) {
|
||||
if (bpf_map_lookup_elem(stack_map, &event->kern_stack_id, ip) != 0) {
|
||||
fprintf(stderr, " [Missed Kernel Stack]\n");
|
||||
} else {
|
||||
show_stack_trace(symbolizer, (__u64 *)ip, env.perf_max_stack_depth, 0);
|
||||
// Multi-line format
|
||||
printf("Process: %s (PID: %d)\n", event->name, event->pid);
|
||||
|
||||
// Print Python stack if available
|
||||
if (has_python_stack) {
|
||||
printf(" Python Stack:\n");
|
||||
print_python_stack(&event->py_stack);
|
||||
}
|
||||
|
||||
// Print native stacks
|
||||
unsigned long *ip = calloc(env.perf_max_stack_depth, sizeof(unsigned long));
|
||||
if (!ip) {
|
||||
fprintf(stderr, "failed to alloc ip\n");
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
// Show user stack
|
||||
if (!STACK_ID_EFAULT(event->user_stack_id)) {
|
||||
if (bpf_map_lookup_elem(stack_map, &event->user_stack_id, ip) == 0) {
|
||||
printf(" Native User Stack:\n");
|
||||
for (int i = 0; i < env.perf_max_stack_depth && ip[i]; i++) {
|
||||
printf(" 0x%lx\n", ip[i]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (env.delimiter && !env.user_stacks_only && !env.kernel_stacks_only &&
|
||||
has_user_stack && has_kernel_stack) {
|
||||
printf(" --\n");
|
||||
}
|
||||
|
||||
/* Then show user stack */
|
||||
if (!env.kernel_stacks_only && has_user_stack) {
|
||||
if (bpf_map_lookup_elem(stack_map, &event->user_stack_id, ip) != 0) {
|
||||
fprintf(stderr, " [Missed User Stack]\n");
|
||||
} else {
|
||||
show_stack_trace(symbolizer, (__u64 *)ip, env.perf_max_stack_depth, event->pid);
|
||||
}
|
||||
}
|
||||
|
||||
printf(" %-16s %s (%d)\n", "-", event->name, event->pid);
|
||||
printf(" %lld\n", count);
|
||||
free(ip);
|
||||
printf(" Count: %lld\n\n", count);
|
||||
} else {
|
||||
/* folded stack output */
|
||||
printf("%s", event->name);
|
||||
|
||||
/* Print user stack first for folded format */
|
||||
if (has_user_stack && !env.kernel_stacks_only) {
|
||||
if (bpf_map_lookup_elem(stack_map, &event->user_stack_id, ip) != 0) {
|
||||
printf(";[Missed User Stack]");
|
||||
} else {
|
||||
printf(";");
|
||||
show_stack_trace_folded(symbolizer, (__u64 *)ip, env.perf_max_stack_depth, event->pid, ';', true);
|
||||
}
|
||||
// Folded format for flamegraphs
|
||||
printf("%s;", event->name);
|
||||
|
||||
if (has_python_stack) {
|
||||
print_python_stack(&event->py_stack);
|
||||
} else {
|
||||
printf("<no python stack>");
|
||||
}
|
||||
|
||||
/* Then print kernel stack if it exists */
|
||||
if (has_kernel_stack && !env.user_stacks_only) {
|
||||
/* Add delimiter between user and kernel stacks if needed */
|
||||
if (has_user_stack && env.delimiter && !env.kernel_stacks_only)
|
||||
printf("-");
|
||||
|
||||
if (bpf_map_lookup_elem(stack_map, &event->kern_stack_id, ip) != 0) {
|
||||
printf(";[Missed Kernel Stack]");
|
||||
} else {
|
||||
printf(";");
|
||||
show_stack_trace_folded(symbolizer, (__u64 *)ip, env.perf_max_stack_depth, 0, ';', true);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
printf(" %lld\n", count);
|
||||
}
|
||||
|
||||
free(ip);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int print_counts(int counts_map, int stack_map)
|
||||
{
|
||||
struct key_ext_t *counts;
|
||||
struct key_t *event;
|
||||
__u64 count;
|
||||
__u32 nr_count = MAX_ENTRIES;
|
||||
size_t nr_missing_stacks = 0;
|
||||
bool has_collision = false;
|
||||
int i, ret = 0;
|
||||
struct key_t empty = {};
|
||||
struct key_t *lookup_key = ∅
|
||||
int i = 0, err;
|
||||
__u32 nr_count = 0;
|
||||
|
||||
counts = calloc(MAX_ENTRIES, sizeof(struct key_ext_t));
|
||||
if (!counts) {
|
||||
@@ -230,89 +247,53 @@ static int print_counts(int counts_map, int stack_map)
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
ret = read_counts_map(counts_map, counts, &nr_count);
|
||||
if (ret)
|
||||
goto cleanup;
|
||||
// Read all entries from the map
|
||||
while (bpf_map_get_next_key(counts_map, lookup_key, &counts[i].k) == 0) {
|
||||
err = bpf_map_lookup_elem(counts_map, &counts[i].k, &counts[i].v);
|
||||
if (err < 0) {
|
||||
fprintf(stderr, "failed to lookup counts: %d\n", err);
|
||||
free(counts);
|
||||
return -err;
|
||||
}
|
||||
|
||||
if (counts[i].v == 0) {
|
||||
lookup_key = &counts[i].k;
|
||||
continue;
|
||||
}
|
||||
|
||||
lookup_key = &counts[i].k;
|
||||
i++;
|
||||
}
|
||||
|
||||
nr_count = i;
|
||||
qsort(counts, nr_count, sizeof(struct key_ext_t), cmp_counts);
|
||||
|
||||
// Print results
|
||||
if (!env.folded) {
|
||||
printf("\n=== Python Stack Profile ===\n");
|
||||
printf("Captured %d unique stacks\n\n", nr_count);
|
||||
}
|
||||
|
||||
for (i = 0; i < nr_count; i++) {
|
||||
event = &counts[i].k;
|
||||
count = counts[i].v;
|
||||
|
||||
print_count(event, count, stack_map);
|
||||
|
||||
/* Add a newline between stack traces for better readability */
|
||||
if (!env.folded && i < nr_count - 1)
|
||||
printf("\n");
|
||||
|
||||
/* handle stack id errors */
|
||||
nr_missing_stacks += MISSING_STACKS(event->user_stack_id, event->kern_stack_id);
|
||||
has_collision = CHECK_STACK_COLLISION(event->user_stack_id, event->kern_stack_id);
|
||||
print_count(&counts[i].k, counts[i].v, stack_map);
|
||||
}
|
||||
|
||||
if (nr_missing_stacks > 0) {
|
||||
fprintf(stderr, "WARNING: %zu stack traces could not be displayed.%s\n",
|
||||
nr_missing_stacks, has_collision ?
|
||||
" Consider increasing --stack-storage-size.":"");
|
||||
}
|
||||
|
||||
cleanup:
|
||||
free(counts);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static void print_headers()
|
||||
{
|
||||
int i;
|
||||
|
||||
if (env.folded)
|
||||
return; // Don't print headers in folded format
|
||||
|
||||
printf("Sampling at %d Hertz of", env.sample_freq);
|
||||
|
||||
if (env.pids[0]) {
|
||||
printf(" PID [");
|
||||
for (i = 0; i < MAX_PID_NR && env.pids[i]; i++)
|
||||
printf("%d%s", env.pids[i], (i < MAX_PID_NR - 1 && env.pids[i + 1]) ? ", " : "]");
|
||||
} else if (env.tids[0]) {
|
||||
printf(" TID [");
|
||||
for (i = 0; i < MAX_TID_NR && env.tids[i]; i++)
|
||||
printf("%d%s", env.tids[i], (i < MAX_TID_NR - 1 && env.tids[i + 1]) ? ", " : "]");
|
||||
} else {
|
||||
printf(" all threads");
|
||||
}
|
||||
|
||||
if (env.user_stacks_only)
|
||||
printf(" by user");
|
||||
else if (env.kernel_stacks_only)
|
||||
printf(" by kernel");
|
||||
else
|
||||
printf(" by user + kernel");
|
||||
|
||||
if (env.cpu != -1)
|
||||
printf(" on CPU#%d", env.cpu);
|
||||
|
||||
if (env.duration < INT_MAX)
|
||||
printf(" for %d secs.\n", env.duration);
|
||||
else
|
||||
printf("... Hit Ctrl-C to end.\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
static const struct argp argp = {
|
||||
.options = opts,
|
||||
.parser = parse_arg,
|
||||
.doc = argp_program_doc,
|
||||
};
|
||||
struct bpf_link *links[MAX_CPU_NR] = {};
|
||||
struct oncputime_bpf *obj;
|
||||
int pids_fd, tids_fd;
|
||||
int err, i;
|
||||
__u8 val = 0;
|
||||
struct python_stack_bpf *obj;
|
||||
int err;
|
||||
|
||||
err = parse_common_args(argc, argv, TOOL_PROFILE);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
err = validate_common_args();
|
||||
err = argp_parse(&argp, argc, argv, 0, NULL, NULL);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
@@ -320,64 +301,44 @@ int main(int argc, char **argv)
|
||||
|
||||
nr_cpus = libbpf_num_possible_cpus();
|
||||
if (nr_cpus < 0) {
|
||||
printf("failed to get # of possible cpus: '%s'!\n",
|
||||
strerror(-nr_cpus));
|
||||
fprintf(stderr, "failed to get # of possible cpus: %s\n",
|
||||
strerror(-nr_cpus));
|
||||
return 1;
|
||||
}
|
||||
if (nr_cpus > MAX_CPU_NR) {
|
||||
fprintf(stderr, "the number of cpu cores is too big, please "
|
||||
"increase MAX_CPU_NR's value and recompile");
|
||||
fprintf(stderr, "the number of cpu cores is too big\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
symbolizer = blaze_symbolizer_new();
|
||||
if (!symbolizer) {
|
||||
fprintf(stderr, "Failed to create a blazesym symbolizer\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
obj = oncputime_bpf__open();
|
||||
obj = python_stack_bpf__open();
|
||||
if (!obj) {
|
||||
fprintf(stderr, "failed to open BPF object\n");
|
||||
blaze_symbolizer_free(symbolizer);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* initialize global data (filtering options) */
|
||||
obj->rodata->user_stacks_only = env.user_stacks_only;
|
||||
obj->rodata->kernel_stacks_only = env.kernel_stacks_only;
|
||||
obj->rodata->include_idle = env.include_idle;
|
||||
if (env.pids[0])
|
||||
// Configure BPF program
|
||||
obj->rodata->python_only = env.python_only;
|
||||
if (env.pid > 0)
|
||||
obj->rodata->filter_by_pid = true;
|
||||
else if (env.tids[0])
|
||||
obj->rodata->filter_by_tid = true;
|
||||
|
||||
bpf_map__set_value_size(obj->maps.stackmap,
|
||||
env.perf_max_stack_depth * sizeof(unsigned long));
|
||||
bpf_map__set_max_entries(obj->maps.stackmap, env.stack_storage_size);
|
||||
|
||||
err = oncputime_bpf__load(obj);
|
||||
err = python_stack_bpf__load(obj);
|
||||
if (err) {
|
||||
fprintf(stderr, "failed to load BPF programs\n");
|
||||
fprintf(stderr, "failed to load BPF programs: %d\n", err);
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
if (env.pids[0]) {
|
||||
pids_fd = bpf_map__fd(obj->maps.pids);
|
||||
for (i = 0; i < MAX_PID_NR && env.pids[i]; i++) {
|
||||
if (bpf_map_update_elem(pids_fd, &(env.pids[i]), &val, BPF_ANY) != 0) {
|
||||
fprintf(stderr, "failed to init pids map: %s\n", strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
}
|
||||
}
|
||||
else if (env.tids[0]) {
|
||||
tids_fd = bpf_map__fd(obj->maps.tids);
|
||||
for (i = 0; i < MAX_TID_NR && env.tids[i]; i++) {
|
||||
if (bpf_map_update_elem(tids_fd, &(env.tids[i]), &val, BPF_ANY) != 0) {
|
||||
fprintf(stderr, "failed to init tids map: %s\n", strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
// Setup PID filter if specified
|
||||
if (env.pid > 0) {
|
||||
int pids_fd = bpf_map__fd(obj->maps.pids);
|
||||
__u8 val = 1;
|
||||
if (bpf_map_update_elem(pids_fd, &env.pid, &val, BPF_ANY) != 0) {
|
||||
fprintf(stderr, "failed to set pid filter: %s\n",
|
||||
strerror(errno));
|
||||
goto cleanup;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -387,28 +348,25 @@ int main(int argc, char **argv)
|
||||
|
||||
signal(SIGINT, sig_handler);
|
||||
|
||||
if (!env.folded)
|
||||
print_headers();
|
||||
if (!env.folded) {
|
||||
printf("Profiling Python stacks at %d Hz", env.sample_freq);
|
||||
if (env.pid > 0)
|
||||
printf(" for PID %d", env.pid);
|
||||
printf("... Hit Ctrl-C to end.\n");
|
||||
}
|
||||
|
||||
/*
|
||||
* We'll get sleep interrupted when someone presses Ctrl-C.
|
||||
* (which will be "handled" with noop by sig_handler)
|
||||
*/
|
||||
sleep(env.duration);
|
||||
|
||||
if (!env.folded)
|
||||
printf("\nCollecting results...\n");
|
||||
|
||||
print_counts(bpf_map__fd(obj->maps.counts),
|
||||
bpf_map__fd(obj->maps.stackmap));
|
||||
|
||||
cleanup:
|
||||
if (env.cpu != -1)
|
||||
bpf_link__destroy(links[env.cpu]);
|
||||
else {
|
||||
for (i = 0; i < nr_cpus; i++)
|
||||
bpf_link__destroy(links[i]);
|
||||
}
|
||||
|
||||
blaze_symbolizer_free(symbolizer);
|
||||
oncputime_bpf__destroy(obj);
|
||||
for (int i = 0; i < nr_cpus; i++)
|
||||
bpf_link__destroy(links[i]);
|
||||
|
||||
python_stack_bpf__destroy(obj);
|
||||
return err != 0;
|
||||
}
|
||||
|
||||
53
src/trace/python-stack-profiler/run_test.sh
Executable file
53
src/trace/python-stack-profiler/run_test.sh
Executable file
@@ -0,0 +1,53 @@
|
||||
#!/bin/bash
|
||||
# Test script for Python stack profiler
|
||||
|
||||
set -e
|
||||
|
||||
echo "=== Python Stack Profiler Test ==="
|
||||
echo ""
|
||||
|
||||
# Check if running as root
|
||||
if [ "$EUID" -ne 0 ]; then
|
||||
echo "Please run as root (required for eBPF)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Build the profiler
|
||||
echo "Building Python stack profiler..."
|
||||
make clean
|
||||
make
|
||||
|
||||
if [ ! -f "./python-stack" ]; then
|
||||
echo "Error: Build failed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Build successful!"
|
||||
echo ""
|
||||
|
||||
# Start Python test program in background
|
||||
echo "Starting Python test program..."
|
||||
python3 test_program.py &
|
||||
PYTHON_PID=$!
|
||||
|
||||
echo "Python test program PID: $PYTHON_PID"
|
||||
echo "Waiting 2 seconds for it to start..."
|
||||
sleep 2
|
||||
|
||||
# Run the profiler
|
||||
echo ""
|
||||
echo "Running profiler for 5 seconds..."
|
||||
./python-stack -p $PYTHON_PID -d 5 -F 49
|
||||
|
||||
# Cleanup
|
||||
echo ""
|
||||
echo "Cleaning up..."
|
||||
kill $PYTHON_PID 2>/dev/null || true
|
||||
wait $PYTHON_PID 2>/dev/null || true
|
||||
|
||||
echo ""
|
||||
echo "=== Test Complete ==="
|
||||
echo ""
|
||||
echo "To generate a flamegraph:"
|
||||
echo " 1. Run: ./python-stack -p <PID> -f > stacks.txt"
|
||||
echo " 2. Generate SVG: flamegraph.pl stacks.txt > flamegraph.svg"
|
||||
46
src/trace/python-stack-profiler/test_program.py
Executable file
46
src/trace/python-stack-profiler/test_program.py
Executable file
@@ -0,0 +1,46 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Simple Python test program to demonstrate stack profiling
|
||||
This simulates a typical workload with multiple function calls
|
||||
"""
|
||||
|
||||
import time
|
||||
import sys
|
||||
|
||||
def expensive_computation(n):
|
||||
"""Simulate CPU-intensive work"""
|
||||
result = 0
|
||||
for i in range(n):
|
||||
result += i ** 2
|
||||
return result
|
||||
|
||||
def process_data(iterations):
|
||||
"""Process data with nested function calls"""
|
||||
results = []
|
||||
for i in range(iterations):
|
||||
value = expensive_computation(10000)
|
||||
results.append(value)
|
||||
return results
|
||||
|
||||
def load_model():
|
||||
"""Simulate model loading"""
|
||||
time.sleep(0.1)
|
||||
data = process_data(50)
|
||||
return sum(data)
|
||||
|
||||
def main():
|
||||
"""Main function that orchestrates the workload"""
|
||||
print("Python test program starting...")
|
||||
print(f"PID: {__import__('os').getpid()}")
|
||||
print("Running CPU-intensive workload...")
|
||||
|
||||
# Run for a while to allow profiling
|
||||
for iteration in range(100):
|
||||
result = load_model()
|
||||
if iteration % 10 == 0:
|
||||
print(f"Iteration {iteration}: result = {result}")
|
||||
|
||||
print("Test program completed.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user