fix and rename

2026-02-03 02:04:30 +08:00 · 2025-07-06 13:24:01 +00:00
parent abe14a69ae
commit df82b97103
3 changed files with 0 additions and 476 deletions
--- a/src/48-energy/README.zh.md
+++ b/src/48-energy/README.zh.md
--- a/src/48-energy/ebpf-energy.md
+++ b/src/48-energy/ebpf-energy.md
@@ -1,70 +0,0 @@
-Below is a quick-scan map of **public eBPF projects & papers that touch CPU power-management knobs (DVFS, idle, thermal) or pure energy accounting.**
-I’ve grouped them so you can see where work already exists and where the gap still is.
-
---
-
-## 1  Projects/papers that *try to control* DVFS / idle / thermal directly
-
-| Name & date                                                     | What it does with eBPF                                                                                                                                                                                                             | Sub-knobs covered                          | Status / notes                                                                                                 |
-| --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ | -------------------------------------------------------------------------------------------------------------- |
-| **`cpufreq_ext` RFC (Zou, 2024)**                               | Hooks the cpufreq governor into a `bpf_struct_ops` table (`get_next_freq()` etc.) so a policy can be written in eBPF instead of C. Integrates with `sched_ext` to let a BPF scheduler and a BPF DVFS policy co-operate.            | **DVFS** (per-policy frequency)            | RFC on linux-pm & bpf lists. Compiles on ≥ 6.9 kernels; crude sample policy included. ([lwn.net][1])           |
-| **eBPF CPU-Idle governor prototype (Eco-Compute summit, 2024)** | Replaces the “menu/TEO” cpuidle governor with a BPF hook so that idle-state choice and idle-injection can be decided in eBPF.                                                                                                      | **Idle states** (C-states), idle injection | Academic prototype; slides only, but code expected to be released by the Eco-Compute students. ([jauu.net][2]) |
-| **Early “power-driver” & BEAR lineage**                         | Molnar/Rasmussen’s 2013 power-driver idea was to unify `go_faster/go_slower/enter_idle`.  Our BEAR concept simply modernises this with eBPF.  No public code yet, but it shows the *direction* the kernel community is discussing. | **DVFS + Idle + Thermal** (goal)           | Design idea; opportunity for a full implementation (research gap). ([jauu.net][2], [lwn.net][1])               |
-
-> **Reality check:** right now cpufreq\_ext is the *only* upstream-bound eBPF code that truly changes CPU frequency.  Idle and thermal hooks are still research prototypes, so this area is wide-open if you want to publish.
-
---
-
-## 2  eBPF projects focused on **energy telemetry / accounting**
-
-*(These don’t set DVFS or idle, but they give the per-process or per-container energy data you’d need to *drive* such policies.)*
-
-| Name                                                                       | Scope & technique                                                                                                                                                                                                                  | Why it matters                                                                                                                           |
-| -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
-| **Wattmeter / *Energy-Aware Process Scheduling in Linux* (HotCarbon ’24)** | Attaches an eBPF program to every context-switch to read RAPL MSRs in-kernel, giving millisecond-scale per-process joules with <1 µs overhead.  Used to build energy-fair and energy-capped schedulers on top of ghOSt/sched\_ext. | Gives accurate, low-overhead energy numbers that could feed a DVFS/thermal policy. ([asafcidon.com][3])                                  |
-| **Kepler (CNCF sandbox, 2023-)**                                           | A Prometheus exporter for Kubernetes.  Uses eBPF tracepoints + perf counters + RAPL/NVML to attribute energy to pods/containers; ships ML models for platforms that lack RAPL.                                                     | Quickly gaining traction in cloud-native stacks; good data source for cluster-level power orchestration. ([sustainable-computing.io][4]) |
-| **DEEP-mon (Polimi, 2018)**                                                | In-kernel eBPF aggregation of scheduler events to attribute power to Docker containers; <5 % runtime overhead on Phoronix & NPB.                                                                                                   | Older but shows in-kernel aggregation trick to avoid 200 k sched-switch/sec user-space wake-ups. ([slideshare.net][5])                   |
-| **eBPF-energy-monitor (GitHub toy project)**                               | Minimal BCC script that latches on `sched_switch`, tracks CPU-time per PID, multiplies by per-core joules from RAPL.                                                                                                               | Handy starting point if you just need a working demo. ([github.com][6])                                                                  |
-| **DEEP-mon spin-offs: BitWatts, Scaphandre**                               | Both offer software-defined power meters; BitWatts focuses on VMs, Scaphandre on bare-metal & K8s.  Scaphandre can optionally compile an eBPF sensor module for process attribution.                                               | Good for comparing accuracy / overhead trade-offs vs. Kepler. ([github.com][7], [github.com][8])                                         |
-
---
-
-### 3  Quick take-aways
-
-* **Very little published work** uses eBPF to *control* DVFS/idle/thermal today – cpufreq\_ext is the main concrete code.
-* **Telemetry is mature.** Kepler, Wattmeter and DEEP-mon already give fine-grained joule accounting that a governor could use as feedback.
-* **Open research space:** wiring those telemetry sources into an eBPF-based unified policy (BEAR-style) that calls cpufreq\_ext + a future cpuidle\_bpf hook + thermal caps is still almost untouched.
-
-If you need more detail on any specific project (code pointers, evaluation numbers, etc.) just tell me which one and I’ll dig in.
-
-[1]: https://lwn.net/Articles/991991/?utm_source=chatgpt.com "cpufreq_ext: Introduce cpufreq ext governor - LWN.net"
-[2]: https://jauu.net/talks/eco-compute-linux-power-analysis.pdf?utm_source=chatgpt.com "[PDF] Linux Power Management Analysis for Embedded Systems"
-[3]: https://www.asafcidon.com/uploads/5/9/7/0/59701649/energy-aware-ebpf.pdf "Energy-Aware Process Scheduling in Linux"
-[4]: https://sustainable-computing.io/?utm_source=chatgpt.com "Kepler"
-[5]: https://www.slideshare.net/necstlab/deepmon-dynamic-and-energy-efficient-power-monitoring-for-containerbased-infrastructures "DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based infrastructures | PPT"
-[6]: https://github.com/fjebaker/eBPF-energy-monitor?utm_source=chatgpt.com "Monitoring energy usage with eBPF at process level granularity."
-[7]: https://github.com/Spirals-Team/bitwatts?utm_source=chatgpt.com "BitWatts is a software-defined power meter for virtualized ... - GitHub"
-[8]: https://github.com/hubblo-org/scaphandre?utm_source=chatgpt.com "hubblo-org/scaphandre - GitHub"
-
-**为什么要在 eBPF 里“自己管” DVFS / idle？**
-
-| 典型诉求                           | 传统做法                                   | eBPF 动态管控能带来的额外好处                                                              | 什么时候“有必要”                         |
-| ------------------------------ | -------------------------------------- | ------------------------------------------------------------------------------ | --------------------------------- |
-| **降能耗 / 提电池**                  | 靠内核默认 governor（`schedutil`、`menu/TEO`） | 结合调度事件、负载特征、温度实时算最优 P/C-state；针对特定 App 可省 5-30 % 电（已在 Android 定制 governor 里见过） | 移动设备、电池供电 IoT；对续航敏感、负载模式单一（游戏、摄像） |
-| **稳帧率 / 避免温度跳水**               | 被动等热节流；温度超了再降频                         | 提前预测热量，把频率慢慢收掉或注入 idle，平均 FPS 更稳；可把“突降”变成“缓降”                                  | 连续长时间满载（录 4K、跑 LLM）且不能掉帧          |
-| **按租户/容器分功耗预算**                | 只能全机统一 RAPL / PL1                      | eBPF 在 `sched_switch` 里实时累能，把 budget 切给高优租户；结合 `cpufreq_ext` 只降其他租户频率          | 多租户云、边缘节点需要功率隔离                   |
-| **实验 / 研究新策略**                 | 改 kernel 再重启                           | eBPF 代码热插拔，5 秒换一套算法；和 `sched_ext` 一起做“联合调度+DVFS”实验快得多                          | 学术/性能团队要 A/B 频繁试验                 |
-| **异构平台 (big.LITTLE, CPU+GPU)** | Vendor blob、用户态守护进程                    | eBPF 可直接读 GPU 负载、温度 map，然后下调 CPU 频率让热 budget 让给 GPU——无 vendor 驱动也能做            | SoC 自己做系统集成、不想依赖私有 HAL            |
-
---
-
-### 真的“必要”吗？一张简表判断
-
-* **工作负载简单、对能耗不敏感** → 默认 governor 足够，eBPF 只是锦上添花。
-* **对每瓦性能或温度拐点有硬约束**（手游、电池无人机、5 U 机柜卡着 PDU）→ 自定策略往往能挖出 10-30 % 空间。
-* **要做系统研究 / 定制产品** → eBPF 是当下最省事、最安全的内核内实验手段，比写 LKM / 改源省几个数量级的维护成本。
-
-> **一句话**：
-> *“用不用 eBPF 管电源，看你在乎多少瓦、多少度，以及你改内核的代价能不能收回。”*
-
-如果只是想看个大概功率曲线，powertop 就够；但要做细粒度、自适应、可热更新的功耗或温度控制，eBPF 给的“事件驱动 + 内核态汇总 + 安全热插拔”组合基本无可替代。
--- a/src/48-energy/能源监控系统详解.md
+++ b/src/48-energy/能源监控系统详解.md
@@ -1,406 +0,0 @@
-# eBPF 能源监控系统详解
-
-## 概述
-
-本项目实现了一个基于 eBPF 的进程级 CPU 能耗监控工具。通过在内核空间捕获进程调度事件，精确计算每个进程的 CPU 使用时间，并估算其能源消耗。相比传统的基于 `/proc` 文件系统的监控方式，该方案具有更低的系统开销和更高的监控精度。
-
-## 系统架构
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│                       用户空间                               │
-│  ┌─────────────────────────────────────────────────────┐   │
-│  │         energy_monitor (用户态程序)                   │   │
-│  │  - 加载 eBPF 程序                                    │   │
-│  │  - 接收内核事件                                      │   │
-│  │  - 计算能耗并展示                                    │   │
-│  └──────────────────┬──────────────────────────────────┘   │
-│                     │ Ring Buffer                           │
-└─────────────────────┼───────────────────────────────────────┘
-                      │
-┌─────────────────────┼───────────────────────────────────────┐
-│                     ▼         内核空间                       │
-│  ┌─────────────────────────────────────────────────────┐   │
-│  │      energy_monitor.bpf.c (eBPF 程序)                │   │
-│  │  - 挂载到调度器跟踪点                                │   │
-│  │  - 记录进程运行时间                                  │   │
-│  │  - 发送事件到用户空间                                │   │
-│  └─────────────────────────────────────────────────────┘   │
-│                     ▲                                        │
-│                     │                                        │
-│  ┌─────────────────┴────────────────────────────────────┐   │
-│  │              Linux 调度器                            │   │
-│  │  - sched_switch (进程切换)                           │   │
-│  │  - sched_process_exit (进程退出)                     │   │
-│  └─────────────────────────────────────────────────────┘   │
-└─────────────────────────────────────────────────────────────┘
-```
-
-## 核心组件详解
-
-### 1. 数据结构定义 (energy_monitor.h)
-
-```c
-struct energy_event {
-    __u64 ts;           // 时间戳（纳秒）
-    __u32 cpu;          // CPU 编号
-    __u32 pid;          // 进程 ID
-    __u64 runtime_ns;   // 运行时间（纳秒）
-    char comm[16];      // 进程名称
-};
-```
-
-这个结构体定义了内核向用户空间传递的事件数据格式，包含了计算能耗所需的所有信息。
-
-### 2. eBPF 内核程序 (energy_monitor.bpf.c)
-
-#### 2.1 核心数据结构
-
-```c
-// 记录进程开始运行的时间
-struct {
-    __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
-    __uint(max_entries, 8192);
-    __type(key, pid_t);
-    __type(value, u64);
-} time_lookup SEC(".maps");
-
-// 累计进程运行时间
-struct {
-    __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
-    __uint(max_entries, 8192);
-    __type(key, pid_t);
-    __type(value, u64);
-} runtime_lookup SEC(".maps");
-
-// 环形缓冲区，用于传递事件
-struct {
-    __uint(type, BPF_MAP_TYPE_RINGBUF);
-    __uint(max_entries, 256 * 1024);
-} rb SEC(".maps");
-```
-
-**关键设计决策：**
- 使用 `PERCPU_HASH` 类型的 map 避免多核并发访问时的锁竞争
- 环形缓冲区大小设为 256KB，平衡内存使用和事件丢失风险
-
-#### 2.2 进程切换处理逻辑
-
-```c
-SEC("tp/sched/sched_switch")
-int handle_switch(struct trace_event_raw_sched_switch *ctx)
-{
-    u64 ts = bpf_ktime_get_ns();
-    pid_t prev_pid = ctx->prev_pid;
-    pid_t next_pid = ctx->next_pid;
-    
-    // 1. 计算前一个进程的运行时间
-    if (prev_pid != 0) {  // 忽略 idle 进程
-        u64 *start_ts = bpf_map_lookup_elem(&time_lookup, &prev_pid);
-        if (start_ts) {
-            u64 delta = ts - *start_ts;
-            // 更新累计运行时间
-            update_runtime(prev_pid, delta);
-            // 发送事件到用户空间
-            send_event(prev_pid, delta, ctx->prev_comm);
-        }
-    }
-    
-    // 2. 记录下一个进程的开始时间
-    if (next_pid != 0) {
-        bpf_map_update_elem(&time_lookup, &next_pid, &ts, BPF_ANY);
-    }
-}
-```
-
-**工作流程：**
-1. 当发生进程切换时，获取当前时间戳
-2. 计算被切换出去的进程运行了多长时间
-3. 更新该进程的累计运行时间
-4. 通过环形缓冲区发送事件给用户空间
-5. 记录新进程开始运行的时间
-
-#### 2.3 优化的除法实现
-
-```c
-static __always_inline u64 div_u64_safe(u64 dividend, u64 divisor)
-{
-    if (divisor == 0)
-        return 0;
-    
-    // 使用位移操作优化除法
-    if (divisor == 1000) {
-        // 纳秒转微秒的快速路径
-        return dividend >> 10;  // 近似除以 1024
-    }
-    
-    // 通用除法实现（避免使用 / 操作符）
-    u64 quotient = 0;
-    u64 remainder = dividend;
-    
-    #pragma unroll
-    for (int i = 0; i < 64; i++) {
-        if (remainder >= divisor) {
-            remainder -= divisor;
-            quotient++;
-        } else {
-            break;
-        }
-    }
-    
-    return quotient;
-}
-```
-
-**优化说明：**
- eBPF 程序中不能直接使用除法操作
- 对于常见的纳秒转微秒操作，使用位移近似
- 其他情况使用循环减法实现
-
-### 3. 用户空间程序 (energy_monitor.c)
-
-#### 3.1 主程序流程
-
-```c
-int main(int argc, char **argv)
-{
-    // 1. 解析命令行参数
-    parse_args(argc, argv);
-    
-    // 2. 加载并附加 eBPF 程序
-    struct energy_monitor_bpf *skel = energy_monitor_bpf__open_and_load();
-    energy_monitor_bpf__attach(skel);
-    
-    // 3. 设置环形缓冲区回调
-    ring_buffer__new(bpf_map__fd(skel->maps.rb), handle_event, NULL);
-    
-    // 4. 主循环：处理事件
-    while (!exiting) {
-        ring_buffer__poll(rb, 100);  // 100ms 超时
-    }
-    
-    // 5. 输出能耗统计
-    print_energy_summary();
-}
-```
-
-#### 3.2 事件处理逻辑
-
-```c
-static int handle_event(void *ctx, void *data, size_t data_sz)
-{
-    struct energy_event *e = data;
-    
-    // 累计进程运行时间
-    struct process_info *info = get_or_create_process(e->pid);
-    info->runtime_ns += e->runtime_ns;
-    strcpy(info->comm, e->comm);
-    
-    if (env.verbose) {
-        printf("[%llu] PID %d (%s) 在 CPU %d 上运行了 %llu 纳秒\n",
-               e->ts, e->pid, e->comm, e->cpu, e->runtime_ns);
-    }
-    
-    return 0;
-}
-```
-
-#### 3.3 能耗计算模型
-
-```c
-static void print_energy_summary(void)
-{
-    double cpu_power_per_core = env.cpu_power / get_nprocs();
-    
-    for (each process) {
-        double runtime_ms = process->runtime_ns / 1000000.0;
-        double runtime_s = runtime_ms / 1000.0;
-        
-        // 能量 (焦耳) = 功率 (瓦特) × 时间 (秒)
-        double energy_j = cpu_power_per_core * runtime_s;
-        double energy_mj = energy_j * 1000;
-        
-        printf("PID %d (%s): 运行时间 %.2f ms, 能耗 %.2f mJ\n",
-               process->pid, process->comm, runtime_ms, energy_mj);
-    }
-}
-```
-
-**计算假设：**
- CPU 功率恒定（默认 15W，可通过 -p 参数调整）
- 功率在所有 CPU 核心间平均分配
- 不考虑 CPU 频率变化和睡眠状态
-
-### 4. 与传统监控方式的对比
-
-#### 4.1 传统方式 (energy_monitor_traditional.sh)
-
-```bash
-# 每 100ms 读取一次 /proc/stat
-while true; do
-    # 读取系统 CPU 时间
-    cpu_times=$(cat /proc/stat | grep "^cpu")
-    
-    # 读取每个进程的 CPU 时间
-    for pid in $(ls /proc/[0-9]*); do
-        stat=$(cat /proc/$pid/stat 2>/dev/null)
-        # 解析并计算差值
-    done
-    
-    sleep 0.1
-done
-```
-
-**传统方式的问题：**
- 固定采样间隔，可能错过短期进程
- 频繁的文件系统访问带来高开销
- 采样精度受限于采样频率
-
-#### 4.2 性能对比
-
-| 指标 | eBPF 方式 | 传统方式 |
-|------|-----------|----------|
-| 系统开销 | < 1% CPU | 5-10% CPU |
-| 采样精度 | 纳秒级 | 毫秒级 |
-| 事件捕获 | 100% | 取决于采样率 |
-| 短期进程 | 完全捕获 | 可能遗漏 |
-| 实时性 | 实时 | 延迟 100ms+ |
-
-### 5. 高级特性
-
-#### 5.1 进程退出处理
-
-```c
-SEC("tp/sched/sched_process_exit")
-int handle_exit(struct trace_event_raw_sched_process_template *ctx)
-{
-    pid_t pid = ctx->pid;
-    
-    // 清理该进程的跟踪数据
-    bpf_map_delete_elem(&time_lookup, &pid);
-    bpf_map_delete_elem(&runtime_lookup, &pid);
-    
-    return 0;
-}
-```
-
-确保不会因为进程退出而导致内存泄漏。
-
-#### 5.2 多核支持
-
-使用 `PERCPU` 类型的 map，每个 CPU 核心维护独立的数据副本，避免锁竞争：
-
-```c
-// 获取当前 CPU 的数据副本
-u64 *runtime = bpf_map_lookup_elem(&runtime_lookup, &pid);
-if (runtime) {
-    __sync_fetch_and_add(runtime, delta);  // 原子操作
-}
-```
-
-## 使用场景
-
-### 1. 应用性能分析
-
-```bash
-# 监控特定应用的能耗
-./energy_monitor -v -d 60  # 监控 60 秒
-
-# 结果示例：
-# PID 1234 (chrome): 运行时间 15234.56 ms, 能耗 57.13 mJ
-# PID 5678 (vscode): 运行时间 8456.23 ms, 能耗 31.71 mJ
-```
-
-### 2. 容器能耗归因
-
-结合容器 PID namespace，可以统计每个容器的能耗：
-
-```bash
-# 获取容器内进程列表
-docker top <container_id> -o pid
-
-# 监控并过滤特定 PID
-./energy_monitor | grep -E "PID (1234|5678|...)"
-```
-
-### 3. 能效优化
-
-通过对比优化前后的能耗数据，评估优化效果：
-
-```bash
-# 优化前
-./energy_monitor -d 300 > before.log
-
-# 进行代码优化...
-
-# 优化后
-./energy_monitor -d 300 > after.log
-
-# 对比分析
-./compare_results.py before.log after.log
-```
-
-## 扩展可能性
-
-### 1. 集成 RAPL 接口
-
-```c
-// 读取实际 CPU 能耗
-static u64 read_rapl_energy(void)
-{
-    int fd = open("/sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj", O_RDONLY);
-    char buf[32];
-    read(fd, buf, sizeof(buf));
-    close(fd);
-    return strtoull(buf, NULL, 10);
-}
-```
-
-### 2. GPU 能耗监控
-
-```c
-// 扩展 energy_event 结构
-struct energy_event {
-    // ... 现有字段 ...
-    __u64 gpu_time_ns;    // GPU 使用时间
-    __u32 gpu_id;         // GPU 设备 ID
-};
-```
-
-### 3. 机器学习模型
-
-基于收集的数据训练能耗预测模型：
-
-```python
-# 特征：CPU 利用率、内存访问模式、I/O 频率
-# 目标：预测未来 N 秒的能耗
-model = train_energy_prediction_model(historical_data)
-predicted_energy = model.predict(current_metrics)
-```
-
-## 局限性与改进方向
-
-### 当前局限性
-
-1. **简化的能耗模型**：假设 CPU 功率恒定，未考虑动态频率调整
-2. **缺少硬件计数器**：未使用 CPU 性能计数器获取更精确的数据
-3. **单一能源类型**：仅考虑 CPU，未包含内存、磁盘、网络能耗
-
-### 改进方向
-
-1. **集成 perf_event**：使用硬件性能计数器提高精度
-2. **动态功率模型**：根据 CPU 频率和利用率动态调整功率估算
-3. **全系统能耗**：扩展到内存、I/O 等其他组件
-4. **实时可视化**：开发 Web 界面实时展示能耗数据
-
-## 总结
-
-本 eBPF 能源监控系统展示了如何利用现代 Linux 内核技术实现高效、精确的系统监控。通过在内核空间直接捕获调度事件，避免了传统监控方式的高开销，同时提供了纳秒级的时间精度。
-
-该实现不仅是一个实用的工具，更是学习 eBPF 编程的优秀案例，涵盖了：
- eBPF 程序开发的完整流程
- 内核与用户空间的高效通信
- 性能优化技巧
- 实际应用场景
-
-随着数据中心能效要求的不断提高，这类精细化的能耗监控工具将发挥越来越重要的作用。