mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-02-03 18:24:27 +08:00
608 lines
36 KiB
HTML
608 lines
36 KiB
HTML
<!DOCTYPE HTML>
|
||
<html lang="en" class="sidebar-visible no-js light">
|
||
<head>
|
||
<!-- Book generated using mdBook -->
|
||
<meta charset="UTF-8">
|
||
<title>一个 Linux 内核 BPF 程序,通过柱状图来总结调度程序运行队列延迟,显示任务等待运行在 CPU 上的时间长度 - bpf-developer-tutorial</title>
|
||
|
||
|
||
<!-- Custom HTML head -->
|
||
|
||
<meta name="description" content="">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
<meta name="theme-color" content="#ffffff" />
|
||
|
||
<link rel="icon" href="../favicon.svg">
|
||
<link rel="shortcut icon" href="../favicon.png">
|
||
<link rel="stylesheet" href="../css/variables.css">
|
||
<link rel="stylesheet" href="../css/general.css">
|
||
<link rel="stylesheet" href="../css/chrome.css">
|
||
<link rel="stylesheet" href="../css/print.css" media="print">
|
||
|
||
<!-- Fonts -->
|
||
<link rel="stylesheet" href="../FontAwesome/css/font-awesome.css">
|
||
<link rel="stylesheet" href="../fonts/fonts.css">
|
||
|
||
<!-- Highlight.js Stylesheets -->
|
||
<link rel="stylesheet" href="../highlight.css">
|
||
<link rel="stylesheet" href="../tomorrow-night.css">
|
||
<link rel="stylesheet" href="../ayu-highlight.css">
|
||
|
||
<!-- Custom theme stylesheets -->
|
||
|
||
</head>
|
||
<body>
|
||
<div id="body-container">
|
||
<!-- Provide site root to javascript -->
|
||
<script>
|
||
var path_to_root = "../";
|
||
var default_theme = window.matchMedia("(prefers-color-scheme: dark)").matches ? "navy" : "light";
|
||
</script>
|
||
|
||
<!-- Work around some values being stored in localStorage wrapped in quotes -->
|
||
<script>
|
||
try {
|
||
var theme = localStorage.getItem('mdbook-theme');
|
||
var sidebar = localStorage.getItem('mdbook-sidebar');
|
||
|
||
if (theme.startsWith('"') && theme.endsWith('"')) {
|
||
localStorage.setItem('mdbook-theme', theme.slice(1, theme.length - 1));
|
||
}
|
||
|
||
if (sidebar.startsWith('"') && sidebar.endsWith('"')) {
|
||
localStorage.setItem('mdbook-sidebar', sidebar.slice(1, sidebar.length - 1));
|
||
}
|
||
} catch (e) { }
|
||
</script>
|
||
|
||
<!-- Set the theme before any content is loaded, prevents flash -->
|
||
<script>
|
||
var theme;
|
||
try { theme = localStorage.getItem('mdbook-theme'); } catch(e) { }
|
||
if (theme === null || theme === undefined) { theme = default_theme; }
|
||
var html = document.querySelector('html');
|
||
html.classList.remove('no-js')
|
||
html.classList.remove('light')
|
||
html.classList.add(theme);
|
||
html.classList.add('js');
|
||
</script>
|
||
|
||
<!-- Hide / unhide sidebar before it is displayed -->
|
||
<script>
|
||
var html = document.querySelector('html');
|
||
var sidebar = null;
|
||
if (document.body.clientWidth >= 1080) {
|
||
try { sidebar = localStorage.getItem('mdbook-sidebar'); } catch(e) { }
|
||
sidebar = sidebar || 'visible';
|
||
} else {
|
||
sidebar = 'hidden';
|
||
}
|
||
html.classList.remove('sidebar-visible');
|
||
html.classList.add("sidebar-" + sidebar);
|
||
</script>
|
||
|
||
<nav id="sidebar" class="sidebar" aria-label="Table of contents">
|
||
<div class="sidebar-scrollbox">
|
||
<ol class="chapter"><li class="chapter-item expanded affix "><li class="part-title">eBPF 实践教程:基于 libbpf 和 CO-RE</li><li class="chapter-item expanded "><a href="../0-introduce/index.html"><strong aria-hidden="true">1.</strong> 介绍 eBPF 的基本概念、常见的开发工具</a></li><li class="chapter-item expanded "><a href="../1-helloworld/index.html"><strong aria-hidden="true">2.</strong> eBPF Hello World,基本框架和开发流程</a></li><li class="chapter-item expanded "><a href="../2-kprobe-unlink/index.html"><strong aria-hidden="true">3.</strong> 使用 kprobe 监测捕获 unlink 系统调用</a></li><li class="chapter-item expanded "><a href="../3-fentry-unlink/index.html"><strong aria-hidden="true">4.</strong> 使用 fentry 监测捕获 unlink 系统调用</a></li><li class="chapter-item expanded "><a href="../4-opensnoop/index.html"><strong aria-hidden="true">5.</strong> 捕获进程打开文件的系统调用集合,使用全局变量过滤进程 pid</a></li><li class="chapter-item expanded "><a href="../5-uprobe-bashreadline/index.html"><strong aria-hidden="true">6.</strong> 使用 uprobe 捕获 bash 的 readline 函数调用</a></li><li class="chapter-item expanded "><a href="../6-sigsnoop/index.html"><strong aria-hidden="true">7.</strong> 捕获进程发送信号的系统调用集合,使用 hash map 保存状态</a></li><li class="chapter-item expanded "><a href="../7-execsnoop/index.html"><strong aria-hidden="true">8.</strong> 捕获进程执行/退出时间,通过 perf event array 向用户态打印输出</a></li><li class="chapter-item expanded "><a href="../8-exitsnoop/index.html"><strong aria-hidden="true">9.</strong> 使用 exitsnoop 监控进程退出事件,使用 ring buffer 向用户态打印输出</a></li><li class="chapter-item expanded "><a href="../9-runqlat/index.html" class="active"><strong aria-hidden="true">10.</strong> 一个 Linux 内核 BPF 程序,通过柱状图来总结调度程序运行队列延迟,显示任务等待运行在 CPU 上的时间长度</a></li><li class="chapter-item expanded "><a href="../10-hardirqs/index.html"><strong aria-hidden="true">11.</strong> 使用 hardirqs 或 softirqs 捕获中断事件</a></li><li class="chapter-item expanded "><a href="../11-bootstrap/index.html"><strong aria-hidden="true">12.</strong> 使用 bootstrap 开发用户态程序并跟踪 exec() 和 exit() 系统调用</a></li><li class="chapter-item expanded "><a href="../13-tcpconnlat/index.html"><strong aria-hidden="true">13.</strong> 使用 libbpf-bootstrap 开发程序统计 TCP 连接延时</a></li><li class="chapter-item expanded "><a href="../14-tcpstates/index.html"><strong aria-hidden="true">14.</strong> 使用 libbpf-bootstrap 记录 TCP 连接状态与 TCP RTT</a></li><li class="chapter-item expanded "><a href="../15-javagc/index.html"><strong aria-hidden="true">15.</strong> 使用 USDT 捕获用户态 Java GC 事件耗时</a></li><li class="chapter-item expanded "><a href="../16-memleak/index.html"><strong aria-hidden="true">16.</strong> 编写 eBPF 程序 Memleak 监控内存泄漏</a></li><li class="chapter-item expanded "><a href="../17-biopattern/index.html"><strong aria-hidden="true">17.</strong> 编写 eBPF 程序 Biopattern 统计随机/顺序磁盘 I/O</a></li><li class="chapter-item expanded "><a href="../18-further-reading/index.html"><strong aria-hidden="true">18.</strong> 更多的参考资料</a></li><li class="chapter-item expanded "><a href="../19-lsm-connect/index.html"><strong aria-hidden="true">19.</strong> 使用 LSM 进行安全检测防御</a></li><li class="chapter-item expanded "><a href="../20-tc/index.html"><strong aria-hidden="true">20.</strong> 使用 eBPF 进行 tc 流量控制</a></li><li class="chapter-item expanded affix "><li class="part-title">eBPF 高级特性与进阶主题</li><li class="chapter-item expanded "><a href="../22-android/index.html"><strong aria-hidden="true">21.</strong> 在 Android 上使用 eBPF 程序</a></li><li class="chapter-item expanded "><a href="../23-http/index.html"><strong aria-hidden="true">22.</strong> 使用 eBPF 追踪 HTTP 请求或其他七层协议</a></li><li class="chapter-item expanded "><a href="../29-sockops/index.html"><strong aria-hidden="true">23.</strong> 使用 sockops 加速网络请求转发</a></li><li class="chapter-item expanded "><a href="../24-hide/index.html"><strong aria-hidden="true">24.</strong> 使用 eBPF 隐藏进程或文件信息</a></li><li class="chapter-item expanded "><a href="../25-signal/index.html"><strong aria-hidden="true">25.</strong> 使用 bpf_send_signal 发送信号终止进程</a></li><li class="chapter-item expanded "><a href="../26-sudo/index.html"><strong aria-hidden="true">26.</strong> 使用 eBPF 添加 sudo 用户</a></li><li class="chapter-item expanded "><a href="../27-replace/index.html"><strong aria-hidden="true">27.</strong> 使用 eBPF 替换任意程序读取或写入的文本</a></li><li class="chapter-item expanded "><a href="../28-detach/index.html"><strong aria-hidden="true">28.</strong> BPF的生命周期:使用 Detached 模式在用户态应用退出后持续运行 eBPF 程序</a></li><li class="chapter-item expanded affix "><li class="part-title">bcc tutorial</li><li class="chapter-item expanded "><a href="../bcc-documents/kernel-versions.html"><strong aria-hidden="true">29.</strong> BPF Features by Linux Kernel Version</a></li><li class="chapter-item expanded "><a href="../bcc-documents/kernel_config.html"><strong aria-hidden="true">30.</strong> Kernel Configuration for BPF Features</a></li><li class="chapter-item expanded "><a href="../bcc-documents/reference_guide.html"><strong aria-hidden="true">31.</strong> bcc Reference Guide</a></li><li class="chapter-item expanded "><a href="../bcc-documents/special_filtering.html"><strong aria-hidden="true">32.</strong> Special Filtering</a></li><li class="chapter-item expanded "><a href="../bcc-documents/tutorial.html"><strong aria-hidden="true">33.</strong> bcc Tutorial</a></li><li class="chapter-item expanded "><a href="../bcc-documents/tutorial_bcc_python_developer.html"><strong aria-hidden="true">34.</strong> bcc Python Developer Tutorial</a></li></ol>
|
||
</div>
|
||
<div id="sidebar-resize-handle" class="sidebar-resize-handle"></div>
|
||
</nav>
|
||
|
||
<!-- Track and set sidebar scroll position -->
|
||
<script>
|
||
var sidebarScrollbox = document.querySelector('#sidebar .sidebar-scrollbox');
|
||
sidebarScrollbox.addEventListener('click', function(e) {
|
||
if (e.target.tagName === 'A') {
|
||
sessionStorage.setItem('sidebar-scroll', sidebarScrollbox.scrollTop);
|
||
}
|
||
}, { passive: true });
|
||
var sidebarScrollTop = sessionStorage.getItem('sidebar-scroll');
|
||
sessionStorage.removeItem('sidebar-scroll');
|
||
if (sidebarScrollTop) {
|
||
// preserve sidebar scroll position when navigating via links within sidebar
|
||
sidebarScrollbox.scrollTop = sidebarScrollTop;
|
||
} else {
|
||
// scroll sidebar to current active section when navigating via "next/previous chapter" buttons
|
||
var activeSection = document.querySelector('#sidebar .active');
|
||
if (activeSection) {
|
||
activeSection.scrollIntoView({ block: 'center' });
|
||
}
|
||
}
|
||
</script>
|
||
|
||
<div id="page-wrapper" class="page-wrapper">
|
||
|
||
<div class="page">
|
||
<div id="menu-bar-hover-placeholder"></div>
|
||
<div id="menu-bar" class="menu-bar sticky">
|
||
<div class="left-buttons">
|
||
<button id="sidebar-toggle" class="icon-button" type="button" title="Toggle Table of Contents" aria-label="Toggle Table of Contents" aria-controls="sidebar">
|
||
<i class="fa fa-bars"></i>
|
||
</button>
|
||
<button id="theme-toggle" class="icon-button" type="button" title="Change theme" aria-label="Change theme" aria-haspopup="true" aria-expanded="false" aria-controls="theme-list">
|
||
<i class="fa fa-paint-brush"></i>
|
||
</button>
|
||
<ul id="theme-list" class="theme-popup" aria-label="Themes" role="menu">
|
||
<li role="none"><button role="menuitem" class="theme" id="light">Light</button></li>
|
||
<li role="none"><button role="menuitem" class="theme" id="rust">Rust</button></li>
|
||
<li role="none"><button role="menuitem" class="theme" id="coal">Coal</button></li>
|
||
<li role="none"><button role="menuitem" class="theme" id="navy">Navy</button></li>
|
||
<li role="none"><button role="menuitem" class="theme" id="ayu">Ayu</button></li>
|
||
</ul>
|
||
<button id="search-toggle" class="icon-button" type="button" title="Search. (Shortkey: s)" aria-label="Toggle Searchbar" aria-expanded="false" aria-keyshortcuts="S" aria-controls="searchbar">
|
||
<i class="fa fa-search"></i>
|
||
</button>
|
||
</div>
|
||
|
||
<h1 class="menu-title">bpf-developer-tutorial</h1>
|
||
|
||
<div class="right-buttons">
|
||
<a href="../print.html" title="Print this book" aria-label="Print this book">
|
||
<i id="print-button" class="fa fa-print"></i>
|
||
</a>
|
||
|
||
</div>
|
||
</div>
|
||
|
||
<div id="search-wrapper" class="hidden">
|
||
<form id="searchbar-outer" class="searchbar-outer">
|
||
<input type="search" id="searchbar" name="searchbar" placeholder="Search this book ..." aria-controls="searchresults-outer" aria-describedby="searchresults-header">
|
||
</form>
|
||
<div id="searchresults-outer" class="searchresults-outer hidden">
|
||
<div id="searchresults-header" class="searchresults-header"></div>
|
||
<ul id="searchresults">
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- Apply ARIA attributes after the sidebar and the sidebar toggle button are added to the DOM -->
|
||
<script>
|
||
document.getElementById('sidebar-toggle').setAttribute('aria-expanded', sidebar === 'visible');
|
||
document.getElementById('sidebar').setAttribute('aria-hidden', sidebar !== 'visible');
|
||
Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) {
|
||
link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1);
|
||
});
|
||
</script>
|
||
|
||
<div id="content" class="content">
|
||
<main>
|
||
<h1 id="ebpf-入门开发实践教程九捕获进程调度延迟以直方图方式记录"><a class="header" href="#ebpf-入门开发实践教程九捕获进程调度延迟以直方图方式记录">eBPF 入门开发实践教程九:捕获进程调度延迟,以直方图方式记录</a></h1>
|
||
<p>eBPF (Extended Berkeley Packet Filter) 是 Linux 内核上的一个强大的网络和性能分析工具。它允许开发者在内核运行时动态加载、更新和运行用户定义的代码。</p>
|
||
<p>runqlat 是一个 eBPF 工具,用于分析 Linux 系统的调度性能。具体来说,runqlat 用于测量一个任务在被调度到 CPU 上运行之前在运行队列中等待的时间。这些信息对于识别性能瓶颈和提高 Linux 内核调度算法的整体效率非常有用。</p>
|
||
<h2 id="runqlat-原理"><a class="header" href="#runqlat-原理">runqlat 原理</a></h2>
|
||
<p>本教程是 eBPF 入门开发实践系列的第九部分,主题是 "捕获进程调度延迟"。在此,我们将介绍一个名为 runqlat 的程序,其作用是以直方图的形式记录进程调度延迟。</p>
|
||
<p>Linux 操作系统使用进程来执行所有的系统和用户任务。这些进程可能被阻塞、杀死、运行,或者正在等待运行。处在后两种状态的进程数量决定了 CPU 运行队列的长度。</p>
|
||
<p>进程有几种可能的状态,如:</p>
|
||
<ul>
|
||
<li>可运行或正在运行</li>
|
||
<li>可中断睡眠</li>
|
||
<li>不可中断睡眠</li>
|
||
<li>停止</li>
|
||
<li>僵尸进程</li>
|
||
</ul>
|
||
<p>等待资源或其他函数信号的进程会处在可中断或不可中断的睡眠状态:进程被置入睡眠状态,直到它需要的资源变得可用。然后,根据睡眠的类型,进程可以转移到可运行状态,或者保持睡眠。</p>
|
||
<p>即使进程拥有它需要的所有资源,它也不会立即开始运行。它会转移到可运行状态,与其他处在相同状态的进程一起排队。CPU可以在接下来的几秒钟或毫秒内执行这些进程。调度器为 CPU 排列进程,并决定下一个要执行的进程。</p>
|
||
<p>根据系统的硬件配置,这个可运行队列(称为 CPU 运行队列)的长度可以短也可以长。短的运行队列长度表示 CPU 没有被充分利用。另一方面,如果运行队列长,那么可能意味着 CPU 不够强大,无法执行所有的进程,或者 CPU 的核心数量不足。在理想的 CPU 利用率下,运行队列的长度将等于系统中的核心数量。</p>
|
||
<p>进程调度延迟,也被称为 "run queue latency",是衡量线程从变得可运行(例如,接收到中断,促使其处理更多工作)到实际在 CPU 上运行的时间。在 CPU 饱和的情况下,你可以想象线程必须等待其轮次。但在其他奇特的场景中,这也可能发生,而且在某些情况下,它可以通过调优减少,从而提高整个系统的性能。</p>
|
||
<p>我们将通过一个示例来阐述如何使用 runqlat 工具。这是一个负载非常重的系统:</p>
|
||
<pre><code class="language-shell"># runqlat
|
||
Tracing run queue latency... Hit Ctrl-C to end.
|
||
^C
|
||
usecs : count distribution
|
||
0 -> 1 : 233 |*********** |
|
||
2 -> 3 : 742 |************************************ |
|
||
4 -> 7 : 203 |********** |
|
||
8 -> 15 : 173 |******** |
|
||
16 -> 31 : 24 |* |
|
||
32 -> 63 : 0 | |
|
||
64 -> 127 : 30 |* |
|
||
128 -> 255 : 6 | |
|
||
256 -> 511 : 3 | |
|
||
512 -> 1023 : 5 | |
|
||
1024 -> 2047 : 27 |* |
|
||
2048 -> 4095 : 30 |* |
|
||
4096 -> 8191 : 20 | |
|
||
8192 -> 16383 : 29 |* |
|
||
16384 -> 32767 : 809 |****************************************|
|
||
32768 -> 65535 : 64 |*** |
|
||
</code></pre>
|
||
<p>在这个输出中,我们看到了一个双模分布,一个模在0到15微秒之间,另一个模在16到65毫秒之间。这些模式在分布(它仅仅是 "count" 列的视觉表示)中显示为尖峰。例如,读取一行:在追踪过程中,809个事件落入了16384到32767微秒的范围(16到32毫秒)。</p>
|
||
<p>在后续的教程中,我们将深入探讨如何利用 eBPF 对此类指标进行深度跟踪和分析,以更好地理解和优化系统性能。同时,我们也将学习更多关于 Linux 内核调度器、中断处理和 CPU 饱</p>
|
||
<p>runqlat 的实现利用了 eBPF 程序,它通过内核跟踪点和函数探针来测量进程在运行队列中的时间。当进程被排队时,trace_enqueue 函数会在一个映射中记录时间戳。当进程被调度到 CPU 上运行时,handle_switch 函数会检索时间戳,并计算当前时间与排队时间之间的时间差。这个差值(或 delta)被用于更新进程的直方图,该直方图记录运行队列延迟的分布。该直方图可用于分析 Linux 内核的调度性能。</p>
|
||
<h2 id="runqlat-代码实现"><a class="header" href="#runqlat-代码实现">runqlat 代码实现</a></h2>
|
||
<h3 id="runqlatbpfc"><a class="header" href="#runqlatbpfc">runqlat.bpf.c</a></h3>
|
||
<p>首先我们需要编写一个源代码文件 runqlat.bpf.c:</p>
|
||
<pre><code class="language-c">// SPDX-License-Identifier: GPL-2.0
|
||
// Copyright (c) 2020 Wenbo Zhang
|
||
#include <vmlinux.h>
|
||
#include <bpf/bpf_helpers.h>
|
||
#include <bpf/bpf_core_read.h>
|
||
#include <bpf/bpf_tracing.h>
|
||
#include "runqlat.h"
|
||
#include "bits.bpf.h"
|
||
#include "maps.bpf.h"
|
||
#include "core_fixes.bpf.h"
|
||
|
||
#define MAX_ENTRIES 10240
|
||
#define TASK_RUNNING 0
|
||
|
||
const volatile bool filter_cg = false;
|
||
const volatile bool targ_per_process = false;
|
||
const volatile bool targ_per_thread = false;
|
||
const volatile bool targ_per_pidns = false;
|
||
const volatile bool targ_ms = false;
|
||
const volatile pid_t targ_tgid = 0;
|
||
|
||
struct {
|
||
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
|
||
__type(key, u32);
|
||
__type(value, u32);
|
||
__uint(max_entries, 1);
|
||
} cgroup_map SEC(".maps");
|
||
|
||
struct {
|
||
__uint(type, BPF_MAP_TYPE_HASH);
|
||
__uint(max_entries, MAX_ENTRIES);
|
||
__type(key, u32);
|
||
__type(value, u64);
|
||
} start SEC(".maps");
|
||
|
||
static struct hist zero;
|
||
|
||
/// @sample {"interval": 1000, "type" : "log2_hist"}
|
||
struct {
|
||
__uint(type, BPF_MAP_TYPE_HASH);
|
||
__uint(max_entries, MAX_ENTRIES);
|
||
__type(key, u32);
|
||
__type(value, struct hist);
|
||
} hists SEC(".maps");
|
||
|
||
static int trace_enqueue(u32 tgid, u32 pid)
|
||
{
|
||
u64 ts;
|
||
|
||
if (!pid)
|
||
return 0;
|
||
if (targ_tgid && targ_tgid != tgid)
|
||
return 0;
|
||
|
||
ts = bpf_ktime_get_ns();
|
||
bpf_map_update_elem(&start, &pid, &ts, BPF_ANY);
|
||
return 0;
|
||
}
|
||
|
||
static unsigned int pid_namespace(struct task_struct *task)
|
||
{
|
||
struct pid *pid;
|
||
unsigned int level;
|
||
struct upid upid;
|
||
unsigned int inum;
|
||
|
||
/* get the pid namespace by following task_active_pid_ns(),
|
||
* pid->numbers[pid->level].ns
|
||
*/
|
||
pid = BPF_CORE_READ(task, thread_pid);
|
||
level = BPF_CORE_READ(pid, level);
|
||
bpf_core_read(&upid, sizeof(upid), &pid->numbers[level]);
|
||
inum = BPF_CORE_READ(upid.ns, ns.inum);
|
||
|
||
return inum;
|
||
}
|
||
|
||
static int handle_switch(bool preempt, struct task_struct *prev, struct task_struct *next)
|
||
{
|
||
struct hist *histp;
|
||
u64 *tsp, slot;
|
||
u32 pid, hkey;
|
||
s64 delta;
|
||
|
||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||
return 0;
|
||
|
||
if (get_task_state(prev) == TASK_RUNNING)
|
||
trace_enqueue(BPF_CORE_READ(prev, tgid), BPF_CORE_READ(prev, pid));
|
||
|
||
pid = BPF_CORE_READ(next, pid);
|
||
|
||
tsp = bpf_map_lookup_elem(&start, &pid);
|
||
if (!tsp)
|
||
return 0;
|
||
delta = bpf_ktime_get_ns() - *tsp;
|
||
if (delta < 0)
|
||
goto cleanup;
|
||
|
||
if (targ_per_process)
|
||
hkey = BPF_CORE_READ(next, tgid);
|
||
else if (targ_per_thread)
|
||
hkey = pid;
|
||
else if (targ_per_pidns)
|
||
hkey = pid_namespace(next);
|
||
else
|
||
hkey = -1;
|
||
histp = bpf_map_lookup_or_try_init(&hists, &hkey, &zero);
|
||
if (!histp)
|
||
goto cleanup;
|
||
if (!histp->comm[0])
|
||
bpf_probe_read_kernel_str(&histp->comm, sizeof(histp->comm),
|
||
next->comm);
|
||
if (targ_ms)
|
||
delta /= 1000000U;
|
||
else
|
||
delta /= 1000U;
|
||
slot = log2l(delta);
|
||
if (slot >= MAX_SLOTS)
|
||
slot = MAX_SLOTS - 1;
|
||
__sync_fetch_and_add(&histp->slots[slot], 1);
|
||
|
||
cleanup:
|
||
bpf_map_delete_elem(&start, &pid);
|
||
return 0;
|
||
}
|
||
|
||
SEC("raw_tp/sched_wakeup")
|
||
int BPF_PROG(handle_sched_wakeup, struct task_struct *p)
|
||
{
|
||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||
return 0;
|
||
|
||
return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid));
|
||
}
|
||
|
||
SEC("raw_tp/sched_wakeup_new")
|
||
int BPF_PROG(handle_sched_wakeup_new, struct task_struct *p)
|
||
{
|
||
if (filter_cg && !bpf_current_task_under_cgroup(&cgroup_map, 0))
|
||
return 0;
|
||
|
||
return trace_enqueue(BPF_CORE_READ(p, tgid), BPF_CORE_READ(p, pid));
|
||
}
|
||
|
||
SEC("raw_tp/sched_switch")
|
||
int BPF_PROG(handle_sched_switch, bool preempt, struct task_struct *prev, struct task_struct *next)
|
||
{
|
||
return handle_switch(preempt, prev, next);
|
||
}
|
||
|
||
char LICENSE[] SEC("license") = "GPL";
|
||
</code></pre>
|
||
<p>这其中定义了一些常量和全局变量,用于过滤对应的追踪目标:</p>
|
||
<pre><code class="language-c">#define MAX_ENTRIES 10240
|
||
#define TASK_RUNNING 0
|
||
|
||
const volatile bool filter_cg = false;
|
||
const volatile bool targ_per_process = false;
|
||
const volatile bool targ_per_thread = false;
|
||
const volatile bool targ_per_pidns = false;
|
||
const volatile bool targ_ms = false;
|
||
const volatile pid_t targ_tgid = 0;
|
||
</code></pre>
|
||
<p>这些变量包括最大映射项数量、任务状态、过滤选项和目标选项。这些选项可以通过用户空间程序设置,以定制 eBPF 程序的行为。</p>
|
||
<p>接下来,定义了一些 eBPF 映射:</p>
|
||
<pre><code class="language-c">struct {
|
||
__uint(type, BPF_MAP_TYPE_CGROUP_ARRAY);
|
||
__type(key, u32);
|
||
__type(value, u32);
|
||
__uint(max_entries, 1);
|
||
} cgroup_map SEC(".maps");
|
||
|
||
struct {
|
||
__uint(type, BPF_MAP_TYPE_HASH);
|
||
__uint(max_entries, MAX_ENTRIES);
|
||
__type(key, u32);
|
||
__type(value, u64);
|
||
} start SEC(".maps");
|
||
|
||
static struct hist zero;
|
||
|
||
struct {
|
||
__uint(type, BPF_MAP_TYPE_HASH);
|
||
__uint(max_entries, MAX_ENTRIES);
|
||
__type(key, u32);
|
||
__type(value, struct hist);
|
||
} hists SEC(".maps");
|
||
</code></pre>
|
||
<p>这些映射包括:</p>
|
||
<ul>
|
||
<li>cgroup_map 用于过滤 cgroup;</li>
|
||
<li>start 用于存储进程入队时的时间戳;</li>
|
||
<li>hists 用于存储直方图数据,记录进程调度延迟。</li>
|
||
</ul>
|
||
<p>接下来是一些辅助函数:</p>
|
||
<p>trace_enqueue 函数用于在进程入队时记录其时间戳:</p>
|
||
<pre><code class="language-c">static int trace_enqueue(u32 tgid, u32 pid)
|
||
{
|
||
u64 ts;
|
||
|
||
if (!pid)
|
||
return 0;
|
||
if (targ_tgid && targ_tgid != tgid)
|
||
return 0;
|
||
|
||
ts = bpf_ktime_get_ns();
|
||
bpf_map_update_elem(&start, &pid, &ts, BPF_ANY);
|
||
return 0;
|
||
}
|
||
</code></pre>
|
||
<p>pid_namespace 函数用于获取进程所属的 PID namespace:</p>
|
||
<pre><code class="language-c">static unsigned int pid_namespace(struct task_struct *task)
|
||
{
|
||
struct pid *pid;
|
||
unsigned int level;
|
||
struct upid upid;
|
||
unsigned int inum;
|
||
|
||
/* get the pid namespace by following task_active_pid_ns(),
|
||
* pid->numbers[pid->level].ns
|
||
*/
|
||
pid = BPF_CORE_READ(task, thread_pid);
|
||
level = BPF_CORE_READ(pid, level);
|
||
bpf_core_read(&upid, sizeof(upid), &pid->numbers[level]);
|
||
inum = BPF_CORE_READ(upid.ns, ns.inum);
|
||
|
||
return inum;
|
||
}
|
||
</code></pre>
|
||
<p>handle_switch 函数是核心部分,用于处理调度切换事件,计算进程调度延迟并更新直方图数据:</p>
|
||
<pre><code class="language-c">static int handle_switch(bool preempt, struct task_struct *prev, struct task_struct *next)
|
||
{
|
||
...
|
||
}
|
||
</code></pre>
|
||
<p>首先,函数根据 filter_cg 的设置判断是否需要过滤 cgroup。然后,如果之前的进程状态为 TASK_RUNNING,则调用 trace_enqueue 函数记录进程的入队时间。接着,函数查找下一个进程的入队时间戳,如果找不到,直接返回。计算调度延迟(delta),并根据不同的选项设置(targ_per_process,targ_per_thread,targ_per_pidns),确定直方图映射的键(hkey)。然后查找或初始化直方图映射,更新直方图数据,最后删除进程的入队时间戳记录。</p>
|
||
<p>接下来是 eBPF 程序的入口点。程序使用三个入口点来捕获不同的调度事件:</p>
|
||
<ul>
|
||
<li>handle_sched_wakeup:用于处理 sched_wakeup 事件,当一个进程从睡眠状态被唤醒时触发。</li>
|
||
<li>handle_sched_wakeup_new:用于处理 sched_wakeup_new 事件,当一个新创建的进程被唤醒时触发。</li>
|
||
<li>handle_sched_switch:用于处理 sched_switch 事件,当调度器选择一个新的进程运行时触发。</li>
|
||
</ul>
|
||
<p>这些入口点分别处理不同的调度事件,但都会调用 handle_switch 函数来计算进程的调度延迟并更新直方图数据。</p>
|
||
<p>最后,程序包含一个许可证声明:</p>
|
||
<pre><code class="language-c">char LICENSE[] SEC("license") = "GPL";
|
||
</code></pre>
|
||
<p>这一声明指定了 eBPF 程序的许可证类型,这里使用的是 "GPL"。这对于许多内核功能是必需的,因为它们要求 eBPF 程序遵循 GPL 许可证。</p>
|
||
<h3 id="runqlath"><a class="header" href="#runqlath">runqlat.h</a></h3>
|
||
<p>然后我们需要定义一个头文件<code>runqlat.h</code>,用来给用户态处理从内核态上报的事件:</p>
|
||
<pre><code class="language-c">/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
|
||
#ifndef __RUNQLAT_H
|
||
#define __RUNQLAT_H
|
||
|
||
#define TASK_COMM_LEN 16
|
||
#define MAX_SLOTS 26
|
||
|
||
struct hist {
|
||
__u32 slots[MAX_SLOTS];
|
||
char comm[TASK_COMM_LEN];
|
||
};
|
||
|
||
#endif /* __RUNQLAT_H */
|
||
</code></pre>
|
||
<h2 id="编译运行"><a class="header" href="#编译运行">编译运行</a></h2>
|
||
<p>eunomia-bpf 是一个结合 Wasm 的开源 eBPF 动态加载运行时和开发工具链,它的目的是简化 eBPF 程序的开发、构建、分发、运行。可以参考 <a href="https://github.com/eunomia-bpf/eunomia-bpf">https://github.com/eunomia-bpf/eunomia-bpf</a> 下载和安装 ecc 编译工具链和 ecli 运行时。我们使用 eunomia-bpf 编译运行这个例子。</p>
|
||
<p>Compile:</p>
|
||
<pre><code class="language-shell">docker run -it -v `pwd`/:/src/ ghcr.io/eunomia-bpf/ecc-`uname -m`:latest
|
||
</code></pre>
|
||
<p>或者</p>
|
||
<pre><code class="language-console">$ ecc runqlat.bpf.c runqlat.h
|
||
Compiling bpf object...
|
||
Generating export types...
|
||
Packing ebpf object and config into package.json...
|
||
</code></pre>
|
||
<p>Run:</p>
|
||
<pre><code class="language-console">$ sudo ecli run examples/bpftools/runqlat/package.json -h
|
||
Usage: runqlat_bpf [--help] [--version] [--verbose] [--filter_cg] [--targ_per_process] [--targ_per_thread] [--targ_per_pidns] [--targ_ms] [--targ_tgid VAR]
|
||
|
||
A simple eBPF program
|
||
|
||
Optional arguments:
|
||
-h, --help shows help message and exits
|
||
-v, --version prints version information and exits
|
||
--verbose prints libbpf debug information
|
||
--filter_cg set value of bool variable filter_cg
|
||
--targ_per_process set value of bool variable targ_per_process
|
||
--targ_per_thread set value of bool variable targ_per_thread
|
||
--targ_per_pidns set value of bool variable targ_per_pidns
|
||
--targ_ms set value of bool variable targ_ms
|
||
--targ_tgid set value of pid_t variable targ_tgid
|
||
|
||
Built with eunomia-bpf framework.
|
||
See https://github.com/eunomia-bpf/eunomia-bpf for more information.
|
||
|
||
$ sudo ecli run examples/bpftools/runqlat/package.json
|
||
key = 4294967295
|
||
comm = rcu_preempt
|
||
|
||
(unit) : count distribution
|
||
0 -> 1 : 9 |**** |
|
||
2 -> 3 : 6 |** |
|
||
4 -> 7 : 12 |***** |
|
||
8 -> 15 : 28 |************* |
|
||
16 -> 31 : 40 |******************* |
|
||
32 -> 63 : 83 |****************************************|
|
||
64 -> 127 : 57 |*************************** |
|
||
128 -> 255 : 19 |********* |
|
||
256 -> 511 : 11 |***** |
|
||
512 -> 1023 : 2 | |
|
||
1024 -> 2047 : 2 | |
|
||
2048 -> 4095 : 0 | |
|
||
4096 -> 8191 : 0 | |
|
||
8192 -> 16383 : 0 | |
|
||
16384 -> 32767 : 1 | |
|
||
|
||
$ sudo ecli run examples/bpftools/runqlat/package.json --targ_per_process
|
||
key = 3189
|
||
comm = cpptools
|
||
|
||
(unit) : count distribution
|
||
0 -> 1 : 0 | |
|
||
2 -> 3 : 0 | |
|
||
4 -> 7 : 0 | |
|
||
8 -> 15 : 1 |*** |
|
||
16 -> 31 : 2 |******* |
|
||
32 -> 63 : 11 |****************************************|
|
||
64 -> 127 : 8 |***************************** |
|
||
128 -> 255 : 3 |********** |
|
||
</code></pre>
|
||
<p>完整源代码请见:<a href="https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/9-runqlat">https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/9-runqlat</a></p>
|
||
<p>参考资料:</p>
|
||
<ul>
|
||
<li><a href="https://www.brendangregg.com/blog/2016-10-08/linux-bcc-runqlat.html">https://www.brendangregg.com/blog/2016-10-08/linux-bcc-runqlat.html</a></li>
|
||
<li><a href="https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqlat.c">https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqlat.c</a></li>
|
||
</ul>
|
||
<h2 id="总结"><a class="header" href="#总结">总结</a></h2>
|
||
<p>runqlat 是一个 Linux 内核 BPF 程序,通过柱状图来总结调度程序运行队列延迟,显示任务等待运行在 CPU 上的时间长度。编译这个程序可以使用 ecc 工具,运行时可以使用 ecli 命令。</p>
|
||
<p>runqlat 是一种用于监控Linux内核中进程调度延迟的工具。它可以帮助您了解进程在内核中等待执行的时间,并根据这些信息优化进程调度,提高系统的性能。可以在 libbpf-tools 中找到最初的源代码:<a href="https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqlat.bpf.c">https://github.com/iovisor/bcc/blob/master/libbpf-tools/runqlat.bpf.c</a></p>
|
||
<p>更多的例子和详细的开发指南,请参考 eunomia-bpf 的官方文档:<a href="https://github.com/eunomia-bpf/eunomia-bpf">https://github.com/eunomia-bpf/eunomia-bpf</a></p>
|
||
<p>如果您希望学习更多关于 eBPF 的知识和实践,可以访问我们的教程代码仓库 <a href="https://github.com/eunomia-bpf/bpf-developer-tutorial">https://github.com/eunomia-bpf/bpf-developer-tutorial</a> 以获取更多示例和完整的教程。</p>
|
||
|
||
</main>
|
||
|
||
<nav class="nav-wrapper" aria-label="Page navigation">
|
||
<!-- Mobile navigation buttons -->
|
||
<a rel="prev" href="../8-exitsnoop/index.html" class="mobile-nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
|
||
<i class="fa fa-angle-left"></i>
|
||
</a>
|
||
|
||
<a rel="next" href="../10-hardirqs/index.html" class="mobile-nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
|
||
<i class="fa fa-angle-right"></i>
|
||
</a>
|
||
|
||
<div style="clear: both"></div>
|
||
</nav>
|
||
</div>
|
||
</div>
|
||
|
||
<nav class="nav-wide-wrapper" aria-label="Page navigation">
|
||
<a rel="prev" href="../8-exitsnoop/index.html" class="nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left">
|
||
<i class="fa fa-angle-left"></i>
|
||
</a>
|
||
|
||
<a rel="next" href="../10-hardirqs/index.html" class="nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right">
|
||
<i class="fa fa-angle-right"></i>
|
||
</a>
|
||
</nav>
|
||
|
||
</div>
|
||
|
||
|
||
|
||
|
||
<script>
|
||
window.playground_copyable = true;
|
||
</script>
|
||
|
||
|
||
<script src="../elasticlunr.min.js"></script>
|
||
<script src="../mark.min.js"></script>
|
||
<script src="../searcher.js"></script>
|
||
|
||
<script src="../clipboard.min.js"></script>
|
||
<script src="../highlight.js"></script>
|
||
<script src="../book.js"></script>
|
||
|
||
<!-- Custom JS scripts -->
|
||
|
||
|
||
</div>
|
||
</body>
|
||
</html>
|