mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-02-04 02:34:16 +08:00
Deploying to gh-pages from @ eunomia-bpf/bpf-developer-tutorial@483f2fc223 🚀
This commit is contained in:
@@ -166,74 +166,74 @@
|
||||
|
||||
<div id="content" class="content">
|
||||
<main>
|
||||
<h1 id="bcc-python-developer-tutorial"><a class="header" href="#bcc-python-developer-tutorial">bcc Python Developer Tutorial</a></h1>
|
||||
<p>This tutorial is about developing <a href="https://github.com/iovisor/bcc">bcc</a> tools and programs using the Python interface. There are two parts: observability then networking. Snippets are taken from various programs in bcc: see their files for licences.</p>
|
||||
<p>Also see the bcc developer's <a href="reference_guide.html">reference_guide.md</a>, and a tutorial for end-users of tools: <a href="tutorial.html">tutorial.md</a>. There is also a lua interface for bcc.</p>
|
||||
<h2 id="observability"><a class="header" href="#observability">Observability</a></h2>
|
||||
<p>This observability tutorial contains 17 lessons, and 46 enumerated things to learn.</p>
|
||||
<h3 id="lesson-1-hello-world"><a class="header" href="#lesson-1-hello-world">Lesson 1. Hello World</a></h3>
|
||||
<p>Start by running <a href="https://github.com/iovisor/bcc/tree/master/examples/hello_world.py">examples/hello_world.py</a>, while running some commands (eg, "ls") in another session. It should print "Hello, World!" for new processes. If not, start by fixing bcc: see <a href="https://github.com/iovisor/bcc/tree/master/INSTALL.md">INSTALL.md</a>.</p>
|
||||
<h1 id="bcc-python-开发者教程"><a class="header" href="#bcc-python-开发者教程">bcc Python 开发者教程</a></h1>
|
||||
<p>本教程介绍使用 Python 接口开发 <a href="https://github.com/iovisor/bcc">bcc</a> 工具和程序。分为两个部分:可观测性和网络。代码片段取自 bcc 的各个程序,请查阅其文件以了解许可证情况。</p>
|
||||
<p>还请参阅 bcc 开发者的<a href="reference_guide.html">参考指南</a>,以及针对工具的用户的教程:<a href="tutorial.html">教程</a>。还有适用于 bcc 的 lua 接口。</p>
|
||||
<h2 id="可观测性"><a class="header" href="#可观测性">可观测性</a></h2>
|
||||
<p>这个可观测性教程包含17个课程和46个要学习的枚举事项。</p>
|
||||
<h3 id="第1课-你好世界"><a class="header" href="#第1课-你好世界">第1课. 你好,世界</a></h3>
|
||||
<p>首先运行 <a href="https://github.com/iovisor/bcc/tree/master/examples/hello_world.py">examples/hello_world.py</a>,同时在另一个会话中运行一些命令(例如,“ls”)。它应该会为新进程打印“Hello, World!”。如果没有打印,请先修复bcc:请参阅 <a href="https://github.com/iovisor/bcc/tree/master/INSTALL.md">INSTALL.md</a>。</p>
|
||||
<pre><code class="language-sh"># ./examples/hello_world.py
|
||||
bash-13364 [002] d... 24573433.052937: : Hello, World!
|
||||
bash-13364 [003] d... 24573436.642808: : Hello, World!
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>Here's the code for hello_world.py:</p>
|
||||
<p>以下是 hello_world.py 的代码示例:</p>
|
||||
<pre><code class="language-Python">from bcc import BPF
|
||||
BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
|
||||
</code></pre>
|
||||
<p>There are six things to learn from this:</p>
|
||||
<p>从中可以学到六件事情:</p>
|
||||
<ol>
|
||||
<li>
|
||||
<p><code>text='...'</code>: This defines a BPF program inline. The program is written in C.</p>
|
||||
<p><code>text='...'</code>:这定义了内联的 BPF 程序。该程序是用 C 编写的。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>kprobe__sys_clone()</code>: This is a short-cut for kernel dynamic tracing via kprobes. If the C function begins with <code>kprobe__</code>, the rest is treated as a kernel function name to instrument, in this case, <code>sys_clone()</code>.</p>
|
||||
<p><code>kprobe__sys_clone()</code>:这是通过 kprobes 动态跟踪内核的一种快捷方式。如果 C 函数以 <code>kprobe__</code> 开头,其余部分将被视为要定位的内核函数名称,本例中为 <code>sys_clone()</code>。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>void *ctx</code>: ctx has arguments, but since we aren't using them here, we'll just cast it to <code>void *</code>.</p>
|
||||
<p><code>void *ctx</code>:ctx 是参数,但由于我们在此处未使用它们,所以我们将其转换为 <code>void*</code> 类型。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>bpf_trace_printk()</code>: A simple kernel facility for printf() to the common trace_pipe (/sys/kernel/debug/tracing/trace_pipe). This is ok for some quick examples, but has limitations: 3 args max, 1 %s only, and trace_pipe is globally shared, so concurrent programs will have clashing output. A better interface is via BPF_PERF_OUTPUT(), covered later.</p>
|
||||
<p><code>bpf_trace_printk()</code>: 用于将 printf() 打印到通用 trace_pipe (/sys/kernel/debug/tracing/trace_pipe) 的简单内核工具。 这对于一些快速示例是可以的,但有一些限制:最多只有 3 个参数,只能有一个 %s,并且 trace_pipe 是全局共享的,所以并发程序会有冲突的输出。更好的接口是通过 BPF_PERF_OUTPUT() 实现的,稍后会介绍。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>return 0;</code>: Necessary formality (if you want to know why, see <a href="https://github.com/iovisor/bcc/issues/139">#139</a>).</p>
|
||||
<p><code>return 0;</code>: 必要的规范性代码(如果想知道原因,请参见 <a href="https://github.com/iovisor/bcc/issues/139">#139</a>)。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>.trace_print()</code>: A bcc routine that reads trace_pipe and prints the output.</p>
|
||||
<p><code>.trace_print()</code>: 一个读取 trace_pipe 并打印输出的 bcc 程序。</p>
|
||||
</li>
|
||||
</ol>
|
||||
<h3 id="lesson-2-sys_sync"><a class="header" href="#lesson-2-sys_sync">Lesson 2. sys_sync()</a></h3>
|
||||
<p>Write a program that traces the sys_sync() kernel function. Print "sys_sync() called" when it runs. Test by running <code>sync</code> in another session while tracing. The hello_world.py program has everything you need for this.</p>
|
||||
<p>Improve it by printing "Tracing sys_sync()... Ctrl-C to end." when the program first starts. Hint: it's just Python.</p>
|
||||
<h3 id="lesson-3-hello_fieldspy"><a class="header" href="#lesson-3-hello_fieldspy">Lesson 3. hello_fields.py</a></h3>
|
||||
<p>This program is in <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/hello_fields.py">examples/tracing/hello_fields.py</a>. Sample output (run commands in another session):</p>
|
||||
<h3 id="第二课-sys_sync"><a class="header" href="#第二课-sys_sync">第二课 sys_sync()</a></h3>
|
||||
<p>编写一个跟踪 sys_sync() 内核函数的程序。运行时打印 "sys_sync() called"。在跟踪时,在另一个会话中运行 <code>sync</code> 进行测试。hello_world.py 程序中包含了这一切所需的内容。</p>
|
||||
<p>通过在程序刚启动时打印 "Tracing sys_sync()... Ctrl-C to end." 来改进它。提示:它只是 Python 代码。</p>
|
||||
<h3 id="第三课-hello_fieldspy"><a class="header" href="#第三课-hello_fieldspy">第三课 hello_fields.py</a></h3>
|
||||
<p>该程序位于 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/hello_fields.py">examples/tracing/hello_fields.py</a>。样本输出(在另一个会话中运行命令):</p>
|
||||
<pre><code class="language-sh"># examples/tracing/hello_fields.py
|
||||
TIME(s) COMM PID MESSAGE
|
||||
24585001.174885999 sshd 1432 Hello, World!
|
||||
24585001.195710000 sshd 15780 Hello, World!
|
||||
24585001.991976000 systemd-udevd 484 Hello, World!
|
||||
24585002.276147000 bash 15787 Hello, World!
|
||||
时间(s) 进程名 进程 ID 消息
|
||||
24585001.174885999 sshd 1432 你好,世界!
|
||||
24585001.195710000 sshd 15780 你好,世界!
|
||||
24585001.991976000 systemd-udevd 484 你好,世界!
|
||||
24585002.276147000 bash 15787 你好,世界!
|
||||
</code></pre>
|
||||
<p>Code:</p>
|
||||
<p>代码:</p>
|
||||
<pre><code class="language-Python">from bcc import BPF
|
||||
|
||||
# define BPF program
|
||||
# 定义 BPF 程序
|
||||
prog = """
|
||||
int hello(void *ctx) {
|
||||
bpf_trace_printk("Hello, World!\\n");
|
||||
bpf_trace_printk("你好,世界!\\n");
|
||||
return 0;
|
||||
}
|
||||
"""
|
||||
|
||||
# load BPF program
|
||||
# 加载 BPF 程序
|
||||
b = BPF(text=prog)
|
||||
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
|
||||
|
||||
# header
|
||||
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
|
||||
# 头部
|
||||
print("%-18s %-16s %-6s %s" % ("时间(s)", "进程名", "进程 ID", "消息"))
|
||||
|
||||
# format output
|
||||
# 格式化输出
|
||||
while 1:
|
||||
try:
|
||||
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
|
||||
@@ -241,34 +241,34 @@ while 1:
|
||||
continue
|
||||
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
|
||||
</code></pre>
|
||||
<p>This is similar to hello_world.py, and traces new processes via sys_clone() again, but has a few more things to learn:</p>
|
||||
<p>这与hello_world.py类似,并通过sys_clone()再次跟踪新进程,但是还有一些要学习的内容:</p>
|
||||
<ol>
|
||||
<li>
|
||||
<p><code>prog =</code>: This time we declare the C program as a variable, and later refer to it. This is useful if you want to add some string substitutions based on command line arguments.</p>
|
||||
<p><code>prog =</code>:这次我们将C程序声明为变量,然后引用它。如果您想根据命令行参数添加一些字符串替换,这将非常有用。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>hello()</code>: Now we're just declaring a C function, instead of the <code>kprobe__</code> shortcut. We'll refer to this later. All C functions declared in the BPF program are expected to be executed on a probe, hence they all need to take a <code>pt_reg* ctx</code> as first argument. If you need to define some helper function that will not be executed on a probe, they need to be defined as <code>static inline</code> in order to be inlined by the compiler. Sometimes you would also need to add <code>_always_inline</code> function attribute to it.</p>
|
||||
<p><code>hello()</code>:现在我们只是声明了一个C函数,而不是使用<code>kprobe__</code>的快捷方式。我们稍后会引用它。在BPF程序中声明的所有C函数都希望在探测器上执行,因此它们都需要以<code>pt_reg* ctx</code>作为第一个参数。如果您需要定义一些不会在探测器上执行的辅助函数,则需要将其定义为<code>static inline</code>,以便由编译器内联。有时您还需要为其添加<code>_always_inline</code>函数属性。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")</code>: Creates a kprobe for the kernel clone system call function, which will execute our defined hello() function. You can call attach_kprobe() more than once, and attach your C function to multiple kernel functions.</p>
|
||||
<p><code>b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")</code>:为内核clone系统调用函数创建一个kprobe,该函数将执行我们定义的hello()函数。您可以多次调用attach_kprobe(),并将您的C函数附加到多个内核函数上。</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>b.trace_fields()</code>: Returns a fixed set of fields from trace_pipe. Similar to trace_print(), this is handy for hacking, but for real tooling we should switch to BPF_PERF_OUTPUT().</p>
|
||||
<p><code>b.trace_fields()</code>:从trace_pipe中返回一组固定的字段。与trace_print()类似,它对于编写脚本很方便,但是对于实际的工具化需求,我们应该切换到BPF_PERF_OUTPUT()。</p>
|
||||
</li>
|
||||
</ol>
|
||||
<h3 id="lesson-4-sync_timingpy"><a class="header" href="#lesson-4-sync_timingpy">Lesson 4. sync_timing.py</a></h3>
|
||||
<p>Remember the days of sysadmins typing <code>sync</code> three times on a slow console before <code>reboot</code>, to give the first asynchronous sync time to complete? Then someone thought <code>sync;sync;sync</code> was clever, to run them all on one line, which became industry practice despite defeating the original purpose! And then sync became synchronous, so more reasons it was silly. Anyway.</p>
|
||||
<p>The following example times how quickly the <code>do_sync</code> function is called, and prints output if it has been called more recently than one second ago. A <code>sync;sync;sync</code> will print output for the 2nd and 3rd sync's:</p>
|
||||
<p>还记得以前系统管理员在缓慢的控制台上输入<code>sync</code>三次然后才重启吗?后来有人认为<code>sync;sync;sync</code>很聪明,将它们都写在一行上运行,尽管这违背了最初的目的!然后,sync变成了同步操作,所以更加愚蠢。无论如何。</p>
|
||||
<p>以下示例计算了<code>do_sync</code>函数被调用的速度,并且如果它在一秒钟之内被调用,则输出信息。<code>sync;sync;sync</code>将为第2个和第3个sync打印输出:</p>
|
||||
<pre><code class="language-sh"># examples/tracing/sync_timing.py
|
||||
Tracing for quick sync's... Ctrl-C to end
|
||||
At time 0.00 s: multiple syncs detected, last 95 ms ago
|
||||
At time 0.10 s: multiple syncs detected, last 96 ms ago
|
||||
追踪快速sync... 按Ctrl-C结束"。
|
||||
</code></pre>
|
||||
<p>This program is <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/sync_timing.py">examples/tracing/sync_timing.py</a>:</p>
|
||||
<p>在时间0.00秒时:检测到多个同步,上次发生在95毫秒前
|
||||
在时间0.10秒时:检测到多个同步,上次发生在96毫秒前</p>
|
||||
<p>此程序是<a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/sync_timing.py">examples/tracing/sync_timing.py</a>:</p>
|
||||
<pre><code class="language-Python">from __future__ import print_function
|
||||
from bcc import BPF
|
||||
|
||||
# load BPF program
|
||||
# 加载BPF程序
|
||||
b = BPF(text="""
|
||||
#include <uapi/linux/ptrace.h>
|
||||
|
||||
@@ -277,18 +277,18 @@ BPF_HASH(last);
|
||||
int do_trace(struct pt_regs *ctx) {
|
||||
u64 ts, *tsp, delta, key = 0;
|
||||
|
||||
// attempt to read stored timestamp
|
||||
// 尝试读取存储的时间戳
|
||||
tsp = last.lookup(&key);
|
||||
if (tsp != NULL) {
|
||||
delta = bpf_ktime_get_ns() - *tsp;
|
||||
if (delta < 1000000000) {
|
||||
// output if time is less than 1 second
|
||||
// 时间小于1秒则输出
|
||||
bpf_trace_printk("%d\\n", delta / 1000000);
|
||||
}
|
||||
last.delete(&key);
|
||||
}
|
||||
|
||||
// update stored timestamp
|
||||
// 更新存储的时间戳
|
||||
ts = bpf_ktime_get_ns();
|
||||
last.update(&key, &ts);
|
||||
return 0;
|
||||
@@ -296,44 +296,43 @@ int do_trace(struct pt_regs *ctx) {
|
||||
""")
|
||||
|
||||
b.attach_kprobe(event=b.get_syscall_fnname("sync"), fn_name="do_trace")
|
||||
print("Tracing for quick sync's... Ctrl-C to end")
|
||||
print("跟踪快速同步... 按Ctrl-C结束")
|
||||
|
||||
# format output
|
||||
# 格式化输出
|
||||
start = 0
|
||||
while 1:
|
||||
(task, pid, cpu, flags, ts, ms) = b.trace_fields()
|
||||
if start == 0:
|
||||
start = ts
|
||||
ts = ts - start
|
||||
print("At time %.2f s: multiple syncs detected, last %s ms ago" % (ts, ms))
|
||||
print("在时间%.2f秒处:检测到多个同步,上次发生在%s毫秒前" % (ts, ms))
|
||||
</code></pre>
|
||||
<p>Things to learn:</p>
|
||||
<p>学习内容:</p>
|
||||
<ol>
|
||||
<li><code>bpf_ktime_get_ns()</code>: Returns the time as nanoseconds.</li>
|
||||
<li><code>BPF_HASH(last)</code>: Creates a BPF map object that is a hash (associative array), called "last". We didn't specify any further arguments, so it defaults to key and value types of u64.</li>
|
||||
<li><code>key = 0</code>: We'll only store one key/value pair in this hash, where the key is hardwired to zero.</li>
|
||||
<li><code>last.lookup(&key)</code>: Lookup the key in the hash, and return a pointer to its value if it exists, else NULL. We pass the key in as an address to a pointer.</li>
|
||||
<li><code>if (tsp != NULL) {</code>: The verifier requires that pointer values derived from a map lookup must be checked for a null value before they can be dereferenced and used.</li>
|
||||
<li><code>last.delete(&key)</code>: Delete the key from the hash. This is currently required because of <a href="https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02">a kernel bug in <code>.update()</code></a> (fixed in 4.8.10).</li>
|
||||
<li><code>last.update(&key, &ts)</code>: Associate the value in the 2nd argument to the key, overwriting any previous value. This records the timestamp.</li>
|
||||
<li><code>bpf_ktime_get_ns()</code>: 返回时间,单位为纳秒。</li>
|
||||
<li><code>BPF_HASH(last)</code>: 创建一个BPF映射对象,类型为哈希(关联数组),名为"last"。我们没有指定其他参数,因此默认的键和值类型为u64。</li>
|
||||
<li><code>key = 0</code>: 我们只会在哈希中存储一个键值对,其中键被硬编码为零。</li>
|
||||
<li><code>last.lookup(&key)</code>: 在哈希中查找键,并如果存在则返回其值的指针,否则返回NULL。我们将键作为指针的地址传递给该函数。</li>
|
||||
<li><code>if (tsp != NULL) {</code>: 验证器要求在将从映射查找得到的指针值解引用使用之前,必须先检查其是否为null。1. <code>last.delete(&key)</code>: 从哈希表中删除key。目前需要这样做是因为<a href="https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a6ed3ea65d9868fdf9eff84e6fe4f666b8d14b02"><code>.update()</code>中存在一个内核错误</a>(在4.8.10中已经修复)。</li>
|
||||
<li><code>last.update(&key, &ts)</code>: 将第二个参数的值与key关联起来,覆盖之前的任何值。这会记录时间戳。</li>
|
||||
</ol>
|
||||
<h3 id="lesson-5-sync_countpy"><a class="header" href="#lesson-5-sync_countpy">Lesson 5. sync_count.py</a></h3>
|
||||
<p>Modify the sync_timing.py program (prior lesson) to store the count of all kernel sync system calls (both fast and slow), and print it with the output. This count can be recorded in the BPF program by adding a new key index to the existing hash.</p>
|
||||
<h3 id="lesson-6-disksnooppy"><a class="header" href="#lesson-6-disksnooppy">Lesson 6. disksnoop.py</a></h3>
|
||||
<p>Browse the <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/disksnoop.py">examples/tracing/disksnoop.py</a> program to see what is new. Here is some sample output:</p>
|
||||
<h3 id="第5课-sync_countpy"><a class="header" href="#第5课-sync_countpy">第5课. sync_count.py</a></h3>
|
||||
<p>修改sync_timing.py程序(前一课)以存储所有内核同步系统调用(包括快速和慢速)的计数,并将其与输出一起打印出来。可以通过向现有哈希表添加一个新的键索引来在BPF程序中记录此计数。</p>
|
||||
<h3 id="第6课-disksnooppy"><a class="header" href="#第6课-disksnooppy">第6课. disksnoop.py</a></h3>
|
||||
<p>浏览<a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/disksnoop.py">examples/tracing/disksnoop.py</a>程序以了解新内容。以下是一些示例输出:</p>
|
||||
<pre><code class="language-sh"># disksnoop.py
|
||||
TIME(s) T BYTES LAT(ms)
|
||||
时间(s) T 字节 延迟(ms)
|
||||
16458043.436012 W 4096 3.13
|
||||
16458043.437326 W 4096 4.44
|
||||
16458044.126545 R 4096 42.82
|
||||
16458044.129872 R 4096 3.24
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>And a code snippet:</p>
|
||||
<p>以及代码片段:</p>
|
||||
<pre><code class="language-Python">[...]
|
||||
REQ_WRITE = 1 # from include/linux/blk_types.h
|
||||
REQ_WRITE = 1 # 来自include/linux/blk_types.h
|
||||
|
||||
# load BPF program
|
||||
# 加载BPF程序
|
||||
b = BPF(text="""
|
||||
#include <uapi/linux/ptrace.h>
|
||||
#include <linux/blk-mq.h>
|
||||
@@ -341,59 +340,57 @@ b = BPF(text="""
|
||||
BPF_HASH(start, struct request *);
|
||||
|
||||
void trace_start(struct pt_regs *ctx, struct request *req) {
|
||||
// stash start timestamp by request ptr
|
||||
u64 ts = bpf_ktime_get_ns();
|
||||
// 使用请求指针存储开始时间戳
|
||||
u64 ts = bpf_ktime_get_ns();
|
||||
|
||||
start.update(&req, &ts);
|
||||
start.update(&req, &ts);
|
||||
}
|
||||
|
||||
void trace_completion(struct pt_regs *ctx, struct request *req) {
|
||||
u64 *tsp, delta;
|
||||
u64 *tsp, delta;
|
||||
|
||||
tsp = start.lookup(&req);
|
||||
if (tsp != 0) {
|
||||
delta = bpf_ktime_get_ns() - *tsp;
|
||||
bpf_trace_printk("%d %x %d\\n", req->__data_len,
|
||||
req->cmd_flags, delta / 1000);
|
||||
start.delete(&req);
|
||||
}
|
||||
tsp = start.lookup(&req);
|
||||
if (tsp != 0) {
|
||||
delta = bpf_ktime_get_ns() - *tsp;
|
||||
bpf_trace_printk("%d %x %d\\n", req->__data_len,
|
||||
req->cmd_flags, delta / 1000);
|
||||
start.delete(&req);
|
||||
}
|
||||
}
|
||||
""")
|
||||
if BPF.get_kprobe_functions(b'blk_start_request'):
|
||||
b.attach_kprobe(event="blk_start_request", fn_name="trace_start")
|
||||
b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_start")
|
||||
if BPF.get_kprobe_functions(b'__blk_account_io_done'):
|
||||
b.attach_kprobe(event="__blk_account_io_done", fn_name="trace_completion")
|
||||
else:
|
||||
b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion")
|
||||
[...]
|
||||
b.attach_kprobe(event="__blk_account_io_done", fn_name="trace_completion") else: b.attach_kprobe(event="blk_account_io_done", fn_name="trace_completion")
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>Things to learn:</p>
|
||||
<p>学习内容:</p>
|
||||
<ol>
|
||||
<li><code>REQ_WRITE</code>: We're defining a kernel constant in the Python program because we'll use it there later. If we were using REQ_WRITE in the BPF program, it should just work (without needing to be defined) with the appropriate #includes.</li>
|
||||
<li><code>trace_start(struct pt_regs *ctx, struct request *req)</code>: This function will later be attached to kprobes. The arguments to kprobe functions are <code>struct pt_regs *ctx</code>, for registers and BPF context, and then the actual arguments to the function. We'll attach this to blk_start_request(), where the first argument is <code>struct request *</code>.</li>
|
||||
<li><code>start.update(&req, &ts)</code>: We're using the pointer to the request struct as a key in our hash. What? This is commonplace in tracing. Pointers to structs turn out to be great keys, as they are unique: two structs can't have the same pointer address. (Just be careful about when it gets free'd and reused.) So what we're really doing is tagging the request struct, which describes the disk I/O, with our own timestamp, so that we can time it. There's two common keys used for storing timestamps: pointers to structs, and, thread IDs (for timing function entry to return).</li>
|
||||
<li><code>req->__data_len</code>: We're dereferencing members of <code>struct request</code>. See its definition in the kernel source for what members are there. bcc actually rewrites these expressions to be a series of <code>bpf_probe_read_kernel()</code> calls. Sometimes bcc can't handle a complex dereference, and you need to call <code>bpf_probe_read_kernel()</code> directly.</li>
|
||||
<li><code>REQ_WRITE</code>: 我们在Python程序中定义了一个内核常量,因为我们后面会在Python程序中使用它。如果我们在BPF程序中使用REQ_WRITE,它应该可以正常工作(无需定义),只需使用适当的<code>#includes</code>。</li>
|
||||
<li><code>trace_start(struct pt_regs *ctx, struct request*req)</code>: 这个函数将在后面附加到kprobe上。kprobe函数的参数是<code>struct pt_regs *ctx</code>,用于寄存器和BPF上下文,然后是函数的实际参数。我们将把它附加到blk_start_request()上,其中第一个参数是<code>struct request*</code>。</li>
|
||||
<li><code>start.update(&req, &ts)</code>: 我们使用请求结构的指针作为哈希中的键。这在跟踪中很常见。结构体指针是非常好的键,因为它们是唯一的:两个结构体不能具有相同的指针地址。(只需小心何时释放和重用指针。)所以我们实际上是给描述磁盘I/O的请求结构体打上我们自己的时间戳,以便我们可以计时。存储时间戳常用的两个键是结构体指针和线程ID(用于记录函数入口到返回的时间)。</li>
|
||||
<li><code>req->__data_len</code>: 我们在解引用<code>struct request</code>的成员。请参阅内核源代码中对其定义的部分以获得有关哪些成员可用的信息。bcc实际上会将这些表达式重写为一系列<code>bpf_probe_read_kernel()</code>调用。有时bcc无法处理复杂的解引用,此时您需要直接调用<code>bpf_probe_read_kernel()</code>。</li>
|
||||
</ol>
|
||||
<p>This is a pretty interesting program, and if you can understand all the code, you'll understand many important basics. We're still using the bpf_trace_printk() hack, so let's fix that next.</p>
|
||||
<p>这是一个非常有趣的程序,如果您能理解所有的代码,您就会理解很多重要的基础知识。我们仍然在使用<code>bpf_trace_printk()</code>的技巧,我们下一步要解决这个问题。</p>
|
||||
<h3 id="lesson-7-hello_perf_outputpy"><a class="header" href="#lesson-7-hello_perf_outputpy">Lesson 7. hello_perf_output.py</a></h3>
|
||||
<p>Let's finally stop using bpf_trace_printk() and use the proper BPF_PERF_OUTPUT() interface. This will also mean we stop getting the free trace_field() members like PID and timestamp, and will need to fetch them directly. Sample output while commands are run in another session:</p>
|
||||
<p>让我们最终停止使用bpf_trace_printk(),并使用适当的BPF_PERF_OUTPUT()接口。这也意味着我们将停止获取免费的trace_field()成员,如PID和时间戳,并且需要直接获取它们。在另一个会话中运行命令时的示例输出</p>
|
||||
<pre><code class="language-sh"># hello_perf_output.py
|
||||
TIME(s) COMM PID MESSAGE
|
||||
0.000000000 bash 22986 Hello, perf_output!
|
||||
0.021080275 systemd-udevd 484 Hello, perf_output!
|
||||
0.021359520 systemd-udevd 484 Hello, perf_output!
|
||||
0.021590610 systemd-udevd 484 Hello, perf_output!
|
||||
0.000000000 bash 22986 你好,perf_output!
|
||||
0.021080275 systemd-udevd 484 你好,perf_output!
|
||||
0.021359520 systemd-udevd 484 你好,perf_output!
|
||||
0.021590610 systemd-udevd 484 你好,perf_output!
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>Code is <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/hello_perf_output.py">examples/tracing/hello_perf_output.py</a>:</p>
|
||||
<p>代码位于<a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/hello_perf_output.py">examples/tracing/hello_perf_output.py</a>:</p>
|
||||
<pre><code class="language-Python">from bcc import BPF
|
||||
|
||||
# define BPF program
|
||||
// 定义BPF程序
|
||||
prog = """
|
||||
#include <linux/sched.h>
|
||||
|
||||
// define output data structure in C
|
||||
// 在C中定义输出数据结构
|
||||
struct data_t {
|
||||
u32 pid;
|
||||
u64 ts;
|
||||
@@ -414,14 +411,14 @@ int hello(struct pt_regs *ctx) {
|
||||
}
|
||||
"""
|
||||
|
||||
# load BPF program
|
||||
// 加载BPF程序
|
||||
b = BPF(text=prog)
|
||||
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
|
||||
|
||||
# header
|
||||
//标题
|
||||
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
|
||||
|
||||
# process event
|
||||
//处理事件
|
||||
start = 0
|
||||
def print_event(cpu, data, size):
|
||||
global start
|
||||
@@ -429,50 +426,47 @@ def print_event(cpu, data, size):
|
||||
if start == 0:
|
||||
start = event.ts
|
||||
time_s = (float(event.ts - start)) / 1000000000
|
||||
print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid,
|
||||
"Hello, perf_output!"))
|
||||
print("%-18.9f %-16s %-6d %s" % (time_s, event.comm, event.pid, "你好,perf_output!"))
|
||||
|
||||
# loop with callback to print_event
|
||||
//循环并回调print_event
|
||||
b["events"].open_perf_buffer(print_event)
|
||||
while 1:
|
||||
b.perf_buffer_poll()
|
||||
</code></pre>
|
||||
<p>Things to learn:</p>
|
||||
<p>学习的内容:</p>
|
||||
<ol>
|
||||
<li><code>struct data_t</code>: This defines the C struct we'll use to pass data from kernel to user space.</li>
|
||||
<li><code>BPF_PERF_OUTPUT(events)</code>: This names our output channel "events".</li>
|
||||
<li><code>struct data_t data = {};</code>: Create an empty data_t struct that we'll then populate.</li>
|
||||
<li><code>bpf_get_current_pid_tgid()</code>: Returns the process ID in the lower 32 bits (kernel's view of the PID, which in user space is usually presented as the thread ID), and the thread group ID in the upper 32 bits (what user space often thinks of as the PID). By directly setting this to a u32, we discard the upper 32 bits. Should you be presenting the PID or the TGID? For a multi-threaded app, the TGID will be the same, so you need the PID to differentiate them, if that's what you want. It's also a question of expectations for the end user.</li>
|
||||
<li><code>bpf_get_current_comm()</code>: Populates the first argument address with the current process name.</li>
|
||||
<li><code>events.perf_submit()</code>: Submit the event for user space to read via a perf ring buffer.</li>
|
||||
<li><code>def print_event()</code>: Define a Python function that will handle reading events from the <code>events</code> stream.</li>
|
||||
<li><code>b["events"].event(data)</code>: Now get the event as a Python object, auto-generated from the C declaration.</li>
|
||||
<li><code>b["events"].open_perf_buffer(print_event)</code>: Associate the Python <code>print_event</code> function with the <code>events</code> stream.</li>
|
||||
<li><code>while 1: b.perf_buffer_poll()</code>: Block waiting for events.</li>
|
||||
<li><code>struct data_t</code>: 这定义了一个C结构体,我们将用它来从内核传递数据到用户空间。1. <code>BPF_PERF_OUTPUT(events)</code>: 这里给我们的输出通道命名为"events"。</li>
|
||||
<li><code>struct data_t data = {};</code>: 创建一个空的<code>data_t</code>结构体,我们将在之后填充它。</li>
|
||||
<li><code>bpf_get_current_pid_tgid()</code>: 返回低32位的进程ID(内核视图中的PID,用户空间中通常被表示为线程ID),以及高32位的线程组ID(用户空间通常认为是PID)。通过直接将其设置为<code>u32</code>,我们丢弃了高32位。应该显示PID还是TGID?对于多线程应用程序,TGID将是相同的,所以如果你想要区分它们,你需要PID。这也是对最终用户期望的一个问题。</li>
|
||||
<li><code>bpf_get_current_comm()</code>: 将当前进程的名称填充到第一个参数的地址中。</li>
|
||||
<li><code>events.perf_submit()</code>: 通过perf环形缓冲区将事件提交给用户空间以供读取。</li>
|
||||
<li><code>def print_event()</code>: 定义一个Python函数来处理从<code>events</code>流中读取的事件。</li>
|
||||
<li><code>b["events"].event(data)</code>: 现在将事件作为一个Python对象获取,该对象是根据C声明自动生成的。</li>
|
||||
<li><code>b["events"].open_perf_buffer(print_event)</code>: 将Python的<code>print_event</code>函数与<code>events</code>流关联起来。</li>
|
||||
<li><code>while 1: b.perf_buffer_poll()</code>: 阻塞等待事件。</li>
|
||||
</ol>
|
||||
<h3 id="lesson-8-sync_perf_outputpy"><a class="header" href="#lesson-8-sync_perf_outputpy">Lesson 8. sync_perf_output.py</a></h3>
|
||||
<p>Rewrite sync_timing.py, from a prior lesson, to use <code>BPF_PERF_OUTPUT</code>.</p>
|
||||
<h3 id="lesson-9-bitehistpy"><a class="header" href="#lesson-9-bitehistpy">Lesson 9. bitehist.py</a></h3>
|
||||
<p>The following tool records a histogram of disk I/O sizes. Sample output:</p>
|
||||
<h3 id="第八课-sync_perf_outputpy"><a class="header" href="#第八课-sync_perf_outputpy">第八课。 sync_perf_output.py</a></h3>
|
||||
<p>重写之前的课程中的sync_timing.py,使用<code>BPF_PERF_OUTPUT</code>。</p>
|
||||
<h3 id="第九课-bitehistpy"><a class="header" href="#第九课-bitehistpy">第九课。 bitehist.py</a></h3>
|
||||
<p>以下工具记录了磁盘I/O大小的直方图。样本输出:</p>
|
||||
<pre><code class="language-sh"># bitehist.py
|
||||
Tracing... Hit Ctrl-C to end.
|
||||
跟踪中... 按Ctrl-C结束。
|
||||
^C
|
||||
kbytes : count distribution
|
||||
0 -> 1 : 3 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 211 |********** |
|
||||
8 -> 15 : 0 | |
|
||||
16 -> 31 : 0 | |
|
||||
32 -> 63 : 0 | |
|
||||
16 -> 31 : 0 | |".32 -> 63 : 0 | |
|
||||
64 -> 127 : 1 | |
|
||||
128 -> 255 : 800 |**************************************|
|
||||
</code></pre>
|
||||
<p>Code is <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/bitehist.py">examples/tracing/bitehist.py</a>:</p>
|
||||
<p>代码在<a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/bitehist.py">examples/tracing/bitehist.py</a>:</p>
|
||||
<pre><code class="language-Python">from __future__ import print_function
|
||||
from bcc import BPF
|
||||
from time import sleep
|
||||
|
||||
# load BPF program
|
||||
# 加载BPF程序
|
||||
b = BPF(text="""
|
||||
#include <uapi/linux/ptrace.h>
|
||||
#include <linux/blkdev.h>
|
||||
@@ -481,49 +475,47 @@ BPF_HISTOGRAM(dist);
|
||||
|
||||
int kprobe__blk_account_io_done(struct pt_regs *ctx, struct request *req)
|
||||
{
|
||||
dist.increment(bpf_log2l(req->__data_len / 1024));
|
||||
return 0;
|
||||
dist.increment(bpf_log2l(req->__data_len / 1024));
|
||||
return 0;
|
||||
}
|
||||
""")
|
||||
|
||||
# header
|
||||
print("Tracing... Hit Ctrl-C to end.")
|
||||
# 头部
|
||||
print("跟踪中... 按Ctrl-C结束.")
|
||||
|
||||
# trace until Ctrl-C
|
||||
# 跟踪直到按下Ctrl-C
|
||||
try:
|
||||
sleep(99999999)
|
||||
sleep(99999999)
|
||||
except KeyboardInterrupt:
|
||||
print()
|
||||
print()
|
||||
|
||||
# output
|
||||
# 输出
|
||||
b["dist"].print_log2_hist("kbytes")
|
||||
</code></pre>
|
||||
<p>A recap from earlier lessons:</p>
|
||||
<p>之前课程的总结:</p>
|
||||
<ul>
|
||||
<li><code>kprobe__</code>: This prefix means the rest will be treated as a kernel function name that will be instrumented using kprobe.</li>
|
||||
<li><code>struct pt_regs *ctx, struct request *req</code>: Arguments to kprobe. The <code>ctx</code> is registers and BPF context, the <code>req</code> is the first argument to the instrumented function: <code>blk_account_io_done()</code>.</li>
|
||||
<li><code>req->__data_len</code>: Dereferencing that member.</li>
|
||||
<li><code>kprobe__</code>: 这个前缀意味着其余部分将被视为一个将使用kprobe进行插桩的内核函数名。</li>
|
||||
<li><code>struct pt_regs *ctx, struct request*req</code>: kprobe的参数。<code>ctx</code> 是寄存器和BPF上下文,<code>req</code> 是被插桩函数 <code>blk_account_io_done()</code> 的第一个参数。</li>
|
||||
<li><code>req->__data_len</code>: 解引用该成员。</li>
|
||||
</ul>
|
||||
<p>New things to learn:</p>
|
||||
<p>新知识:</p>
|
||||
<ol>
|
||||
<li><code>BPF_HISTOGRAM(dist)</code>: Defines a BPF map object that is a histogram, and names it "dist".</li>
|
||||
<li><code>dist.increment()</code>: Increments the histogram bucket index provided as first argument by one by default. Optionally, custom increments can be passed as the second argument.</li>
|
||||
<li><code>bpf_log2l()</code>: Returns the log-2 of the provided value. This becomes the index of our histogram, so that we're constructing a power-of-2 histogram.</li>
|
||||
<li><code>b["dist"].print_log2_hist("kbytes")</code>: Prints the "dist" histogram as power-of-2, with a column header of "kbytes". The only data transferred from kernel to user space is the bucket counts, making this efficient.</li>
|
||||
<li><code>BPF_HISTOGRAM(dist)</code>: 定义了一个名为 "dist" 的BPF映射对象,它是一个直方图。</li>
|
||||
<li><code>dist.increment()</code>: 默认情况下,将第一个参数提供的直方图桶索引加1。也可以作为第二个参数传递自定义的增量。</li>
|
||||
<li><code>bpf_log2l()</code>: 返回所提供值的对数值。这将成为我们直方图的索引,这样我们构建了一个以2为底的幂直方图。</li>
|
||||
<li><code>b["dist"].print_log2_hist("kbytes")</code>: 以2为底的幂形式打印 "dist" 直方图,列标题为 "kbytes"。这样只有桶计数从内核传输到用户空间,因此效率高。</li>
|
||||
</ol>
|
||||
<h3 id="lesson-10-disklatencypy"><a class="header" href="#lesson-10-disklatencypy">Lesson 10. disklatency.py</a></h3>
|
||||
<p>Write a program that times disk I/O, and prints a histogram of their latency. Disk I/O instrumentation and timing can be found in the disksnoop.py program from a prior lesson, and histogram code can be found in bitehist.py from a prior lesson.</p>
|
||||
<h3 id="lesson-11-vfsreadlatpy"><a class="header" href="#lesson-11-vfsreadlatpy">Lesson 11. vfsreadlat.py</a></h3>
|
||||
<p>This example is split into separate Python and C files. Example output:</p>
|
||||
<h3 id="lesson-10-disklatencypy-lesson-11-vfsreadlatpy"><a class="header" href="#lesson-10-disklatencypy-lesson-11-vfsreadlatpy">Lesson 10. disklatency.py”。#### Lesson 11. vfsreadlat.py</a></h3>
|
||||
<p>这个例子分为独立的Python和C文件。示例输出:</p>
|
||||
<pre><code class="language-sh"># vfsreadlat.py 1
|
||||
Tracing... Hit Ctrl-C to end.
|
||||
usecs : count distribution
|
||||
跟踪中... 按Ctrl-C停止。
|
||||
微秒 : 数量 分布
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 2 |*********** |
|
||||
4 -> 7 : 7 |****************************************|
|
||||
8 -> 15 : 4 |********************** |
|
||||
|
||||
usecs : count distribution
|
||||
微秒 : 数量 分布
|
||||
0 -> 1 : 29 |****************************************|
|
||||
2 -> 3 : 28 |************************************** |
|
||||
4 -> 7 : 4 |***** |
|
||||
@@ -538,8 +530,7 @@ Tracing... Hit Ctrl-C to end.
|
||||
2048 -> 4095 : 0 | |
|
||||
4096 -> 8191 : 4 |***** |
|
||||
8192 -> 16383 : 6 |******** |
|
||||
16384 -> 32767 : 9 |************ |
|
||||
32768 -> 65535 : 6 |******** |
|
||||
16384 -> 32767 : 9 |************ |```.32768 -> 65535 : 6 |******** |
|
||||
65536 -> 131071 : 2 |** |
|
||||
|
||||
usecs : count distribution
|
||||
@@ -551,14 +542,15 @@ Tracing... Hit Ctrl-C to end.
|
||||
32 -> 63 : 2 |******* |
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>Browse the code in <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/vfsreadlat.py">examples/tracing/vfsreadlat.py</a> and <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/vfsreadlat.c">examples/tracing/vfsreadlat.c</a>. Things to learn:</p>
|
||||
<p>浏览 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/vfsreadlat.py">examples/tracing/vfsreadlat.py</a> 和 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/vfsreadlat.c">examples/tracing/vfsreadlat.c</a> 中的代码。</p>
|
||||
<p>学习的内容:</p>
|
||||
<ol>
|
||||
<li><code>b = BPF(src_file = "vfsreadlat.c")</code>: Read the BPF C program from a separate source file.</li>
|
||||
<li><code>b.attach_kretprobe(event="vfs_read", fn_name="do_return")</code>: Attaches the BPF C function <code>do_return()</code> to the return of the kernel function <code>vfs_read()</code>. This is a kretprobe: instrumenting the return from a function, rather than its entry.</li>
|
||||
<li><code>b["dist"].clear()</code>: Clears the histogram.</li>
|
||||
<li><code>b = BPF(src_file = "vfsreadlat.c")</code>: 从单独的源代码文件中读取 BPF C 程序。</li>
|
||||
<li><code>b.attach_kretprobe(event="vfs_read", fn_name="do_return")</code>: 将 BPF C 函数 <code>do_return()</code> 链接到内核函数 <code>vfs_read()</code> 的返回值上。这是一个 kretprobe:用于检测函数返回值,而不是函数的入口。</li>
|
||||
<li><code>b["dist"].clear()</code>: 清除直方图。</li>
|
||||
</ol>
|
||||
<h3 id="lesson-12-urandomreadpy"><a class="header" href="#lesson-12-urandomreadpy">Lesson 12. urandomread.py</a></h3>
|
||||
<p>Tracing while a <code>dd if=/dev/urandom of=/dev/null bs=8k count=5</code> is run:</p>
|
||||
<p>当运行 <code>dd if=/dev/urandom of=/dev/null bs=8k count=5</code> 时进行跟踪:</p>
|
||||
<pre><code class="language-sh"># urandomread.py
|
||||
TIME(s) COMM PID GOTBITS
|
||||
24652832.956994001 smtp 24690 384
|
||||
@@ -568,11 +560,11 @@ TIME(s) COMM PID GOTBITS
|
||||
24652837.728294998 dd 24692 65536
|
||||
24652837.728888001 dd 24692 65536
|
||||
</code></pre>
|
||||
<p>Hah! I caught smtp by accident. Code is <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/urandomread.py">examples/tracing/urandomread.py</a>:</p>
|
||||
<pre><code class="language-Python">from __future__ import print_function
|
||||
<p>哈!我意外地捕捉到了 smtp。代码在 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/urandomread.py">examples/tracing/urandomread.py</a> 中:</p>
|
||||
<pre><code class="language-Python">from __future__ import print_function".```python
|
||||
from bcc import BPF
|
||||
|
||||
# load BPF program
|
||||
# 加载BPF程序
|
||||
b = BPF(text="""
|
||||
TRACEPOINT_PROBE(random, urandom_read) {
|
||||
// args is from /sys/kernel/debug/tracing/events/random/urandom_read/format
|
||||
@@ -592,34 +584,34 @@ while 1:
|
||||
continue
|
||||
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
|
||||
</code></pre>
|
||||
<p>Things to learn:</p>
|
||||
<p>要学到的东西:</p>
|
||||
<ol>
|
||||
<li><code>TRACEPOINT_PROBE(random, urandom_read)</code>: Instrument the kernel tracepoint <code>random:urandom_read</code>. These have a stable API, and thus are recommend to use instead of kprobes, wherever possible. You can run <code>perf list</code> for a list of tracepoints. Linux >= 4.7 is required to attach BPF programs to tracepoints.</li>
|
||||
<li><code>args->got_bits</code>: <code>args</code> is auto-populated to be a structure of the tracepoint arguments. The comment above says where you can see that structure. Eg:</li>
|
||||
<li><code>TRACEPOINT_PROBE(random, urandom_read)</code>: 对内核跟踪点 <code>random:urandom_read</code> 进行注入。这些具有稳定的API,因此在可能的情况下建议使用它们来代替kprobe。您可以运行 <code>perf list</code> 来获取跟踪点列表。至少需要 Linux 版本 4.7 来将 BPF 程序附加到跟踪点上。</li>
|
||||
<li><code>args->got_bits</code>: <code>args</code> 是自动填充的跟踪点参数结构。上面的注释指出了可以查看这个结构的位置。例如:</li>
|
||||
</ol>
|
||||
<pre><code class="language-sh"># cat /sys/kernel/debug/tracing/events/random/urandom_read/format
|
||||
name: urandom_read
|
||||
ID: 972
|
||||
format:
|
||||
field:unsigned short common_type; offset:0; size:2; signed:0;
|
||||
field:unsigned char common_flags; offset:2; size:1; signed:0;
|
||||
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
|
||||
field:int common_pid; offset:4; size:4; signed:1;
|
||||
field:unsigned short common_type; offset:0; size:2; signed:0;
|
||||
field:unsigned char common_flags; offset:2; size:1; signed:0;
|
||||
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
|
||||
field:int common_pid; offset:4; size:4; signed:1;
|
||||
|
||||
field:int got_bits; offset:8; size:4; signed:1;
|
||||
field:int pool_left; offset:12; size:4; signed:1;
|
||||
field:int input_left; offset:16; size:4; signed:1;
|
||||
field:int got_bits; offset:8; size:4; signed:1;
|
||||
field:int pool_left; offset:12; size:4; signed:1;
|
||||
field:int input_left; offset:16; size:4; signed:1;
|
||||
|
||||
print fmt: "got_bits %d nonblocking_pool_entropy_left %d input_entropy_left %d", REC->got_bits, REC->pool_left, REC->input_left
|
||||
</code></pre>
|
||||
<p>In this case, we were printing the <code>got_bits</code> member.</p>
|
||||
<h3 id="lesson-13-disksnooppy-fixed"><a class="header" href="#lesson-13-disksnooppy-fixed">Lesson 13. disksnoop.py fixed</a></h3>
|
||||
<p>Convert disksnoop.py from a previous lesson to use the <code>block:block_rq_issue</code> and <code>block:block_rq_complete</code> tracepoints.</p>
|
||||
<h3 id="lesson-14-strlen_countpy"><a class="header" href="#lesson-14-strlen_countpy">Lesson 14. strlen_count.py</a></h3>
|
||||
<p>This program instruments a user-level function, the <code>strlen()</code> library function, and frequency counts its string argument. Example output:</p>
|
||||
<p>在这种情况下,我们正在打印 <code>got_bits</code> 成员。</p>
|
||||
<h3 id="第13课-disksnooppy已修复"><a class="header" href="#第13课-disksnooppy已修复">第13课. disksnoop.py已修复</a></h3>
|
||||
<p>将上一课的 disksnoop.py 修改为使用 <code>block:block_rq_issue</code> 和 <code>block:block_rq_complete</code> 跟踪点。</p>
|
||||
<h3 id="第14课-strlen_countpy"><a class="header" href="#第14课-strlen_countpy">第14课. strlen_count.py.</a></h3>
|
||||
<p>这个程序对用户级函数进行插桩,其中包括 <code>strlen()</code> 库函数,并对其字符串参数进行频率统计。例如输出</p>
|
||||
<pre><code class="language-sh"># strlen_count.py
|
||||
Tracing strlen()... Hit Ctrl-C to end.
|
||||
^C COUNT STRING
|
||||
跟踪 strlen()... 按 Ctrl-C 结束。
|
||||
^C 数量 字符串
|
||||
1 " "
|
||||
1 "/bin/ls"
|
||||
1 "."
|
||||
@@ -637,13 +629,13 @@ Tracing strlen()... Hit Ctrl-C to end.
|
||||
70 "#%^,~:-=?+/}"
|
||||
340 "\x01\x1b]0;root@bgregg-test: ~\x07\x02root@bgregg-test:~# "
|
||||
</code></pre>
|
||||
<p>These are various strings that are being processed by this library function while tracing, along with their frequency counts. <code>strlen()</code> was called on "LC_ALL" 12 times, for example.</p>
|
||||
<p>Code is <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/strlen_count.py">examples/tracing/strlen_count.py</a>:</p>
|
||||
<p>这些是在跟踪时由此库函数处理的各种字符串以及它们的频率计数。例如,"LC_ALL" 被调用了12次。</p>
|
||||
<p>代码在 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/strlen_count.py">examples/tracing/strlen_count.py</a> 中:</p>
|
||||
<pre><code class="language-Python">from __future__ import print_function
|
||||
from bcc import BPF
|
||||
from time import sleep
|
||||
|
||||
# load BPF program
|
||||
# 载入 BPF 程序
|
||||
b = BPF(text="""
|
||||
#include <uapi/linux/ptrace.h>
|
||||
|
||||
@@ -660,7 +652,7 @@ int count(struct pt_regs *ctx) {
|
||||
u64 zero = 0, *val;
|
||||
|
||||
bpf_probe_read_user(&key.c, sizeof(key.c), (void *)PT_REGS_PARM1(ctx));
|
||||
// could also use `counts.increment(key)`
|
||||
// 也可以使用 `counts.increment(key)`
|
||||
val = counts.lookup_or_try_init(&key, &zero);
|
||||
if (val) {
|
||||
(*val)++;
|
||||
@@ -670,35 +662,34 @@ int count(struct pt_regs *ctx) {
|
||||
""")
|
||||
b.attach_uprobe(name="c", sym="strlen", fn_name="count")
|
||||
|
||||
# header
|
||||
print("Tracing strlen()... Hit Ctrl-C to end.")
|
||||
# 头部
|
||||
print("跟踪 strlen()... 按 Ctrl-C 结束。")
|
||||
|
||||
# sleep until Ctrl-C
|
||||
# 睡眠直到按下 Ctrl-C
|
||||
try:
|
||||
sleep(99999999)
|
||||
except KeyboardInterrupt:
|
||||
pass
|
||||
|
||||
# print output
|
||||
print("%10s %s" % ("COUNT", "STRING"))
|
||||
# 打印输出
|
||||
print("%10s %s" % ("数量", "字符串"))
|
||||
counts = b.get_table("counts")
|
||||
for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
|
||||
print("%10d \"%s\"" % (v.value, k.c.encode('string-escape')))
|
||||
</code></pre>
|
||||
<p>Things to learn:</p>
|
||||
<p>要学习的内容:1. <code>PT_REGS_PARM1(ctx)</code>: 这个参数会获取传递给 <code>strlen()</code> 的第一个参数,也就是字符串。</p>
|
||||
<ol>
|
||||
<li><code>PT_REGS_PARM1(ctx)</code>: This fetches the first argument to <code>strlen()</code>, which is the string.</li>
|
||||
<li><code>b.attach_uprobe(name="c", sym="strlen", fn_name="count")</code>: Attach to library "c" (if this is the main program, use its pathname), instrument the user-level function <code>strlen()</code>, and on execution call our C function <code>count()</code>.</li>
|
||||
<li><code>b.attach_uprobe(name="c", sym="strlen", fn_name="count")</code>: 附加到库 "c"(如果这是主程序,则使用其路径名),对用户级函数 <code>strlen()</code> 进行插装,并在执行时调用我们的 C 函数 <code>count()</code>。</li>
|
||||
</ol>
|
||||
<h3 id="lesson-15-nodejs_http_serverpy"><a class="header" href="#lesson-15-nodejs_http_serverpy">Lesson 15. nodejs_http_server.py</a></h3>
|
||||
<p>This program instruments a user statically-defined tracing (USDT) probe, which is the user-level version of a kernel tracepoint. Sample output:</p>
|
||||
<h3 id="第15课nodejs_http_serverpy"><a class="header" href="#第15课nodejs_http_serverpy">第15课。nodejs_http_server.py</a></h3>
|
||||
<p>本程序会对用户静态定义的跟踪 (USDT) 探测点进行插装,这是内核跟踪点的用户级版本。示例输出:</p>
|
||||
<pre><code class="language-sh"># nodejs_http_server.py 24728
|
||||
TIME(s) COMM PID ARGS
|
||||
24653324.561322998 node 24728 path:/index.html
|
||||
24653335.343401998 node 24728 path:/images/welcome.png
|
||||
24653340.510164998 node 24728 path:/images/favicon.png
|
||||
</code></pre>
|
||||
<p>Relevant code from <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/nodejs_http_server.py">examples/tracing/nodejs_http_server.py</a>:</p>
|
||||
<p>来自 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/nodejs_http_server.py">examples/tracing/nodejs_http_server.py</a> 的相关代码:</p>
|
||||
<pre><code class="language-Python">from __future__ import print_function
|
||||
from bcc import BPF, USDT
|
||||
import sys
|
||||
@@ -732,26 +723,17 @@ if debug:
|
||||
# initialize BPF
|
||||
b = BPF(text=bpf_text, usdt_contexts=[u])
|
||||
</code></pre>
|
||||
<p>Things to learn:</p>
|
||||
<p>学习内容:</p>
|
||||
<ol>
|
||||
<li><code>bpf_usdt_readarg(6, ctx, &addr)</code>: Read the address of argument 6 from the USDT probe into <code>addr</code>.</li>
|
||||
<li><code>bpf_probe_read_user(&path, sizeof(path), (void *)addr)</code>: Now the string <code>addr</code> points to into our <code>path</code> variable.</li>
|
||||
<li><code>u = USDT(pid=int(pid))</code>: Initialize USDT tracing for the given PID.</li>
|
||||
<li><code>u.enable_probe(probe="http__server__request", fn_name="do_trace")</code>: Attach our <code>do_trace()</code> BPF C function to the Node.js <code>http__server__request</code> USDT probe.</li>
|
||||
<li><code>b = BPF(text=bpf_text, usdt_contexts=[u])</code>: Need to pass in our USDT object, <code>u</code>, to BPF object creation.</li>
|
||||
<li><code>bpf_usdt_readarg(6, ctx, &addr)</code>: 从 USDT 探测点中读取参数 6 的地址到 <code>addr</code>。</li>
|
||||
<li><code>bpf_probe_read_user(&path, sizeof(path), (void *)addr)</code>: 现在字符串 <code>addr</code> 指向我们的 <code>path</code> 变量。</li>
|
||||
<li><code>u = USDT(pid=int(pid))</code>: 为给定的 PID 初始化 USDT 跟踪。1. <code>u.enable_probe(probe="http__server__request", fn_name="do_trace")</code>: 将我们的 <code>do_trace()</code> BPF C 函数附加到 Node.js 的 <code>http__server__request</code> USDT 探针。</li>
|
||||
<li><code>b = BPF(text=bpf_text, usdt_contexts=[u])</code>: 需要将我们的 USDT 对象 <code>u</code> 传递给 BPF 对象的创建。</li>
|
||||
</ol>
|
||||
<h3 id="lesson-16-task_switchc"><a class="header" href="#lesson-16-task_switchc">Lesson 16. task_switch.c</a></h3>
|
||||
<p>This is an older tutorial included as a bonus lesson. Use this for recap and to reinforce what you've already learned.</p>
|
||||
<p>This is a slightly more complex tracing example than Hello World. This program
|
||||
will be invoked for every task change in the kernel, and record in a BPF map
|
||||
the new and old pids.</p>
|
||||
<p>The C program below introduces a new concept: the prev argument. This
|
||||
argument is treated specially by the BCC frontend, such that accesses
|
||||
to this variable are read from the saved context that is passed by the
|
||||
kprobe infrastructure. The prototype of the args starting from
|
||||
position 1 should match the prototype of the kernel function being
|
||||
kprobed. If done so, the program will have seamless access to the
|
||||
function parameters.</p>
|
||||
<h3 id="第16课-task_switchc"><a class="header" href="#第16课-task_switchc">第16课. task_switch.c</a></h3>
|
||||
<p>这是一个早期的教程,作为额外的课程包含其中。用它来复习和加深你已经学到的内容。</p>
|
||||
<p>这是一个比 Hello World 更复杂的示例程序。该程序将在内核中每次任务切换时被调用,并在一个 BPF 映射中记录新旧进程的 pid。</p>
|
||||
<p>下面的 C 程序引入了一个新的概念:prev 参数。BCC 前端会特殊处理这个参数,从而使得对这个变量的访问从由 kprobe 基础设施传递的保存上下文中进行读取。从位置1开始的参数的原型应该与被 kprobed 的内核函数的原型匹配。如果这样做,程序就可以无缝访问函数参数。</p>
|
||||
<pre><code class="language-c">#include <uapi/linux/ptrace.h>
|
||||
#include <linux/sched.h>
|
||||
|
||||
@@ -776,30 +758,27 @@ int count_sched(struct pt_regs *ctx, struct task_struct *prev) {
|
||||
return 0;
|
||||
}
|
||||
</code></pre>
|
||||
<p>The userspace component loads the file shown above, and attaches it to the
|
||||
<code>finish_task_switch</code> kernel function.
|
||||
The <code>[]</code> operator of the BPF object gives access to each BPF_HASH in the
|
||||
program, allowing pass-through access to the values residing in the kernel. Use
|
||||
the object as you would any other python dict object: read, update, and deletes
|
||||
are all allowed.</p>
|
||||
<p>用户空间组件加载上面显示的文件,并将其附加到 <code>finish_task_switch</code> 内核函数上。
|
||||
BPF 对象的 <code>[]</code> 运算符允许访问程序中的每个 BPF_HASH,允许对内核中的值进行通行访问。可以像使用任何其他 python dict 对象一样使用该对象:读取、更新和删除操作都是允许的。</p>
|
||||
<pre><code class="language-python">from bcc import BPF
|
||||
from time import sleep
|
||||
|
||||
b = BPF(src_file="task_switch.c")
|
||||
b = BPF(src_file="task_switch.c")".```markdown
|
||||
```Chinese
|
||||
b.attach_kprobe(event="finish_task_switch", fn_name="count_sched")
|
||||
|
||||
# generate many schedule events
|
||||
# 生成多个调度事件
|
||||
for i in range(0, 100): sleep(0.01)
|
||||
|
||||
for k, v in b["stats"].items():
|
||||
print("task_switch[%5d->%5d]=%u" % (k.prev_pid, k.curr_pid, v.value))
|
||||
</code></pre>
|
||||
<p>These programs can be found in the files <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/task_switch.c">examples/tracing/task_switch.c</a> and <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/task_switch.py">examples/tracing/task_switch.py</a> respectively.</p>
|
||||
<h3 id="lesson-17-further-study"><a class="header" href="#lesson-17-further-study">Lesson 17. Further Study</a></h3>
|
||||
<p>For further study, see Sasha Goldshtein's <a href="https://github.com/goldshtn/linux-tracing-workshop">linux-tracing-workshop</a>, which contains additional labs. There are also many tools in bcc /tools to study.</p>
|
||||
<p>Please read <a href="https://github.com/iovisor/bcc/tree/master/CONTRIBUTING-SCRIPTS.md">CONTRIBUTING-SCRIPTS.md</a> if you wish to contribute tools to bcc. At the bottom of the main <a href="https://github.com/iovisor/bcc/tree/master/README.md">README.md</a>, you'll also find methods for contacting us. Good luck, and happy tracing!</p>
|
||||
<h2 id="networking"><a class="header" href="#networking">Networking</a></h2>
|
||||
<p>To do.</p>
|
||||
<p>这些程序可以在文件 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/task_switch.c">examples/tracing/task_switch.c</a> 和 <a href="https://github.com/iovisor/bcc/tree/master/examples/tracing/task_switch.py">examples/tracing/task_switch.py</a> 中找到。</p>
|
||||
<h3 id="第17课-进一步研究"><a class="header" href="#第17课-进一步研究">第17课. 进一步研究</a></h3>
|
||||
<p>要进行进一步研究,请参阅 Sasha Goldshtein 的 <a href="https://github.com/goldshtn/linux-tracing-workshop">linux-tracing-workshop</a>,其中包含了额外的实验。bcc/tools 中还有许多工具可供研究。</p>
|
||||
<p>如果您希望为 bcc 贡献工具,请阅读 <a href="https://github.com/iovisor/bcc/tree/master/CONTRIBUTING-SCRIPTS.md">CONTRIBUTING-SCRIPTS.md</a>。在主要的 <a href="https://github.com/iovisor/bcc/tree/master/README.md">README.md</a> 的底部,您还会找到与我们联系的方法。祝您好运,祝您成功追踪!</p>
|
||||
<h2 id="网络"><a class="header" href="#网络">网络</a></h2>
|
||||
<p>TODO</p>
|
||||
|
||||
</main>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user