mirror of
https://github.com/eunomia-bpf/bpf-developer-tutorial.git
synced 2026-02-04 18:54:35 +08:00
Deploying to gh-pages from @ eunomia-bpf/bpf-developer-tutorial@483f2fc223 🚀
This commit is contained in:
@@ -166,13 +166,13 @@
|
||||
|
||||
<div id="content" class="content">
|
||||
<main>
|
||||
<h1 id="bcc-tutorial"><a class="header" href="#bcc-tutorial">bcc Tutorial</a></h1>
|
||||
<p>This tutorial covers how to use <a href="https://github.com/iovisor/bcc">bcc</a> tools to quickly solve performance, troubleshooting, and networking issues. If you want to develop new bcc tools, see <a href="tutorial_bcc_python_developer.html">tutorial_bcc_python_developer.md</a> for that tutorial.</p>
|
||||
<p>It is assumed for this tutorial that bcc is already installed, and you can run tools like execsnoop successfully. See <a href="https://github.com/iovisor/bcc/tree/master/INSTALL.md">INSTALL.md</a>. This uses enhancements added to the Linux 4.x series.</p>
|
||||
<h2 id="observability"><a class="header" href="#observability">Observability</a></h2>
|
||||
<p>Some quick wins.</p>
|
||||
<h3 id="0-before-bcc"><a class="header" href="#0-before-bcc">0. Before bcc</a></h3>
|
||||
<p>Before using bcc, you should start with the Linux basics. One reference is the <a href="https://netflixtechblog.com/linux-performance-analysis-in-60-000-milliseconds-accc10403c55">Linux Performance Analysis in 60,000 Milliseconds</a> post, which covers these commands:</p>
|
||||
<h1 id="bcc-教程"><a class="header" href="#bcc-教程">bcc 教程</a></h1>
|
||||
<p>本教程介绍如何使用<a href="https://github.com/iovisor/bcc">bcc</a>工具快速解决性能、故障排除和网络问题。如果你想开发新的bcc工具,请参考<a href="tutorial_bcc_python_developer.html">tutorial_bcc_python_developer.md</a>教程。</p>
|
||||
<p>本教程假设bcc已经安装好,并且你可以成功运行像execsnoop这样的工具。参见<a href="https://github.com/iovisor/bcc/tree/master/INSTALL.md">INSTALL.md</a>。这些功能是在Linux 4.x系列中增加的。</p>
|
||||
<h2 id="可观察性"><a class="header" href="#可观察性">可观察性</a></h2>
|
||||
<p>一些快速的收获。</p>
|
||||
<h3 id="0-使用bcc之前"><a class="header" href="#0-使用bcc之前">0. 使用bcc之前</a></h3>
|
||||
<p>在使用bcc之前,你应该从Linux基础知识开始。可以参考<a href="https://netflixtechblog.com/linux-performance-analysis-in-60-000-milliseconds-accc10403c55">Linux Performance Analysis in 60,000 Milliseconds</a>文章,其中介绍了以下命令:</p>
|
||||
<ol>
|
||||
<li>uptime</li>
|
||||
<li>dmesg | tail</li>
|
||||
@@ -185,12 +185,12 @@
|
||||
<li>sar -n TCP,ETCP 1</li>
|
||||
<li>top</li>
|
||||
</ol>
|
||||
<h3 id="1-general-performance"><a class="header" href="#1-general-performance">1. General Performance</a></h3>
|
||||
<p>Here is a generic checklist for performance investigations with bcc, first as a list, then in detail:</p>
|
||||
<h3 id="1-性能分析"><a class="header" href="#1-性能分析">1. 性能分析</a></h3>
|
||||
<p>这是一个用于性能调查的通用检查清单,首先有一个列表,然后详细描述:</p>
|
||||
<ol>
|
||||
<li>execsnoop</li>
|
||||
<li>opensnoop</li>
|
||||
<li>ext4slower (or btrfs*, xfs*, zfs*)</li>
|
||||
<li>ext4slower(或btrfs*,xfs*,zfs*)</li>
|
||||
<li>biolatency</li>
|
||||
<li>biosnoop</li>
|
||||
<li>cachestat</li>
|
||||
@@ -200,7 +200,7 @@
|
||||
<li>runqlat</li>
|
||||
<li>profile</li>
|
||||
</ol>
|
||||
<p>These tools may be installed on your system under /usr/share/bcc/tools, or you can run them from the bcc github repo under /tools where they have a .py extension. Browse the 50+ tools available for more analysis options.</p>
|
||||
<p>这些工具可能已经安装在你的系统的/usr/share/bcc/tools目录下,或者你可以从bcc github仓库的/tools目录中运行它们,这些工具使用.py扩展名。浏览50多个可用的工具,获得更多的分析选项。</p>
|
||||
<h4 id="11-execsnoop"><a class="header" href="#11-execsnoop">1.1 execsnoop</a></h4>
|
||||
<pre><code class="language-sh"># ./execsnoop
|
||||
PCOMM PID RET ARGS
|
||||
@@ -210,9 +210,8 @@ mkdir 9662 0 /bin/mkdir -p ./main
|
||||
run 9663 0 ./run
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>execsnoop prints one line of output for each new process. Check for short-lived processes. These can consume CPU resources, but not show up in most monitoring tools that periodically take snapshots of which processes are running.</p>
|
||||
<p>It works by tracing exec(), not the fork(), so it will catch many types of new processes but not all (eg, it won't see an application launching working processes, that doesn't exec() anything else).</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/execsnoop_example.txt">examples</a>.</p>
|
||||
<p>execsnoop对于每个新进程打印一行输出。检查短生命周期的进程。这些进程可能会消耗CPU资源,但不会在大多数周期性运行的进程监控工具中显示出来。它通过跟踪<code>exec()</code>来工作,而不是<code>fork()</code>,所以它可以捕获许多类型的新进程,但不是所有类型(例如,它不会看到启动工作进程的应用程序,该应用程序没有<code>exec()</code>其他任何内容)。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/execsnoop_example.txt">例子</a>。</p>
|
||||
<h4 id="12-opensnoop"><a class="header" href="#12-opensnoop">1.2. opensnoop</a></h4>
|
||||
<pre><code class="language-sh"># ./opensnoop
|
||||
PID COMM FD ERR PATH
|
||||
@@ -227,28 +226,27 @@ PID COMM FD ERR PATH
|
||||
1603 snmpd 11 0 /proc/sys/net/ipv6/conf/eth0/forwarding
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>opensnoop prints one line of output for each open() syscall, including details.</p>
|
||||
<p>Files that are opened can tell you a lot about how applications work: identifying their data files, config files, and log files. Sometimes applications can misbehave, and perform poorly, when they are constantly attempting to read files that do not exist. opensnoop gives you a quick look.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/opensnoop_example.txt">examples</a>.</p>
|
||||
<h4 id="13-ext4slower-or-btrfs-xfs-zfs"><a class="header" href="#13-ext4slower-or-btrfs-xfs-zfs">1.3. ext4slower (or btrfs*, xfs*, zfs*)</a></h4>
|
||||
<p>opensnoop每次open() syscall执行时打印一行输出,包括详细信息。</p>
|
||||
<p>打开的文件可以告诉你很多关于应用程序的工作方式的信息:它们的数据文件、配置文件和日志文件。有时候应用程序可能会表现不正常,当它们不断尝试读取不存在的文件时则会表现得很差。opensnoop能够快速帮助你查看。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/opensnoop_example.txt">例子</a>。</p>
|
||||
<h4 id="13-ext4slower或btrfsxfszfs"><a class="header" href="#13-ext4slower或btrfsxfszfs">1.3. ext4slower(或btrfs*,xfs*,zfs*)</a></h4>
|
||||
<pre><code class="language-sh"># ./ext4slower
|
||||
Tracing ext4 operations slower than 10 ms
|
||||
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
|
||||
追踪超过10毫秒的ext4操作
|
||||
时间 进程 进程ID T 字节数 偏移KB 延迟(ms) 文件名
|
||||
06:35:01 cron 16464 R 1249 0 16.05 common-auth
|
||||
06:35:01 cron 16463 R 1249 0 16.04 common-auth
|
||||
06:35:01 cron 16465 R 1249 0 16.03 common-auth
|
||||
06:35:01 cron 16465 R 4096 0 10.62 login.defs
|
||||
06:35:01 cron 16464 R 4096 0 10.61 login.defs
|
||||
</code></pre>
|
||||
<p>ext4slower traces the ext4 file system and times common operations, and then only prints those that exceed a threshold.</p>
|
||||
<p>This is great for identifying or exonerating one type of performance issue: show individually slow disk i/O via the file system. Disks process I/O asynchronously, and it can be difficult to associate latency at that layer with the latency applications experience. Tracing higher up in the kernel stack, at the VFS -> file system interface, will more closely match what an application suffers. Use this tool to identify if file system latency exceeds a given threshold.</p>
|
||||
<p>Similar tools exist in bcc for other file systems: btrfsslower, xfsslower, and zfsslower. There is also fileslower, which works at the VFS layer and traces everything (although at some higher overhead).</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/ext4slower_example.txt">examples</a>.</p>
|
||||
<p>ext4slower跟踪ext4文件系统,并计时常见操作,然后只打印超过阈值的操作。这对于识别或证明一种性能问题非常方便:通过文件系统单独显示较慢的磁盘 I/O。磁盘以异步方式处理 I/O,很难将该层的延迟与应用程序所经历的延迟关联起来。在内核堆栈中更高层的追踪,即在 VFS -> 文件系统接口中,会更接近应用程序遭受的延迟。使用此工具来判断文件系统的延迟是否超过了给定的阈值。</p>
|
||||
<p>在 bcc 中存在其他文件系统的类似工具:btrfsslower、xfsslower 和 zfsslower。还有一个名为 fileslower 的工具,它在 VFS 层工作并跟踪所有内容(尽管会有更高的开销)。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/ext4slower_example.txt">示例</a>。</p>
|
||||
<h4 id="14-biolatency"><a class="header" href="#14-biolatency">1.4. biolatency</a></h4>
|
||||
<pre><code class="language-sh"># ./biolatency
|
||||
Tracing block device I/O... Hit Ctrl-C to end.
|
||||
跟踪块设备的 I/O... 按 Ctrl-C 结束。
|
||||
^C
|
||||
usecs : count distribution
|
||||
微秒 : 数量 分布
|
||||
0 -> 1 : 0 | |
|
||||
2 -> 3 : 0 | |
|
||||
4 -> 7 : 0 | |
|
||||
@@ -263,13 +261,12 @@ Tracing block device I/O... Hit Ctrl-C to end.
|
||||
2048 -> 4095 : 47 |********************************** |
|
||||
4096 -> 8191 : 52 |**************************************|
|
||||
8192 -> 16383 : 36 |************************** |
|
||||
16384 -> 32767 : 15 |********** |
|
||||
32768 -> 65535 : 2 |* |
|
||||
16384 -> 32767 : 15 |********** |。32768 -> 65535 : 2 |* |
|
||||
65536 -> 131071 : 2 |* |
|
||||
</code></pre>
|
||||
<p>biolatency traces disk I/O latency (time from device issue to completion), and when the tool ends (Ctrl-C, or a given interval), it prints a histogram summary of the latency.</p>
|
||||
<p>This is great for understanding disk I/O latency beyond the average times given by tools like iostat. I/O latency outliers will be visible at the end of the distribution, as well as multi-mode distributions.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/biolatency_example.txt">examples</a>.</p>
|
||||
<p>biolatency跟踪磁盘I/O延迟(从设备执行到完成的时间),当工具结束(Ctrl-C,或给定的间隔)时,它会打印延迟的直方图摘要。</p>
|
||||
<p>这对于了解超出iostat等工具提供的平均时间的磁盘I/O延迟非常有用。在分布的末尾将可见I/O延迟的异常值,以及多种模式的分布。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/biolatency_example.txt">示例</a>。</p>
|
||||
<h4 id="15-biosnoop"><a class="header" href="#15-biosnoop">1.5. biosnoop</a></h4>
|
||||
<pre><code class="language-sh"># ./biosnoop
|
||||
TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
|
||||
@@ -281,23 +278,24 @@ TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
|
||||
1.022568002 supervise 1950 xvda1 W 13188496 4096 0.93
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>biosnoop prints a line of output for each disk I/O, with details including latency (time from device issue to completion).</p>
|
||||
<p>This allows you to examine disk I/O in more detail, and look for time-ordered patterns (eg, reads queueing behind writes). Note that the output will be verbose if your system performs disk I/O at a high rate.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/biosnoop_example.txt">examples</a>.</p>
|
||||
<p>biosnoop为每个磁盘I/O打印一行输出,其中包括延迟(从设备执行到完成的时间)等详细信息。</p>
|
||||
<p>这让您可以更详细地研究磁盘I/O,并寻找按时间排序的模式(例如,读取在写入后排队)。请注意,如果您的系统以高速率执行磁盘I/O,则输出将冗长。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/biosnoop_example.txt">示例</a>。</p>
|
||||
<h4 id="16-cachestat"><a class="header" href="#16-cachestat">1.6. cachestat</a></h4>
|
||||
<pre><code class="language-sh"># ./cachestat
|
||||
HITS MISSES DIRTIES READ_HIT% WRITE_HIT% BUFFERS_MB CACHED_MB
|
||||
1074 44 13 94.9% 2.9% 1 223
|
||||
2195 170 8 92.5% 6.8% 1 143
|
||||
182 53 56 53.6% 1.3% 1 143
|
||||
62480 40960 20480 40.6% 19.8% 1 223
|
||||
7 2 5 22.2% 22.2% 1 223
|
||||
62480 40960 20480 40.6% 19.8% 1 223"。
|
||||
格式:仅返回翻译后的内容,不包括原始文本。```
|
||||
7 2 5 22.2% 22.2% 1 223
|
||||
348 0 0 100.0% 0.0% 1 223
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>cachestat prints a one line summary every second (or every custom interval) showing statistics from the file system cache.</p>
|
||||
<p>Use this to identify a low cache hit ratio, and a high rate of misses: which gives one lead for performance tuning.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/cachestat_example.txt">examples</a>.</p>
|
||||
<p>cachestat 每秒(或每个自定义时间间隔)打印一行摘要,显示文件系统缓存的统计信息。</p>
|
||||
<p>可以用它来识别低缓存命中率和高缺失率,这是性能调优的线索之一。</p>
|
||||
<p>更多 <a href="https://github.com/iovisor/bcc/tree/master/tools/cachestat_example.txt">示例</a>。</p>
|
||||
<h4 id="17-tcpconnect"><a class="header" href="#17-tcpconnect">1.7. tcpconnect</a></h4>
|
||||
<pre><code class="language-sh"># ./tcpconnect
|
||||
PID COMM IP SADDR DADDR DPORT
|
||||
@@ -308,9 +306,9 @@ PID COMM IP SADDR DADDR DPORT
|
||||
2015 ssh 6 fe80::2000:bff:fe82:3ac fe80::2000:bff:fe82:3ac 22
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>tcpconnect prints one line of output for every active TCP connection (eg, via connect()), with details including source and destination addresses.</p>
|
||||
<p>Look for unexpected connections that may point to inefficiencies in application configuration, or an intruder.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/tcpconnect_example.txt">examples</a>.</p>
|
||||
<p>tcpconnect 每个活动的 TCP 连接(例如通过 connect())打印一行输出,包括源地址和目标地址的详细信息。</p>
|
||||
<p>寻找可能指向应用程序配置问题或入侵者的意外连接。</p>
|
||||
<p>更多 <a href="https://github.com/iovisor/bcc/tree/master/tools/tcpconnect_example.txt">示例</a>。</p>
|
||||
<h4 id="18-tcpaccept"><a class="header" href="#18-tcpaccept">1.8. tcpaccept</a></h4>
|
||||
<pre><code class="language-sh"># ./tcpaccept
|
||||
PID COMM IP RADDR LADDR LPORT
|
||||
@@ -319,25 +317,25 @@ PID COMM IP RADDR LADDR LPORT
|
||||
5389 perl 6 1234:ab12:2040:5020:2299:0:5:0 1234:ab12:2040:5020:2299:0:5:0 7001
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>tcpaccept prints one line of output for every passive TCP connection (eg, via accept()), with details including source and destination addresses.</p>
|
||||
<p>Look for unexpected connections that may point to inefficiencies in application configuration, or an intruder.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/tcpaccept_example.txt">examples</a>.</p>
|
||||
<p>tcpaccept 每个被动的 TCP 连接(例如通过 accept())打印一行输出,包括源地址和目标地址的详细信息。</p>
|
||||
<p>寻找可能指向应用程序配置问题或入侵者的意外连接。</p>
|
||||
<p>更多 <a href="https://github.com/iovisor/bcc/tree/master/tools/tcpaccept_example.txt">示例</a>。</p>
|
||||
<h4 id="19-tcpretrans"><a class="header" href="#19-tcpretrans">1.9. tcpretrans</a></h4>
|
||||
<pre><code class="language-sh"># ./tcpretrans
|
||||
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
|
||||
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
|
||||
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
|
||||
01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED
|
||||
<pre><code class="language-sh"># ./tcpretrans".
|
||||
```时间 PID IP LADDR:LPORT T> RADDR:RPORT 状态
|
||||
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 已建立
|
||||
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 已建立
|
||||
01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 已建立
|
||||
[...]
|
||||
</code></pre>
|
||||
<p>tcprerans prints one line of output for every TCP retransmit packet, with details including source and destination addresses, and kernel state of the TCP connection.</p>
|
||||
<p>TCP retransmissions cause latency and throughput issues. For ESTABLISHED retransmits, look for patterns with networks. For SYN_SENT, this may point to target kernel CPU saturation and kernel packet drops.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/tcpretrans_example.txt">examples</a>.</p>
|
||||
<p>tcpretrans为每个TCP重传数据包打印一行输出,其中包括源地址、目的地址以及TCP连接的内核状态。</p>
|
||||
<p>TCP重传会导致延迟和吞吐量问题。对于已建立的重传,可以查找与网络有关的模式。对于SYN_SENT,可能指向目标内核CPU饱和和内核数据包丢失。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/tcpretrans_example.txt">示例</a>。</p>
|
||||
<h4 id="110-runqlat"><a class="header" href="#110-runqlat">1.10. runqlat</a></h4>
|
||||
<pre><code class="language-sh"># ./runqlat
|
||||
Tracing run queue latency... Hit Ctrl-C to end.
|
||||
跟踪运行队列延迟... 按Ctrl-C结束。
|
||||
^C
|
||||
usecs : count distribution
|
||||
微秒数 : 计数 分布
|
||||
0 -> 1 : 233 |*********** |
|
||||
2 -> 3 : 742 |************************************ |
|
||||
4 -> 7 : 203 |********** |
|
||||
@@ -351,19 +349,17 @@ Tracing run queue latency... Hit Ctrl-C to end.
|
||||
1024 -> 2047 : 27 |* |
|
||||
2048 -> 4095 : 30 |* |
|
||||
4096 -> 8191 : 20 | |
|
||||
8192 -> 16383 : 29 |* |
|
||||
16384 -> 32767 : 809 |****************************************|
|
||||
32768 -> 65535 : 64 |*** |
|
||||
8192 -> 16383 : 29 |* |".16384 -> 32767 : 809 |****************************************|
|
||||
32768 -> 65535 : 64 |*** |
|
||||
</code></pre>
|
||||
<p>runqlat times how long threads were waiting on the CPU run queues, and prints this as a histogram.</p>
|
||||
<p>This can help quantify time lost waiting for a turn on CPU, during periods of CPU saturation.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/runqlat_example.txt">examples</a>.</p>
|
||||
<h4 id="111-profile"><a class="header" href="#111-profile">1.11. profile</a></h4>
|
||||
<p>这可以帮助量化在CPU饱和期间等待获取CPU的时间损失。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/runqlat_example.txt">示例</a>。</p>
|
||||
<h4 id="111-分析"><a class="header" href="#111-分析">1.11. 分析</a></h4>
|
||||
<pre><code class="language-sh"># ./profile
|
||||
Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
|
||||
以每秒49次的频率对所有线程进行采样,包括用户和内核栈...按Ctrl-C结束。
|
||||
^C
|
||||
00007f31d76c3251 [unknown]
|
||||
47a2c1e752bf47f7 [unknown]
|
||||
00007f31d76c3251 [未知]
|
||||
47a2c1e752bf47f7 [未知]
|
||||
- sign-file (8877)
|
||||
1
|
||||
|
||||
@@ -381,7 +377,7 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
|
||||
0000000000400542 func_a
|
||||
0000000000400598 main
|
||||
00007f12a133e830 __libc_start_main
|
||||
083e258d4c544155 [unknown]
|
||||
083e258d4c544155 [未知]
|
||||
- func_ab (13549)
|
||||
5
|
||||
|
||||
@@ -396,20 +392,19 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
|
||||
- swapper/1 (0)
|
||||
75
|
||||
</code></pre>
|
||||
<p>profile is a CPU profiler, which takes samples of stack traces at timed intervals, and prints a summary of unique stack traces and a count of their occurrence.</p>
|
||||
<p>Use this tool to understand the code paths that are consuming CPU resources.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/profile_example.txt">examples</a>.</p>
|
||||
<h3 id="2-observability-with-generic-tools"><a class="header" href="#2-observability-with-generic-tools">2. Observability with Generic Tools</a></h3>
|
||||
<p>In addition to the above tools for performance tuning, below is a checklist for bcc generic tools, first as a list, and in detail:</p>
|
||||
<p>profile是一个CPU分析工具,它在定时间隔内采样堆栈跟踪,并打印唯一堆栈跟踪的摘要及其出现次数。</p>
|
||||
<p>使用此工具来了解消耗CPU资源的代码路径。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/profile_example.txt">示例</a>。</p>
|
||||
<h3 id="2-使用通用工具进行可观察性"><a class="header" href="#2-使用通用工具进行可观察性">2. 使用通用工具进行可观察性</a></h3>
|
||||
<p>除了上述用于性能调整的工具外,下面是一个bcc通用工具的清单,首先是一个列表,然后详细说明:</p>
|
||||
<ol>
|
||||
<li>trace</li>
|
||||
<li>argdist</li>
|
||||
<li>funccount</li>
|
||||
<li>funccount这些通用工具可能有助于解决您特定问题的可视化。</li>
|
||||
</ol>
|
||||
<p>These generic tools may be useful to provide visibility to solve your specific problems.</p>
|
||||
<h4 id="21-trace"><a class="header" href="#21-trace">2.1. trace</a></h4>
|
||||
<h5 id="example-1"><a class="header" href="#example-1">Example 1</a></h5>
|
||||
<p>Suppose you want to track file ownership change. There are three syscalls, <code>chown</code>, <code>fchown</code> and <code>lchown</code> which users can use to change file ownership. The corresponding syscall entry is <code>SyS_[f|l]chown</code>. The following command can be used to print out syscall parameters and the calling process user id. You can use <code>id</code> command to find the uid of a particular user.</p>
|
||||
<h4 id="21-跟踪"><a class="header" href="#21-跟踪">2.1. 跟踪</a></h4>
|
||||
<h5 id="示例-1"><a class="header" href="#示例-1">示例 1</a></h5>
|
||||
<p>假设您想要跟踪文件所有权更改。有三个系统调用,<code>chown</code>、<code>fchown</code>和<code>lchown</code>,用户可以使用它们来更改文件所有权。相应的系统调用入口是<code>SyS_[f|l]chown</code>。可以使用以下命令打印系统调用参数和调用进程的用户ID。您可以使用<code>id</code>命令查找特定用户的UID。</p>
|
||||
<pre><code class="language-sh">$ trace.py \
|
||||
'p::SyS_chown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
|
||||
'p::SyS_fchown "fd = %d, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
|
||||
@@ -421,14 +416,14 @@ PID TID COMM FUNC -
|
||||
1269442 1269442 zstd SyS_chown file = /tmp/dotsync-gzp413o_/dotsync-package.zst, to_uid = 128203, to_gid = 100, from_uid = 128203
|
||||
1269255 1269255 python3.6 SyS_lchown file = /tmp/dotsync-whx4fivm/tmp/.bash_profile, to_uid = 128203, to_gid = 100, from_uid = 128203
|
||||
</code></pre>
|
||||
<h5 id="example-2"><a class="header" href="#example-2">Example 2</a></h5>
|
||||
<p>Suppose you want to count nonvoluntary context switches (<code>nvcsw</code>) in your bpf based performance monitoring tools and you do not know what is the proper method. <code>/proc/<pid>/status</code> already tells you the number (<code>nonvoluntary_ctxt_switches</code>) for a pid and you can use <code>trace.py</code> to do a quick experiment to verify your method. With kernel source code, the <code>nvcsw</code> is counted at file <code>linux/kernel/sched/core.c</code> function <code>__schedule</code> and under condition</p>
|
||||
<pre><code class="language-c">!(!preempt && prev->state) // i.e., preempt || !prev->state
|
||||
<h5 id="示例-2"><a class="header" href="#示例-2">示例 2</a></h5>
|
||||
<p>假设您想要统计基于bpf的性能监控工具中的非自愿上下文切换(<code>nvcsw</code>),而您不知道正确的方法是什么。<code>/proc/<pid>/status</code>已经告诉您进程的非自愿上下文切换(<code>nonvoluntary_ctxt_switches</code>)的数量,并且您可以使用<code>trace.py</code>进行快速实验以验证您的方法。根据内核源代码,<code>nvcsw</code>在文件<code>linux/kernel/sched/core.c</code>的<code>__schedule</code>函数中计数,并满足以下条件:</p>
|
||||
<pre><code class="language-c">.!(!preempt && prev->state) // 即 preempt || !prev->state
|
||||
</code></pre>
|
||||
<p>The <code>__schedule</code> function is marked as <code>notrace</code>, and the best place to evaluate the above condition seems in <code>sched/sched_switch</code> tracepoint called inside function <code>__schedule</code> and defined in <code>linux/include/trace/events/sched.h</code>. <code>trace.py</code> already has <code>args</code> being the pointer to the tracepoint <code>TP_STRUCT__entry</code>. The above condition in function <code>__schedule</code> can be represented as</p>
|
||||
<p><code>__schedule</code> 函数被标记为 <code>notrace</code> ,评估上述条件的最佳位置似乎在函数 <code>__schedule</code> 内部的 <code>sched/sched_switch</code> 跟踪点中,并且在 <code>linux/include/trace/events/sched.h</code> 中定义。<code>trace.py</code> 已经将 <code>args</code> 设置为跟踪点 <code>TP_STRUCT__entry</code> 的指针。函数 <code>__schedule</code> 中的上述条件可以表示为</p>
|
||||
<pre><code class="language-c">args->prev_state == TASK_STATE_MAX || args->prev_state == 0
|
||||
</code></pre>
|
||||
<p>The below command can be used to count the involuntary context switches (per process or per pid) and compare to <code>/proc/<pid>/status</code> or <code>/proc/<pid>/task/<task_id>/status</code> for correctness, as in typical cases, involuntary context switches are not very common.</p>
|
||||
<p>可以使用以下命令来计算非自愿上下文切换(每个进程或每个进程ID),并与 <code>/proc/<pid>/status</code> 或 <code>/proc/<pid>/task/<task_id>/status</code> 进行比较,以确保正确性,因为在典型情况下,非自愿上下文切换并不常见。</p>
|
||||
<pre><code class="language-sh">$ trace.py -p 1134138 't:sched:sched_switch (args->prev_state == TASK_STATE_MAX || args->prev_state == 0)'
|
||||
PID TID COMM FUNC
|
||||
1134138 1134140 contention_test sched_switch
|
||||
@@ -440,45 +435,44 @@ PID TID COMM FUNC
|
||||
1134138 1134140 contention_test sched_switch
|
||||
...
|
||||
</code></pre>
|
||||
<h5 id="example-3"><a class="header" href="#example-3">Example 3</a></h5>
|
||||
<p>This example is related to issue <a href="https://github.com/iovisor/bcc/issues/1231">1231</a> and <a href="https://github.com/iovisor/bcc/issues/1516">1516</a> where uprobe does not work at all in certain cases. First, you can do a <code>strace</code> as below</p>
|
||||
<h5 id="示例-3"><a class="header" href="#示例-3">示例 3</a></h5>
|
||||
<p>此示例与问题 <a href="https://github.com/iovisor/bcc/issues/1231">1231</a> 和 <a href="https://github.com/iovisor/bcc/issues/1516">1516</a> 相关,其中在某些情况下,uprobes 完全无法工作。首先,你可以执行以下 <code>strace</code></p>
|
||||
<pre><code class="language-sh">$ strace trace.py 'r:bash:readline "%s", retval'
|
||||
...
|
||||
perf_event_open(0x7ffd968212f0, -1, 0, -1, 0x8 /* PERF_FLAG_??? */) = -1 EIO (Input/output error)
|
||||
...
|
||||
</code></pre>
|
||||
<p>The <code>perf_event_open</code> syscall returns <code>-EIO</code>. Digging into kernel uprobe related codes in <code>/kernel/trace</code> and <code>/kernel/events</code> directories to search <code>EIO</code>, the function <code>uprobe_register</code> is the most suspicious. Let us find whether this function is called or not and what is the return value if it is called. In one terminal using the following command to print out the return value of uprobe_register,</p>
|
||||
<pre><code class="language-sh">$ trace.py 'r::uprobe_register "ret = %d", retval'
|
||||
<p><code>perf_event_open</code>系统调用返回<code>-EIO</code>。在<code>/kernel/trace</code>和<code>/kernel/events</code>目录中查找与<code>EIO</code>相关的内核uprobe代码,函数<code>uprobe_register</code>最可疑。让我们找出是否调用了这个函数,如果调用了,返回值是什么。在一个终端中使用以下命令打印出<code>uprobe_register</code>的返回值:</p>
|
||||
<pre><code class="language-sh">trace.py 'r::uprobe_register "ret = %d", retval'
|
||||
</code></pre>
|
||||
<p>In another terminal run the same bash uretprobe tracing example, and you should get</p>
|
||||
<p>在另一个终端中运行相同的bash uretprobe跟踪示例,您应该得到:</p>
|
||||
<pre><code class="language-sh">$ trace.py 'r::uprobe_register "ret = %d", retval'
|
||||
PID TID COMM FUNC -
|
||||
1041401 1041401 python2.7 uprobe_register ret = -5
|
||||
</code></pre>
|
||||
<p>The <code>-5</code> error code is EIO. This confirms that the following code in function <code>uprobe_register</code> is the most suspicious culprit.</p>
|
||||
<p>错误代码<code>-5</code>是EIO。这证实了函数<code>uprobe_register</code>中的以下代码是最可疑的罪魁祸首。</p>
|
||||
<pre><code class="language-c"> if (!inode->i_mapping->a_ops->readpage && !shmem_mapping(inode->i_mapping))
|
||||
return -EIO;
|
||||
</code></pre>
|
||||
<p>The <code>shmem_mapping</code> function is defined as</p>
|
||||
<p><code>shmem_mapping</code>函数定义如下:</p>
|
||||
<pre><code class="language-c">bool shmem_mapping(struct address_space *mapping)
|
||||
{
|
||||
return mapping->a_ops == &shmem_aops;
|
||||
}
|
||||
</code></pre>
|
||||
<p>To confirm the theory, find what is <code>inode->i_mapping->a_ops</code> with the following command</p>
|
||||
<p>为了确认这个理论,使用以下命令找出<code>inode->i_mapping->a_ops</code>的值:</p>
|
||||
<pre><code class="language-sh">$ trace.py -I 'linux/fs.h' 'p::uprobe_register(struct inode *inode) "a_ops = %llx", inode->i_mapping->a_ops'
|
||||
PID TID COMM FUNC -
|
||||
814288 814288 python2.7 uprobe_register a_ops = ffffffff81a2adc0
|
||||
^C$ grep ffffffff81a2adc0 /proc/kallsyms
|
||||
ffffffff81a2adc0 R empty_aops
|
||||
</code></pre>
|
||||
<p>The kernel symbol <code>empty_aops</code> does not have <code>readpage</code> defined and hence the above suspicious condition is true. Further examining the kernel source code shows that <code>overlayfs</code> does not provide its own <code>a_ops</code> while some other file systems (e.g., ext4) define their own <code>a_ops</code> (e.g., <code>ext4_da_aops</code>), and <code>ext4_da_aops</code> defines <code>readpage</code>. Hence, uprobe works fine on ext4 while not on overlayfs.</p>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/trace_example.txt">examples</a>.</p>
|
||||
<h4 id="22-argdist"><a class="header" href="#22-argdist">2.2. argdist</a></h4>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/argdist_example.txt">examples</a>.</p>
|
||||
<p>内核符号<code>empty_aops</code>没有定义<code>readpage</code>,因此上述可疑条件为真。进一步检查内核源代码显示,<code>overlayfs</code>没有提供自己的<code>a_ops</code>,而其他一些文件系统(例如ext4)定义了自己的<code>a_ops</code>(例如<code>ext4_da_aops</code>),并且<code>ext4_da_aops</code>定义了<code>readpage</code>。因此,uprobe对于ext4正常工作,但在overlayfs上不正常工作。</p>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/trace_example.txt">示例</a>。</p>
|
||||
<h4 id="22-argdist更多示例"><a class="header" href="#22-argdist更多示例">2.2. argdist"。更多<a href="https://github.com/iovisor/bcc/tree/master/tools/argdist_example.txt">示例</a></a></h4>
|
||||
<h4 id="23-funccount"><a class="header" href="#23-funccount">2.3. funccount</a></h4>
|
||||
<p>More <a href="https://github.com/iovisor/bcc/tree/master/tools/funccount_example.txt">examples</a>.</p>
|
||||
<h2 id="networking"><a class="header" href="#networking">Networking</a></h2>
|
||||
<p>更多<a href="https://github.com/iovisor/bcc/tree/master/tools/funccount_example.txt">示例</a>.</p>
|
||||
<h2 id="网络"><a class="header" href="#网络">网络</a></h2>
|
||||
<p>To do.</p>
|
||||
|
||||
</main>
|
||||
|
||||
Reference in New Issue
Block a user