diff --git a/src/47-cuda-events/README.md b/src/47-cuda-events/README.md
index 4bd54dc..1927b93 100644
--- a/src/47-cuda-events/README.md
+++ b/src/47-cuda-events/README.md
@@ -469,7 +469,7 @@ cudaMemcpyD2H:        383.66 µs
 cudaFree:               0.00 µs
 ```
 
-The tracer adds about 2us overhead to each CUDA API call, which is negligible for most cases.
+The tracer adds about 2us overhead to each CUDA API call, which is negligible for most cases. To further reduce the overhead, you can try using the [bpftime](https://github.com/eunomia-bpf/bpftime) userspace runtime to optimize the eBPF program.
 
 ## Command Line Options
 
diff --git a/src/47-cuda-events/README.zh.md b/src/47-cuda-events/README.zh.md
index 2fc6d4f..0d85fd9 100644
--- a/src/47-cuda-events/README.zh.md
+++ b/src/47-cuda-events/README.zh.md
@@ -469,7 +469,7 @@ cudaMemcpyD2H:        383.66 µs
 cudaFree:               0.00 µs
 ```
 
-追踪器为每个CUDA API调用增加了约2微秒的开销，这对大多数情况来说是可以忽略不计的。
+追踪器为每个CUDA API调用增加了约2微秒的开销，这对大多数情况来说是可以忽略不计的。为了进一步减少开销，你可以尝试使用[bpftime](https://github.com/eunomia-bpf/bpftime)用户空间运行时来优化eBPF程序。
 
 ## 命令行选项