Apr 13, 2022 9 min read ebpf

ebpf user-space probes 原理探究

通过 ebpf 探究 uprobe 原理

User-space probes 简称 Uprobes，它能够动态的介入应用程序的任意函数，采集调试和性能信息，且不引起混乱。目前，用户态探针有两种类型： uprobes 和 uretprobes（也叫 return 探针）。可以在应用程序的虚拟地址空间的任意指令上插入 uprobe，当用户函数返回的时候触发 uretprobe。

插入 uprobe 探针需要的信息，例如进程、插入位置、探针 handler 等可以通过注册函数进行指定，例如 register_uprobe()。uprobe 通过内核模块，ebpf 事件等方式进行工作，后文会简单尝试总结 epbf 使用 uprobe 的工作原理，在这之前先看一下 uprobe 的工作原理。

`uprobe` 工作原理

uprobe

当一个 uprobe 被注册后，Uprobes 会创建一个被探测指令的副本，停止被探测的应用程序，用断点指令替换被探测指令的首字节（在 i386 和 x86_64 上是 int3），之后让应用程序继续运行。（在插入断点的时候，Uprobes 使用与 ptrace 使用的相同的 copy on write 机制，这样断点也只影响那个进程，不会影响其他运行相同程序的进程。甚至是被探测的指令在共享库中也一样。）

当 CPU 命中断点指令的时候，发生了一个软件中断 trap，CPU 用户模式的寄存器都被保存起来，产生了一个 SIGTRAP 信号。Uprobes 拦截 SIGTRAP 信号，找到关联的 uprobe。然后，用 uprobe 结构体和先前保存的寄存器地址调用与 uprobe 关联的回调函数。这个回调函数可能会阻塞，但要记住回调函数执行期间，被探测的线程一直是停止的。

接下来，Uprobes 会单步执行被探测指令的副本，之后会恢复被探测的程序，让它在探测点之后的指令处继续执行。（实际上单步执行原始指令会更简单，但之后，Uprobes 必须移除断点指令。这在多线程应用程序中会引起问题。比如，当另一个线程执行过探测点的时会打开一个时间窗口。）

被单步执行的指令副本存储在每个进程的"单步跳出（SSOL）区域"中，它是由 Uprobes 在每个被探测进程的地址空间中创建的很小的 VM 区域。

utrace

对于同一个应用进程有多个 uprobe 探针的时候，Uprobes 用 Utrace 为进程中每个线程建立了一个追踪"引擎"。Uprobes 使用 Utrace “静默"机制，在插入或移除断点之前停止所有线程。Utrace 在被探测进程的生命周期中（fork, clone, exec, exit），通知 Uprobes 断点和单步执行陷阱以及其他感兴趣的事件。注册或注销探针的时候，要等到 Utrace 停止了进程中的所有线程后，才会插入或删除断点。注册/注销函数在断点已经被插入或移除之后才返回

uretprobe

如果想使用 uretprobe 探针，需要调用 register_uretprobe() 函数，此时 Uprobes 在函数的入口处创建一个 uprobe ，当调用被探测函数的时候命中这个探针，Uprobes 会保存 return 地址的一个副本，然后用"蹦床"的地址替换 return 地址（一段包含一个断点指令代码）。蹦床存储在 SSOL 区域中。

当被探测的函数执行它的 return 指令时，控制转移到蹦床，命中断点。Uprobes 的蹦床回调函数调用与 uretprobe 关联的回调函数，然后把已保存的指令指针设置为已保存的 return 地址，再然后就从 trap 返回后的地方恢复执行。

多线程支持

Uprobes 支持多线程应用的探测。Uprobes 在被探测的应用中没有线程数量的限制。在单个进程中的所有线程，使用相同的进程资源（上下文），所以进程中的每个探针，会影响所有线程，另外每个线程命中探测点（以及运行回调函数）是相对独立的。多个线程可能同时运行相同的回调函数。如果你想要一个特定的线程或是一组线程运行一个特定的回调函数，那回调函数应该检查 current 或 current->pid 来确认哪个线程命中了探测点。当进程克隆一个新的线程时，该线程自动的共享所有为进程创建的探针。

uprobe 和 ebpf

uprobe 作为内核提供的一种收集用户态程序运行信息的框架，以前需要通过 kernel module 开发来实现（主要是 handler 即回调函数）。ebpf 出现之后重新定义了 kernel 开发的方式，所以这里尝试整理一下，如何通过 ebpf 开发实现利通 uprobe 探针动态跟踪用户进程的信息收集。

ebpf 监听 uprobe events 原理

ebpf 通过勾子来过滤识别感兴趣的事件，例如系统调用事件等，勾子原理简单理解就是：

每个函数编译后地址的前 5 个字节都是 callq function+0x5，将函数入口地址的前5个字节修改成 jmp Hook_ptr 即可实现事件触发点（Hook_ptr 不是勾子函数地址，需要考虑字节保留和堆栈平衡等影响）

只要我们替换函数的入口为一个断点指令（int3），然后在断点处理程序中调用定制的监听程序，之后再调用实际的原程序即可完成通过 epbf 监听 uprobe 事件。

ebpf 设置 uprobe 探针原理

如下跟踪 goroutine 创建的 uprobe 探针工具实现：

package main

import (
	"encoding/binary"
	"flag"
	"fmt"
	"os"
	"os/signal"

	"github.com/iovisor/gobpf/bcc"
)

const bpfProgram = `
#include <uapi/linux/ptrace.h>

BPF_PERF_OUTPUT(trace);

typedef struct {
	int num;
	long fn_ptr;
}newproc_args;

// This function will be registered to be called everytime
// runtime.newproc is called.
inline int newprocCalled(struct pt_regs *ctx) {
  // function address
  long val = ctx->ax;
  trace.perf_submit(ctx, &val, sizeof(val));

  /*
  void* stackAddr = (void*)ctx->sp;
  newproc_args event = {};
  bpf_probe_read(&event.num, sizeof(event.num), stackAddr+8);
  bpf_probe_read(&event.fn_ptr, sizeof(event.fn_ptr), stackAddr+16);

  trace.perf_submit(ctx, &event, sizeof(event));
  */

  return 0;
}
`

var binaryProg string

func init() {
	flag.StringVar(&binaryProg, "binary", "", "The binary to probe")
}

func main() {
	flag.Parse()
	if len(binaryProg) == 0 {
		panic("Argument --binary needs to be specified")
	}

	bccMod := bcc.NewModule(bpfProgram, []string{})
	uprobeFD, err := bccMod.LoadUprobe("newprocCalled")
	if err != nil {
		panic(err)
	}

	// Attach the uprobe to be called everytime main.computeE is called.
	// We need to specify the path to the binary so it can be patched.
	err = bccMod.AttachUprobe(binaryProg, "runtime.newproc", uprobeFD, -1)
	if err != nil {
		panic(err)
	}

	// Create the output table named "trace" that the BPF program writes to.
	table := bcc.NewTable(bccMod.TableId("trace"), bccMod)
	ch := make(chan []byte)

	pm, err := bcc.InitPerfMap(table, ch, nil)
	if err != nil {
		panic(err)
	}

	// Watch Ctrl-C so we can quit this program.
	intCh := make(chan os.Signal, 1)
	signal.Notify(intCh, os.Interrupt)

	pm.Start()
	defer pm.Stop()

	for {
		select {
		case <-intCh:
			fmt.Println("Terminating")
			os.Exit(0)
		case v := <-ch:
			// This is a bit of hack, but we know that iterations is a
			// 8 bytes int64 value.
			fmt.Println("get perf event ", v)
			d := binary.LittleEndian.Uint64(v)
			fmt.Printf("Value = %x\n", d)
		}
	}
}

大致流程如下：

编写探针回调函数即 bpf 程序，获取事件中感兴趣的数据，提交到数据通路（例如 perf buffer）
attach_uprobe 加载 bpf 程序，同时设置感兴趣的函数符号（如runtime.newproc）和回调函数（如 newprocCalled)
监听 perf buffer 获取相应事件输出

具体的原理如下 attach_uprobe 代码所示：

StatusTuple BPF::attach_uprobe(const std::string& binary_path,
                               const std::string& symbol,
                               const std::string& probe_func,
                               uint64_t symbol_addr,
                               bpf_probe_attach_type attach_type, pid_t pid,
                               uint64_t symbol_offset,
                               uint32_t ref_ctr_offset) {

  if (symbol_addr != 0 && symbol_offset != 0)
    return StatusTuple(-1,
             "Attachng uprobe with addr %lx and offset %lx is not supported",
             symbol_addr, symbol_offset);

  std::string module;
  uint64_t offset;
  TRY2(check_binary_symbol(binary_path, symbol, symbol_addr, module, offset,
                           symbol_offset));

  std::string probe_event = get_uprobe_event(module, offset, attach_type, pid);
  if (uprobes_.find(probe_event) != uprobes_.end())
    return StatusTuple(-1, "uprobe %s already attached", probe_event.c_str());

  int probe_fd;
  TRY2(load_func(probe_func, BPF_PROG_TYPE_KPROBE, probe_fd));

  int res_fd = bpf_attach_uprobe(probe_fd, attach_type, probe_event.c_str(),
                                 binary_path.c_str(), offset, pid,
                                 ref_ctr_offset);

  if (res_fd < 0) {
    TRY2(unload_func(probe_func));
    return StatusTuple(
        -1,
        "Unable to attach %suprobe for binary %s symbol %s addr %lx "
        "offset %lx using %s\n",
        attach_type_debug(attach_type).c_str(), binary_path.c_str(),
        symbol.c_str(), symbol_addr, symbol_offset, probe_func.c_str());
  }

  open_probe_t p = {};
  p.perf_event_fd = res_fd;
  p.func = probe_func;
  uprobes_[probe_event] = std::move(p);
  return StatusTuple::OK();
}

bpf_attach_uprobe 是通过读写 tracing debugfs 接口实现 uprobe 相应的配置，具体如下代码所示：

// config1 could be either kprobe_func or uprobe_path,
// see bpf_try_perf_event_open_with_probe().
static int bpf_attach_probe(int progfd, enum bpf_probe_attach_type attach_type,
                            const char *ev_name, const char *config1, const char* event_type,
                            uint64_t offset, pid_t pid, int maxactive,
                            uint32_t ref_ctr_offset)
{
  int kfd, pfd = -1;
  char buf[PATH_MAX], fname[256];
  bool is_kprobe = strncmp("kprobe", event_type, 6) == 0;

  if (maxactive <= 0)
    // Try create the [k,u]probe Perf Event with perf_event_open API.
    pfd = bpf_try_perf_event_open_with_probe(config1, offset, pid, event_type,
                                             attach_type != BPF_PROBE_ENTRY,
                                             ref_ctr_offset);

  // If failed, most likely Kernel doesn't support the perf_kprobe PMU
  // (e12f03d "perf/core: Implement the 'perf_kprobe' PMU") yet.
  // Try create the event using debugfs.
  if (pfd < 0) {
    if (create_probe_event(buf, ev_name, attach_type, config1, offset,
                           event_type, pid, maxactive) < 0)
      goto error;

    // If we're using maxactive, we need to check that the event was created
    // under the expected name.  If debugfs doesn't support maxactive yet
    // (kernel < 4.12), the event is created under a different name; we need to
    // delete that event and start again without maxactive.
    if (is_kprobe && maxactive > 0 && attach_type == BPF_PROBE_RETURN) {
      if (snprintf(fname, sizeof(fname), "%s/id", buf) >= sizeof(fname)) {
        fprintf(stderr, "filename (%s) is too long for buffer\n", buf);
        goto error;
      }
      if (access(fname, F_OK) == -1) {
        // Deleting kprobe event with incorrect name.
        kfd = open("/sys/kernel/debug/tracing/kprobe_events",
                   O_WRONLY | O_APPEND, 0);
        if (kfd < 0) {
          fprintf(stderr, "open(/sys/kernel/debug/tracing/kprobe_events): %s\n",
                  strerror(errno));
          return -1;
        }
        snprintf(fname, sizeof(fname), "-:kprobes/%s_0", ev_name);
        if (write(kfd, fname, strlen(fname)) < 0) {
          if (errno == ENOENT)
            fprintf(stderr, "cannot detach kprobe, probe entry may not exist\n");
          else
            fprintf(stderr, "cannot detach kprobe, %s\n", strerror(errno));
          close(kfd);
          goto error;
        }
        close(kfd);

        // Re-creating kprobe event without maxactive.
        if (create_probe_event(buf, ev_name, attach_type, config1,
                               offset, event_type, pid, 0) < 0)
          goto error;
      }
    }
  }
  // If perf_event_open succeeded, bpf_attach_tracing_event will use the created
  // Perf Event FD directly and buf would be empty and unused.
  // Otherwise it will read the event ID from the path in buf, create the
  // Perf Event event using that ID, and updated value of pfd.
  if (bpf_attach_tracing_event(progfd, buf, pid, &pfd) == 0)
    return pfd;

error:
  bpf_close_perf_event_fd(pfd);
  return -1;
}

ebpf user-space probes 原理探究

`uprobe` 工作原理

uprobe

utrace

uretprobe

多线程支持

uprobe 和 ebpf

ebpf 监听 uprobe events 原理

ebpf 设置 uprobe 探针原理

References

Public discussion

uprobe 工作原理

uprobe

utrace

uretprobe

多线程支持

uprobe 和 ebpf

ebpf 监听 uprobe events 原理

ebpf 设置 uprobe 探针原理

References

You might also like...

Linux hook 机制

深入理解 ebpf loader

深入了解 ebpf map

golang map 类型实现原理研究

Rust E0597 错误引发的一次思考

Popular tags

`uprobe` 工作原理