Skip to main content

BTFGen: One Step Closer to Truly Portable eBPF Programs

Mauricio Vásquez Bernal
· 16 min read

eBPF is a widely known technology used in the observability, networking and security landscapes. The Linux operating system provides a virtual machine that can run eBPF programs in a secure and efficient way. Those programs are attached to different hooks exposed by the operating system to be able to filter and extract the information of interest when a given event happens in the kernel.

In this blog post, we'll present BTFGen, a tool that helps to make eBPF programs portable, and how it can be integrated in your project. We'll explain the challenges of running eBPF programs on different target machines, how this has been traditionally solved by compiling them before loading, the problems it presents and how different mechanisms and tools like CO-RE (Compile Once – Run Everywhere) and BTFHub try to solve them.

eBPF programs access the kernel structures to gather the data they need; hence they are dependent on the layout of such structures and a program compiled for a given kernel version usually doesn't work on another one because the layout of their structures changes: a field is added, removed, its type changed. Even a change to the kernel configuration would change the whole structure layout. For instance, disabling CONFIG_THREAD_INFO_IN_TASK changes the offsets of all members of task_struct:

struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
/*
* For reasons of header soup (see current_thread_info()), this
* must be the first element of task_struct.
*/
struct thread_info thread_info;
#endif
unsigned int __state;

#ifdef CONFIG_PREEMPT_RT
/* saved state for "spinlock sleepers" */
unsigned int saved_state;
#endif
...

This problem has usually been overcome by using the kernel headers of the target machine to compile the programs before loading them. This is the approach used by BPF Compiler Collection (BCC).

This approach has multiple drawbacks: (1) a heavy compiler chain must be shipped to the target machine to compile the program, (2) compiling the programs requires some resources and could affect the workload scheduling in some scenarios, (3) compiling takes a considerable amount of time and hence the first events arrive with some delay and (4) the kernel headers need to be available on the target machine.

CO-RE (Compile Once – Run Everywhere)

The CO-RE mechanism presents a solution to these problems. In this case, the eBPF program is compiled once, then it's patched at runtime to update its instructions according to the layout of the kernel structures of machine where the program is going to run in. The BPF CO-RE (Compile Once – Run Everywhere) blog post presents all the details and components behind this technology. What is important to understand for the scope of this blog post is that CO-RE requires to have the BTF (BPF Type Format) information of the target kernel. This is provided by the kernel itself when it's compiled with CONFIG_DEBUG_INFO_BTF=y. This option was introduced in Linux 5.2 and many popular distributions only enabled it some versions after that. This means that there are a lot of users running kernels that don't expose BTF information and hence it's not possible to use CO-RE in those cases.

BTFHub

BTFHub is a project from the Aqua Security folks that provides BTF information for the released kernels of the most popular distributions that don't expose BTF information. The BTF file for the target kernel can be downloaded at runtime and then fed to the loader library (libbpf, cilium/ebpf or others), that will use it to patch the program accordingly.

Even if BTFHub is a great improvement, it still presents some challenges: each BTF file is a few MBs big, hence it's not possible to ship the BTF files for all the kernels together with the application as it'll require some GBs of space. The alternative is to download the BTF for the current kernel at runtime, which also presents some problems: it delays the starting of the eBPF program and in some scenarios it's not possible to reach an external host to download such a file.

BTFGen

It's not needed to provide the full BTF file describing all the kernel types to perform the program patching as an eBPF program usually accesses only a few of them. A "reduced" BTF with only the information about the types used will suffice. This is where BTFGen comes into play: it's a tool that generates a reduced BTF file with only the types that are needed by a set of eBPF programs. The generated BTF files are very small (few KBs) and hence it's possible to ship them together with the application.

BTFGen is not designed to be used alone. It needs the source BTF files with all the kernel types for different Linux distributions (provided by BTFHub) and the CO-RE mechanism (in libbpf, the Linux kernel or another loader library) is still used to patch the program when loading it.

The workflow of BTFGen is the following:

  1. Developers write their eBPF programs using CO-RE and compile them with llvm/clang to generate the object files
  2. The source BTF files for different distributions are gathered from BTFHub or another source
  3. BTFGen is used to generate a reduced version of the BTF files
  4. The reduced BTF files are distributed with the application

Image showing the BTFGen workflow as described above

Internal Details

BTFGen is implemented in bpftool and it uses the libbpf CO-RE logic to solve relocations. With this information it's able to pick the types involved in a relocation to generate the "reduced" BTF file. The goal of this post is not to explain all the internal implementation details. If you want to know more, you can check this documentation on the BTFHub repository or the patch where it was implemented.

How to Use?

This section provides more details about the usage of this utility. In this example we'll use BTFGen to enable running some of the BCC libbpf-tools on machines without CONFIG_DEBUG_INFO_BTF enabled. A very similar approach can be used to integrate with other eBPF applications.

In this demonstration we will go through the following steps:

  1. Download, compile and install a version of bpftool with BTFGen support
  2. Download BTF files from BTFHub
  3. Download and compile BCC tools
  4. Use BTFGen to generate "reduced" BTF files for some BCC tools
  5. Modify those tools to load the BTF information from a custom path
  6. Try it out

As a first step, let's create an empty folder for this demonstration:

$ mkdir /tmp/btfgendemo

Installing bpftool

BTFGen was just merged into bpftool. Until it's included in the packages of the different distributions, we need to compile it from source.

$ cd /tmp/btfgendemo
$ git clone --recurse-submodules https://github.com/libbpf/bpftool.git
$ cd bpftool/src
$ make
$ sudo make install

Get BTF files for different kernels from BFThub

In this case we use BTFHub and we only consider Ubuntu Focal for brevity but exactly the same approach is used to generate files for other distributions supported by BTFHub.

$ cd /tmp/btfgendemo
$ git clone https://github.com/aquasecurity/btfhub-archive
$ cd btfhub-archive/ubuntu/focal/x86_64/
$ for f in *.tar.xz; do tar -xf "$f"; done
$ ls -lhn *.btf | head
-rw-r----- 1 1000 1000 4,5M Sep 29 13:36 5.11.0-1007-azure.btf
-rw-r----- 1 1000 1000 4,8M Aug 10 23:33 5.11.0-1009-aws.btf
-rw-r----- 1 1000 1000 4,8M Jan 22 12:29 5.11.0-1009-gcp.btf
-rw-r----- 1 1000 1000 4,5M Sep 29 13:38 5.11.0-1012-azure.btf
-rw-r----- 1 1000 1000 4,5M Sep 29 13:40 5.11.0-1013-azure.btf
-rw-r----- 1 1000 1000 4,8M Aug 10 23:39 5.11.0-1014-aws.btf
-rw-r----- 1 1000 1000 4,8M Jan 22 12:32 5.11.0-1014-gcp.btf
-rw-r----- 1 1000 1000 4,5M Sep 29 13:43 5.11.0-1015-azure.btf
-rw-r----- 1 1000 1000 4,8M Sep 7 22:52 5.11.0-1016-aws.btf
-rw-r----- 1 1000 1000 4,8M Sep 7 22:57 5.11.0-1017-aws.btf

As you can see, the size of the BTF file for each kernel is around 4 MB.

$ find . -name "*.btf" | xargs du -ch | tail -n 1
944M total

And the total size of the files, only for Ubuntu Focal, is ~944MB, which makes it infeasible to ship them together with the application.

Download, modify and compile BCC libbpf tools

Let's clone the BCC repository at the v0.24.0 tag.

$ cd /tmp/btfgendemo
$ git clone https://github.com/iovisor/bcc -b v0.24.0 --recursive

By default, the different BCC tools try to load BTF information from a well-known list of directories. We shouldn't overwrite those files in our system, as they might be needed by other tools. Instead, we can modify some of the BCC tools to load the BTF information from a custom path. We can use LIBBPF_OPTS() to declare a bpf_object_open_opts struct with the btf_custom_path field set to the path where the BTF resides and pass it to the TOOL_bpf__open_opts() function. You can apply the following patch to modify the opensnoop, execsnoop and bindsnoop tools.

# /tmp/btfgendemo/bcc.patch
diff --git a/libbpf-tools/bindsnoop.c b/libbpf-tools/bindsnoop.c
index 5d87d484..a336747e 100644
--- a/libbpf-tools/bindsnoop.c
--- b/libbpf-tools/bindsnoop.c
@@ -187,7 +187,8 @@ int main(int argc, char **argv)
libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
libbpf_set_print(libbpf_print_fn);

- obj: bindsnoop_bpf__open();
+ LIBBPF_OPTS(bpf_object_open_opts, opts, .btf_custom_path: "/tmp/vmlinux.btf");
+ obj: bindsnoop_bpf__open_opts(&opts);
if (!obj) {
warn("failed to open BPF object\n");
return 1;
diff --git a/libbpf-tools/execsnoop.c b/libbpf-tools/execsnoop.c
index 38294816..9bd0d077 100644
--- a/libbpf-tools/execsnoop.c
--- b/libbpf-tools/execsnoop.c
@@ -274,7 +274,8 @@ int main(int argc, char **argv)
libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
libbpf_set_print(libbpf_print_fn);

- obj: execsnoop_bpf__open();
+ LIBBPF_OPTS(bpf_object_open_opts, opts, .btf_custom_path: "/tmp/vmlinux.btf");
+ obj: execsnoop_bpf__open_opts(&opts);
if (!obj) {
fprintf(stderr, "failed to open BPF object\n");
return 1;
diff --git a/libbpf-tools/opensnoop.c b/libbpf-tools/opensnoop.c
index 557a63cd..cf2c5db6 100644
--- a/libbpf-tools/opensnoop.c
--- b/libbpf-tools/opensnoop.c
@@ -231,7 +231,8 @@ int main(int argc, char **argv)
libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
libbpf_set_print(libbpf_print_fn);

- obj: opensnoop_bpf__open();
+ LIBBPF_OPTS(bpf_object_open_opts, opts, .btf_custom_path: "/tmp/vmlinux.btf");
+ obj: opensnoop_bpf__open_opts(&opts);
if (!obj) {
fprintf(stderr, "failed to open BPF object\n");
return 1;
$ cd bcc
$ git apply /tmp/btfgendemo/bcc.patch
$ cd libbpf-tools/
$ make -j$(nproc)

Generate "reduced" BTF

In this step, we'll use the bpftool gen min_core_btf command to generate the reduced BTF files for the bindsnoop, execsnoop and opensnoop BCC tools. The following command invokes bpftool for each BTF file present in the directory.

$ OBJ1=/tmp/btfgendemo/bcc/libbpf-tools/.output/bindsnoop.bpf.o
$ OBJ2=/tmp/btfgendemo/bcc/libbpf-tools/.output/execsnoop.bpf.o
$ OBJ3=/tmp/btfgendemo/bcc/libbpf-tools/.output/opensnoop.bpf.o

$ mkdir -p /tmp/btfgendemo/btfs
$ cd /tmp/btfgendemo/btfhub-archive/ubuntu/focal/x86_64/

$ for f in *.btf; do bpftool gen min_core_btf "$f" \
/tmp/btfgendemo/btfs/$(basename "$f") $OBJ1 $OBJ2 $OBJ3; \
done

$ ls -lhn /tmp/btfgendemo/btfs | head
total 864K
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1007-azure.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1009-aws.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1009-gcp.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1012-azure.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1013-azure.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1014-aws.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1014-gcp.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1015-azure.btf
-rw-r--r-- 1 1000 1000 1,1K Feb 8 14:46 5.11.0-1016-aws.btf

Each generated BTF file is around 1.1KB and the size of all generated files for Ubuntu Focal is 864KB, which makes it possible to ship them together with the applications.

We can even optimize it more if we compress the generated files:

$ cd /tmp/btfgendemo/btfs
$ tar cvfJ compressed.tar.xz *.btf
$ ls -lhn compressed.tar.xz
-rw-r--r-- 1 1000 1000 2,5K Feb 17 15:19 compressed.tar.xz

The compression rate is so high because many of the generated files are identical, we will discuss it in further detail below.

Try it out

In order to try it out we need to run a machine with Ubuntu Focal. The following Vagrantfile can be used to create a VM with it. Please note that Ubuntu Focal has enabled BTF support starting from kernel version 5.4.0-92-generic, so we need to run a previous version to make sure this example makes sense. We use the 202012.21.0 version of the bento/ubuntu-20.04 Vagrant box that has kernel 5.4.0-58-generic.

This uses sshfs to share files between the host and the VM, be sure that you have the vagrant-sshfs plugin installed.

# /tmp/btfgendemo/Vagrantfile
Vagrant.configure("2") do | config |
config.vm.box: "bento/ubuntu-20.04"
config.vm.box_version: "= 202012.21.0"
config.vm.synced_folder "/tmp/btfgendemo", "/btfgendemo", type: "sshfs"

config.vm.provider "virtualbox" do | vb |
vb.gui: false
vb.cpus: 4
vb.memory: "4096"
end
end

Bring the VM up and SSH into it:

$ vagrant up
$ vagrant ssh

The following commands must be executed inside the VM.

Check the kernel version:

$ uname -a
Linux vagrant 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Let's check that the kernel doesn't have CONFIG_DEBUG_INFO_BTF enabled.

$ cat /boot/config-$(uname -r) | grep CONFIG_DEBUG_INFO_BTF
CONFIG_DEBUG_INFO_BTF is not set

Let's try to run some of the tools before copying the BTF file to the correct path.

$ sudo /btfgendemo/bcc/libbpf-tools/execsnoop
libbpf: failed to parse target BTF: -2
libbpf: failed to perform CO-RE relocations: -2
libbpf: failed to load object 'execsnoop_bpf'
libbpf: failed to load BPF skeleton 'execsnoop_bpf': -2
failed to load BPF object: -2

As expected, the tool is failing because it's not able to find the BTF information required to perform the CO-RE relocations.

Let's copy the the BTF file corresponding to this kernel version.

$ cp /btfgendemo/btfs/$(uname -r).btf /tmp/vmlinux.btf

After this, the tools work fine:

$ sudo /btfgendemo/bcc/libbpf-tools/execsnoop
PCOMM PID PPID RET ARGS
^C

$ sudo /btfgendemo/bcc/libbpf-tools/bindsnoop
PID COMM RET PROTO OPTS IF PORT ADDR
^C

$ sudo /btfgendemo/bcc/libbpf-tools/opensnoop
PID COMM FD ERR PATH
^C

Of course this is just an example showing the general workflow of the tool. A real integration should take care of automatically providing the right BTF file based on the kernel version of the host. The following section shows two different examples of this integration.

Example of integrations

In this section we'll cover how BTFGen is being used by the Inspektor Gadget and Tracee projects.

Inspektor Gadget

Inspektor Gadget is a collection of tools to debug and inspect Kubernetes resources and applications. Since it's distributed as a container image, we chose to ship the BTF files for the different Linux distributions within it. We added a step to the Dockerfile to generate the BTF files when building the container image:

RUN set -ex; \
if [ "$ENABLE_BTFGEN": true ]; then \
cd /btf-tools && \
LIBBPFTOOLS=/objs BTFHUB=/tmp/btfhub INSPEKTOR_GADGET=/gadget ./btfgen.sh; \
fi

btfgen.sh is a helper script that invokes bpftool generating BTF files for all kernels supported by BTFHub.

Then, we modified the entrypoint script to install the right BTF file on the container filesystem, so the different gadgets have it available. Inspektor Gadget is designed to always run in a container, hence we can install the BTF file in a system path (/boot/vmlinux-$(uname –r)) without affecting the host. By doing so we also avoid modifying the source code of the different BCC tools (as we did in the example above):

echo "Kernel provided BTF is not available: Trying shipped BTF files"
SOURCE_BTF=/btfs/$ID/$VERSION_ID/$ARCH/$KERNEL.btf
if [-f $SOURCE_BTF]; then
objcopy --input binary --output elf64-little --rename-section .data=.BTF $SOURCE_BTF /boot/vmlinux-$KERNEL
echo "shipped BTF available. Installed at /boot/vmlinux-$KERNEL"
else
...

The whole PR introducing this support was https://github.com/inspektor-gadget/inspektor-gadget/pull/387.

Tracee

Tracee is a Runtime Security and forensics tool for Linux. In this case the generated BTF files are embedded within the application binary. The Makefile has a btfhub target that invokes btfhub.sh. This script clones the BTFHub repository and calls btfgen.sh to generate the BTF files. Those files are moved to the ./dist/btfhub folder.

# generate tailored BTFs

[ ! -f ./tools/btfgen.sh ] && die "could not find btfgen.sh"
./tools/btfgen.sh -a ${ARCH} -o $TRACEE_BPF_CORE

# move tailored BTFs to dist

[ ! -d ${BASEDIR}/dist ] && die "could not find dist directory"
[ ! -d ${BASEDIR}/dist/btfhub ] && mkdir ${BASEDIR}/dist/btfhub

rm -rf ${BASEDIR}/dist/btfhub/*
mv ./custom-archive/* ${BASEDIR}/dist/btfhub

Then, they are embedded in the go binary using go:embed here.

//go:build ebpf
// +build ebpf

package tracee

import (
"embed"
)

//go:embed "dist/tracee.bpf.core.o"
//go:embed "dist/btfhub/*"

var BPFBundleInjected embed.FS

At runtime, the BTF file for the current kernel is unpacked and its path passed to libbpfgo to be used for the CO-RE relocations.

Limitations

The BTF support in the kernel is not only about exposing the BTF types. Some kind of eBPF programs like fentry/fexit and LSM hooks require the kernel to expose BTF information. Those programs won't work with BTFGen and the only alternative is to have a kernel with CONFIG_DEBUG_INFO_BTF enabled.

Future Ahead / Next Steps

We are aware that BTFGen is a temporary solution until most systems are updated to a kernel version that exposes BTF information. However, we think it's going to take some years and, in the meanwhile, BTFGen can help to fill that gap.

The following are some improvements / next steps that we could consider soon.

Integration with other projects

Some projects like BCC and its libbpf-based tools could benefit a lot from an integration with BTFGen. We opened a PR to make those tools available to a wider range of Linux distributions by using BTFGen.

Deduplication of generated files

eBPF programs usually access few kernel types, hence there is a high chance that the generated files for two different kernel versions are identical, this is especially true for different minor releases of kernels of the same Linux distribution. A further improvement to BTFGen would be to take advantage of this to avoid creating duplicated files by using a symbolic link or a similar approach.

This could also be done directly on BTFHub where some source BTF files are duplicated like indicated in this issue, even if in this case the chance of having duplicated files is lower.

Online API

BTFHub is a big repository, and its size continues to increase because new kernels are released over time. The folks from Seekret created an API that uses BTFGen and BTFHub to generate on demand the "reduced" BTF files for eBPF objects provided by a user.

More Information

The following links are useful if you want to learn more about eBPF, BTF, CO-RE, BTFHub and BTFGen.

Thanks

This feature took inspiration from already existing projects and was implemented as a joint effort by different companies. First, we'd like to thank the Aqua Security team for their amazing work on BTFHub, that's the base project we took inspiration from. Secondly, we'd like to thank the different people that contributed during the development of this feature: Rafael David Tinoco from Aqua Security and Lorenzo Fontana and Leonardo Di Donato from Elastic. Finally, the maintainers of libbpf, Andrii Nakryiko and Alexei Starovoitov and of bpftool, Quentin Monnet, who provided a lot of valuable feedback and guidance to implement it.