Porting an eBPF-based application to arm64: our experience with Inspektor Gadget
Nowadays, the arm
architecture is everywhere, from your pocket to
Mars.
Particularly, its 8th version, i.e.
ARMv8-A
, permitted using it in more
and more Systems on Chip which can be found in
single board computers (SBC),
tablets,
laptops
and even
virtual machines in the cloud.
Developers, from arm64
-based laptop users to enthusiasts with dozen of SBCs, have embraced this architecture, particularly for cloud native workloads.
Inspektor Gadget is a collection of tools (or gadgets) that facilitate cloud
native development, originally developed for amd64
.
With the growing adoption of arm64
, it became clear to us that porting it to
that architecture was a necessary step in our journey to empower developers to
achieve more.
However, Inspektor Gadget is not your typical user-space application.
Due to its extensive use of low-level kernel concepts including eBPF, porting
Inspektor Gadget to arm64
presented some unique challenges.
This blog post describes those technical challenges and how we resolved them.
bcc
gadgets are wrappers around
iovisor/bcc
tools.
In this case, the eBPF code is compiled on each target machine just before it
gets executed.
Consequently, we only needed to add arm64
versions of clang/llvm
to our
container images.
The CO-RE gadgets
CO-RE stands for "Compile Once - Run Everywhere". Contrary to standard gadgets, they are not compiled just before being run. Instead, the BPF object files are shipped and directly used with some address relocation.
In Inspektor Gadget's build process, we use
bpf2go
to generate the
BPF object files.
This CLI tool permits specifying the target platform with -target
option.
The eBPF instruction set is architecture independent as the kernel translates eBPF instructions to machine instructions.
Nonetheless, we have to handle two different cases depending on whether the
gadget includes the
bpf_tracing
header file:
- If the gadget does not include this file, we can use
bpfel
(i.e. "BPF endianness little") as target and the same BPF object file will be used for bothamd64
andarm64
, as these two architectures are little endian. - For gadgets including this file, we have to compile a specific BPF object file
for each architecture. Indeed, this header file defines architecture specific
macros used to access syscall arguments1 based on the value of
__TARGET_ARCH_
.
Thanks to the golang
tags present at top of files generated by bpf2go
the
BPF object files corresponding to the target architecture will be used at
build time.
The specific case of trace exec
trace exec
is a gadget which monitors calls to the exec
syscall family.
It permits listing when a new application is run on the system.
Even with its BPF object file built for arm64
, this gadget was not working.
After some investigation, we realized the
kernel code
in charge of starting a new thread on arm64
omitted the syscall number.
Without the syscall number, the
tracing cannot occur.
We fixed this behavior in upstream kernel.
However, that does not solve the problem for existing kernels in the field, so
we also switched the tracing mechanism of trace exec
for available
kernels2.
The specific case of traceloop
As time of writing, the traceloop
gadget
hard-codes
system call numbers for amd64
.
We plan to rewrite this gadget and make it work on arm64
as part of future
work.
Speeding up the container images build
So far, we have container images built for both amd64
and arm64
with the
corresponding BPF object files.
The image built for arm64
relies on
docker buildx
, which by
default emulates other architectures using qemu
.
This was working well, but the emulation made the build slow
(35 minutes compared to 10 minutes before).
Consequently, we decided to rev it up using
cross compilation
in docker buildx
.
Our docker images stages are the following:
- There is one builder stage where we build the
gadgettracermanager
binary as well as other binaries. - The other stage consists of using another image as "main" image and some tweaking.
So, instead of emulating the builder stage, we used cross compilation.
This process relies mainly on variables defined by docker buildx
, like
BUILDPLATFORM
and TARGETARCH
.
In our case, the builder stage runs on the BUILDPLATFORM
(which is amd64
for us) and generates binaries for the TARGETARCH
(which are amd64
and
arm64
).
As a result, the container images build time decreased (from around 35
minutes to 15 minutes).
Demonstration of Inspektor Gadget on AKS arm64
cluster
We presented the work we achieved to port Inspektor Gadget on the arm64
architecture, we will now see it in action on an AKS cluster.
As arm64
cluster is in preview in AKS, you will first need to
install the aks-preview
Azure CLI
and register the AKSARM64Preview
preview feature.
Now, let's create the cluster:
$ resource_group=inspektor-gadget-arm64-group
$ kubernetes_cluster=inspektor-gadget-arm64-cluster
$ az group create --name $resource_group --location westeurope -o none
# All the available arm64 sizes are listed there:
# https://azure.microsoft.com/en-us/blog/now-in-preview-azure-virtual-machines-with-ampere-altra-armbased-processors/
$ az aks create --resource-group $resource_group \
--name $kubernetes_cluster \
--node-count 2 \
--generate-ssh-keys \
-s 'Standard_E2ps_v5'
The behavior of this command has been altered by the following extension: aks-preview
...
$ az aks get-credentials --resource-group $resource_group \
--name $kubernetes_cluster
The behavior of this command has been altered by the following extension: aks-preview
Merged "inspektor-gadget-arm64-cluster" as current context in /home/you/.kube/config
$ kubectl gadget deploy
...
Inspektor Gadget successfully deployed
$ kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Namespace
metadata:
name: test-namespace
---
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: test-namespace
spec:
containers:
- name: test-container
image: debian
command: ["/bin/sh"]
args: ["-c", "apt-get update && apt-get install -qy python2; while true; do python2.7 -c \"exec'()'*7**6\"; sleep 1; done"]
EOF
namespace/test-namespace created
pod/test-pod created
$ kubectl gadget trace signal -n test-namespace
# Node name was changed to make it shorter
NODE NAMESPACE POD CONTAINER PID COMM SIGNAL TPID RET
aks-node1 test-namespace test-pod test-container 5371 python2.7 SIGSEGV 5371 0
aks-node1 test-namespace test-pod test-container 5371 python2.7 SIGPIPE 5371 0
aks-node1 test-namespace test-pod test-container 5454 python2.7 SIGSEGV 5454 0
aks-node1 test-namespace test-pod test-container 5454 python2.7 SIGPIPE 5454 0
^C
Terminating...
# Clean up everything
$ kubectl delete ns test-namespace
namespace "test-namespace" deleted
$ kubectl gadget undeploy
...
Inspektor Gadget successfully removed
$ az group delete --no-wait --resource-group $resource_group
Are you sure you want to perform this operation? (y/n): y
As you can see, there is no difference using Inspektor Gadget on amd64
or
arm64
cluster.
Known limitation
The trace open
gadget does not display opened file path on kernel older than
version 5.5 as they lack an upstream
commit
fixing a bug reading user space value from the kernel.
Conclusions
Inspektor Gadget is now available on arm64
!
It will enable you to debug your kubernetes cluster running on this
architecture, whether it is a thousand nodes cluster hosted in the cloud or
locally on your arm64
-based laptop or SBC.
We will also keep an eye on RISC-V
development as we may port Inspektor Gadget to that architecture in the future!
Footnotes
-
The way syscall arguments are passed is architecture dependent and part of the Application Binary Interface. For example, register
rdi
stores the first syscall argument onamd64
while it isx0
onarm64
. So, thebpf_tracing
header defines cross-platform macros to get syscall parameters values. ↩ -
Instead of using
tracepoint
, we usekprobe
fortrace exec
onarm64
. Sadly, withkprobe
it is not possible to get the command line arguments of the application. ↩