traceloop
The traceloop gadget is used to capture system calls in real-time, acting as a flight recorder for your applications.
Requirements
- Minimum Kernel Version: 5.10+
- Dependencies: [Inspektor Gadget installed] (https://inspektor-gadget.io/docs/latest/quick-start)
- Access: kubectl access to your cluster (for Kubernetes) or container runtime access (for standalone containers)
Getting Started
Running the gadget:
- kubectl gadget
- ig
$ kubectl gadget run ghcr.io/inspektor-gadget/gadget/traceloop:v0.44.1 [flags]
$ sudo ig run ghcr.io/inspektor-gadget/gadget/traceloop:v0.44.1 [flags]
Guide
System calls are the foundation of how applications interact with the operating system. When debugging production issues, understanding the exact sequence of system calls before a crash or failure can be invaluable. The traceloop gadget continuously captures these system calls, allowing you to "rewind" and see exactly what your application was doing when something went wrong.
Think of traceloop as a flight recorder for your applications - it's always recording, and when something goes wrong, you can review the exact sequence of events that led to the issue.
Flow of System Call Recording in Kubernetes
Before diving into scenarios, it's important to understand how traceloop captures system calls across different components:
Application Pods: The primary targets that generate system calls through normal operation.
eBPF Probes: Kernel-level probes that capture system calls in real-time.
Traceloop Recorder: The component that buffers and formats system call data.
Output Stream: Real-time display of captured system calls with context.
The overall flow of system call capture is as follows:
Tracing a Specific Application Pod
Let's walk through capturing system calls for a specific pod.
First, create a namespace for testing:
- kubectl gadget
- ig
$ kubectl create ns test-traceloop-ns
Expected output:
namespace/test-traceloop-ns created
Create a pod that runs the sleep inf command, which makes the container sleep indefinitely (infinite sleep):
$ kubectl run -n test-traceloop-ns --image busybox test-traceloop-pod --command -- sleep inf
Expected output:
pod/test-traceloop-pod created
$ docker run -it --rm --name test-traceloop busybox /bin/sh
Then, let's run the gadget:
- kubectl gadget
- ig
$ kubectl gadget run traceloop:v0.44.1 --namespace test-traceloop-ns
K8S.NODE K8S.NAMESPACE K8S.PODNAME K8S.CONTAINERNAME CPU PID COMM SYSCALL PARAMETERS RET
$ sudo ig run traceloop:v0.44.1 --containername test-traceloop
RUNTIME.CONTAINERNAME CPU PID COMM SYSCALL PARAMETERS RET
Now, let's generate some events: Simulating and Capturing Issues
Inside the pod, perform some operations that will generate interesting system calls:
- kubectl gadget
- ig
$ kubectl exec -ti -n test-traceloop-ns test-traceloop-pod -- /bin/hush
/ # ls
/ # ls
Let's collect the syscalls. Press Ctrl+C to collect the syscalls:
The system calls captured by the traceloop gadget are like the below:
- kubectl gadget
- ig
$ kubectl gadget run traceloop:v0.44.1 --namespace test-traceloop-ns
K8S.NODE K8S.NAMESPACE K8S.PODNAME K8S.CONTAINERNAME CPU PID COMM SYSCALL PARAMETERS RET
^C
...
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls brk brk=0 94032…
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls mmap addr=0, len… 14008…
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls access filename="/… -1 (P…
...
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls write fd=1, buf="… 201
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls exit_group error_code=0 X
$ sudo ig run traceloop:v0.44.1 --containername test-traceloop
RUNTIME.CONTAINERNAME CPU PID COMM SYSCALL PARAMETERS RET
^C
...
test-traceloop 5 58054 sh execve filename="/bin/ls", a… 0
test-traceloop 5 58054 ls brk brk=0 102559763509…
test-traceloop 5 58054 ls mmap addr=0, len=8192, pro… 123786398932…
test-traceloop 5 58054 ls access filename="/etc/ld.so.… -1 (Permissi…
...
test-traceloop 5 58054 ls write fd=1, buf="\x1b[1;34m… 201
test-traceloop 5 58054 ls exit_group error_code=0 X
...
The trace shows an ls command that allocated memory, encountered a permission error while trying to access a file or directory, successfully wrote its output to the terminal, and exited cleanly.
Finally, clean the system:
- kubectl gadget
- ig
$ kubectl delete ns test-traceloop-ns
namespace "test-traceloop-ns" deleted
$ docker rm -f test-traceloop
Advanced Tracing Scenarios
Built-in System Call Filtering
For more efficient filtering, traceloop supports built-in syscall filtering at the eBPF level using --syscall-filters
. This reduces overhead compared to userspace filtering --filter
or --filter-expr
because the filtering happens directly in the kernel before data reaches userspace.
- kubectl gadget
- ig
# Filter for specific syscalls
$ kubectl gadget run traceloop:v0.44.1 -n test-ns --podname my-pod --syscall-filters openat,write,read
# Focus on file operations only
$ kubectl gadget run traceloop:v0.44.1 -n test-ns --syscall-filters openat,close,read,write,unlink
# Filter for specific syscalls (using -c shorthand)
$ sudo ig run traceloop:v0.44.1 -c test-container --syscall-filters openat,write
# Focus on file operations only
$ sudo ig run traceloop:v0.44.1 -c test-container --syscall-filters openat,close,read,write,unlink
Benefits of built-in filtering:
- Reduced overhead: Filtering happens at the kernel level, not in userspace
- Cleaner output: Only relevant syscalls are captured and displayed
- Better performance: Less CPU and memory usage compared to grep filtering
- Focused debugging: Easier to spot patterns in specific syscall categories
- Improved ring buffer efficiency: Filtering occurs before events are added to the ring buffer, preserving space for relevant syscalls and extending historical visibility
💡 Pro tip: Did you know you can filter specific syscalls with sudo ig run traceloop -c [container] --syscall-filters openat,write
? This helps reduce noise, and shows only failed system calls, helping you quickly identify issues when debugging file I/O issues.
Common filtering patterns:
# Comprehensive file I/O testing
--syscall-filters open,openat,close,read,write
# Memory operations
--syscall-filters mmap,munmap,brk,mprotect
# Process management
--syscall-filters fork,execve,exit,exit_group,wait4
# Network debugging
--syscall-filters socket,bind,listen,accept,connect,sendto,recvfrom
Real-World Scenarios
Scenario 1: Application System Call Monitoring
When monitoring application activity, traceloop shows:
- System call sequence - The chronological order of system calls (brk, mmap, access, write, exit_group)
- Process details - Process ID, command name, and CPU core information
- Return values - Success codes and failure codes with error types (like permission errors)
- File operations - File access attempts with specific file descriptors and buffer information
- Kubernetes context - Node, namespace, pod, and container identification
- Process termination - Exit codes showing how processes end (error_code=0 for clean exit)
The trace captures basic application behavior, showing what system calls an application makes during normal operation, whether those calls succeed or fail, and how processes terminate.
Scenario 2: Configuration Issues
When troubleshooting application problems, traceloop shows:
- Permission issues with specific resources - Failed file or directory access attempts with permission errors like
access filename="/..." -1 (P...)
The trace captures when applications encounter permission-related failures while trying to access files, directories, or other system resources, helping identify access control problems.
Integration with kubectl Workflows
Traceloop integrates seamlessly with standard Kubernetes debugging:
# Standard kubectl debugging
kubectl get pods -n myapp
kubectl describe pod problematic-pod -n myapp
kubectl logs problematic-pod -n myapp
# Enhanced with traceloop for system-level insight
kubectl gadget run traceloop:v0.44.1 --namespace myapp --podname problematic-pod
# ... reproduce the issue ...
# ^C to capture the complete system call timeline
Best Practices
Development Environment
- Use traceloop during local testing with Docker containers to monitor basic application behavior
- Capture system call sequences to understand normal application operation patterns
- Use built-in syscall filtering
--syscall-filters
to focus on specific operation types during testing
Staging Environment
- Test syscall filtering patterns before production use
- Validate that traceloop can capture the specific system calls your application makes
- Practice using targeted filtering to reduce overhead
Production Environment
- Use targeted tracing with specific namespace/pod filters
--namespace
,--podname
- Apply syscall filtering to reduce overhead and focus on relevant system calls
--syscall-filters open,openat,close,read,write
- Keep traces focused by filtering for specific containers or pods to minimize performance impact
- Integrate with standard kubectl workflows - use alongside
kubectl get pods
,kubectl describe
, andkubectl logs
for comprehensive debugging
General Usage
- Use the built-in filtering capabilities rather than post-processing with grep for better performance
- Apply common filtering patterns based on your debugging needs (file I/O, memory operations, process management, or network debugging)
- Clean up test environments after tracing sessions
Common Troubleshooting
No Events Captured
- Check kernel version:
uname -r
(must be 5.10+) - Verify target is running: Ensure the pod/container is active and executing commands
Excessive Output
- Use namespace filtering:
--namespace specific-namespace
- Filter by pod:
--podname specific-pod
- Use built-in syscall filtering:
--syscall-filters open,openat,close,read,write
- Apply specific filtering patterns: Use targeted filters like
--syscall-filters mmap,munmap,brk,mprotect
for memory operations
Integration Issues
- Use proper targeting: Combine traceloop with standard kubectl commands
kubectl get pods
,kubectl describe
,kubectl logs
- Clean up resources: Remember to delete test namespaces and containers after tracing sessions
Limitations
Timestamps are not filled on kernel older than 5.7.