Skip to main content
Version: v0.44.1

traceloop

The traceloop gadget is used to capture system calls in real-time, acting as a flight recorder for your applications.

Requirements

Getting Started

Running the gadget:

$ kubectl gadget run ghcr.io/inspektor-gadget/gadget/traceloop:v0.44.1 [flags]

Guide

System calls are the foundation of how applications interact with the operating system. When debugging production issues, understanding the exact sequence of system calls before a crash or failure can be invaluable. The traceloop gadget continuously captures these system calls, allowing you to "rewind" and see exactly what your application was doing when something went wrong.

info

Think of traceloop as a flight recorder for your applications - it's always recording, and when something goes wrong, you can review the exact sequence of events that led to the issue.

Flow of System Call Recording in Kubernetes

Before diving into scenarios, it's important to understand how traceloop captures system calls across different components:

Application Pods: The primary targets that generate system calls through normal operation.

eBPF Probes: Kernel-level probes that capture system calls in real-time.

Traceloop Recorder: The component that buffers and formats system call data.

Output Stream: Real-time display of captured system calls with context.

The overall flow of system call capture is as follows:

Tracing a Specific Application Pod

Let's walk through capturing system calls for a specific pod.

First, create a namespace for testing:

$ kubectl create ns test-traceloop-ns

Expected output:

namespace/test-traceloop-ns created

Create a pod that runs the sleep inf command, which makes the container sleep indefinitely (infinite sleep):

$ kubectl run -n test-traceloop-ns --image busybox test-traceloop-pod --command -- sleep inf

Expected output:

pod/test-traceloop-pod created

Then, let's run the gadget:

$ kubectl gadget run traceloop:v0.44.1 --namespace test-traceloop-ns
K8S.NODE K8S.NAMESPACE K8S.PODNAME K8S.CONTAINERNAME CPU PID COMM SYSCALL PARAMETERS RET

Now, let's generate some events: Simulating and Capturing Issues

Inside the pod, perform some operations that will generate interesting system calls:

$ kubectl exec -ti -n test-traceloop-ns test-traceloop-pod -- /bin/hush
/ # ls

Let's collect the syscalls. Press Ctrl+C to collect the syscalls:

The system calls captured by the traceloop gadget are like the below:

$ kubectl gadget run traceloop:v0.44.1 --namespace test-traceloop-ns
K8S.NODE K8S.NAMESPACE K8S.PODNAME K8S.CONTAINERNAME CPU PID COMM SYSCALL PARAMETERS RET
^C
...
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls brk brk=0 94032
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls mmap addr=0, len… 14008
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls access filename="/… -1 (P…
...
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls write fd=1, buf="201
minikube-docker test-traceloop-ns test-traceloop-pod test-traceloop-pod 2 95419 ls exit_group error_code=0 X

The trace shows an ls command that allocated memory, encountered a permission error while trying to access a file or directory, successfully wrote its output to the terminal, and exited cleanly.

Finally, clean the system:

$ kubectl delete ns test-traceloop-ns
namespace "test-traceloop-ns" deleted

Advanced Tracing Scenarios

Built-in System Call Filtering

For more efficient filtering, traceloop supports built-in syscall filtering at the eBPF level using --syscall-filters. This reduces overhead compared to userspace filtering --filter or --filter-expr because the filtering happens directly in the kernel before data reaches userspace.

# Filter for specific syscalls
$ kubectl gadget run traceloop:v0.44.1 -n test-ns --podname my-pod --syscall-filters openat,write,read
# Focus on file operations only
$ kubectl gadget run traceloop:v0.44.1 -n test-ns --syscall-filters openat,close,read,write,unlink

Benefits of built-in filtering:

  • Reduced overhead: Filtering happens at the kernel level, not in userspace
  • Cleaner output: Only relevant syscalls are captured and displayed
  • Better performance: Less CPU and memory usage compared to grep filtering
  • Focused debugging: Easier to spot patterns in specific syscall categories
  • Improved ring buffer efficiency: Filtering occurs before events are added to the ring buffer, preserving space for relevant syscalls and extending historical visibility
info

💡 Pro tip: Did you know you can filter specific syscalls with sudo ig run traceloop -c [container] --syscall-filters openat,write? This helps reduce noise, and shows only failed system calls, helping you quickly identify issues when debugging file I/O issues.

Common filtering patterns:

# Comprehensive file I/O testing
--syscall-filters open,openat,close,read,write

# Memory operations
--syscall-filters mmap,munmap,brk,mprotect

# Process management
--syscall-filters fork,execve,exit,exit_group,wait4

# Network debugging
--syscall-filters socket,bind,listen,accept,connect,sendto,recvfrom

Real-World Scenarios

Scenario 1: Application System Call Monitoring

When monitoring application activity, traceloop shows:

  • System call sequence - The chronological order of system calls (brk, mmap, access, write, exit_group)
  • Process details - Process ID, command name, and CPU core information
  • Return values - Success codes and failure codes with error types (like permission errors)
  • File operations - File access attempts with specific file descriptors and buffer information
  • Kubernetes context - Node, namespace, pod, and container identification
  • Process termination - Exit codes showing how processes end (error_code=0 for clean exit)

The trace captures basic application behavior, showing what system calls an application makes during normal operation, whether those calls succeed or fail, and how processes terminate.

Scenario 2: Configuration Issues

When troubleshooting application problems, traceloop shows:

  • Permission issues with specific resources - Failed file or directory access attempts with permission errors like access filename="/..." -1 (P...)

The trace captures when applications encounter permission-related failures while trying to access files, directories, or other system resources, helping identify access control problems.

Integration with kubectl Workflows

Traceloop integrates seamlessly with standard Kubernetes debugging:

# Standard kubectl debugging
kubectl get pods -n myapp
kubectl describe pod problematic-pod -n myapp
kubectl logs problematic-pod -n myapp

# Enhanced with traceloop for system-level insight
kubectl gadget run traceloop:v0.44.1 --namespace myapp --podname problematic-pod
# ... reproduce the issue ...
# ^C to capture the complete system call timeline

Best Practices

Development Environment

  • Use traceloop during local testing with Docker containers to monitor basic application behavior
  • Capture system call sequences to understand normal application operation patterns
  • Use built-in syscall filtering --syscall-filters to focus on specific operation types during testing

Staging Environment

  • Test syscall filtering patterns before production use
  • Validate that traceloop can capture the specific system calls your application makes
  • Practice using targeted filtering to reduce overhead

Production Environment

  • Use targeted tracing with specific namespace/pod filters --namespace, --podname
  • Apply syscall filtering to reduce overhead and focus on relevant system calls --syscall-filters open,openat,close,read,write
  • Keep traces focused by filtering for specific containers or pods to minimize performance impact
  • Integrate with standard kubectl workflows - use alongside kubectl get pods, kubectl describe, and kubectl logs for comprehensive debugging

General Usage

  • Use the built-in filtering capabilities rather than post-processing with grep for better performance
  • Apply common filtering patterns based on your debugging needs (file I/O, memory operations, process management, or network debugging)
  • Clean up test environments after tracing sessions

Common Troubleshooting

No Events Captured

  • Check kernel version: uname -r (must be 5.10+)
  • Verify target is running: Ensure the pod/container is active and executing commands

Excessive Output

  • Use namespace filtering: --namespace specific-namespace
  • Filter by pod: --podname specific-pod
  • Use built-in syscall filtering: --syscall-filters open,openat,close,read,write
  • Apply specific filtering patterns: Use targeted filters like --syscall-filters mmap,munmap,brk,mprotect for memory operations

Integration Issues

  • Use proper targeting: Combine traceloop with standard kubectl commands kubectl get pods, kubectl describe, kubectl logs
  • Clean up resources: Remember to delete test namespaces and containers after tracing sessions

Limitations

Timestamps are not filled on kernel older than 5.7.