Version: main

top_cuda_memory

The top_cuda_memory gadget periodically reports CUDA memory allocation and free activity per process, split by library (libcuda Driver API vs libcudart Runtime API) and by memory type (GPU device memory vs host pinned memory).

It traces alloc/free calls in both libraries independently because the CUDA Runtime (libcudart) internally calls the Driver API (libcuda), and the runtime may cache memory rather than forwarding every free to the driver. Tracking both libraries gives an accurate picture of memory usage from each perspective.

Requirements

Minimum Kernel Version: 5.12 (requires BPF CMPXCHG atomics)
CUDA workload using libcuda.so (Driver API) and/or libcudart.so (Runtime API)

Getting started

Running the gadget:

kubectl gadget
ig

$ kubectl gadget run ghcr.io/inspektor-gadget/gadget/top_cuda_memory:latest [flags]

$ sudo ig run ghcr.io/inspektor-gadget/gadget/top_cuda_memory:latest [flags]

Guide

This example shows the gadget running alongside an application that allocates GPU and host-pinned memory using the CUDA Driver and Runtime APIs.

Start the gadget:

kubectl gadget
ig

$ kubectl gadget run top_cuda_memory:latest --pod mypod
K8S.NODE         K8S.NAMESPACE   K8S.PODNAME   K8S.CONTAINERNAME        PID COMM              MEM_ALLOC_BYTES   MEM_FREE_BYTES  MEM_ALLOC_CALLS   MEM_FREE_CALLS  MEM_IMPLICIT_FREE_BYTES  MEM_IMPLICIT_FREE_CALLS HOST

$ sudo ig run top_cuda_memory:latest
RUNTIME.CONTAINERNAME               PID COMM              MEM_ALLOC_BYTES   MEM_FREE_BYTES  MEM_ALLOC_CALLS   MEM_FREE_CALLS  MEM_IMPLICIT_FREE_BYTES  MEM_IMPLICIT_FREE_CALLS HOST

Once a CUDA process starts allocating memory, the gadget will show output like:

kubectl gadget
ig

K8S.NODE         K8S.NAMESPACE   K8S.PODNAME   K8S.CONTAINERNAME        PID COMM              MEM_ALLOC_BYTES   MEM_FREE_BYTES  MEM_ALLOC_CALLS   MEM_FREE_CALLS  MEM_IMPLICIT_FREE_BYTES  MEM_IMPLICIT_FREE_CALLS HOST
minikube         default         mypod         mypod                1234567 cuda_app               2147483648       1073741824              128               64                        0                        0 DEVICE
minikube         default         mypod         mypod                1234567 cuda_app                134217728        134217728               16               16                        0                        0 HOST

RUNTIME.CONTAINERNAME               PID COMM              MEM_ALLOC_BYTES   MEM_FREE_BYTES  MEM_ALLOC_CALLS   MEM_FREE_CALLS  MEM_IMPLICIT_FREE_BYTES  MEM_IMPLICIT_FREE_CALLS HOST
my-cuda-container               1234567 cuda_app                2147483648       1073741824             128               64                        0                        0 DEVICE
my-cuda-container               1234567 cuda_app                 134217728        134217728              16               16                        0                        0 HOST

The HOST column distinguishes memory type:

DEVICE — GPU device memory (allocated with cuMemAlloc, cudaMalloc, etc.)
HOST — Page-locked host memory (allocated with cuMemAllocHost, cudaMallocHost, cudaHostAlloc, etc.)

The gadget reports two separate datasources:

libcuda_mem_stats — Driver API (libcuda.so) view
libcudart_mem_stats — Runtime API (libcudart.so) view

For applications like ollama that never call cuMemFree explicitly, memory is bulk-freed when the CUDA context is destroyed. In that case MEM_IMPLICIT_FREE_BYTES and MEM_IMPLICIT_FREE_CALLS will reflect the released memory:

kubectl gadget
ig

K8S.NODE         K8S.NAMESPACE   K8S.PODNAME   K8S.CONTAINERNAME        PID COMM              MEM_ALLOC_BYTES   MEM_FREE_BYTES  MEM_ALLOC_CALLS   MEM_FREE_CALLS  MEM_IMPLICIT_FREE_BYTES  MEM_IMPLICIT_FREE_CALLS HOST
minikube         default         mypod         mypod                9876543 ollama                17179869184                0             1024                0              17179869184                        1 DEVICE

RUNTIME.CONTAINERNAME               PID COMM              MEM_ALLOC_BYTES   MEM_FREE_BYTES  MEM_ALLOC_CALLS   MEM_FREE_CALLS  MEM_IMPLICIT_FREE_BYTES  MEM_IMPLICIT_FREE_CALLS HOST
ollama-container                9876543 ollama               17179869184                 0             1024                0              17179869184                        1 DEVICE

By default the gadget prints a summary each second. Both the number of summaries and the interval can be customized with --map-fetch-count and --map-fetch-interval:

kubectl gadget
ig

kubectl gadget run top_cuda_memory:latest --map-fetch-count 5 --map-fetch-interval 2s

sudo ig run top_cuda_memory:latest --map-fetch-count 5 --map-fetch-interval 2s

Requirements​

Getting started​

Guide​

Requirements

Getting started

Guide