top_cuda_memory
The top_cuda_memory gadget periodically reports CUDA memory allocation and free activity per
process, split by library (libcuda Driver API vs libcudart Runtime API) and by memory type (GPU
device memory vs host pinned memory).
It traces alloc/free calls in both libraries independently because the CUDA Runtime (libcudart)
internally calls the Driver API (libcuda), and the runtime may cache memory rather than forwarding
every free to the driver. Tracking both libraries gives an accurate picture of memory usage from
each perspective.
Requirements
- Minimum Kernel Version: 5.12 (requires BPF CMPXCHG atomics)
- CUDA workload using
libcuda.so(Driver API) and/orlibcudart.so(Runtime API)
Getting started
Running the gadget:
- kubectl gadget
- ig
$ kubectl gadget run ghcr.io/inspektor-gadget/gadget/top_cuda_memory:v0.53.0 [flags]
$ sudo ig run ghcr.io/inspektor-gadget/gadget/top_cuda_memory:v0.53.0 [flags]
Guide
This example shows the gadget running alongside an application that allocates GPU and host-pinned memory using the CUDA Driver and Runtime APIs.
Start the gadget:
- kubectl gadget
- ig
$ kubectl gadget run top_cuda_memory:v0.53.0 --pod mypod
K8S.NODE K8S.NAMESPACE K8S.PODNAME K8S.CONTAINERNAME PID COMM MEM_ALLOC_BYTES MEM_FREE_BYTES MEM_ALLOC_CALLS MEM_FREE_CALLS MEM_IMPLICIT_FREE_BYTES MEM_IMPLICIT_FREE_CALLS HOST
$ sudo ig run top_cuda_memory:v0.53.0
RUNTIME.CONTAINERNAME PID COMM MEM_ALLOC_BYTES MEM_FREE_BYTES MEM_ALLOC_CALLS MEM_FREE_CALLS MEM_IMPLICIT_FREE_BYTES MEM_IMPLICIT_FREE_CALLS HOST
Once a CUDA process starts allocating memory, the gadget will show output like:
- kubectl gadget
- ig
K8S.NODE K8S.NAMESPACE K8S.PODNAME K8S.CONTAINERNAME PID COMM MEM_ALLOC_BYTES MEM_FREE_BYTES MEM_ALLOC_CALLS MEM_FREE_CALLS MEM_IMPLICIT_FREE_BYTES MEM_IMPLICIT_FREE_CALLS HOST
minikube default mypod mypod 1234567 cuda_app 2147483648 1073741824 128 64 0 0 DEVICE
minikube default mypod mypod 1234567 cuda_app 134217728 134217728 16 16 0 0 HOST
RUNTIME.CONTAINERNAME PID COMM MEM_ALLOC_BYTES MEM_FREE_BYTES MEM_ALLOC_CALLS MEM_FREE_CALLS MEM_IMPLICIT_FREE_BYTES MEM_IMPLICIT_FREE_CALLS HOST
my-cuda-container 1234567 cuda_app 2147483648 1073741824 128 64 0 0 DEVICE
my-cuda-container 1234567 cuda_app 134217728 134217728 16 16 0 0 HOST
The HOST column distinguishes memory type:
DEVICE— GPU device memory (allocated withcuMemAlloc,cudaMalloc, etc.)HOST— Page-locked host memory (allocated withcuMemAllocHost,cudaMallocHost,cudaHostAlloc, etc.)
The gadget reports two separate datasources:
libcuda_mem_stats— Driver API (libcuda.so) viewlibcudart_mem_stats— Runtime API (libcudart.so) view
For applications like ollama that never call cuMemFree explicitly, memory is bulk-freed when
the CUDA context is destroyed. In that case MEM_IMPLICIT_FREE_BYTES and MEM_IMPLICIT_FREE_CALLS
will reflect the released memory:
- kubectl gadget
- ig
K8S.NODE K8S.NAMESPACE K8S.PODNAME K8S.CONTAINERNAME PID COMM MEM_ALLOC_BYTES MEM_FREE_BYTES MEM_ALLOC_CALLS MEM_FREE_CALLS MEM_IMPLICIT_FREE_BYTES MEM_IMPLICIT_FREE_CALLS HOST
minikube default mypod mypod 9876543 ollama 17179869184 0 1024 0 17179869184 1 DEVICE
RUNTIME.CONTAINERNAME PID COMM MEM_ALLOC_BYTES MEM_FREE_BYTES MEM_ALLOC_CALLS MEM_FREE_CALLS MEM_IMPLICIT_FREE_BYTES MEM_IMPLICIT_FREE_CALLS HOST
ollama-container 9876543 ollama 17179869184 0 1024 0 17179869184 1 DEVICE
By default the gadget prints a summary each second. Both the number of summaries and the interval
can be customized with --map-fetch-count and --map-fetch-interval:
- kubectl gadget
- ig
kubectl gadget run top_cuda_memory:v0.53.0 --map-fetch-count 5 --map-fetch-interval 2s
sudo ig run top_cuda_memory:v0.53.0 --map-fetch-count 5 --map-fetch-interval 2s