Secure Accelerator Access
Kubernetes enforces GPU isolation through the Nvidia device plugin. Only pods that explicitly request a nvidia.com/gpu resource are granted access to GPU devices. This page demonstrates that isolation through three tests: a pod with GPU access, a pod without GPU access, and an attempt by a second pod to claim an already-allocated GPU.
Test 1 — Pod Without GPU Request
Deploy a pod that does not request a GPU:
apiVersion: v1
kind: Pod
metadata:
name: no-gpu-pod
namespace: default
spec:
containers:
- name: test
image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
command: ["/bin/sh", "-c"]
args:
- |
echo "=== GPU Device Access ==="
ls /dev/nvidia* 2>/dev/null && echo "GPU devices found" || echo "No GPU devices - access denied"
echo "=== NVIDIA_VISIBLE_DEVICES ==="
echo "NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES:-not set}"
echo "=== nvidia-smi ==="
nvidia-smi -L 2>/dev/null || echo "nvidia-smi not available"
sleep 3600
Test 2 — Pod With GPU Request
Deploy a pod that requests a GPU:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
namespace: default
spec:
containers:
- name: test
image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
command: ["/bin/sh", "-c"]
args:
- |
echo "=== GPU Device Access ==="
ls /dev/nvidia* 2>/dev/null && echo "GPU devices found" || echo "No GPU devices"
echo "=== NVIDIA_VISIBLE_DEVICES ==="
echo "NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES:-not set}"
echo "=== nvidia-smi ==="
nvidia-smi -L 2>/dev/null || echo "nvidia-smi not available"
sleep 3600
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
Wait for both pods to start:
Results
Check the logs of the pod that requested a GPU:
Expected output:
=== GPU Device Access ===
/dev/nvidia-modeset
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia0
/dev/nvidiactl
GPU devices found
=== NVIDIA_VISIBLE_DEVICES ===
NVIDIA_VISIBLE_DEVICES=void
=== nvidia-smi ===
GPU 0: Tesla T4 (UUID: GPU-bc1545c8-4103-eb2b-b25d-cea283d0a7d4)
Check the logs of the pod that did not request a GPU:
Expected output:
=== GPU Device Access ===
No GPU devices - access denied
=== NVIDIA_VISIBLE_DEVICES ===
NVIDIA_VISIBLE_DEVICES=all
=== nvidia-smi ===
nvidia-smi not available
The pod with the GPU request has /dev/nvidia0 mounted and can enumerate the device. The pod without the request has no GPU devices and nvidia-smi is unavailable.
Test 3 — GPU Exclusivity
A GPU allocated to one pod cannot be scheduled to a second pod on the same node. First, find the name of the GPU worker node:
Deploy two pods to the same node, each requesting the single available GPU. Replace <gpu-node-name> with the node name returned above:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod-1
namespace: default
spec:
nodeSelector:
kubernetes.io/hostname: <gpu-node-name>
containers:
- name: test
image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
command: ["/bin/sh", "-c", "nvidia-smi -L && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
---
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod-2
namespace: default
spec:
nodeSelector:
kubernetes.io/hostname: <gpu-node-name>
containers:
- name: test
image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
command: ["/bin/sh", "-c", "nvidia-smi -L && sleep 3600"]
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
Results
Check the pod statuses:
Expected output:
gpu-pod-1 acquires the GPU and runs. gpu-pod-2 remains Pending because the GPU is already allocated. Describing the pending pod confirms the reason:
The relevant section of the output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 27s default-scheduler 0/3 nodes are available: 1 Insufficient nvidia.com/gpu, 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint(s). no new claims to deallocate, preemption: 0/3 nodes are available: 1 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
The Insufficient nvidia.com/gpu reason confirms that the scheduler could not find a node with a free GPU to satisfy the second pod's request.