Secure Accelerator Access

Kubernetes enforces GPU isolation through the Nvidia device plugin. Only pods that explicitly request a nvidia.com/gpu resource are granted access to GPU devices. This page demonstrates that isolation through three tests: a pod with GPU access, a pod without GPU access, and an attempt by a second pod to claim an already-allocated GPU.

Test 1 — Pod Without GPU Request

Deploy a pod that does not request a GPU:

apiVersion: v1
kind: Pod
metadata:
  name: no-gpu-pod
  namespace: default
spec:
  containers:
  - name: test
    image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
    command: ["/bin/sh", "-c"]
    args:
      - |
        echo "=== GPU Device Access ==="
        ls /dev/nvidia* 2>/dev/null && echo "GPU devices found" || echo "No GPU devices - access denied"
        echo "=== NVIDIA_VISIBLE_DEVICES ==="
        echo "NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES:-not set}"
        echo "=== nvidia-smi ==="
        nvidia-smi -L 2>/dev/null || echo "nvidia-smi not available"
        sleep 3600

Test 2 — Pod With GPU Request

Deploy a pod that requests a GPU:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
  namespace: default
spec:
  containers:
  - name: test
    image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
    command: ["/bin/sh", "-c"]
    args:
      - |
        echo "=== GPU Device Access ==="
        ls /dev/nvidia* 2>/dev/null && echo "GPU devices found" || echo "No GPU devices"
        echo "=== NVIDIA_VISIBLE_DEVICES ==="
        echo "NVIDIA_VISIBLE_DEVICES=${NVIDIA_VISIBLE_DEVICES:-not set}"
        echo "=== nvidia-smi ==="
        nvidia-smi -L 2>/dev/null || echo "nvidia-smi not available"
        sleep 3600
    resources:
      limits:
        nvidia.com/gpu: 1
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

Wait for both pods to start:

kubectl get pods -w

Results

Check the logs of the pod that requested a GPU:

kubectl logs gpu-pod

Expected output:

=== GPU Device Access ===
/dev/nvidia-modeset
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia0
/dev/nvidiactl
GPU devices found
=== NVIDIA_VISIBLE_DEVICES ===
NVIDIA_VISIBLE_DEVICES=void
=== nvidia-smi ===
GPU 0: Tesla T4 (UUID: GPU-bc1545c8-4103-eb2b-b25d-cea283d0a7d4)

Check the logs of the pod that did not request a GPU:

kubectl logs no-gpu-pod

Expected output:

=== GPU Device Access ===
No GPU devices - access denied
=== NVIDIA_VISIBLE_DEVICES ===
NVIDIA_VISIBLE_DEVICES=all
=== nvidia-smi ===
nvidia-smi not available

The pod with the GPU request has /dev/nvidia0 mounted and can enumerate the device. The pod without the request has no GPU devices and nvidia-smi is unavailable.

Test 3 — GPU Exclusivity

A GPU allocated to one pod cannot be scheduled to a second pod on the same node. First, find the name of the GPU worker node:

kubectl get nodes -l nvidia.com/gpu.present=true \
  -o jsonpath='{.items[0].metadata.name}'

Deploy two pods to the same node, each requesting the single available GPU. Replace <gpu-node-name> with the node name returned above:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod-1
  namespace: default
spec:
  nodeSelector:
    kubernetes.io/hostname: <gpu-node-name>
  containers:
  - name: test
    image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
    command: ["/bin/sh", "-c", "nvidia-smi -L && sleep 3600"]
    resources:
      limits:
        nvidia.com/gpu: 1
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod-2
  namespace: default
spec:
  nodeSelector:
    kubernetes.io/hostname: <gpu-node-name>
  containers:
  - name: test
    image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
    command: ["/bin/sh", "-c", "nvidia-smi -L && sleep 3600"]
    resources:
      limits:
        nvidia.com/gpu: 1
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule

Results

Check the pod statuses:

kubectl get pods | grep gpu-pod

Expected output:

gpu-pod-1   1/1     Running   0          15s
gpu-pod-2   0/1     Pending   0          14s

gpu-pod-1 acquires the GPU and runs. gpu-pod-2 remains Pending because the GPU is already allocated. Describing the pending pod confirms the reason:

kubectl describe pod/gpu-pod-2

The relevant section of the output:

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  27s   default-scheduler  0/3 nodes are available: 1 Insufficient nvidia.com/gpu, 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint(s). no new claims to deallocate, preemption: 0/3 nodes are available: 1 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.

The Insufficient nvidia.com/gpu reason confirms that the scheduler could not find a node with a free GPU to satisfy the second pod's request.