Dynamic Resource Allocation (DRA)

Dynamic Resource Allocation (DRA) is a Kubernetes mechanism for requesting and sharing hardware resources such as GPUs across pods and containers. DRA is enabled by default in Kubernetes 1.34. For more details, see the Kubernetes DRA documentation.

Prerequisites

Before proceeding, make sure you have a Kubernetes cluster created using a Nvidia cluster template with a GPU-enabled worker node attached. See Creating a Kubernetes cluster with GPUs for instructions.

Once the cluster is ready, download the kubeconfig and make it available locally:

openstack --os-cloud breqwatr coe cluster config <cluster-name>

Then either copy the file to ~/.kube/config or export the path:

export KUBECONFIG=<path-to-kubeconfig>

Install the Nvidia GPU Operator

Add the Nvidia Helm repository:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

Install the GPU operator. The driver.enabled=false flag is set because the Nvidia driver is already baked into the Kubernetes node image, and devicePlugin.enabled=false because DRA replaces the legacy device plugin:

helm install nvidia-gpu-operator nvidia/gpu-operator \
  --version v26.3.1 \
  -n nvidia-gpu-operator \
  --set devicePlugin.enabled=false \
  --set driver.enabled=false \
  --create-namespace

Wait for all GPU operator pods to reach a running state:

watch kubectl get pods -n nvidia-gpu-operator

Expected output once completed:

NAME                                                              READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-5rbc7                                       1/1     Running     0          19h
gpu-operator-66f98555c5-rjdxp                                     1/1     Running     0          19h
nvidia-container-toolkit-daemonset-xvb4g                          1/1     Running     0          19h
nvidia-cuda-validator-m2t87                                       0/1     Completed   0          19h
nvidia-dcgm-exporter-gc2mh                                        1/1     Running     0          19h
nvidia-gpu-operator-node-feature-discovery-gc-66bf76fb94-j4l97    1/1     Running     0          19h
nvidia-gpu-operator-node-feature-discovery-master-64db499drkwpb   1/1     Running     0          19h
nvidia-gpu-operator-node-feature-discovery-worker-bwfh8           1/1     Running     0          19h
nvidia-gpu-operator-node-feature-discovery-worker-mz99t           1/1     Running     0          19h
nvidia-operator-validator-xh2wz                                   1/1     Running     0          19h

Verify the DRA API

Confirm the DRA API resources are available in the cluster:

kubectl api-resources | grep resource.k8s.io

Expected output:

deviceclasses                                                resource.k8s.io/v1                  false        DeviceClass
resourceclaims                                               resource.k8s.io/v1                  true         ResourceClaim
resourceclaimtemplates                                       resource.k8s.io/v1                  true         ResourceClaimTemplate
resourceslices                                               resource.k8s.io/v1                  false        ResourceSlice

Install the Nvidia DRA Driver

Install the Nvidia DRA driver Helm chart:

helm install nvidia-dra-driver nvidia/nvidia-dra-driver-gpu \
  --version 25.12.0 \
  -n nvidia-dra-driver \
  --create-namespace \
  --set gpuResourcesEnabledOverride=true

Wait for the DRA driver pods to reach a running state:

watch kubectl get pods -n nvidia-dra-driver

Expected output once completed:

NAME                                                READY   STATUS    RESTARTS   AGE
nvidia-dra-driver-gpu-controller-67b55db587-2vpkr   1/1     Running   0          19h
nvidia-dra-driver-gpu-kubelet-plugin-6chfs          2/2     Running   0          19h

Verify the DRA Setup

Check that resource slices have been created:

kubectl get resourceslices

Expected output:

NAME                                                              NODE                                          DRIVER                      POOL                                          AGE
kube-qz0e0-default-worker-6w8ws-kcbtn-fbvfb-compute-domaintsmhd   kube-qz0e0-default-worker-6w8ws-kcbtn-fbvfb   compute-domain.nvidia.com   kube-qz0e0-default-worker-6w8ws-kcbtn-fbvfb   19h
kube-qz0e0-default-worker-6w8ws-kcbtn-fbvfb-gpu.nvidia.comgpkbl   kube-qz0e0-default-worker-6w8ws-kcbtn-fbvfb   gpu.nvidia.com              kube-qz0e0-default-worker-6w8ws-kcbtn-fbvfb   19h

Check that device classes are available:

kubectl get deviceclasses

Expected output:

NAME                                        AGE
compute-domain-daemon.nvidia.com            19h
compute-domain-default-channel.nvidia.com   19h
gpu.nvidia.com                              19h
mig.nvidia.com                              19h
vfio.gpu.nvidia.com                         19h

Test GPU Access with DRA

Create a ResourceClaimTemplate that requests a single GPU:

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: gpu-claim-template
spec:
  spec:
    devices:
      requests:
      - name: single-gpu
        exactly:
          deviceClassName: gpu.nvidia.com
          allocationMode: ExactCount
          count: 1

Create a test deployment that claims the GPU resource:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dra-gpu-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dra-gpu-test
  template:
    metadata:
      labels:
        app: dra-gpu-test
    spec:
      runtimeClassName: nvidia
      resourceClaims:
        - name: single-gpu
          resourceClaimTemplateName: gpu-claim-template
      containers:
        - name: cuda-test
          image: nvidia/cuda:13.0.1-base-ubi9
          command: ["/bin/sh", "-c"]
          args:
            - nvidia-smi && sleep 3600
          resources:
            claims:
              - name: single-gpu

Once the deployment is running, check the logs to confirm the GPU is accessible:

kubectl logs deploy/dra-gpu-test

Expected output:

Tue May  5 14:48:07 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05              Driver Version: 595.71.05      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:05.0 Off |                    0 |
| N/A   34C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+