Skip to content

Cluster Autoscaling

The Kubernetes Cluster Autoscaler adds worker nodes when pods can't be scheduled for lack of resources, and removes nodes once they're no longer needed. Breqwatr Cloud clusters wire it up at create time via Magnum labels on the worker node group; the autoscaler itself runs as a Deployment inside the cluster (installed below).

This is a CLI-only flow today. See Use the OpenStack CLI for the clouds.yaml setup the --os-cloud breqwatr selector reads from.

Create a Cluster with Autoscaling Enabled

Pass the auto_scaling_enabled=true label when creating the cluster. Use min_node_count and max_node_count to set the autoscaling bounds:

openstack --os-cloud breqwatr coe cluster create cluster-autoscaling \
  --cluster-template <cluster-template> \
  --keypair <keypair> \
  --master-count 1 \
  --node-count 1 \
  --master-flavor <master-flavor> \
  --flavor <worker-flavor> \
  --labels boot_volume_size=50,boot_volume_type=ceph,kube_tag=v1.34.3,auto_scaling_enabled=true,min_node_count=1,max_node_count=3

Once the cluster reaches CREATE_COMPLETE, verify the autoscaling labels are present on the worker node group:

openstack --os-cloud breqwatr coe nodegroup show cluster-autoscaling default-worker

Expected output:

+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field              | Value                                                                                                                                                       |
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uuid               | ef7916ff-b26a-46dd-9177-8b4af466b121                                                                                                                        |
| name               | default-worker                                                                                                                                              |
| cluster_id         | 2fc40fe9-da4f-4fe9-9446-ebe14a6b09a6                                                                                                                        |
| project_id         | bcb98105f709408ba9122cf8d8055d6f                                                                                                                            |
| docker_volume_size | None                                                                                                                                                        |
| labels             | {'boot_volume_size': '50', 'boot_volume_type': 'ceph', 'kube_tag': 'v1.34.3', 'auto_scaling_enabled': 'true', 'min_node_count': '1', 'max_node_count': '3'} |
| labels_overridden  | {}                                                                                                                                                          |
| labels_skipped     | {}                                                                                                                                                          |
| labels_added       | {}                                                                                                                                                          |
| flavor_id          | v1.c4r8                                                                                                                                                     |
| image_id           | rockylinux-9-kube-v1.34.3.qcow2                                                                                                                             |
| node_addresses     | []                                                                                                                                                          |
| node_count         | 1                                                                                                                                                           |
| role               | worker                                                                                                                                                      |
| max_node_count     | None                                                                                                                                                        |
| min_node_count     | 1                                                                                                                                                           |
| is_default         | True                                                                                                                                                        |
| stack_id           | kube-xbytb                                                                                                                                                  |
| status             | CREATE_COMPLETE                                                                                                                                             |
| status_reason      | None                                                                                                                                                        |
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+

The labels field confirms that autoscaling is enabled with a minimum of 1 and maximum of 3 worker nodes.

Test Scale-Up

Deploy a workload with 5 replicas, each requesting 1 CPU and 2Gi of memory. This exceeds the capacity of a single worker node, triggering the autoscaler to provision additional nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: autoscale-test
spec:
  replicas: 5
  selector:
    matchLabels:
      app: autoscale-test
  template:
    metadata:
      labels:
        app: autoscale-test
    spec:
      containers:
      - name: stress
        image: busybox
        command: ["sh", "-c", "sleep 3600"]
        resources:
          requests:
            cpu: "1000m"
            memory: "2Gi"

Initially, 2 pods will be in Pending state because the single worker node does not have enough capacity:

kubectl get pods

Expected output:

NAME                              READY   STATUS              RESTARTS   AGE
autoscale-test-6bbd775d5f-244c8   0/1     ContainerCreating   0          4s
autoscale-test-6bbd775d5f-bjwds   0/1     ContainerCreating   0          4s
autoscale-test-6bbd775d5f-cbt5v   0/1     Pending             0          4s
autoscale-test-6bbd775d5f-lppbm   0/1     ContainerCreating   0          4s
autoscale-test-6bbd775d5f-rh5gd   0/1     Pending             0          4s

The autoscaler detects the unschedulable pods and provisions a second worker node. After a few minutes, all pods will be running:

kubectl get nodes

Expected output:

NAME                                          STATUS   ROLES                  AGE   VERSION
kube-jyex6-2dnh2-n4zhx                        Ready    control-plane,master   26m   v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-frkz4   Ready    worker                 12m   v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-lfrjf   Ready    worker                 20m   v1.34.3
kubectl get pods

Expected output:

NAME                              READY   STATUS    RESTARTS   AGE
autoscale-test-6bbd775d5f-244c8   1/1     Running   0          15m
autoscale-test-6bbd775d5f-bjwds   1/1     Running   0          15m
autoscale-test-6bbd775d5f-cbt5v   1/1     Running   0          15m
autoscale-test-6bbd775d5f-lppbm   1/1     Running   0          15m
autoscale-test-6bbd775d5f-rh5gd   1/1     Running   0          15m

Scale the deployment to 10 replicas:

kubectl scale deploy/autoscale-test --replicas=10

Again, some pods will be in Pending state while the autoscaler provisions two more nodes:

kubectl get pods

Expected output:

NAME                              READY   STATUS    RESTARTS   AGE
autoscale-test-6bbd775d5f-244c8   1/1     Running   0          21m
autoscale-test-6bbd775d5f-7qxlb   0/1     Pending   0          3m28s
autoscale-test-6bbd775d5f-bjwds   1/1     Running   0          21m
autoscale-test-6bbd775d5f-cbt5v   1/1     Running   0          21m
autoscale-test-6bbd775d5f-hpgkk   0/1     Pending   0          3m28s
autoscale-test-6bbd775d5f-lppbm   1/1     Running   0          21m
autoscale-test-6bbd775d5f-ltjmj   1/1     Running   0          3m28s
autoscale-test-6bbd775d5f-mjwkl   0/1     Pending   0          3m28s
autoscale-test-6bbd775d5f-rh5gd   1/1     Running   0          21m
autoscale-test-6bbd775d5f-zlqdj   0/1     Pending   0          3m28s

After a few minutes, two more worker nodes are added and all pods reach Running state:

kubectl get nodes

Expected output:

NAME                                          STATUS   ROLES                  AGE     VERSION
kube-jyex6-2dnh2-n4zhx                        Ready    control-plane,master   35m     v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-5smw7   Ready    worker                 3m45s   v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-8n76c   Ready    worker                 3m36s   v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-frkz4   Ready    worker                 21m     v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-lfrjf   Ready    worker                 29m     v1.34.3
kubectl get pods

Expected output:

NAME                              READY   STATUS    RESTARTS   AGE
autoscale-test-6bbd775d5f-244c8   1/1     Running   0          24m
autoscale-test-6bbd775d5f-7qxlb   1/1     Running   0          7m6s
autoscale-test-6bbd775d5f-bjwds   1/1     Running   0          24m
autoscale-test-6bbd775d5f-cbt5v   1/1     Running   0          24m
autoscale-test-6bbd775d5f-hpgkk   1/1     Running   0          7m6s
autoscale-test-6bbd775d5f-lppbm   1/1     Running   0          24m
autoscale-test-6bbd775d5f-ltjmj   1/1     Running   0          7m6s
autoscale-test-6bbd775d5f-mjwkl   1/1     Running   0          7m6s
autoscale-test-6bbd775d5f-rh5gd   1/1     Running   0          24m
autoscale-test-6bbd775d5f-zlqdj   1/1     Running   0          7m6s

Hitting the Maximum Node Limit

Scaling beyond what the maximum node count can accommodate will leave pods permanently in Pending state, since the autoscaler cannot add more nodes than max_node_count allows:

kubectl scale deploy/autoscale-test --replicas=15
kubectl get pods | grep -i pend

Expected output:

autoscale-test-6bbd775d5f-h2484   0/1     Pending   0          59s
autoscale-test-6bbd775d5f-h5rnw   0/1     Pending   0          59s
autoscale-test-6bbd775d5f-x7csc   0/1     Pending   0          59s

The cluster is at its maximum of 3 worker nodes and cannot scale further. The 3 remaining pods will stay pending until either the replica count is reduced or the max_node_count is increased.

Test Scale-Down

Scale the deployment down to 3 replicas:

kubectl scale deploy/autoscale-test --replicas=3

All 3 pods consolidate onto a single worker node. The autoscaler detects that the other nodes are no longer needed and removes them approximately 10 minutes after they become idle:

kubectl get pods -o wide

Expected output:

NAME                              READY   STATUS    RESTARTS   AGE   IP               NODE                                          NOMINATED NODE   READINESS GATES
autoscale-test-6bbd775d5f-244c8   1/1     Running   0          44m   10.100.112.129   kube-jyex6-default-worker-zmb5k-td6wh-lfrjf   <none>           <none>
autoscale-test-6bbd775d5f-bjwds   1/1     Running   0          44m   10.100.112.131   kube-jyex6-default-worker-zmb5k-td6wh-lfrjf   <none>           <none>
autoscale-test-6bbd775d5f-wvtx8   1/1     Running   0          72s   10.100.112.132   kube-jyex6-default-worker-zmb5k-td6wh-lfrjf   <none>           <none>
kubectl get nodes

Expected output:

NAME                                          STATUS   ROLES                  AGE   VERSION
kube-jyex6-2dnh2-n4zhx                        Ready    control-plane,master   54m   v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-lfrjf   Ready    worker                 49m   v1.34.3

The cluster is back to its minimum of 1 worker node, respecting the min_node_count=1 label set at creation time.


GPU Cluster Autoscaling

The same autoscaling mechanism works with GPU worker nodes. When a pod requests a GPU that is unavailable, the autoscaler provisions a new GPU worker node to satisfy the demand.

Create a GPU Cluster with Autoscaling Enabled

Use a Nvidia cluster template and a GPU-enabled worker flavor. Set min_node_count and max_node_count as before:

openstack --os-cloud breqwatr coe cluster create cluster-autoscaling \
  --cluster-template <nvidia-cluster-template> \
  --keypair <keypair> \
  --master-count 1 \
  --node-count 1 \
  --master-flavor <master-flavor> \
  --flavor <worker-gpu-flavor> \
  --labels boot_volume_size=50,boot_volume_type=ceph,kube_tag=v1.34.3,auto_scaling_enabled=true,min_node_count=1,max_node_count=4

Install the Nvidia GPU Operator

Install the GPU operator so that GPU resources are advertised on worker nodes. The driver.enabled=false flag is set because the Nvidia driver is already baked into the node image:

helm install gpu-operator nvidia/gpu-operator \
  --version v26.3.1 \
  -n nvidia-gpu-operator \
  --create-namespace \
  --set driver.enabled=false

Wait for all GPU operator pods to reach a running state:

kubectl get pods -n nvidia-gpu-operator

Expected output:

NAME                                                          READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-k2hbk                                   1/1     Running     0          3m59s
gpu-operator-69b958b7fc-fslc8                                 1/1     Running     0          4m24s
gpu-operator-node-feature-discovery-gc-8fb8d5d8d-745hz        1/1     Running     0          4m24s
gpu-operator-node-feature-discovery-master-6758c899fb-z8q9n   1/1     Running     0          4m24s
gpu-operator-node-feature-discovery-worker-6v5cm              1/1     Running     0          4m24s
gpu-operator-node-feature-discovery-worker-cd5v7              1/1     Running     0          4m24s
nvidia-container-toolkit-daemonset-pl8m5                      1/1     Running     0          4m1s
nvidia-cuda-validator-t4jdp                                   0/1     Completed   0          2m36s
nvidia-dcgm-exporter-b989l                                    1/1     Running     0          3m59s
nvidia-device-plugin-daemonset-plqjl                          1/1     Running     0          4m
nvidia-operator-validator-r2wgl                               1/1     Running     0          4m1s

Verify the GPU is visible as a schedulable resource on the worker node:

kubectl get node <worker-node-name> \
  -o jsonpath='{.status.capacity}' | python3 -m json.tool | grep nvidia

Expected output:

    "nvidia.com/gpu": "1",

Test GPU Scale-Up

Deploy a workload that requests a GPU. The first replica will schedule on the existing worker node:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-autoscale-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-autoscale-test
  template:
    metadata:
      labels:
        app: gpu-autoscale-test
    spec:
      containers:
      - name: cuda-test
        image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
        command: ["/bin/sh", "-c"]
        args:
          - nvidia-smi && sleep 3600
        resources:
          limits:
            nvidia.com/gpu: 1
      tolerations:
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule

Verify the first pod is running:

kubectl get pods

Expected output:

NAME                                  READY   STATUS    RESTARTS   AGE
gpu-autoscale-test-5455857fc7-9qqtk   1/1     Running   0          60s

Scale to 2 replicas. Since each GPU worker node has only 1 GPU, the second pod cannot schedule on the existing node and will remain Pending:

kubectl scale deploy/gpu-autoscale-test --replicas=2
kubectl get pods

Expected output:

NAME                                  READY   STATUS    RESTARTS   AGE
gpu-autoscale-test-5455857fc7-9qqtk   1/1     Running   0          2m1s
gpu-autoscale-test-5455857fc7-qd6jj   0/1     Pending   0          11s

The autoscaler detects the unschedulable pod and provisions a new GPU worker node. Once the node is ready, the GPU operator deploys its components onto it:

kubectl get nodes

Expected output:

NAME                                          STATUS   ROLES                  AGE     VERSION
kube-h1qtv-default-worker-vfkbs-886wk-9tnxl   Ready    worker                 23m     v1.34.3
kube-h1qtv-default-worker-vfkbs-886wk-m4hmr   Ready    worker                 3m41s   v1.34.3
kube-h1qtv-sxt6v-qjb4x                        Ready    control-plane,master   31m     v1.34.3
kubectl get pods -n nvidia-gpu-operator

Expected output:

NAME                                                          READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-6pdcn                                   1/1     Running     0          3m7s
gpu-feature-discovery-k2hbk                                   1/1     Running     0          17m
gpu-operator-69b958b7fc-fslc8                                 1/1     Running     0          17m
gpu-operator-node-feature-discovery-gc-8fb8d5d8d-745hz        1/1     Running     0          17m
gpu-operator-node-feature-discovery-master-6758c899fb-z8q9n   1/1     Running     0          17m
gpu-operator-node-feature-discovery-worker-6ckvl              1/1     Running     0          3m50s
gpu-operator-node-feature-discovery-worker-6v5cm              1/1     Running     0          17m
gpu-operator-node-feature-discovery-worker-cd5v7              1/1     Running     0          17m
nvidia-container-toolkit-daemonset-nvg9v                      1/1     Running     0          3m7s
nvidia-container-toolkit-daemonset-pl8m5                      1/1     Running     0          17m
nvidia-cuda-validator-dgsrl                                   0/1     Completed   0          59s
nvidia-cuda-validator-t4jdp                                   0/1     Completed   0          16m
nvidia-dcgm-exporter-b989l                                    1/1     Running     0          17m
nvidia-dcgm-exporter-vwdx2                                    0/1     Running     0          3m8s
nvidia-device-plugin-daemonset-cf7rl                          1/1     Running     0          3m7s
nvidia-device-plugin-daemonset-plqjl                          1/1     Running     0          17m
nvidia-operator-validator-r2wgl                               1/1     Running     0          17m
nvidia-operator-validator-v95gh                               1/1     Running     0          3m8s

Once the GPU operator components are running on the new node, the pending pod is scheduled and both replicas are running:

kubectl get pods

Expected output:

NAME                                  READY   STATUS    RESTARTS   AGE
gpu-autoscale-test-5455857fc7-9qqtk   1/1     Running   0          13m
gpu-autoscale-test-5455857fc7-qd6jj   1/1     Running   0          11m

Verify GPU access by checking the logs of a running pod:

kubectl logs pod/gpu-autoscale-test-5455857fc7-9qqtk

Expected output:

Thu May  7 19:17:53 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05              Driver Version: 595.71.05      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:05.0 Off |                    0 |
| N/A   34C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Test GPU Scale-Down

Scale the deployment back to 1 replica:

kubectl scale deploy/gpu-autoscale-test --replicas=1

The second GPU worker node becomes idle and is removed approximately 10 minutes after it is no longer needed, returning the cluster to its minimum node count:

kubectl get nodes

Expected output:

NAME                                          STATUS   ROLES                  AGE   VERSION
kube-h1qtv-default-worker-vfkbs-886wk-9tnxl   Ready    worker                 40m   v1.34.3
kube-h1qtv-sxt6v-qjb4x                        Ready    control-plane,master   49m   v1.34.3