Cluster Autoscaling
The Kubernetes Cluster Autoscaler adds worker nodes when pods can't be scheduled for lack of resources, and removes nodes once they're no longer needed. Breqwatr Cloud clusters wire it up at create time via Magnum labels on the worker node group; the autoscaler itself runs as a Deployment inside the cluster (installed below).
This is a CLI-only flow today. See Use the OpenStack CLI for the clouds.yaml setup the --os-cloud breqwatr selector reads from.
Create a Cluster with Autoscaling Enabled
Pass the auto_scaling_enabled=true label when creating the cluster. Use min_node_count and max_node_count to set the autoscaling bounds:
openstack --os-cloud breqwatr coe cluster create cluster-autoscaling \
--cluster-template <cluster-template> \
--keypair <keypair> \
--master-count 1 \
--node-count 1 \
--master-flavor <master-flavor> \
--flavor <worker-flavor> \
--labels boot_volume_size=50,boot_volume_type=ceph,kube_tag=v1.34.3,auto_scaling_enabled=true,min_node_count=1,max_node_count=3
Once the cluster reaches CREATE_COMPLETE, verify the autoscaling labels are present on the worker node group:
Expected output:
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uuid | ef7916ff-b26a-46dd-9177-8b4af466b121 |
| name | default-worker |
| cluster_id | 2fc40fe9-da4f-4fe9-9446-ebe14a6b09a6 |
| project_id | bcb98105f709408ba9122cf8d8055d6f |
| docker_volume_size | None |
| labels | {'boot_volume_size': '50', 'boot_volume_type': 'ceph', 'kube_tag': 'v1.34.3', 'auto_scaling_enabled': 'true', 'min_node_count': '1', 'max_node_count': '3'} |
| labels_overridden | {} |
| labels_skipped | {} |
| labels_added | {} |
| flavor_id | v1.c4r8 |
| image_id | rockylinux-9-kube-v1.34.3.qcow2 |
| node_addresses | [] |
| node_count | 1 |
| role | worker |
| max_node_count | None |
| min_node_count | 1 |
| is_default | True |
| stack_id | kube-xbytb |
| status | CREATE_COMPLETE |
| status_reason | None |
+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
The labels field confirms that autoscaling is enabled with a minimum of 1 and maximum of 3 worker nodes.
Test Scale-Up
Deploy a workload with 5 replicas, each requesting 1 CPU and 2Gi of memory. This exceeds the capacity of a single worker node, triggering the autoscaler to provision additional nodes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: autoscale-test
spec:
replicas: 5
selector:
matchLabels:
app: autoscale-test
template:
metadata:
labels:
app: autoscale-test
spec:
containers:
- name: stress
image: busybox
command: ["sh", "-c", "sleep 3600"]
resources:
requests:
cpu: "1000m"
memory: "2Gi"
Initially, 2 pods will be in Pending state because the single worker node does not have enough capacity:
Expected output:
NAME READY STATUS RESTARTS AGE
autoscale-test-6bbd775d5f-244c8 0/1 ContainerCreating 0 4s
autoscale-test-6bbd775d5f-bjwds 0/1 ContainerCreating 0 4s
autoscale-test-6bbd775d5f-cbt5v 0/1 Pending 0 4s
autoscale-test-6bbd775d5f-lppbm 0/1 ContainerCreating 0 4s
autoscale-test-6bbd775d5f-rh5gd 0/1 Pending 0 4s
The autoscaler detects the unschedulable pods and provisions a second worker node. After a few minutes, all pods will be running:
Expected output:
NAME STATUS ROLES AGE VERSION
kube-jyex6-2dnh2-n4zhx Ready control-plane,master 26m v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-frkz4 Ready worker 12m v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-lfrjf Ready worker 20m v1.34.3
Expected output:
NAME READY STATUS RESTARTS AGE
autoscale-test-6bbd775d5f-244c8 1/1 Running 0 15m
autoscale-test-6bbd775d5f-bjwds 1/1 Running 0 15m
autoscale-test-6bbd775d5f-cbt5v 1/1 Running 0 15m
autoscale-test-6bbd775d5f-lppbm 1/1 Running 0 15m
autoscale-test-6bbd775d5f-rh5gd 1/1 Running 0 15m
Scale the deployment to 10 replicas:
Again, some pods will be in Pending state while the autoscaler provisions two more nodes:
Expected output:
NAME READY STATUS RESTARTS AGE
autoscale-test-6bbd775d5f-244c8 1/1 Running 0 21m
autoscale-test-6bbd775d5f-7qxlb 0/1 Pending 0 3m28s
autoscale-test-6bbd775d5f-bjwds 1/1 Running 0 21m
autoscale-test-6bbd775d5f-cbt5v 1/1 Running 0 21m
autoscale-test-6bbd775d5f-hpgkk 0/1 Pending 0 3m28s
autoscale-test-6bbd775d5f-lppbm 1/1 Running 0 21m
autoscale-test-6bbd775d5f-ltjmj 1/1 Running 0 3m28s
autoscale-test-6bbd775d5f-mjwkl 0/1 Pending 0 3m28s
autoscale-test-6bbd775d5f-rh5gd 1/1 Running 0 21m
autoscale-test-6bbd775d5f-zlqdj 0/1 Pending 0 3m28s
After a few minutes, two more worker nodes are added and all pods reach Running state:
Expected output:
NAME STATUS ROLES AGE VERSION
kube-jyex6-2dnh2-n4zhx Ready control-plane,master 35m v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-5smw7 Ready worker 3m45s v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-8n76c Ready worker 3m36s v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-frkz4 Ready worker 21m v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-lfrjf Ready worker 29m v1.34.3
Expected output:
NAME READY STATUS RESTARTS AGE
autoscale-test-6bbd775d5f-244c8 1/1 Running 0 24m
autoscale-test-6bbd775d5f-7qxlb 1/1 Running 0 7m6s
autoscale-test-6bbd775d5f-bjwds 1/1 Running 0 24m
autoscale-test-6bbd775d5f-cbt5v 1/1 Running 0 24m
autoscale-test-6bbd775d5f-hpgkk 1/1 Running 0 7m6s
autoscale-test-6bbd775d5f-lppbm 1/1 Running 0 24m
autoscale-test-6bbd775d5f-ltjmj 1/1 Running 0 7m6s
autoscale-test-6bbd775d5f-mjwkl 1/1 Running 0 7m6s
autoscale-test-6bbd775d5f-rh5gd 1/1 Running 0 24m
autoscale-test-6bbd775d5f-zlqdj 1/1 Running 0 7m6s
Hitting the Maximum Node Limit
Scaling beyond what the maximum node count can accommodate will leave pods permanently in Pending state, since the autoscaler cannot add more nodes than max_node_count allows:
Expected output:
autoscale-test-6bbd775d5f-h2484 0/1 Pending 0 59s
autoscale-test-6bbd775d5f-h5rnw 0/1 Pending 0 59s
autoscale-test-6bbd775d5f-x7csc 0/1 Pending 0 59s
The cluster is at its maximum of 3 worker nodes and cannot scale further. The 3 remaining pods will stay pending until either the replica count is reduced or the max_node_count is increased.
Test Scale-Down
Scale the deployment down to 3 replicas:
All 3 pods consolidate onto a single worker node. The autoscaler detects that the other nodes are no longer needed and removes them approximately 10 minutes after they become idle:
Expected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
autoscale-test-6bbd775d5f-244c8 1/1 Running 0 44m 10.100.112.129 kube-jyex6-default-worker-zmb5k-td6wh-lfrjf <none> <none>
autoscale-test-6bbd775d5f-bjwds 1/1 Running 0 44m 10.100.112.131 kube-jyex6-default-worker-zmb5k-td6wh-lfrjf <none> <none>
autoscale-test-6bbd775d5f-wvtx8 1/1 Running 0 72s 10.100.112.132 kube-jyex6-default-worker-zmb5k-td6wh-lfrjf <none> <none>
Expected output:
NAME STATUS ROLES AGE VERSION
kube-jyex6-2dnh2-n4zhx Ready control-plane,master 54m v1.34.3
kube-jyex6-default-worker-zmb5k-td6wh-lfrjf Ready worker 49m v1.34.3
The cluster is back to its minimum of 1 worker node, respecting the min_node_count=1 label set at creation time.
GPU Cluster Autoscaling
The same autoscaling mechanism works with GPU worker nodes. When a pod requests a GPU that is unavailable, the autoscaler provisions a new GPU worker node to satisfy the demand.
Create a GPU Cluster with Autoscaling Enabled
Use a Nvidia cluster template and a GPU-enabled worker flavor. Set min_node_count and max_node_count as before:
openstack --os-cloud breqwatr coe cluster create cluster-autoscaling \
--cluster-template <nvidia-cluster-template> \
--keypair <keypair> \
--master-count 1 \
--node-count 1 \
--master-flavor <master-flavor> \
--flavor <worker-gpu-flavor> \
--labels boot_volume_size=50,boot_volume_type=ceph,kube_tag=v1.34.3,auto_scaling_enabled=true,min_node_count=1,max_node_count=4
Install the Nvidia GPU Operator
Install the GPU operator so that GPU resources are advertised on worker nodes. The driver.enabled=false flag is set because the Nvidia driver is already baked into the node image:
helm install gpu-operator nvidia/gpu-operator \
--version v26.3.1 \
-n nvidia-gpu-operator \
--create-namespace \
--set driver.enabled=false
Wait for all GPU operator pods to reach a running state:
Expected output:
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-k2hbk 1/1 Running 0 3m59s
gpu-operator-69b958b7fc-fslc8 1/1 Running 0 4m24s
gpu-operator-node-feature-discovery-gc-8fb8d5d8d-745hz 1/1 Running 0 4m24s
gpu-operator-node-feature-discovery-master-6758c899fb-z8q9n 1/1 Running 0 4m24s
gpu-operator-node-feature-discovery-worker-6v5cm 1/1 Running 0 4m24s
gpu-operator-node-feature-discovery-worker-cd5v7 1/1 Running 0 4m24s
nvidia-container-toolkit-daemonset-pl8m5 1/1 Running 0 4m1s
nvidia-cuda-validator-t4jdp 0/1 Completed 0 2m36s
nvidia-dcgm-exporter-b989l 1/1 Running 0 3m59s
nvidia-device-plugin-daemonset-plqjl 1/1 Running 0 4m
nvidia-operator-validator-r2wgl 1/1 Running 0 4m1s
Verify the GPU is visible as a schedulable resource on the worker node:
kubectl get node <worker-node-name> \
-o jsonpath='{.status.capacity}' | python3 -m json.tool | grep nvidia
Expected output:
Test GPU Scale-Up
Deploy a workload that requests a GPU. The first replica will schedule on the existing worker node:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-autoscale-test
spec:
replicas: 1
selector:
matchLabels:
app: gpu-autoscale-test
template:
metadata:
labels:
app: gpu-autoscale-test
spec:
containers:
- name: cuda-test
image: nvcr.io/nvidia/cuda:13.0.1-base-ubi9
command: ["/bin/sh", "-c"]
args:
- nvidia-smi && sleep 3600
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
Verify the first pod is running:
Expected output:
Scale to 2 replicas. Since each GPU worker node has only 1 GPU, the second pod cannot schedule on the existing node and will remain Pending:
Expected output:
NAME READY STATUS RESTARTS AGE
gpu-autoscale-test-5455857fc7-9qqtk 1/1 Running 0 2m1s
gpu-autoscale-test-5455857fc7-qd6jj 0/1 Pending 0 11s
The autoscaler detects the unschedulable pod and provisions a new GPU worker node. Once the node is ready, the GPU operator deploys its components onto it:
Expected output:
NAME STATUS ROLES AGE VERSION
kube-h1qtv-default-worker-vfkbs-886wk-9tnxl Ready worker 23m v1.34.3
kube-h1qtv-default-worker-vfkbs-886wk-m4hmr Ready worker 3m41s v1.34.3
kube-h1qtv-sxt6v-qjb4x Ready control-plane,master 31m v1.34.3
Expected output:
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-6pdcn 1/1 Running 0 3m7s
gpu-feature-discovery-k2hbk 1/1 Running 0 17m
gpu-operator-69b958b7fc-fslc8 1/1 Running 0 17m
gpu-operator-node-feature-discovery-gc-8fb8d5d8d-745hz 1/1 Running 0 17m
gpu-operator-node-feature-discovery-master-6758c899fb-z8q9n 1/1 Running 0 17m
gpu-operator-node-feature-discovery-worker-6ckvl 1/1 Running 0 3m50s
gpu-operator-node-feature-discovery-worker-6v5cm 1/1 Running 0 17m
gpu-operator-node-feature-discovery-worker-cd5v7 1/1 Running 0 17m
nvidia-container-toolkit-daemonset-nvg9v 1/1 Running 0 3m7s
nvidia-container-toolkit-daemonset-pl8m5 1/1 Running 0 17m
nvidia-cuda-validator-dgsrl 0/1 Completed 0 59s
nvidia-cuda-validator-t4jdp 0/1 Completed 0 16m
nvidia-dcgm-exporter-b989l 1/1 Running 0 17m
nvidia-dcgm-exporter-vwdx2 0/1 Running 0 3m8s
nvidia-device-plugin-daemonset-cf7rl 1/1 Running 0 3m7s
nvidia-device-plugin-daemonset-plqjl 1/1 Running 0 17m
nvidia-operator-validator-r2wgl 1/1 Running 0 17m
nvidia-operator-validator-v95gh 1/1 Running 0 3m8s
Once the GPU operator components are running on the new node, the pending pod is scheduled and both replicas are running:
Expected output:
NAME READY STATUS RESTARTS AGE
gpu-autoscale-test-5455857fc7-9qqtk 1/1 Running 0 13m
gpu-autoscale-test-5455857fc7-qd6jj 1/1 Running 0 11m
Verify GPU access by checking the logs of a running pod:
Expected output:
Thu May 7 19:17:53 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.71.05 Driver Version: 595.71.05 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:05.0 Off | 0 |
| N/A 34C P8 15W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Test GPU Scale-Down
Scale the deployment back to 1 replica:
The second GPU worker node becomes idle and is removed approximately 10 minutes after it is no longer needed, returning the cluster to its minimum node count:
Expected output: