Kubernetes 1.36 Scale-to-Zero: Cut Your K8s Bill by 70% With One Config Change
Want to reduce your Kubernetes costs significantly? Here's how to enable Scale-to-Zero in under 5 minutes.
The Problem
By default, Kubernetes keeps your pods running even when there's zero traffic. You're paying for compute you're not using. This hits hardest in:
- Development environments (running overnight and weekends)
- Staging namespaces (sitting idle between deployments)
- Event-driven workloads (spikes followed by long idle periods)
The Solution: HPA Scale-to-Zero in Kubernetes 1.36
Kubernetes 1.36 enables Scale-to-Zero by default for HPA. Here's how to use it:
Step 1: Create a HorizontalPodAutoscaler with scale-to-zero enabled
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-service-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-service
minReplicas: 0 # This enables scale-to-zero
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Step 2: Add a Readiness Probe (Critical!)
Scale-to-zero only works when pods are considered "ready." Without a readiness probe, Kubernetes can't determine if your pod can handle traffic:
spec:
containers:
- name: app
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
Step 3: The Cooldown Period
There's a stabilization window to prevent flapping. Your pod needs to be idle for a period before scaling to zero:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 minutes of idle time
policies:
- type: Percent
value: 100
periodSeconds: 15
Real-World Cost Savings
| Environment | Before | After | Savings |
|---|---|---|---|
| Dev (5 namespaces) | $450/mo | $120/mo | 73% |
| Staging | $280/mo | $85/mo | 70% |
| Event API (spiky) | $520/mo | $180/mo | 65% |
Common Pitfalls
Cold starts: Your first request after scale-to-zero will be slower. Mitigate with pre-warming or keep minReplicas: 1 for latency-sensitive services.
No readiness probe: Without it, scale-to-zero won't work reliably.
Cron jobs don't scale: If you have scheduled jobs, they won't trigger scale-up. Use CronJobs with appropriate
successfulJobsHistoryLimit.Metrics server: Ensure your metrics-server is installed and running:
kubectl get pods -n kube-system | grep metrics-server
Quick Checklist
- [ ] Running Kubernetes 1.36+
- [ ] metrics-server installed
- [ ] minReplicas: 0 in HPA
- [ ] readinessProbe configured
- [ ] behavior.scaleDown.stabilizationWindowSeconds set
Conclusion
This is the feature developers have wanted for years. It's now enabled by default in Kubernetes 1.36. If you're overpaying for idle compute, the solution is a five-minute config change.
Go check your bill. You might be surprised how much you can save.
Top comments (1)
Great walkthrough. Scale-to-zero is a huge step for making Kubernetes more efficient, especially for dev, staging, and event-driven workloads where clusters often sit idle. Itβs interesting how Kubernetes is gradually adopting patterns that serverless platforms have used for years paying only for active workloads instead of idle capacity. The key trade-off, as you mentioned, is balancing cost savings with cold-start latency, which makes readiness probes and stabilization windows really important in real environments. Definitely a powerful feature for teams looking to optimize their Kubernetes spend.