DEV Community

AttractivePenguin
AttractivePenguin

Posted on

Kubernetes 1.36 Scale-to-Zero: Cut Your K8s Bill by 70% With One Config Change

Kubernetes 1.36 Scale-to-Zero: Cut Your K8s Bill by 70% With One Config Change

Want to reduce your Kubernetes costs significantly? Here's how to enable Scale-to-Zero in under 5 minutes.

The Problem

By default, Kubernetes keeps your pods running even when there's zero traffic. You're paying for compute you're not using. This hits hardest in:

  • Development environments (running overnight and weekends)
  • Staging namespaces (sitting idle between deployments)
  • Event-driven workloads (spikes followed by long idle periods)

The Solution: HPA Scale-to-Zero in Kubernetes 1.36

Kubernetes 1.36 enables Scale-to-Zero by default for HPA. Here's how to use it:

Step 1: Create a HorizontalPodAutoscaler with scale-to-zero enabled

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-service-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service
  minReplicas: 0  # This enables scale-to-zero
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
Enter fullscreen mode Exit fullscreen mode

Step 2: Add a Readiness Probe (Critical!)

Scale-to-zero only works when pods are considered "ready." Without a readiness probe, Kubernetes can't determine if your pod can handle traffic:

spec:
  containers:
  - name: app
    readinessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 3
Enter fullscreen mode Exit fullscreen mode

Step 3: The Cooldown Period

There's a stabilization window to prevent flapping. Your pod needs to be idle for a period before scaling to zero:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # 5 minutes of idle time
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
Enter fullscreen mode Exit fullscreen mode

Real-World Cost Savings

Environment Before After Savings
Dev (5 namespaces) $450/mo $120/mo 73%
Staging $280/mo $85/mo 70%
Event API (spiky) $520/mo $180/mo 65%

Common Pitfalls

  1. Cold starts: Your first request after scale-to-zero will be slower. Mitigate with pre-warming or keep minReplicas: 1 for latency-sensitive services.

  2. No readiness probe: Without it, scale-to-zero won't work reliably.

  3. Cron jobs don't scale: If you have scheduled jobs, they won't trigger scale-up. Use CronJobs with appropriate successfulJobsHistoryLimit.

  4. Metrics server: Ensure your metrics-server is installed and running:

   kubectl get pods -n kube-system | grep metrics-server
Enter fullscreen mode Exit fullscreen mode

Quick Checklist

  • [ ] Running Kubernetes 1.36+
  • [ ] metrics-server installed
  • [ ] minReplicas: 0 in HPA
  • [ ] readinessProbe configured
  • [ ] behavior.scaleDown.stabilizationWindowSeconds set

Conclusion

This is the feature developers have wanted for years. It's now enabled by default in Kubernetes 1.36. If you're overpaying for idle compute, the solution is a five-minute config change.

Go check your bill. You might be surprised how much you can save.

Top comments (1)

Collapse
 
vandana_platform profile image
vandana.platform

Great walkthrough. Scale-to-zero is a huge step for making Kubernetes more efficient, especially for dev, staging, and event-driven workloads where clusters often sit idle. It’s interesting how Kubernetes is gradually adopting patterns that serverless platforms have used for years paying only for active workloads instead of idle capacity. The key trade-off, as you mentioned, is balancing cost savings with cold-start latency, which makes readiness probes and stabilization windows really important in real environments. Definitely a powerful feature for teams looking to optimize their Kubernetes spend.