What We'll Build
By the end of this workshop, you'll have a concrete cost audit of your Kubernetes cluster — actual utilization numbers, a cost-per-value ratio, and a clear action plan. You'll either right-size what you have or know it's time to migrate to something simpler.
Let me show you a pattern I use in every infrastructure engagement: the three-signal framework that tells you whether your cluster is earning its keep.
Prerequisites
- A running Kubernetes cluster (EKS, GKE, or AKS)
-
kubectlconfigured and pointed at your cluster - Access to your cloud billing dashboard
- ~30 minutes of uninterrupted time
Step 1: Measure Actual Resource Utilization
Run this right now:
kubectl top nodes
Write down the CPU and memory percentages for each node. If your average CPU utilization is below 30%, you're over-provisioned. Most startup clusters I audit sit at 12–18%. That means you're paying for five nodes and using one.
Step 2: Calculate Your Cost-Per-Value Ratio
Pull your monthly infrastructure bill and compare it against revenue. Here is the minimal framework:
| Signal | Threshold | You've Crossed the Cliff If... |
|---|---|---|
| Infra cost vs. revenue | >15% | Your $4K cluster eats into $8K MRR |
| Ops hours vs. feature hours | >1:1 | More Helm charts than product code |
| Node CPU utilization | <30% | Paying for capacity you don't use |
If two of three signals are red, keep reading — the next steps will save you real money.
Step 3: Install Namespace-Level Cost Attribution
You can't cut what you can't measure. Install OpenCost to tag every workload by service:
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace
In one engagement, this step alone revealed a forgotten staging deployment consuming 35% of cluster resources. Check for zombie workloads before optimizing anything else.
Step 4: Add Spot Instance Node Pools
Spot instances cut compute costs by 60–90%. Here is the minimal setup to get this working on EKS:
managedNodeGroups:
- name: spot-pool
instanceTypes: ["m5.large", "m5a.large", "m5d.large"]
spot: true
minSize: 1
maxSize: 5
desiredCapacity: 2
- name: on-demand-baseline
instanceType: m5.large
minSize: 1
maxSize: 2
Run stateless workloads on spot. Keep databases and stateful services on on-demand. This typically reduces compute spend by 40–55%.
Step 5: Deploy the Vertical Pod Autoscaler
Most teams set resource requests based on guesswork. VPA watches actual usage and recommends right-sized values:
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd-gen.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-rbac.yaml
Then create a VPA resource for each deployment:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Start with recommendations only
Set updateMode: "Off" first. Review the recommendations for a week before enabling auto-updates. I've seen memory requests drop from 512Mi to 90Mi — across 20 pods, that frees an entire node.
Step 6: Decide — Right-Size or Retreat
If your team is under 10 engineers, your traffic is under 1,000 RPS, and you lack dedicated platform engineering — consider leaving K8s entirely. Here is the decision matrix:
| Kubernetes | Cloud Run | Fly.io | |
|---|---|---|---|
| Monthly cost (3 services) | $2,600–$4,000 | $50–$300 | $30–$150 |
| Ops overhead | High | Near-zero | Low |
| Deployment | Helm/ArgoCD | gcloud run deploy |
fly deploy |
If you do migrate, go service by service: extract databases to managed services first, move your lowest-traffic service as a proof of concept, then cut the cluster only after production validation.
Gotchas
-
The docs don't mention this, but
kubectl toprequires metrics-server. If you get errors, install it first:kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml - VPA and HPA conflict on the same metric. Don't set VPA to auto-update CPU if HPA is also scaling on CPU utilization.
- Spot interruptions are real. Always run at least one on-demand node as a baseline, and set Pod Disruption Budgets for critical services.
- OpenCost needs Prometheus. If you don't already run Prometheus, the bundled install handles it — but watch the resource footprint of the monitoring stack itself.
Conclusion
Here is the gotcha that will save you hours: most startups don't have a scaling problem, they have a spending problem. Run through these six steps today. You'll either cut your K8s bill by 40–60% with spot instances and VPA, or you'll have the data to confidently migrate to Cloud Run or Fly.io — and redirect that budget into building product.
Top comments (0)