SoftwareDevs mvpfactory.io

Posted on Mar 9 • Originally published at mvpfactory.io

Workshop: Auditing Your Kubernetes Costs in 30 Minutes

#webdev #programming

What We'll Build

By the end of this workshop, you'll have a concrete cost audit of your Kubernetes cluster — actual utilization numbers, a cost-per-value ratio, and a clear action plan. You'll either right-size what you have or know it's time to migrate to something simpler.

Let me show you a pattern I use in every infrastructure engagement: the three-signal framework that tells you whether your cluster is earning its keep.

Prerequisites

A running Kubernetes cluster (EKS, GKE, or AKS)
kubectl configured and pointed at your cluster
Access to your cloud billing dashboard
~30 minutes of uninterrupted time

Step 1: Measure Actual Resource Utilization

Run this right now:

kubectl top nodes

Write down the CPU and memory percentages for each node. If your average CPU utilization is below 30%, you're over-provisioned. Most startup clusters I audit sit at 12–18%. That means you're paying for five nodes and using one.

Step 2: Calculate Your Cost-Per-Value Ratio

Pull your monthly infrastructure bill and compare it against revenue. Here is the minimal framework:

Signal	Threshold	You've Crossed the Cliff If...
Infra cost vs. revenue	>15%	Your $4K cluster eats into $8K MRR
Ops hours vs. feature hours	>1:1	More Helm charts than product code
Node CPU utilization	<30%	Paying for capacity you don't use

If two of three signals are red, keep reading — the next steps will save you real money.

Step 3: Install Namespace-Level Cost Attribution

You can't cut what you can't measure. Install OpenCost to tag every workload by service:

helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace

In one engagement, this step alone revealed a forgotten staging deployment consuming 35% of cluster resources. Check for zombie workloads before optimizing anything else.

Step 4: Add Spot Instance Node Pools

Spot instances cut compute costs by 60–90%. Here is the minimal setup to get this working on EKS:

managedNodeGroups:
  - name: spot-pool
    instanceTypes: ["m5.large", "m5a.large", "m5d.large"]
    spot: true
    minSize: 1
    maxSize: 5
    desiredCapacity: 2
  - name: on-demand-baseline
    instanceType: m5.large
    minSize: 1
    maxSize: 2

Run stateless workloads on spot. Keep databases and stateful services on on-demand. This typically reduces compute spend by 40–55%.

Step 5: Deploy the Vertical Pod Autoscaler

Most teams set resource requests based on guesswork. VPA watches actual usage and recommends right-sized values:

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd-gen.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-rbac.yaml

Then create a VPA resource for each deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Start with recommendations only

Set updateMode: "Off" first. Review the recommendations for a week before enabling auto-updates. I've seen memory requests drop from 512Mi to 90Mi — across 20 pods, that frees an entire node.

Step 6: Decide — Right-Size or Retreat

If your team is under 10 engineers, your traffic is under 1,000 RPS, and you lack dedicated platform engineering — consider leaving K8s entirely. Here is the decision matrix:

	Kubernetes	Cloud Run	Fly.io
Monthly cost (3 services)	$2,600–$4,000	$50–$300	$30–$150
Ops overhead	High	Near-zero	Low
Deployment	Helm/ArgoCD	`gcloud run deploy`	`fly deploy`

If you do migrate, go service by service: extract databases to managed services first, move your lowest-traffic service as a proof of concept, then cut the cluster only after production validation.

Gotchas

The docs don't mention this, but kubectl top requires metrics-server. If you get errors, install it first: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
VPA and HPA conflict on the same metric. Don't set VPA to auto-update CPU if HPA is also scaling on CPU utilization.
Spot interruptions are real. Always run at least one on-demand node as a baseline, and set Pod Disruption Budgets for critical services.
OpenCost needs Prometheus. If you don't already run Prometheus, the bundled install handles it — but watch the resource footprint of the monitoring stack itself.

Conclusion

Here is the gotcha that will save you hours: most startups don't have a scaling problem, they have a spending problem. Run through these six steps today. You'll either cut your K8s bill by 40–60% with spot instances and VPA, or you'll have the data to confidently migrate to Cloud Run or Fly.io — and redirect that budget into building product.

DEV Community