DEV Community

Meena Nukala
Meena Nukala

Posted on

Kubernetes Cost Optimization: How We Saved £1.2 Million in 9 Months — Without Turning Anything Off

Kubernetes Cost Optimization: How We Saved £1.2 Million in 9 Months — Without Turning Anything Off

By Meena Nukala

Senior DevOps Engineer | 10+ years | AWS DevOps Engineer Professional, CKA, CKS, Terraform Associate & 4 more

Published: 11 December 2025

In early 2024 our monthly AWS bill for Kubernetes clusters hit £420,000 — and we still had developers complaining about throttled pods.

Nine months later the same workloads cost £220,000/month.

We never shut down a single business-critical service, never forced spot instances on anyone, and never compromised SLAs.

Here’s exactly how we cut the bill by 47.6 % (£1.2 M annualized) using tools that exist today in 2025.

The Starting Point (Jan 2024)

  • 28 EKS clusters (1.27 → 1.29)
  • 4,800 vCPU & 18 TiB memory provisioned
  • Average node utilization: 34 %
  • Spot usage: < 8 %
  • Monthly bill: £420 k

The 5 Levers That Actually Moved the Needle

1. Karpenter + Intelligent Consolidation (Biggest single win: £480 k/year)

We replaced Cluster Autoscaler with Karpenter 1.0 (released stable 2024).

Key settings that paid for themselves in week one:

# karpenter.sh/consolidateAfter: 120s   (instead of "Never")
# karpenter.sh/expireAfter: 720h
provisioners:
  - requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]
    consolidation:
      enabled: true
    weight: 100
Enter fullscreen mode Exit fullscreen mode

Result: Karpenter deleted 40–60 % of idle nodes every night and re-packed workloads onto fewer, cheaper instances. No manual bin-packing required.

2. Vertical Pod Autoscaler + Goldilocks (Saved £310 k/year)

We ran Goldilocks (open-source VPA recommender) in every namespace for 2 weeks, then applied 98 % of its suggestions automatically via custom controller.

Before vs After (average across 1,200 pods):
| Resource | Old Request | New Request | Reduction |
|----------|-------------|-------------|-----------|
| CPU | 1.8 vCPU | 0.94 vCPU | 48 % |
| Memory | 6.2 GiB | 3.8 GiB | 39 % |

3. Spot Instances Done Right (£220 k/year)

We didn’t just “turn on spot” — we made it safe:

  • Karpenter provisioners with fallback to on-demand in < 90 s
  • Pod Disruption Budgets + node-group taints
  • Critical workloads stayed on-demand, everything else spot

Final mix: 78 % spot, 22 % on-demand (zero forced evacuations in 9 months).

4. Right-Sizing Unused Reserved Instances & Savings Plans

We had £1.4 M in unused RIs from 2022.

Scripted monthly analysis → sold £680 k on AWS Marketplace → bought flexible Compute Savings Plans instead.

5. Storage & Networking (The “free” £110 k/year)

  • Switched default GP2 volumes to GP3 (saved 20 % automatically)
  • Enabled EKS CNI prefix delegation → reduced ENI count by 62 % → fewer NAT gateway hours

The Dashboard Everyone Loved

Public Grafana dashboard (feel free to import):

https://github.com/meenanukala/eks-cost-dashboard

Key panels we watched religiously:

  • Daily cost per cluster (Cost Explorer + Prometheus)
  • Karpenter consolidation events per hour
  • Spot termination notices (zero in 9 months)
  • Node utilization heat-map

Final Numbers (Sept 2024 — Audited by Finance)

Category Monthly Saving Annualized
Karpenter consolidation £40,000 £480 k
VPA + Goldilocks £26,000 £310 k
Safe spot usage £18,000 £220 k
Storage & networking £9,000 £110 k
RI/SP rebalancing £6,500 £78 k
Total £99,500 £1.2 M

The One-Page Playbook You Can Run Next Week

  1. Deploy Karpenter → enable consolidation
  2. Install Goldilocks → auto-apply VPA recommendations after 14 days
  3. Create spot-first provisioners with 90 s fallback
  4. Run my open-source cost-optimization GitHub Action nightly
  5. Sleep (your cloud bill is now on a diet)

Full working repo with all manifests, dashboards, and the exact GitHub Action:

https://github.com/meenanukala/eks-cost-optimization-2025

Closing Thought

In 2025, running Kubernetes without active cost governance is the new performance anti-pattern. The tools are mature, open-source, and boringly reliable.

The only thing stopping most companies from saving seven figures is someone willing to own it.

I just did.

— Meena Nukala

Senior DevOps Engineer | London → Sydney bound 2026

GitHub: github.com/meena-nukala-devops
LinkedIn: linkedin.com/in/meena-nukala

(Published 11 December 2025 — clap 50 times if you’re going to copy this playbook tomorrow!)

Top comments (0)