The bill arrived on a Tuesday, up 38% on last month, and finance wanted to know why by Friday. The attachment was 2,400 lines of m5.2xlarge, persistent-disk-ssd, network-inter-region-egress. Nothing that mapped to a namespace, a team, or anything anyone in the room recognised.
If you run Kubernetes in production you've had this week, or you will. And the thing that makes it hard isn't the size of the number. It's that the cloud bill and the cluster are describing two entirely different systems in two entirely different languages, and the translation is your problem.
Right-sizing requests is where most of the advice lands, and it's not wrong, just downstream. You can't fix what you can't attribute. So before getting into tuning, it's worth looking at where the money actually goes, and why each category tends to quietly grow without anyone noticing.
Compute is the obvious one, usually 60 to 75% of the bill, and almost none of it is spent on the thing you'd expect. The mental model most people start with is that pods cost money, but pods don't appear on any invoice. Nodes do. A 16-vCPU node running at 40% average utilisation costs exactly the same as one running at 95%, and the scheduler will happily keep it alive as long as something on it has a reservation. Requests, in other words, are what you're really paying for, not usage.
This is where the familiar failures live. Someone sized a deployment during a load test six months ago, and the peak was real, but the other 99% of the time the pod uses a tenth of what it reserves. Every replica now holds ten times what it needs, and the autoscaler keeps the nodes warm to honour it. Then the node pools fragment: one for GPUs, one for high-memory workloads, one for ARM, each with its own minimum size and its own idle headroom, and you're paying the tax on all of them. Bin-packing looks fine on paper because bin-packing reads requests, not reality, and if the requests are fiction then so is the packing. Scale-up is fast, scale-down is deliberately slow, and a 9am spike can leave you holding nodes until lunchtime.
The honest answer to "why is compute so high" is almost never "we have too much traffic". It's that reservations exceed usage across fragmented pools, and the scheduler can only be as good as the numbers you hand it.
Storage is a different shape of problem, because storage doesn't spike, it accretes. It looks cheap per GB on the pricing page and it is, individually, and that's exactly why nobody notices when it quietly doubles. The default reclaim policy for most PersistentVolumes is Retain, which means scaling a StatefulSet from ten replicas to three leaves seven volumes behind, and they'll still be there next year unless someone goes looking. Snapshots compound the same way: a Helm chart installed in 2022 set up daily snapshots with a comment in the readme saying "configure retention in production", and of course nobody did, and now there are fourteen hundred of them. Log volumes behave the same. Retention configured at the application layer doesn't shrink the disk underneath, and disks can only grow, not shrink, on most providers. A Postgres instance given 500GB "to be safe" that uses 40GB will stay at 500GB forever.
Compute spikes get noticed because someone gets paged. Storage bloat is invisible until someone actually opens the console, which is rarely.
Then there's egress, which is the line that causes the most genuine surprise when it arrives, because it's the one least visible from inside the cluster. Prometheus has no idea what AWS is going to charge you for a cross-AZ hop. And cross-AZ is the big one: frontends in one zone, caches in another, databases in a third, with every request making two or three zone crossings both ways, each direction billed separately on most providers. A service doing ten thousand requests a second with a 2KB response between zones works out to roughly $400 a month in egress alone, and there's rarely just one service doing this.
Inter-region replication is the other classic. A DR plan was drafted, replication was set up, the plan was quietly shelved six months later, and the replication kept running. Image pulls do it too, in a smaller way, especially during cluster upgrades or autoscale churn where the same 2GB image gets pulled onto a fresh node for the fifteenth time that week. None of this shows up anywhere until the invoice lands.
The long tail is a thousand small cuts. A Service of type LoadBalancer provisions a dedicated cloud LB at fifteen to twenty-five dollars a month baseline, and teams create them freely, so the cluster ends up with eighty of them where an ingress controller would have covered sixty. Managed databases sit at minimum spec for services that got deprecated eighteen months ago, holding four megabytes of data on four gigabytes of instance. NAT gateways charge per GB processed, which means every byte leaving a private subnet, every image pull, every telemetry payload to a SaaS vendor, gets billed twice: once through the NAT, once as egress. And none of this is on any cluster dashboard, because none of it is a cluster concept.
Which is the thing worth sitting with, really, because it's the whole problem in miniature. The cloud bill speaks in SKUs: n2-standard-8, pd-ssd, inter-zone-egress. The cluster speaks in namespaces, workloads, teams, environments. Out of the box, nothing translates between them. There's no native way to answer "how much did the payments team spend last month" without a lot of manual correlation work, and that's not a tooling gap you can close by tuning CPU limits. Right-sizing is a real 15% win on the wrong axis. The bigger win, the one that changes how the conversation with finance goes, is knowing which 15% of workloads generate 60% of the spend, who owns them, and whether they're actually earning their keep. Once that's visible, the decisions get a lot easier. Until it is, you're arguing about requests in a vacuum.
I've been working on this problem for a while, which is what this site is about. Happy to answer specific questions in the comments.
Top comments (0)