DEV Community

Muskan
Muskan

Posted on • Originally published at zop.dev

A Kubernetes cluster is one line on your bill, so you cannot see which namespace burns the money

Your cloud bill shows a Kubernetes cluster as a single number. EKS, GKE, and AKS all roll node compute into one rolled-up figure. Finance sees the total. Nobody sees which namespace

Your cloud bill shows a Kubernetes cluster as a single number. EKS, GKE, and AKS all roll node compute into one rolled-up figure. Finance sees the total. Nobody sees which namespace or deployment produced it.

We call this the cluster line-item problem. The cluster total tells you that spending exists. It does not tell you where the spending comes from, so you cannot assign it, defend it, or cut it. ZopNight v1.18.0 breaks that single line down per workload across AWS, GCP, and Azure: spend attributed to each namespace and each deployment, plus how much of it is idle waste you pay for but never use.

Per-namespace breakdown answers the question the cluster total cannot

A cluster total answers one question: how much did the cluster cost. That is the only question it can answer.

Push the same spend down to the namespace and you answer a different question: which team or product owns the cost. Push it down to the deployment and you answer a sharper one: which workload is the cost, and is it running hot or sitting idle. Each level of attribution unlocks a decision the level above cannot support.

Granularity What it tells you What you can act on
Cluster bill (one line) Total spend exists Approve or question the invoice
Per-namespace Which team or product owns the spend Assign cost, set a budget, run cloud cost allocation
Per-deployment Which workload drives the spend Right-size, scale down, or kill it
Idle waste per workload What you pay for but never run Reclaim the gap between request and use

This is the same discipline that FinOps for engineering teams depends on. You cannot hold a team accountable for a number they cannot see. The breakdown turns a shared, anonymous total into named, ownable lines.

Idle waste is the gap between what you request and what you actually run

Idle waste is not a vague inefficiency. It is a measurable gap with a clear mechanism.

A deployment sets CPU and memory requests. The Kubernetes scheduler reserves that capacity on a node and refuses to hand it to anything else. If the deployment requests 4 vCPU and uses 1, the other 3 sit reserved and unused. You paid for the node, so you paid for those 3.

That waste happens because requests are a promise, not a measurement. Teams pad requests to avoid throttling and out-of-memory kills, which is rational, but the padding becomes permanent. The per-workload breakdown surfaces this gap as a line you can see, the same way Kubernetes resource requests drain budgets when nobody audits them. Idle waste is the difference between provisioned cost and used cost, attributed to the deployment that caused it.

The breakdown sums back to the discounted VM bill, so nothing double-counts

Here is the honesty detail that makes the breakdown safe to trust. Workload cost is not a new charge. It is a split of the VM spend you already pay.

ZopNight takes the node and VM cost the cluster already incurs, with reservations, savings plans, and spot discounts already applied, then divides that exact amount across the workloads running on those nodes. The numbers you see per namespace and per deployment add back up to the cluster total on your cloud bill. Summaries never invent a second charge on top of the one you already have.

This matters because naive attribution tools double-count. They price each workload at on-demand rates, ignore your commitments, and report a figure larger than the actual invoice. That breaks trust the first time finance reconciles it. Tying the breakdown to the discounted bill is the same reconciliation rigor behind cloud cost anomaly detection: the number has to match reality or it is noise.

diagram

Start by attributing the most expensive cluster first

Do not boil the ocean. Point the per-workload breakdown at your single most expensive cluster, find the top three namespaces, and ask each owning team to defend its number.

This works when your workloads set requests and your nodes carry meaningful spend. It breaks when a cluster runs almost no real load, because dividing a small bill across tiny workloads produces noise, not insight. It also breaks when shared overhead, system namespaces and unused node headroom, dwarfs application spend; in that case fix node packing first by right-sizing your node groups.

The breakdown runs the same way across EKS, GKE, and AKS, because the mechanism is provider-agnostic: take the discounted VM spend, attribute it down to namespace and deployment, expose idle waste, and reconcile the sum back to the bill. The cluster line-item problem is solved the moment a single number becomes a list you can name.

Frequently Asked Questions

Q: How does per-namespace breakdown answers the question the cluster total cannot apply in practice?

See the section above titled "Per-namespace breakdown answers the question the cluster total cannot" for the full breakdown with examples.

Q: How does idle waste is the gap between what you request and what you actually run apply in practice?

See the section above titled "Idle waste is the gap between what you request and what you actually run" for the full breakdown with examples.

Q: How does the breakdown sums back to the discounted vm bill, so nothing double-counts apply in practice?

See the section above titled "The breakdown sums back to the discounted VM bill, so nothing double-counts" for the full breakdown with examples.

Q: How does start by attributing the most expensive cluster first apply in practice?

See the section above titled "Start by attributing the most expensive cluster first" for the full breakdown with examples.


Drop a comment if you've audited a similar spike. What was the dominant cause for your team? Share what worked or what blew up.

Top comments (0)