Puneetha Jalagam

Posted on Jun 25

Why Your Kubernetes Cluster Is Probably Bigger Than It Needs to Be

#aws #kubernetes #cloud #devops

You set up Kubernetes, your apps are running, and everything looks fine. But every month, the cloud bill is higher than expected. You add a few more nodes to stay safe, and the cycle continues.

Here's the thing you're probably not short on resources. You likely have too many.

Most Kubernetes clusters are over-provisioned. Not because engineers are careless, but because the defaults, habits, and pressures of day-to-day work quietly push clusters to grow larger than they actually need to be.

Let's break down why that happens and what you can do about it.

First, What Does Oversized Mean?

Simple: your cluster is oversized when it's regularly paying for resources it doesn't use. A lot of reserved capacity sits idle while the bill keeps coming.

Why Does This Happen?

1. You're Reserving Way More Than You're Using

Every app running in Kubernetes declares in advance how much CPU and memory it needs. This declaration is called a resource request. Kubernetes uses it to decide where to place the app not based on what the app actually uses, but based on what it claimed it would use.

Here's the problem: most developers set these numbers too high. They're being careful, which makes sense. Nobody wants their app to crash because it ran out of memory. So they add extra buffer. Then a little more. And then their colleague copies those numbers for the next service.

Before long, you have a cluster where every app has reserved far more than it ever actually uses. The scheduler sees those reservations as "taken" and keeps spinning up new machines to fit everything even though the existing machines are mostly sitting idle.

If your apps are typically using 20–30% of what they've reserved, that's a red flag.

2. Everything Runs at Full Size, All the Time

Traffic is not constant. Most apps are busier during the day and quiet at night. But many clusters run the same number of app instances around the clock, regardless of actual demand.

That means you're paying full price at 3am for capacity you don't need until noon.

Kubernetes has tools to fix this. Autoscalers can automatically increase the number of running instances when traffic picks up, and reduce them when things quiet down. But a surprising number of teams either haven't set these up or have them configured in ways that don't actually help.

Without autoscaling, you're sizing your cluster for your busiest moment and paying for that 24 hours a day.

3. The Cluster Isn't Allowed to Shrink

Kubernetes can also automatically add and remove the underlying machines (called nodes) based on how much is running. When demand drops, it's supposed to remove nodes you no longer need.

But this often doesn't happen. Sometimes scale-down is turned off entirely. Sometimes the rules are too strict — apps are flagged as "can't be moved," which prevents machines from being safely emptied and removed.

The result: your cluster grows when traffic spikes but never shrinks when things calm down. It just stays large. And you keep paying for machines that have nothing meaningful running on them.

4. Old, Forgotten Workloads Are Still Running

This one catches almost every team eventually.

Someone spins up a test environment. A short-term project gets deployed. A proof of concept runs for a week. Then the work moves on — but nobody deletes those deployments.

They just sit there. Still reserving resources. Still keeping nodes alive. Still adding to your bill. Kubernetes doesn't clean these up automatically. If you don't delete them, they stay forever.

A quick monthly audit of what's actually running — and whether it should be — can free up more capacity than you'd expect.

5. Your Machine Sizes Don't Match Your Workloads

Think of fitting boxes into a truck. If your boxes are small but your truck is enormous, you'll never fill it efficiently. You end up with a lot of unused space that you're still paying to haul around.

The same thing happens in Kubernetes. If your apps are small but your nodes are very large, the scheduler can't pack them efficiently. Big sections of each machine go unused, but you're paying for the whole machine.

Getting the right balance between app size and machine size makes a real difference in how much capacity actually gets used.

A Quick Example

A startup runs five services on Kubernetes. Each service has four copies running, and each copy has claimed one full CPU.

That's 20 CPUs worth of reservations requiring 5 large machines to accommodate them.

But when they check the actual usage, each copy is only using about 15% of one CPU during normal hours. Their real CPU usage is closer to 3 cores. They're paying for 20.

By adjusting their reservations to reflect reality, turning on autoscaling, and letting the cluster shrink overnight, they get down to 2 machines for most of the day scaling up only when traffic actually demands it. Their bill drops by more than 60%.

That's not a special case. That's what most teams find when they look closely.

What You Can Actually Do

Look at what's really being used. Before changing anything, check actual usage against what's been reserved. Most cloud platforms show this. If you see apps using 20% or less of their reservations, that's where to start.

Bring reservations closer to reality. You don't need to cut everything to the bone. Set reservations to about 1.5x your typical usage enough breathing room, without massive waste.

Turn on autoscaling. Let the number of running instances grow with traffic and shrink when things slow down. Then check that your cluster is also allowed to remove idle machines not just add new ones.

Clean up what's not being used. Set aside an hour once a month to look at what's running across your cluster. Delete anything that's leftover from old projects or testing. It adds up.

Match machine sizes to your workloads. If your apps are mostly small, use smaller machines. You'll fill them more efficiently and waste less capacity.

Common Mistakes to Avoid

Copying resource values from tutorials without checking if they match your app's actual needs
Turning off scale-down "just to be safe" that's exactly when waste builds up
Forgetting about dev and staging clusters they're often idle most of the time but running at full size anyway
Thinking bigger clusters mean more reliability — reliability comes from good design, not extra machines

Conclusion

Kubernetes clusters don't get oversized overnight. It happens gradually a generous reservation here, a forgotten deployment there, autoscaling that never actually scales down. The costs compound quietly until someone finally looks closely at the bill.

The good news: none of this is hard to fix. You don't need a major migration or a weekend of downtime. You just need visibility into what's actually happening in your cluster and the habit of checking it regularly.

Key Takeaways

Kubernetes reserves resources based on what apps claim they need, not what they actually use — so inflated reservations waste real capacity
Running the same number of instances at 3am as at noon means paying peak prices all day
Clusters that can add nodes but never remove them will only ever grow
Forgotten test deployments and old projects quietly consume real resources
Machine sizes that don't match workload sizes lead to poor packing and wasted space
Fixing oversizing doesn't require downtime — it starts with just looking at your actual usage

FAQ

1. How do I know if my cluster is oversized?
Check average node utilization. If your machines are consistently running below 40–50% usage, you're paying for more than you need.

2. What is a resource request in Kubernetes?
It's the amount of CPU or memory an app claims it needs. Kubernetes uses this number not actual usage to decide where to place the app and whether to add more machines.

3. What happens if I lower my resource requests too much?
Your app might get throttled or crash if it hits memory limits. The goal is to match requests to realistic usage not to go as low as possible.

4. What's autoscaling and why does it matter?
Autoscaling automatically adjusts how many app instances are running based on actual demand. Without it, you're running the same number of instances regardless of whether anyone is using them.

5. Does Kubernetes clean up old deployments automatically?
No. Whatever you deploy stays running until someone manually deletes it. Regular cleanup is a team responsibility, not something Kubernetes handles for you.

6. Do I need to right-size my dev and staging clusters too?
Yes. Non-production clusters are often the worst offenders running 24/7 even when nobody is using them during nights and weekends.

7. How quickly can I expect to see savings?
If you adjust reservations and enable autoscaling, many teams see a noticeable difference within the first monthly billing cycle.

8. What's a realistic utilization target for cluster nodes?
Aim for 60–80% average utilization. Below 50% means you're carrying excess capacity. Above 85% means you're cutting it close during traffic spikes.

9. How often should I audit my cluster?
Once a month is a solid starting point. Check what's running, whether it should be, and how usage compares to what's been reserved.

10. Is right-sizing a one-time task?
No, it's an ongoing habit. As apps change and teams add new workloads, the same patterns of waste tend to creep back in without regular review.

Optimize Smarter with EcoScale

Reduce cloud costs, eliminate resource waste, and improve Kubernetes efficiency with actionable optimization insights.
Learn more at https://ecoscale.dev

DEV Community