DEV Community: Tal Shafir

10 Mistakes You're making in Kubernetes that cost you money

Tal Shafir — Sat, 14 Feb 2026 17:55:44 +0000

Kubernetes is an incredible tool, but it's also a complex tool with a lot of buttons and levers for you to tweak.
It's easy to make a mistake and end up paying more than you should.

In this article, I'll try to list some of the most common mistakes I've seen teams make and how to avoid them.

Not upgrading your cluster

This point is mainly focusing on managed Kubernetes services like GKE, EKS, AKS, etc.
Once your Kubernetes version is old enough you need to "upgrade" your cluster to use "extended support".

The Kubernetes community supports minor versions for approximately 14 months.
Old clusters aren't just a security risk. They are a hidden tax.
Once a version reaches its end-of-life, cloud providers force you into "extended support" to keep running it securely.
The cost for this is high - often jumping from $0.10/hr to $0.60/hr for the control plane.

Relying on the 'Power of 2' Instinct

We are wired to love powers of 2. It feels right. But in Kubernetes, this instinct is killing your cluster utilization.
It's a relic of a time where we used to choose specific VMs for our workloads. In a containerized world, it's a costly habit.

"How much resources does my app need?" - "Well, I'm not sure, but let's go with 2 vCPUs and 4GiB of RAM."

What's the problem you ask?

Let's think about our example with common cloud providers VM sizes.
(For this example I chose AWS latest generation with AMD based instances of the most common families C / M / R)

InstanceType	vCPUs	RAM
c8a.medium	1	2 GiB
m8a.medium	1	4 GiB
r8a.medium	1	8 GiB
c8a.large	2	4 GiB
m8a.large	2	8 GiB
r8a.large	2	16 GiB
c8a.xlarge	4	8 GiB
m8a.xlarge	4	16 GiB
r8a.xlarge	4	32 GiB
c8a.2xlarge	8	16 GiB
m8a.2xlarge	8	32 GiB
r8a.2xlarge	8	64 GiB

You're probably thinking, what's the problem? I'll give the cluster to choose from c8a.large and c8a.xlarge and will get 100% utilization, that would be awesome right?

Not exactly.

We're forgetting 2 things:

DaemonSets - Your cluster definitely has some DaemonSets running, like node-exporter, kube-proxy, etc. They need resources too
Node allocation overhead - the values we see in the table are the node's capacity not the allocatable resources (see how EKS calculates allocatable resources or check out this great article about allocatable resources)

Let's assume for a second we kept only c8a.large and c8a.xlarge and that our DaemonSets need around 100m cpu and 256MiB of memory.
c8a.large, even when ignoring allocatable vs capacity, is not enough for a single pod.
c8a.xlarge is enough for a single pod, but not for 2 - so we'll end up with 1 pod on a c8a.xlarge node with utilization of 53% CPU / 60% Memory (2,100m / 3,910m and 4,352MiB / ~7,168 MiB).

We'll be wasting almost half of our node.
Going for a larger instance can make this issue less bad, for example if we would go with c8a.2xlarge we'll end up with 3 pods on a c8a.2xlarge node with utilization of 77% CPU / 81% Memory (6,100m / 7,900m and 12,544MiB / 15,360 MiB).

Solution

Try to avoid the power of 2 instinct and try to think about where your workloads will run.
If for example we needed 3 pods of 2 VCPU and 4 GiB of RAM, unless we have a good reason to go with this specific size, choosing smaller/bigger pods and changing replicas accordingly will save us a lot of money.
For example, reducing the request to 1.8 vCPU and 3 GiB RAM allows the pod to fit efficiently on c8a.large with 95% CPU / 98% Memory utilization, or 2 pods on c8a.xlarge with 94% CPU / 89% Memory utilization.

Specific usage numbers:

c8a.large: 1,900m / 2,000m used.
c8a.xlarge: 3,700m / 3,910m used.

In this example we made the pods smaller, but sometimes it makes more sense to go larger — requesting 3.8 vCPU instead of 2 vCPU can pack better on certain node sizes for example.

💡 Tip: Avoid "over-fitting" your resource requests to specific node sizes or DaemonSet configurations—these can change over time. Aim for small improvements (like avoiding exact powers of 2) rather than calculating exact values to maximize utilization.

Not rightsizing your resources

In the previous section, we've seen that small tweaking of manual pod sizing can make a big change.
In general we want our resources to be accurate as well.

Allocating too much

For example, requesting 3 GiB of memory for a workload that rarely uses more than 2 GiB of memory is a waste of resources. The same can be said with allocating CPU.

Allocating too little

Requesting too few resources can result either in failures in the workloads that are under-allocated or with a harder thing to detect - the issue of noisy neighbors.
If many of your workloads are using 110% of their allocation, it's likely that some of them won't have available resources on the actual node.

Solution

Don't guess the workload resource requirements, measure them.
Usually you should start with higher resources than what you think is suitable and after a while you can measure the actual usage and adjust accordingly.

There are tools for doing that automatically or semi-automatically - like krr or VPA and other commercial tools.

Personally, I think that for realtime needs of your workloads Horizontal scaling should be preferred over Vertical scaling - but now that in-place pod resize has reached stable in Kubernetes v1.35 and many of its earlier limitations have been removed, Vertical scaling has become a much more viable option for certain use-cases.

Choosing the wrong instance types

A lot of times, teams choose a limited amount of instance types for their clusters.
This was very common during the early days of Cluster Autoscaler.

Not allowing for enough sizes

Let's take our example from the previous section, we have a 2 VCPU and 4 GiB of RAM workload.
If we would only allow for c8a.large and c8a.xlarge we would end up with 1 pod on a c8a.xlarge node with utilization of around 50% but by allowing for c8a.2xlarge we would sometimes end up with 3 pods for closer to 80% utilization.

Generally speaking, larger nodes are more cost-efficient, you only need one set of DaemonSets and the Kubelet overhead and similar are lower for more resources you use - for example in CPU 6% is reserved for the first core, while only 0.25% will be reserved for the fourth core onwards.

Not mixing instance families

The cloud-providers have different categories for instance types, for example AWS has c, m, r families - compute optimized / general purpose / memory optimized, respectively.
Depending on your use-case, some families may suit your workload better than others.

I recall a specific case where a workload needed to be scaled up due to memory issues. Our DevOps team was concerned that the new instance size would be too large - some of the largest in our fleet.
Upon investigation, we found that our cluster was mostly configured with c and m families. We weren't utilizing the CPU we already had, we just needed more RAM. Switching to r family instances allowed us to use a smaller tier (e.g., replacing c/m 8xlarge with r 4xlarge), effectively solving the issue while saving money.

Not allowing for instance variants

Within each family, there are multiple variants. For example, within the c (compute) family, we have for the latest generation:

c8a
c8g
c8gb
c8gd
c8gn
c8i-flex
c8i

Usually the "special" instance types are more expensive than the "regular" ones, but can offer better performance in some areas (like better EBS performance for c8gb or better network performance for c8gn).

Not preparing for ARM instances

All those instance types that ended with g (c8g for example) are ARM instances based on AWS Graviton.
They are usually cheaper than their Intel/AMD based instances and offer better price-to-performance.
Unlike the previous recommendation, this is usually not just "plug and play" and you need to have your workloads support ARM (in a lot of languages and tools it's very easy).

Lack of visibility

It can be tough in large environments to find where you're less efficient or where most of the money goes.
Optimizing by hunch often fails. You might spend a week improving a service by 80%, only to realize the total cost was just $200/month. Meanwhile, a 5% improvement on another area could have saved you $5,000/month.
There are some great Open-source tools built for that, like OpenCost.

Not considering Spot Instances and Reserved Capacity

My view about this one may be a bit skewed as I used to work on Spot Ocean for 4 years.
At least some parts of your workload can run on Spot Instances. This is yet again another thing that is not a "plug and play" solution — Not all workloads can run on spot instances.

One of the things that was surprising to me when I started is that Spot was fully committed to "drinking your own champagne" and most of our workloads ran on spot instances and Spot products (when I started it was Spot Elastigroup and during my time there we migrated to Spot Ocean) - I would say that over 90% of our workloads ran on spot instances (excluding managed databases and very specific services/databases).

Spot Instances can be a great way to save A LOT of money, but it should be handled with care.

For workloads that are stable and predictable, don't forget about Reserved Instances (or Savings Plans on AWS / Committed Use Discounts on GCP). If you know you'll be running a baseline capacity 24/7, committing to 1-3 years can save 30-60% compared to on-demand pricing.

Not thinking about Network costs

Network cost is something that is often overlooked, but can be a significant cost.
For example in AWS:

Traffic Type	Cost
Internet → EC2 (inbound)	Free
EC2 → Internet (outbound)	Free first 100 GB, then $0.09/GB
EC2 → Another AWS region	$0.02/GB
Within same AZ	Free
Between AZs (same region)	$0.02/GB ($0.01 in + $0.01 out)

For a usual cluster, transfer to outside is not "negotiable" (e.g it's what you return to your customers) but transfer within the same region is.
Optimizing your workloads to be closer to each other can save you a lot of money and improve performance.
There are other Kubernetes features you can use to optimize this, like topology-aware routing and service traffic distribution.

Not using Auto-scaling

Auto-scaling can save you money when done right, there are 2 "layers" for it:

Workload auto-scaling - you can use tools like HPA or KEDA to scale your workloads based on metrics or events.
Node auto-scaling - you can use tools like Cluster Autoscaler, Karpenter or Spot Ocean to scale your nodes to your workload's requirements.

⚠️ Warning: Be careful with aggressive HPA configurations on latency-sensitive workloads. Too-tight thresholds can cause oscillation—rapidly scaling up and down—which wastes resources and can actually hurt performance. Test your scaling behavior under realistic load before going to production.

When discussing cost, it's obvious that we want our compute capacity to fit our workload's requirements with as little waste as possible. However, our workloads' demands change over time according to our load and other factors (like holidays, etc).
Paying for peak capacity while your customers are asleep is simply burning money.

Beyond standard auto-scalers, specialized tools can help with niche use cases:

Vcluster: Runs virtual clusters inside a host cluster. Perfect for isolating multi-tenant dev/test environments without paying for multiple control planes (among other use cases).
Snorlax: Automatically sleeps selected workloads on nights and weekends, ensuring you only pay for resources when developers are actually working.

Or if you have serverless workloads, you can use a tool like Knative which offers the ability to scale to zero and scale up to the required capacity. This can be great for cases where you can "afford" cold starts.

Mismanaging Node Pools

Most node auto-scaling tools group different "kinds" of nodes into different pools, for Karpenter it's called Node Pools, for Spot Ocean it's called Virtual Node Groups.

These are similar in concept and allow you to create a group of nodes with the same configurations, you can filter different instance types, different behaviors (different set of labels/taints, how often can it be scaled down, how many at a time, etc).

Separating too much

One of Kubernetes' main cost advantages is resource sharing. Workloads with different needs (e.g., CPU-intensive vs. Memory-intensive) can run on the same node, maximizing overall utilization.

However, teams often over-segment their node pools, isolating workloads and losing this efficiency. A common mistake is creating separate pools for Spot and On-Demand instances. This can lead to situations where Spot pods trigger a new Spot node while your existing On-Demand nodes have ample free space. The cheapest instance is no instance at all.

Instead, consider combining them. By using Pod PriorityClasses, you can allow Spot workloads to run on "spare" On-Demand capacity. If a critical On-Demand pod needs that space later, the scheduler will simply evict the lower-priority Spot pod to make room.

Not separating enough

A common case where you probably should separate your node pools is when you have workloads that shouldn't be scaled down, such as long-running batch jobs or StatefulSets with slow recovery times.
In this case, it can help you avoid the case where you have this very large node that made sense when it was scaled up, but now after a while only one pod that can't be scaled down is running on it, and you're paying for a very large node that is only running one pod.

It's important to note that you can make this scenario uncommon even without a dedicated node pool but it requires some more work and planning.

The Hidden Cost of Rapid Scale-Down

Karpenter, for example, by default will scale down nodes as soon as they are underutilized, but you can configure it to wait for a certain amount of time before scaling down a node.
Now you're probably wondering why would I do that? It's wasting money right?

Technically, yes. However, since developers often lack access to configure node pools, they tend to react with the only tools they have: locking down workloads. They might set restrictive PDBs or annotations to prevent eviction, which ironically leads to worse bin-packing.

Depending on your use case, even a few minutes of consolidation delay can make a huge difference in developer experience without significantly impacting costs.

Bottom line

Kubernetes is powerful, but without proper planning, it can become a financial black hole.

Go check your node utilization right now. I bet you'll find a 'perfect' 4GB pod sitting on a partially empty node.

The Sunk Cost Fallacy in Software: How to Recognize It and What to Do About It

Tal Shafir — Sat, 29 Nov 2025 12:58:12 +0000

What is The Sunk Cost Fallacy?

The sunk cost fallacy is our tendency to follow through with something that we've already invested heavily in (be it time, money, effort, or emotional energy), even when giving up is clearly a better idea.

An everyday example: imagine you buy a $20 movie ticket, and halfway through the movie you realize you're not enjoying it. The rational choice would be to leave and use your time more enjoyably - but many people stay just to "get their money's worth."
That's sunk cost fallacy in action.

How does it relate to Software?

We've all most likely been there, we made a decision that made sense at the time, sometimes because of external reasons.

I bet at least some of these quotes will sound familiar

We need to move fast, no time for tests / documentation / design
We don't have capacity to learn to use X for this new use case, let's keep using Y that we're already familiar with
We don't have enough DevOps to manage another DB, let's use the same DB for all these microservices
Let's use language X - most of our founding team already knows it, and we'll just implement anything missing in its ecosystem ourselves.
We must use this new cool technology, it will scale us great when we reach thousands of customers (only 982 more customers to go until we reach the first thousand)

A lot of times these are good reasons early on and most commonly in small scales, you don't always have the time or capacity to engineer a solution for 1000x customers and load - and a lot of times you shouldn't.

Ultimately, delivering value to your customers through your product is far more important than using perfect tools under the hood. The ocean floor is littered with well-designed ships that never left port.

It's usually a balancing act that we almost definitely will miss at least sometimes.

The Consequences of Past Decisions

Looking back at those examples, it seems inevitable that we'll eventually make decisions we regret. How can we know that we've reached the point that we need to do something about it?

Usually there are some signs that we've reached a point that we already start feeling the cost of previous decisions:

Over-engineered systems and processes
Framework lock-in or bad platform choices
Investing a lot of time and effort on stuff that is not your core business
Slowed feature development
Performance Issues
System Instability
Increasing maintenance effort

These are all common in the modern lifecycle of software products, the industry learned that building the "perfect" system from the get-go is nearly impossible, requirements change, scale change and it's almost impossible to predict those changes without getting your product in the hands of your customers.

How to Recognize When You're Falling for It

It's not the past decisions that hurt us, it's ignoring them and delaying action that causes real damage.
It's tempting to band-aid these issues instead of addressing root causes.
Performance problems? Throw more compute at them.
Database struggling? Scale it up.

The more we push forward with our mitigations and workarounds the more sunk cost we have. If we'd left the 3-hour movie after the first hour we would've lost $20 and an hour of our lives, if we waited 2 hours we'll lose $20 and 2 hours of our lives.

Scale this up to a software company: it's not $20, it's likely $20,000+ in cloud bills. It's not an hour - it's hundreds of person-hours and significant opportunity cost.

A good sign for it is if you start hearing phrases like

We've invested too much to change it now.
The rewrite will take too long.
We'll lose political capital if we admit it didn't work.

Ask yourself these

Are we defending this decision only because of past effort?
Is this architecture still serving us today?
What would we do if we started from scratch?

What to Do Instead: Strategies to Move Forward

Ok, we've started to fall for it, what can we do?

Run cost-benefit analyses (fresh, not based on past costs). Compare "cost to migrate away" vs. "cost to keep maintaining" without factoring in what you've already spent.

Small rewrites over big-bang. Instead of rewriting the entire legacy authentication system, start by replacing it piece by piece in new services.

Build confidence with spikes and proof of concepts. Before committing to a database migration, run a 2-week spike to validate the approach works for your use case.

Communicate tradeoffs openly with stakeholders. "Migrating off this framework will slow feature development for 2 quarters but reduce maintenance overhead by 40%" is better than silence and bailing mid-work.

Embrace "build to replace" design patterns. Write new code assuming it might be replaced, making it easier to incrementally modernize.

Invest in modularization to isolate painful components. If you're working on a component you know will need to be replaced, abstract it so you can upgrade it independently.

It's not always possible to handle these cases immediately but detecting and reacting fast can mitigate a lot of the cost of these cases. The solution is to stay vigilant: document decisions and their rationale, hold regular architecture reviews, and foster open discussions that help you catch issues early.

How Can You Make a Change

Always surface the issues you find. Silence helps no one - problems can't get solved if decision-makers don't know they exist.
Before raising concerns, keep these important principles in mind.

Remember you're on the same team, even when you disagree. The goal isn't to annoy people into agreement - it's to align on how addressing this issue helps everyone reach shared goals.

Learn to take "no" gracefully. They're saying no to your idea, not to you personally. You mustn't take it personally and understand that there are factors outside your scope. For instance, "if we don't do X, we won't scale to 100x customers" may not be a valid concern if there are urgent issues that could end your runway before you reach 1.5x customers.

Come with a plan, not just cool-sounding ideas. It's much easier for stakeholders to agree to a reasonable, well-thought-out plan than to a half-baked idea that may not be realistic.

Finally, filter your ideas carefully. Not every interesting technology you read about is worth implementing. Focus on real pain points you're encountering, not solutions designed for companies at the scale of Google or Netflix. "Let's use chaos monkey, gorilla, and kong in production" is probably not what a small company providing non-critical services in a single region should invest time in.

I must admit that I've fallen into some, if not all of these pitfalls at some point.

The Leadership Angle: Creating Space for Rational Change

The people that feel these issues the most are those in the trenches, it's the developers that keep writing the same boilerplate or the PM that keep hearing that "we can't do it in this timeframe" or "we can't do it the DB won't handle it".

It's important to build an environment where raising those issues is rewarded. It's also important to act on it, it's very easy for teams to become numb to these issues and feel that no matter how hard they try to advocate, it won't change or even worse, being framed as a troublemaker and the developer who cried legacy.

Managers and architects should:

Reward truth-seeking behavior and try to act on it
Protect engineers who raise hard truths
Avoid framing rewrites as personal failures

Case Studies: Breaking Free from Sunk Cost

After discussing the theory, let's look at some well-known scenarios where companies pivoted - reworking decisions they had already heavily invested in.

Amazon Prime Video migrated their monitoring service from serverless distributed services to a monolithic application, helping them achieve higher scale and resilience, while reducing their costs by 90%.
(For some reason, the original blog post link doesn't work anymore, but the internet doesn't forget.)

Slack has pivoted more than once!
Slack originated from the gaming company Tiny Speck, which created an online multiplayer game called Glitch. The company developed an internal communication tool for game development, which ultimately proved more valuable than the game itself.
You can read more in their blog post about it 12 years later.

Since its inception, Slack used MySQL as its storage engine, managing sharding and data access directly within their monolithic application. They maintained their product this way for years.
In 2017, they began migrating to Vitess, a horizontal scaling system for MySQL. For more details about this journey and the decisions behind it, see this blog post.
Here are some key highlights:

The migration took about three years, all while their system continued to function as usual.
Fortuitously, they completed most of the migration just before the COVID-19 pandemic, when many businesses suddenly transitioned to remote work. During this period, they saw query rates increase by 50% in a single week - without Vitess, scaling to meet that demand would likely have caused downtime.

Both of these examples demonstrate that, while such changes can be costly, wise choices and careful planning can yield substantial long-term benefits.

Of course, we can't ignore that there are likely many more examples where such attempts have failed. Often, it's difficult to determine whether the failures stemmed from the attempts themselves - or simply from acting too late.

Choosing Progress Over Pride

Encountering issues and reaching limits of decision is part of our professional lives, it's nothing to feel shame about but a badge of honor - we did such a good job that we need to scale up.

Recognizing sunk cost fallacy is a sign of engineering maturity, being able to act on it is a sign of a good engineering environment.

Sometimes, the best path forward is letting go of what got you here. Start small, audit one decision that may be influenced by sunk costs.