Harsh Thakkar

Posted on May 1

ECS vs EKS: The Decision I Regret (And What I’d Do Differently Now)

#aws #productivity

The first time things broke in production, I didn’t even know where to look.

Not “figuratively lost.” I mean literally.

Pods were restarting. Logs were half there. Metrics were… somewhere.
And I was SSH’ing into nodes like it was 2016, trying to understand why a simple Node.js service decided to disappear mid traffic.

That was the moment I quietly said to myself:

“Maybe choosing EKS wasn’t the flex I thought it was.”

A few months earlier, the decision felt obvious.

We were starting fresh. New service. Decent traffic expectations. Nothing massive, but not trivial either.

Someone asked:
“ECS or EKS?”

And I jumped in a bit too quickly:

“Let’s go with Kubernetes. Future proof. Industry standard.”

That word future proof has cost me time more than once.

What I thought I was choosing

At the time, I framed it like this:

ECS → simple, but limiting
EKS → powerful, flexible, scalable

So naturally… I chose “powerful.”

What I didn’t realize then is:

Power comes with a tax. And it’s not always visible upfront.

The first few weeks felt fine

Honestly, the initial setup wasn’t even that bad.

Cluster came up
Services deployed
ALB hooked in
Things were… running

I remember feeling slightly proud. Like I had unlocked some next level infrastructure badge 😅

But that feeling didn’t last long.

The slow creep of complexity

It didn’t hit all at once.

It showed up in small, annoying ways.

Logs weren’t where I expected

With ECS, logs just kind of… show up in CloudWatch.

With EKS, I had to think about:

Fluent Bit / Fluentd
Sidecars vs DaemonSets
Log routing

Nothing impossible. Just… extra decisions.

And every decision added surface area for mistakes.

Deployments became “events”

A simple deploy wasn’t simple anymore.

Something failed once because:

readiness probe was too aggressive
pod restarted before warming up
rollout got stuck

The fix was easy after I understood what happened.

But getting there? That took time.

What I didn’t realize at the time:

Kubernetes doesn’t fail loudly. It fails… descriptively. And you have to know where to look.

Node level issues became my problem

This one annoyed me the most.

I had to think about:

node scaling
resource fragmentation
pod scheduling

At one point, we had enough CPU overall… but pods still couldn’t schedule.

Because no single node had enough contiguous resources.

That’s the kind of problem I never had with ECS Fargate.

And honestly… I didn’t want to have it.

The incident that changed my mind

We had a traffic spike. Nothing crazy. Maybe 3x normal load.

Auto scaling kicked in… kind of.

New nodes were coming up. Pods were pending. Some were stuck.

Meanwhile:

existing pods were overloaded
latency increased
a few endpoints started timing out

And I was watching this cascade happen in slow motion.

The worst part?

Everything looked “configured correctly.”

In hindsight, the problem was:

cluster autoscaler lag
node provisioning delay
pod scheduling constraints

Individually, each makes sense.

Together… they create friction exactly when you don’t want it.

That day, I kept thinking:

ECS would’ve handled this more predictably.

Maybe not perfectly. But predictably.

What I misunderstood about ECS

I had dismissed ECS too quickly.

I assumed:

it wouldn’t scale as well
it was less flexible
it was somehow “less serious”

That was ego talking, not experience.

Because in reality:

ECS (especially Fargate) removes entire categories of problems:

no node management
no scheduler tuning
no cluster level debugging

You give up control, yes.

But you also give up responsibility.

And sometimes that’s exactly what you want.

The trade offs are real (and uncomfortable)

I’m not saying EKS is bad. It’s not.

In fact, there are cases where I’d absolutely choose it again:

multi cloud strategy
heavy Kubernetes ecosystem usage
custom controllers / operators
deep networking requirements

But here’s the uncomfortable truth:

Most projects don’t need that level of control.

Mine didn’t.

And I paid for that mismatch with time, complexity, and a few stressful evenings.

What I’d do differently now

If I were making that decision again, I’d ask a very different question.

Not:

“What’s more powerful?”

But:

“What problems do I actually want to own?”

Because that’s what this decision really is.

With EKS, you own:

cluster behavior
scheduling quirks
scaling edge cases

With ECS, you give that up.

And honestly… I’d start with ECS now.

Especially if:

the team is small
infra isn’t the product
speed matters more than flexibility

I’d move to EKS only when ECS starts getting in the way.

Not before.

The part I didn’t expect

The hardest part wasn’t learning Kubernetes.

It was unlearning the idea that “more control = better engineering.”

It doesn’t.

Sometimes better engineering is:

fewer moving parts
fewer decisions
fewer things that can break at 2 AM

Final thought

I don’t regret learning EKS.

But I do regret choosing it too early.

There’s a difference.

And if you’re at that crossroads right now, trying to decide…

Just remember:

You’re not choosing a tool.
You’re choosing a set of problems.

Pick carefully 🙂

DEV Community

ECS vs EKS: The Decision I Regret (And What I’d Do Differently Now)

Top comments (0)