Originally published on graycloudarch.com.
The CTO wanted to know why the platform team had picked EKS for their new environment. They'd been running ECS for two years without issues. The team lead explained they needed GitOps, better autoscaling, and "industry-standard tooling."
Three months later, they were debugging a cert-manager webhook failure at 11am. Two engineers had spent 30 hours the previous month on cluster operations. They hadn't shipped a net-new feature in six weeks.
EKS wasn't wrong for them. The timing was. They had three engineers, twelve services, and no one who'd operated a Kubernetes cluster in production before. The ecosystem they wanted required them to operate it first.
This is the ECS vs EKS conversation most teams don't have until after they've made the choice.
The Actual Decision Axis
Feature comparisons miss the point. Both ECS and EKS run containers reliably. The real question is: what does your team have to operate to make that happen — and what's the cost of getting it wrong?
Two axes matter:
Operational capacity: How much complexity can your team absorb while still shipping product? A 3-engineer platform team and a 15-engineer platform team are not playing the same game.
Kubernetes maturity: Have your engineers operated k8s in production under pressure? "We've done some k8s" and "we've debugged etcd under load" are not the same thing.
The answer to which one you should use today often changes in 18 months. A team that's right for ECS now may be right for EKS after their platform engineers have shipped 6 months of Kubernetes work. Building with that arc in mind matters.
What ECS Actually Gives You
No control plane. That's the headline. With Fargate, there are no nodes to patch, no node groups to right-size, no kubelet to troubleshoot. AWS manages the underlying compute entirely.
The IAM model is simpler by design. Task roles attach directly to task definitions — no service accounts, no IRSA, no Web Identity tokens to wire up. For engineers coming from EC2-era IAM, this maps cleanly to what they already know.
ECS Fargate has no cluster fixed cost. EKS charges $0.10/hr per cluster — $72/month whether you're running one service or fifty. At low service counts or in non-production environments, that difference is real.
AWS integrations are first-class rather than plugged in. ALB target group registration, CloudMap service discovery, Secrets Manager injection via ECS container secrets — these work without Helm charts or CRDs. The AWS API surface and the ECS API surface are the same surface.
The internal tools team: 3 engineers, zero Kubernetes background, 8 services. ECS Fargate with a shared Terraform module got them to production in three weeks. No platform team required.
What EKS Actually Gives You
Ecosystem depth that ECS simply doesn't have. Karpenter for bin-packing and just-in-time node provisioning. KEDA for event-driven autoscaling off SQS, Kafka, or custom metrics. Argo CD or Flux for GitOps with real reconciliation loops. External Secrets Operator, Cert-manager, Prometheus Operator — the tooling is mature, battle-tested, and actively maintained.
ECS has no equivalent. The closest alternatives are either AWS-native (EventBridge Pipes, Application Auto Scaling) and less flexible, or custom-built and unmaintained after the engineer who wrote them leaves.
Karpenter in particular changes the EC2 cost math at scale. Intelligent bin-packing and spot interruption handling can cut compute costs 30-50% compared to fixed node groups. Below 20-30 nodes the savings often don't justify the operational overhead. Above that, it's hard to ignore.
Multi-cloud portability is real if you actually need it. Kubernetes manifests transfer to GKE or AKS. ECS task definitions do not. If "running this workload outside AWS" is a real scenario — not just theoretical — that matters.
The data platform I worked on: mixed batch and streaming workloads, KEDA scaling on SQS queue depth. ECS autoscaling would have required custom CloudWatch metrics and polling-based triggers. KEDA handled it natively in 20 lines of YAML. That alone settled the decision.
The Decision Tree
Walk through these in order. First yes wins.
Zero Kubernetes experience on the team? → ECS. The operational cost of learning k8s while building product is real and usually underestimated. The 40-hour/month cluster ops tax from the story above was paid by a team that had some k8s experience. Zero experience is worse.
Migrating from an existing ECS platform? → ECS. Rewrite and replatform simultaneously fails more often than it succeeds. Stabilize on ECS, migrate later when the workload is boring.
Need KEDA, custom-metric HPA, or Karpenter? → EKS. ECS autoscaling is Application Auto Scaling against CloudWatch metrics. It works, but the ceiling is lower and the custom metric path is significantly more work.
Need GitOps with Argo CD or Flux? → EKS. ECS has no native GitOps story. You can build one — CodePipeline + ECS deployment, Terraform-driven deployments — but you're building it. The operational difference is significant.
Five or more services sharing infrastructure? → EKS. The fixed cost justifies it; shared node pools improve utilization; the per-service overhead of ECS task definitions multiplies fast.
Default → ECS Fargate. Simpler, cheaper to start, and the migration path to EKS is well-understood.
ECS Anywhere: The Third Option
ECS Anywhere gets overlooked in most comparisons because it doesn't fit neatly into "cloud vs cloud" comparisons. It should be in the decision tree.
ECS Anywhere lets you register non-AWS compute — on-premises servers, VMs in other clouds, edge devices — as ECS external instances. Your task definitions, IAM roles, and tooling stay the same. The ECS control plane in AWS manages scheduling. The compute runs wherever you've registered it.
Where this actually wins:
Regulated environments with data residency requirements. If certain workloads must stay on-premises for compliance, ECS Anywhere lets you run them with the same tooling as your AWS workloads. On the GovCloud platform I built, we had ground system software that had to process flight data on local hardware before transmission. ECS Anywhere would have let us manage those workloads from the same ECS cluster as our cloud services — same Terraform modules, same IAM patterns, same observability pipeline.
Brownfield migration. If you're moving workloads from on-premises to AWS and want a consistent deployment target during the migration, ECS Anywhere gives you that. Register the on-prem servers, migrate task by task, deregister when done.
Edge compute. Consistent deployment tooling across dozens of edge nodes without running a k8s control plane at each site.
The constraint: ECS Anywhere instances are external infrastructure you own and patch. Fargate's "no nodes to manage" advantage disappears. The tradeoff is deliberate — you're accepting node management in exchange for placement control.
The Migration Path
ECS → EKS migration is well-understood and not particularly risky if the IaC is clean.
Containerized workloads move without changes. The two meaningful changes are IAM (task roles → IRSA service accounts — mechanical, not complex) and networking (ALB target group registration → Ingress or Service — also mechanical).
What breaks the migration is task definitions in CloudFormation or hand-managed console resources. If your ECS deployment is 100% Terraform with a module per service, the migration is boring. If it's six engineers' worth of one-off console configurations, it's archaeology.
Build ECS as if you'll migrate it. Keep task definitions in Terraform modules, service definitions composable, networking configuration explicit. The Jira ticket for "migrate from ECS to EKS" should feel like plumbing work, not a project.
Mistakes I See Repeatedly
Choosing EKS because it's "industry standard." Industry standard at Stripe is not industry standard at a 40-person SaaS company. The operational tax is the same either way.
Choosing ECS without accounting for the autoscaling ceiling. For workloads with bursty, event-driven traffic patterns, ECS autoscaling requires CloudWatch custom metrics and Application Auto Scaling policies that are genuinely annoying to tune. Know the ceiling before you hit it.
Single-cluster EKS for two services. The fixed cost of the control plane ($72/month), the operational overhead of running Kubernetes, and the learning curve are all real. For two or three services, this almost never makes sense.
Underestimating the Helm/CRD surface area. When a Helm-managed CRD conflicts with another controller at 2am, you need someone on the team who can debug it. "We'll figure it out" is not a plan.
Building a new platform or rearchitecting an existing container environment? The choice between ECS, EKS, and ECS Anywhere usually comes down to where your team is on the Kubernetes maturity curve and what your autoscaling requirements actually are — not which technology is more capable. Get in touch if you're working through this decision — it's a conversation I have with platform teams regularly, and the right answer depends on specifics that don't fit in a blog post.

Top comments (0)