Abhishek Jaiswal

Posted on Apr 26 • Originally published at ownkube.io

EKS vs k3s on AWS for startups: cost, complexity, and when to choose each

#kubernetes #aws #startup #devops

Originally published here: EKS vs k3s on AWS for startups

You have an app to ship. Maybe a few. You're on AWS because that's where the credits are and where the auditors want your data. Somebody on the team said "Kubernetes" out loud and now you're trying to decide between EKS and k3s before the week is out.

We'll save you the essay. Here's our take:

If you're an indie builder or a 1 to 2 person team just trying the product out, run k3s on a single EC2 box. Don't even open the EKS console. The $73/month control plane fee alone is larger than your entire compute bill, and none of the things EKS is good at are things you need yet.
If you're a small team (up to 20 engineers) without a dedicated platform owner, start on k3s. Ship product. You'll know when to graduate.
If you're 20+ engineers with production traffic, multi-AZ requirements, or real compliance pressure, pick EKS. Not as a hedge, as a commitment. At that size the managed control plane, IRSA, and AWS-native autoscaling pay for themselves.

EKS is usually the right destination, not the right starting point, and the reason has very little to do with Kubernetes itself.

Here's what actually matters when you make this call.

The thing people get wrong

Both are Kubernetes. Same API, same manifests, same kubectl apply. If someone hands you a chart that works on EKS, it will almost certainly work on k3s without modification, and vice versa. This isn't a technology choice. It's an operations choice.

What you're really picking is who owns the control plane and how much of AWS you need to glue in.

EKS: AWS runs the control plane. You pay $73/month per cluster for it, and you glue in IRSA, VPC CNI, the AWS Load Balancer Controller, EBS CSI, and whichever flavor of autoscaler you prefer. Upgrades happen on AWS's calendar.
k3s: You run the control plane on an EC2 instance. It starts in under a minute, ships with Traefik and a working storage class, and upgrades when you decide. Nothing AWS-specific unless you want it.

The rest of this post is the operational fallout of that choice.

Side-by-side, the useful version

	EKS	k3s on EC2
Control plane cost	$73/month/cluster	$0
Minimum viable footprint	1 control plane + 2 nodes in 2 AZs	1 EC2 instance
Time to a working cluster	15-25 min with `eksctl`, longer first time	Under 60 seconds
Networking	VPC CNI (real VPC IPs, counts against subnet)	Flannel VXLAN (overlay, doesn't touch VPC IPs)
Ingress	ALB Controller + one ALB per group	Traefik built in
Storage	EBS / EFS / FSx CSI, IAM required	`local-path` out of the box; add EBS CSI if you want
Pod IAM	IRSA (clean, audited)	Instance profile, or you bring a solution
Upgrades	AWS-driven, rolling managed node groups	You pick the hour; risk is yours
Biggest sharp edge	VPC CNI IP exhaustion, IRSA permission dance	Embedded etcd quorum loss, single-node backups
Honest team size	15+ engineers, platform owner forming	2-20 engineers, nobody owns infra full-time

Anything not on this table is a rounding error for a startup.

A note on ECS

ECS often shows up as "Kubernetes but simpler." The simplicity is real; the tradeoff is lock-in. Task definitions, services, and the deployment model are AWS-proprietary: no portable API, no Helm, no ecosystem that transfers. The day you want to run the same workloads elsewhere, you rewrite every manifest. Both EKS and k3s give you the same portable Kubernetes API; ECS gives you a one-way door.

What the bill really looks like

Take a workload we see often: one web service, two workers, Postgres on RDS, Redis on ElastiCache, a staging environment, and room for a few PR preview environments. Call it ~8 vCPU / 16GB of production pods plus ~4 vCPU / 8GB for staging and previews.

All prices below are approximate on-demand rates in us-east-1 as of April 2026. Your numbers will vary with region, reserved capacity, and traffic profile.

k3s on EC2

3 x t3.xlarge on-demand (4 vCPU / 16GB each): $299
300GB EBS gp3: $24
1 NAT gateway: $32 + traffic
1 ALB in front of Traefik: $22
Data transfer (moderate): $20

~$400/month for plenty of headroom. You can fit 5-10 services, full staging, and rotating previews on that without thinking about capacity.

EKS

Control plane: $73
2 x t3.large prod nodes (for the 2-AZ story): $121
1 x t3.medium staging node: $30
1 x t3.large for previews: $60
300GB EBS gp3: $24
1 NAT gateway: $32 + traffic
2-3 ALBs (ALB controller per IngressGroup, typical): $44-$66
Data transfer (higher, because VPC CNI loves inter-AZ chatter): $25

~$410-$440/month.

Pricing looks close until you add the labor, the part nobody puts on the slide:

First EKS setup: 1 to 3 engineer-days.
Every EKS upgrade: a half-day of drain and verify, quarterly.
First VPC CNI IP exhaustion: half a day figuring out t3.large nodes only get 35 IPs.
ALB controller version pinning across upgrades: a ticket, every time.

That's real salary spent on AWS glue rather than product. On k3s, the comparable surface area is "keep an AMI up to date" and "snapshot etcd nightly."

Getting from zero to shipping

EKS (plan on a week if nobody's done this before):

VPC with public/private subnets across 2 to 3 AZs, plus NAT and route tables.
IAM roles for the cluster, node group, and IRSA.
eksctl create cluster, then wait 15 to 25 minutes.
Install the AWS Load Balancer Controller, EBS CSI driver, and Karpenter (each its own IAM dance).
Tune CoreDNS, metrics-server, and VPC CNI (prefix delegation, warm pool settings).
Wire up Ingress with ACM and Route53.

The piece that bites first-timers is always IRSA and the OIDC provider. One typo in a trust policy and pods silently fail to assume roles.

k3s (under a day):

Launch an EC2 instance.
curl -sfL https://get.k3s.io | sh -.
Point DNS at it.
Deploy.

Not a marketing simplification. Traefik is running, there's a default StorageClass, kubeconfig is at /etc/rancher/k3s/k3s.yaml. You can be serving traffic in a lunch break. For production, add an ASG of three k3s servers with embedded etcd for HA, an NLB out front, and a nightly etcd snapshot.

The failures you'll actually hit

Forget feature checklists. These are the incidents that will eat your weekend.

On EKS, the usual suspects:

Pods stuck Pending with no IP addresses available. VPC CNI assigns real VPC IPs to every pod. On a t3.large that's 35 IPs max. You hit it during an autoscaling event, not during testing. Fix is prefix delegation, which requires a node recycle.
IRSA silently not working. Pod annotation, service account annotation, trust policy, OIDC provider, role policy. Five things have to line up. One off-by-one and you get AccessDenied with no obvious source.
ALB controller version skew after an EKS upgrade. The ALB controller has its own compatibility matrix. Forget to bump it and ingress just stops reconciling.
Node group upgrade drains in the wrong order. PDBs not set, pods evicted faster than they start elsewhere. 30-second outage during a "safe" upgrade.

On k3s, the usual suspects:

Embedded etcd quorum loss. You were running HA on three t3.medium servers. Two got replaced by ASG inside five minutes. Cluster is read-only. Recovery is k3s server --cluster-reset from a known-good snapshot. You want that snapshot script working before you need it.
Local-path PVs disappearing with the node. The default StorageClass is per-node local disk. Great for caches, terrible for your single-replica Postgres. Switch stateful workloads to RDS or add EBS CSI.
k3s version upgrade breaking Traefik. k3s bundles Traefik, and major k3s upgrades can bump Traefik's CRDs. Pin the Traefik Helm values or disable the bundled version and run your own.
Single-node cluster dies with the instance. If you started all-in-one to move fast and forgot to migrate to HA, a spot interruption or AZ blip is a full outage. Migrate before you're depending on it in production.

Both sets are learnable. The EKS failures are more about fighting AWS primitives. The k3s failures are more about owning the operational basics yourself.

When k3s is enough

Start here if most of these are true:

Under 20 engineers and nobody's job title is "platform."
Stateless web services and workers, with state in a managed Postgres (either in-cluster on EC2 with an operator, or RDS if you prefer AWS-native), plus ElastiCache and S3 as needed.
Single-region is fine for now.
Compliance doesn't demand AWS-managed control plane components.
You'd rather spend the next two sprints on product than on Kubernetes.

k3s is not a toy. It's CNCF-certified Kubernetes, it powers Rancher's own product, and there are companies running it on bare metal fleets larger than most SaaS startups will ever see. Using it isn't a compromise; it's picking the distribution that doesn't punish small teams.

When EKS earns its keep

Move to EKS (or start there, if you're already past the line) when any of these are real:

Audit pressure. If SOC 2 or HIPAA readiness hinges on "AWS patches the control plane," use EKS.
Fine-grained pod IAM. Per-pod credentials for S3, SQS, Bedrock are much cleaner with IRSA than instance profiles or sidecars.
~50+ services or ~300+ pods. k3s handles it, but upgrades and capacity get real. Karpenter on EKS is genuinely better at that scale.
Multi-AZ or multi-region HA as a hard requirement. k3s HA is possible; EKS HA is the default.
A dedicated platform hire or team. Once someone owns infra full-time, they'll want managed node groups and IRSA.
GPU pools, Graviton spot fleets, Bottlerocket, Windows nodes. EKS wires these in natively.

If none of these describe your next 12 months, you're paying EKS tax for a future you might not have.

The migration nobody sells you

Here's what actually happens when you outgrow k3s and move to EKS.

Your Deployments, Services, ConfigMaps, Secrets, Jobs, CronJobs: unchanged. Your Helm charts: unchanged. The diff is at the edges:

Ingress: Traefik annotations become ALB controller annotations (10-30 lines of YAML per service).
Storage: local-path PVCs move to gp3; stateful workloads you already had on RDS need no change.
IAM: instance-profile or sidecar-based access becomes IRSA. A real piece of work, but mechanical.
Autoscaling: single-ASG k3s becomes Cluster Autoscaler or Karpenter.

Plan a week or two of cleanup, not a rewrite. And critically, k3s-to-EKS is a much shorter migration than "no Kubernetes to EKS" would have been if you'd held off for 18 months. For a deeper walkthrough of the k3s-on-AWS setup most startups land on, see How to deploy on AWS without hiring a DevOps engineer.

The real recommendation

If you're choosing today on AWS with under 20 engineers, start on k3s. Ship product. Re-evaluate when a specific thing on the EKS list above becomes true, not sooner. You'll save money and keep your team's attention on the business, and you won't paint yourself into a corner because the migration path is honest and well-worn.

Ownkube runs on top of either cluster and focuses on the part neither EKS nor k3s gives you out of the box: the Heroku-style developer flow. Git push to deploy, a preview environment per PR, one-click Postgres and Redis in your VPC, plain-English pod crash explanations, and automatic right-sizing. The idea is simple: start on k3s in your own AWS account, switch to EKS when your business asks for it, and keep the same workflow the whole way through.

If you're evaluating this path, the original article includes more OwnKube context and related AWS deployment guides: EKS vs k3s on AWS for startups.

DEV Community