Calin Florescu

Posted on Mar 20

Debugging an Invisible Scaling Limit on EKS

#kubernetes #aws #networking #devops

Introduction

If you've ever tried scaling a deployment past 1000 pods on EKS and watched everything just... stop, with no errors, no warnings, and pods that look healthy but never actually receive traffic — this one's for you.

I ran into this exact situation on a client's EKS cluster. The HPA was configured to scale well beyond 1000 replicas, and it did spin up the pods. They started fine, containers were healthy, but something was off: the readiness probes weren't even being evaluated. The pods were stuck in a limbo where they existed but didn't really exist as far as the load balancer was concerned.

The cluster was running Kubernetes 1.33 with the AWS Load Balancer Controller v2.12, using IP-mode target registration behind an Application Load Balancer. That last detail turned out to matter a lot.

Debugging

The fact that exactly 1000 instances were spun up and registered normally — and none above — made me think about some kind of quota or limit being reached on the EKS or AWS level. A quick check confirmed two important AWS quotas: Targets per Target Group per Region and Targets per Application Load Balancer, both with a default value of 1000. Raising these was a necessary first step.

But even after raising the quotas, the pods were not marked as ready, and they were not showing up as new targets. From a container perspective, everything looked fine — processes were starting correctly, no crazy delays with the readiness probes.

The nodes were fine too. Kubelets weren't starved of resources, there weren't too many pods scheduled per node, and IPs were correctly assigned. Nothing pointed to anything being off in that direction.

The next thing I dug into was the internal networking of Kubernetes and how the pods get registered as targets for the target groups. And that's when I found the culprit: the AWS Load Balancer Controller and the way it uses the Endpoint API to manage target registration.

The controller monitors several Kubernetes resources — Services, Ingresses, Pods, Nodes, and critically, Endpoints/EndpointSlices — to determine which backends should be registered as targets in AWS Elastic Load Balancing target groups.

This behaviour depends on the target type configured for the controller. In my case, it was set to IP, which heavily relies on Endpoints/EndpointSlices.

In IP mode, the controller registers individual pod IPs directly into the target group:

The controller's reconciliation loop watches for changes to the Endpoints objects for the relevant Service.
When an EndpointSlice is created or updated, the controller extracts the list of ready pod IP addresses and their associated ports.
It compares this set against the currently registered targets in the AWS target group.
It calls RegisterTargets for any new pod IPs and DeregisterTargets for any stale ones.

After understanding this, the problem was obvious: my controller was configured to use the Endpoint API for IP registration, and according to the Kubernetes docs, each Endpoint object created for a service is capped at 1000 entries.

The Endpoint object for my Service was truncated, so no changes were detected for the newly added pods, and no reconciliation was triggered on the controller side to register the new targets.

But wait — why did the readiness probes look weird?

There was still a nagging question: why did the pods look ready at a glance, even though the readiness probe seemingly never ran?

This is where the pod readiness gate comes in. The AWS LB Controller uses a custom readiness condition (target-health.elbv2.k8s.aws/targetgroupbinding) that works like this:

A pod starts and gets an IP, but that custom condition starts as False.
The controller registers the pod IP in the target group, polls the AWS health check, and only patches the condition to True once AWS reports the target as healthy. Until that happens, the pod isn't truly "Ready" from the EndpointSlice's perspective, meaning kube-proxy won't route traffic to it either.

It's actually a nice mechanism — it prevents dropped connections during rollouts by ensuring the load balancer marks a target as healthy before Kubernetes starts sending traffic to it.

But here's the catch: if the pod never gets registered in the target group in the first place (because the Endpoint object was truncated at 1000), the controller never even starts that health check dance. The readiness gate stays False forever, silently. No error, no event, nothing in the logs saying "hey, I skipped this pod." It just doesn't happen.

That's what made this so tricky to debug. The symptom looked like a readiness probe issue, but the actual cause was three layers deeper.

Solution

After understanding how the AWS LB Controller worked and how it was configured, the solution was straightforward: configure it to use the EndpointSlices API instead of the limited Endpoint one. EndpointSlices don't have the same 1000-entry cap per object — they split endpoints across multiple slices, so the controller can see all of your pod IPs regardless of scale.

Combined with the AWS quota increases mentioned earlier, this got the deployment scaling well beyond the 1000 pod barrier.

Conclusion

I wrote this up for two reasons. The obvious one: if you're hitting this exact wall, I hope this saves you the hours I spent staring at perfectly healthy pods that refused to serve traffic.

The less obvious one is a reminder — mostly to myself — about what happens when we work on top of deep abstraction layers day after day. The Endpoint API silently truncating at 1000 entries, with no warning and no error, buried under controllers, CRDs, and cloud provider integrations — that's the kind of thing you can only debug if you understand what's actually happening beneath the tools you're using. Abstractions are great, they're how we make progress, but when they break, they break quietly. And the only thing that helps at that point is knowing what's underneath.

DEV Community