DEV Community

Cover image for EKS Ingress Address Not Assigned (Application Outage)-Incident & Resolution Guide
alok shankar
alok shankar

Posted on

EKS Ingress Address Not Assigned (Application Outage)-Incident & Resolution Guide

1. Introduction
In Kubernetes, applications are typically exposed internally using Services (ClusterIP, NodePort). However, for exposing applications externally in a scalable, secure, and cloud‑native manner, Kubernetes provides the concept of Ingress.

What is Ingress?
Ingress is a Kubernetes API object that manages external HTTP/HTTPS access to services within a cluster. It provides:

  • Layer‑7 routing (path‑based, host‑based)
  • TLS termination
  • Centralized traffic management

Ingress works in conjunction with an Ingress Controller, which implements the actual traffic routing logic.

Why Ingress instead of NodePort / LoadBalancer?

In AWS EKS, the recommended production approach is Ingress with AWS Application Load Balancer (ALB) using the AWS Load Balancer Controller.

Incident Overview
I encountered an issue in the EKS environment where an application became inaccessible from outside the cluster.
Although the application pods were running and Kubernetes services were healthy, external users were unable to access the application URL.

Upon investigation, it was observed that the Ingress resource was created successfully, but the ADDRESS field of the Ingress remained empty (null).
As a result, no valid Load Balancer endpoint was available to route external traffic to the application.
This issue closely resembled a production outage scenario, as it directly impacted external traffic routing despite the application itself being operational.

[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl get ingress app-ingress -n ep-apps -o wide
NAME          CLASS   HOSTS   ADDRESS   PORTS   AGE
app-ingress   alb     *                 80      10h

Enter fullscreen mode Exit fullscreen mode

Impact

  1. External users could not access the application.
  2. No ALB DNS was available from the Ingress.
  3. Target Group showed 0 registered targets.
  4. Application health appeared normal internally, which made the issue non-obvious at first glance.

Use Case
Business Scenario

  1. Application deployed in EKS.
  2. Needs to be exposed externally over HTTPS
  3. Uses path-based routing
  4. Requires container-level health checks.

2. Architecture Overview
High-Level Flow

3. Timeline of Events:

  1. The application was deployed successfully in the EKS cluster.
  2. Pods were in Running state and passing readiness and liveness probes.
  3. Kubernetes Service (ClusterIP) showed valid endpoints.
  4. An ALB-backed Ingress was created to expose the application externally.
  5. Despite successful Ingress creation, the Ingress ADDRESS field remained empty.
  6. AWS Console showed an ALB and Target Group created, but the Target Group had zero registered targets.
  7. Because the Ingress did not publish an ADDRESS, application traffic could not reach the cluster.
  8. This resulted in an outage-like situation where the application was “up” internally but unreachable externally.

4. Initial Observation
At a high level, everything appeared correct:

  1. Pods were healthy.
[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl get pods -A
NAMESPACE           NAME                                                              READY   STATUS    RESTARTS   AGE
amazon-cloudwatch   amazon-cloudwatch-observability-controller-manager-586c44c2cclk   1/1     Running   0          7h6m
amazon-cloudwatch   cloudwatch-agent-xxx                                              1/1     Running   0          6h41m
amazon-cloudwatch   cloudwatch-agent-xxxx                                             1/1     Running   0          6h41m
amazon-cloudwatch   fluent-bit-xxxx                                                   1/1     Running   0          6h41m
amazon-cloudwatch   fluent-bit-xxxx                                                   1/1     Running   0          6h41m
external-dns        external-dns-75f7b59749-dfkgn                                     1/1     Running   0          24h
ep-apps             condition-service-96475888c-bdmdn                                 1/1     Running   0          22h
ep-apps             web-query-service-78b5d4dcb7-nms56                                1/1     Running   0          23h
ep-apps             web-query-service-78b5d4dcb7-xlfj9                                1/1     Running   0          23h
ep-apps             web-apps-59658b6868-fkwvp                                         1/1     Running   0          22h
kube-system         aws-node-4xrsc                                                    2/2     Running   0          24h
kube-system         aws-load-balancer-controller-78bddb649b-w56d5                     1/1     Running   0          24h
kube-system         aws-load-balancer-controller-78bddb649b-z5s5g                     1/1     Running   0          24h
kube-system         aws-node-ncp5f                                                    2/2     Running   0          24h
Enter fullscreen mode Exit fullscreen mode
  1. Service endpoints existed.
  2. Ingress configuration looked valid
  3. ALB resources were present in AWS
  4. However, traffic was not flowing due to the missing Ingress ADDRESS, indicating a failure in Ingress‑to‑ALB reconciliation.

5. Root Cause Analysis (What Went Wrong)

This issue was not a single problem, but a chain of configuration gaps.
Root Causes Identified

5.1 Ingress Group Conflict

  • TEST ingress was using DEV group name.
  • Caused ALB ownership conflict.
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ kubectl describe ingress app-ingress -n ep-apps
Name:             app-ingress
Labels:           app=xxx
                  app.kubernetes.io/name=app-ingress
                  app.kubernetes.io/part-of=ep
Namespace:        ep-apps
Address:
Ingress Class:    alb
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
              /   app:80 (xx.xx.xx.xxx:xxxx,xxx.xx.xx.xx.xxx:xxxx)
**_Annotations:  alb.ingress.kubernetes.io/group.name: app-dev_**
              alb.ingress.kubernetes.io/group.order: 100
              alb.ingress.kubernetes.io/healthcheck-interval-seconds: 30
              alb.ingress.kubernetes.io/healthcheck-path: /api/health

Enter fullscreen mode Exit fullscreen mode

5.2 ACM Certificate Issue

  • 1. Certificate attached was in PENDING_VALIDATION
  • 2. ALB HTTPS listener creation failed

5.3 Subnet Tagging Missing

  • Public subnets lacked required tags
  • ALB could not discover subnets correctly

5.4 Broken ALB Controller Webhook

  • aws-load-balancer-webhook service had no endpoints
  • Blocked creation of TargetGroupBinding
  • Prevented Pod IP registration

5.5 Ingress Finalizer Stuck

  • Failed reconciliation added finalizer
  • Controller unable to clean up state

6. Solution Applied

Step-by-Step Resolution

6.1 Correct Ingress Group

[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl annotate ingress app-ingress -n ep-apps \
  alb.ingress.kubernetes.io/group.name=app-test \
  --overwrite
ingress.networking.k8s.io/app-ingress annotated

Enter fullscreen mode Exit fullscreen mode

6.2 Use ISSUED Valid ACM Certificate

[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl annotate ingress app-ingress -n ep-apps \
  alb.ingress.kubernetes.io/certificate-arn=arn:aws:acm:us-west-2:xxxxxxxxxxx:certificate/xxxxxxxxxxxxxxxxxxxxxxx \
  --overwrite
ingress.networking.k8s.io/web-ingress annotated

Enter fullscreen mode Exit fullscreen mode

6.3 Tag Public Subnets (Mandatory)

kubernetes.io/role/elb = 1
kubernetes.io/cluster/<cluster-name> = shared
Enter fullscreen mode Exit fullscreen mode

6.4 Allow ALB → Node Traffic (Critical for IP Mode)

Inbound rule on worker node security group.

6.5 Remove Broken ALB Webhook

Command to check logs

kubectl logs -n kube-system deployment/aws-load-balancer-controller --tail=200
Enter fullscreen mode Exit fullscreen mode
{"level":"error","ts":"2026-04-15T04:30:28Z","msg":"Reconciler error","controller":"ingress","object":{"name":"ep-test"},"namespace":"","name":"ep-test","reconcileID":"7ea1f646-368e-473f-b6b4-cc0a76cf4785","error":"Internal error occurred: failed calling webhook \"mtargetgroupbinding.elbv2.k8s.aws\": failed to call webhook: Post \"https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s\": context deadline exceeded"}
{"level":"error","ts":"2026-04-15T04:30:32Z","msg":"Reconciler error","controller":"ingress","object":{"name":"search-query-service","namespace":"ep-apps"},"namespace":"ep-apps","name":"search-query-service","reconcileID":"970d799a-1982-4e78-9791-76daa6a54d4d","error":"Internal error occurred: failed calling webhook \"mtargetgroupbinding.elbv2.k8s.aws\": failed to call webhook: Post \"https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s\": context deadline exceeded"}
Enter fullscreen mode Exit fullscreen mode

Solution-

[ec2-user@ip-xxx-xxxx-xxxx ~]$ kubectl get mutatingwebhookconfigurations
NAME                                                             WEBHOOKS   AGE
amazon-cloudwatch-observability-mutating-webhook-configuration   5          18h
aws-load-balancer-webhook                                        6          11h
pod-identity-webhook                                             1          6d21h
vpc-resource-mutating-webhook                                    1          6d21h
[ec2-user@ip-xxx-xxxx-xxxx ~]$ kubectl delete mutatingwebhookconfiguration aws-load-balancer-webhook
mutatingwebhookconfiguration.admissionregistration.k8s.io "aws-load-balancer-webhook" deleted
Enter fullscreen mode Exit fullscreen mode

6.6 Rollout and restart deployment & Recreate Ingress

Restart the ALB controller
✅ This forces the controller to:

  • Re‑build the model
  • Create TargetGroupBinding
  • Register pod IPs
  • Update ingress status
[ec2-user@ip-xx-xxx-xxx ~]$ kubectl rollout restart deployment aws-load-balancer-controller -n kube-system
deployment.apps/aws-load-balancer-controller restarted
Enter fullscreen mode Exit fullscreen mode

7. FINAL VERIFICATION

[ec2-user@ip-xx-xxx-xx-xx ~]$ kubectl get ingress app-ingress -n ep-apps -o wide
NAME          CLASS   HOSTS   ADDRESS                                                        PORTS   AGE
app-ingress   alb     *       k8s-eptest-erfs423536-xxxxxxxxxx.us-west-2.elb.amazonaws.com   80      10h
Enter fullscreen mode Exit fullscreen mode

8. Validation Commands

kubectl get ingress -A
kubectl get endpoints -A
kubectl logs -n kube-system deployment/aws-load-balancer-controller
kubectl get targetgroupbinding -A

Enter fullscreen mode Exit fullscreen mode

9. Final Outcome

✅ ALB created successfully
✅ Target Group registered Pod IPs
✅ Health checks passed
✅ Ingress ADDRESS populated

✅ Application accessible externally over HTTPS

10. Best Practices Checklist (Must Follow Every Time)

✅ Ingress Configuration Checklist

  • Environment-specific ingress group (dev/test/prod)
  • Valid target-type (ip or instance)
  • Correct service name and port
  • Health check path works from Pod

✅ ACM Certificate Checklist

  • Certificate status = ISSUED
  • Cert region = same as ALB
  • Domain matches DNS

✅ Subnet Checklist (CRITICAL)

For internet-facing ALB

  • Public subnets
  • Route to Internet Gateway
  • Tags: kubernetes.io/role/elb=1 kubernetes.io/cluster/=shared

✅ Security Group Checklist (IP Mode)

  • ALB SG allows inbound 80/443
  • Node SG allows inbound from ALB SG on container port
  • No restrictive NACLs

✅ Controller Health Checklist

  • aws-load-balancer-controller pods Running
  • No webhook timeouts in controller logs
  • TargetGroupBinding objects created

11. Key Learnings

  • ALB IP mode requires explicit SG permissions
  • Broken webhooks can silently block target registration
  • Ingress ADDRESS updates only after full reconciliation
  • Always validate subnet tags before troubleshooting ALB

12. Conclusion

Ingress with ALB provides a powerful, scalable, and production-ready way to expose applications in EKS.
However, it relies on tight integration between Kubernetes and AWS infrastructure, and misalignment at any layer can lead to hard‑to‑debug issues.

Following the checklists and best practices above will ensure:

  • Faster deployments
  • Predictable behavior
  • Reduced downtime
  • Easier troubleshooting

Happy Learning & Reliable Kubernetes! 🚀

Follow me on LinkedIn: www.linkedin.com/in/alok-shankar-55b94826

Top comments (0)