alok shankar

Posted on Apr 16

EKS Ingress Address Not Assigned (Application Outage)-Incident & Resolution Guide

#devops #aws #eks #productivity

1. Introduction
In Kubernetes, applications are typically exposed internally using Services (ClusterIP, NodePort). However, for exposing applications externally in a scalable, secure, and cloud‑native manner, Kubernetes provides the concept of Ingress.

What is Ingress?
Ingress is a Kubernetes API object that manages external HTTP/HTTPS access to services within a cluster. It provides:

Layer‑7 routing (path‑based, host‑based)
TLS termination
Centralized traffic management

Ingress works in conjunction with an Ingress Controller, which implements the actual traffic routing logic.

Why Ingress instead of NodePort / LoadBalancer?

In AWS EKS, the recommended production approach is Ingress with AWS Application Load Balancer (ALB) using the AWS Load Balancer Controller.

Incident Overview
I encountered an issue in the EKS environment where an application became inaccessible from outside the cluster.
Although the application pods were running and Kubernetes services were healthy, external users were unable to access the application URL.

Upon investigation, it was observed that the Ingress resource was created successfully, but the ADDRESS field of the Ingress remained empty (null).
As a result, no valid Load Balancer endpoint was available to route external traffic to the application.
This issue closely resembled a production outage scenario, as it directly impacted external traffic routing despite the application itself being operational.

[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl get ingress app-ingress -n ep-apps -o wide
NAME          CLASS   HOSTS   ADDRESS   PORTS   AGE
app-ingress   alb     *                 80      10h

Impact

External users could not access the application.
No ALB DNS was available from the Ingress.
Target Group showed 0 registered targets.
Application health appeared normal internally, which made the issue non-obvious at first glance.

Use Case
Business Scenario

Application deployed in EKS.
Needs to be exposed externally over HTTPS
Uses path-based routing
Requires container-level health checks.

2. Architecture Overview
High-Level Flow

3. Timeline of Events:

The application was deployed successfully in the EKS cluster.
Pods were in Running state and passing readiness and liveness probes.
Kubernetes Service (ClusterIP) showed valid endpoints.
An ALB-backed Ingress was created to expose the application externally.
Despite successful Ingress creation, the Ingress ADDRESS field remained empty.
AWS Console showed an ALB and Target Group created, but the Target Group had zero registered targets.
Because the Ingress did not publish an ADDRESS, application traffic could not reach the cluster.
This resulted in an outage-like situation where the application was “up” internally but unreachable externally.

4. Initial Observation
At a high level, everything appeared correct:

Pods were healthy.

[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl get pods -A
NAMESPACE           NAME                                                              READY   STATUS    RESTARTS   AGE
amazon-cloudwatch   amazon-cloudwatch-observability-controller-manager-586c44c2cclk   1/1     Running   0          7h6m
amazon-cloudwatch   cloudwatch-agent-xxx                                              1/1     Running   0          6h41m
amazon-cloudwatch   cloudwatch-agent-xxxx                                             1/1     Running   0          6h41m
amazon-cloudwatch   fluent-bit-xxxx                                                   1/1     Running   0          6h41m
amazon-cloudwatch   fluent-bit-xxxx                                                   1/1     Running   0          6h41m
external-dns        external-dns-75f7b59749-dfkgn                                     1/1     Running   0          24h
ep-apps             condition-service-96475888c-bdmdn                                 1/1     Running   0          22h
ep-apps             web-query-service-78b5d4dcb7-nms56                                1/1     Running   0          23h
ep-apps             web-query-service-78b5d4dcb7-xlfj9                                1/1     Running   0          23h
ep-apps             web-apps-59658b6868-fkwvp                                         1/1     Running   0          22h
kube-system         aws-node-4xrsc                                                    2/2     Running   0          24h
kube-system         aws-load-balancer-controller-78bddb649b-w56d5                     1/1     Running   0          24h
kube-system         aws-load-balancer-controller-78bddb649b-z5s5g                     1/1     Running   0          24h
kube-system         aws-node-ncp5f                                                    2/2     Running   0          24h

Service endpoints existed.
Ingress configuration looked valid
ALB resources were present in AWS
However, traffic was not flowing due to the missing Ingress ADDRESS, indicating a failure in Ingress‑to‑ALB reconciliation.

5. Root Cause Analysis (What Went Wrong)

This issue was not a single problem, but a chain of configuration gaps.
Root Causes Identified

5.1 Ingress Group Conflict

TEST ingress was using DEV group name.
Caused ALB ownership conflict.

[ec2-user@ip-xx-xxx-xxx-xxx ~]$ kubectl describe ingress app-ingress -n ep-apps
Name:             app-ingress
Labels:           app=xxx
                  app.kubernetes.io/name=app-ingress
                  app.kubernetes.io/part-of=ep
Namespace:        ep-apps
Address:
Ingress Class:    alb
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *
              /   app:80 (xx.xx.xx.xxx:xxxx,xxx.xx.xx.xx.xxx:xxxx)
**_Annotations:  alb.ingress.kubernetes.io/group.name: app-dev_**
              alb.ingress.kubernetes.io/group.order: 100
              alb.ingress.kubernetes.io/healthcheck-interval-seconds: 30
              alb.ingress.kubernetes.io/healthcheck-path: /api/health

5.2 ACM Certificate Issue

1. Certificate attached was in PENDING_VALIDATION
2. ALB HTTPS listener creation failed

5.3 Subnet Tagging Missing

Public subnets lacked required tags
ALB could not discover subnets correctly

5.4 Broken ALB Controller Webhook

aws-load-balancer-webhook service had no endpoints
Blocked creation of TargetGroupBinding
Prevented Pod IP registration

5.5 Ingress Finalizer Stuck

Failed reconciliation added finalizer
Controller unable to clean up state

6. Solution Applied

Step-by-Step Resolution

6.1 Correct Ingress Group

[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl annotate ingress app-ingress -n ep-apps \
  alb.ingress.kubernetes.io/group.name=app-test \
  --overwrite
ingress.networking.k8s.io/app-ingress annotated

6.2 Use ISSUED Valid ACM Certificate

[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl annotate ingress app-ingress -n ep-apps \
  alb.ingress.kubernetes.io/certificate-arn=arn:aws:acm:us-west-2:xxxxxxxxxxx:certificate/xxxxxxxxxxxxxxxxxxxxxxx \
  --overwrite
ingress.networking.k8s.io/web-ingress annotated

6.3 Tag Public Subnets (Mandatory)

kubernetes.io/role/elb = 1
kubernetes.io/cluster/<cluster-name> = shared

6.4 Allow ALB → Node Traffic (Critical for IP Mode)

Inbound rule on worker node security group.

6.5 Remove Broken ALB Webhook

Command to check logs

kubectl logs -n kube-system deployment/aws-load-balancer-controller --tail=200

{"level":"error","ts":"2026-04-15T04:30:28Z","msg":"Reconciler error","controller":"ingress","object":{"name":"ep-test"},"namespace":"","name":"ep-test","reconcileID":"7ea1f646-368e-473f-b6b4-cc0a76cf4785","error":"Internal error occurred: failed calling webhook \"mtargetgroupbinding.elbv2.k8s.aws\": failed to call webhook: Post \"https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s\": context deadline exceeded"}
{"level":"error","ts":"2026-04-15T04:30:32Z","msg":"Reconciler error","controller":"ingress","object":{"name":"search-query-service","namespace":"ep-apps"},"namespace":"ep-apps","name":"search-query-service","reconcileID":"970d799a-1982-4e78-9791-76daa6a54d4d","error":"Internal error occurred: failed calling webhook \"mtargetgroupbinding.elbv2.k8s.aws\": failed to call webhook: Post \"https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s\": context deadline exceeded"}

Solution-

[ec2-user@ip-xxx-xxxx-xxxx ~]$ kubectl get mutatingwebhookconfigurations
NAME                                                             WEBHOOKS   AGE
amazon-cloudwatch-observability-mutating-webhook-configuration   5          18h
aws-load-balancer-webhook                                        6          11h
pod-identity-webhook                                             1          6d21h
vpc-resource-mutating-webhook                                    1          6d21h
[ec2-user@ip-xxx-xxxx-xxxx ~]$ kubectl delete mutatingwebhookconfiguration aws-load-balancer-webhook
mutatingwebhookconfiguration.admissionregistration.k8s.io "aws-load-balancer-webhook" deleted

6.6 Rollout and restart deployment & Recreate Ingress

Restart the ALB controller
✅ This forces the controller to:

Re‑build the model
Create TargetGroupBinding
Register pod IPs
Update ingress status

[ec2-user@ip-xx-xxx-xxx ~]$ kubectl rollout restart deployment aws-load-balancer-controller -n kube-system
deployment.apps/aws-load-balancer-controller restarted

✅ 7. FINAL VERIFICATION

[ec2-user@ip-xx-xxx-xx-xx ~]$ kubectl get ingress app-ingress -n ep-apps -o wide
NAME          CLASS   HOSTS   ADDRESS                                                        PORTS   AGE
app-ingress   alb     *       k8s-eptest-erfs423536-xxxxxxxxxx.us-west-2.elb.amazonaws.com   80      10h

✅ 8. Validation Commands

kubectl get ingress -A
kubectl get endpoints -A
kubectl logs -n kube-system deployment/aws-load-balancer-controller
kubectl get targetgroupbinding -A

9. Final Outcome

✅ ALB created successfully
✅ Target Group registered Pod IPs
✅ Health checks passed
✅ Ingress ADDRESS populated

✅ Application accessible externally over HTTPS

10. Best Practices Checklist (Must Follow Every Time)

✅ Ingress Configuration Checklist

Environment-specific ingress group (dev/test/prod)
Valid target-type (ip or instance)
Correct service name and port
Health check path works from Pod

✅ ACM Certificate Checklist

Certificate status = ISSUED
Cert region = same as ALB
Domain matches DNS

✅ Subnet Checklist (CRITICAL)

For internet-facing ALB

Public subnets
Route to Internet Gateway
Tags: kubernetes.io/role/elb=1 kubernetes.io/cluster/=shared

✅ Security Group Checklist (IP Mode)

ALB SG allows inbound 80/443
Node SG allows inbound from ALB SG on container port
No restrictive NACLs

✅ Controller Health Checklist

aws-load-balancer-controller pods Running
No webhook timeouts in controller logs
TargetGroupBinding objects created

11. Key Learnings

ALB IP mode requires explicit SG permissions
Broken webhooks can silently block target registration
Ingress ADDRESS updates only after full reconciliation
Always validate subnet tags before troubleshooting ALB

12. Conclusion

Ingress with ALB provides a powerful, scalable, and production-ready way to expose applications in EKS.
However, it relies on tight integration between Kubernetes and AWS infrastructure, and misalignment at any layer can lead to hard‑to‑debug issues.

DEV Community

EKS Ingress Address Not Assigned (Application Outage)-Incident & Resolution Guide

Top comments (0)