1. Introduction
In Kubernetes, applications are typically exposed internally using Services (ClusterIP, NodePort). However, for exposing applications externally in a scalable, secure, and cloud‑native manner, Kubernetes provides the concept of Ingress.
What is Ingress?
Ingress is a Kubernetes API object that manages external HTTP/HTTPS access to services within a cluster. It provides:
- Layer‑7 routing (path‑based, host‑based)
- TLS termination
- Centralized traffic management
Ingress works in conjunction with an Ingress Controller, which implements the actual traffic routing logic.
Why Ingress instead of NodePort / LoadBalancer?
In AWS EKS, the recommended production approach is Ingress with AWS Application Load Balancer (ALB) using the AWS Load Balancer Controller.
Incident Overview
I encountered an issue in the EKS environment where an application became inaccessible from outside the cluster.
Although the application pods were running and Kubernetes services were healthy, external users were unable to access the application URL.
Upon investigation, it was observed that the Ingress resource was created successfully, but the ADDRESS field of the Ingress remained empty (null).
As a result, no valid Load Balancer endpoint was available to route external traffic to the application.
This issue closely resembled a production outage scenario, as it directly impacted external traffic routing despite the application itself being operational.
[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl get ingress app-ingress -n ep-apps -o wide
NAME CLASS HOSTS ADDRESS PORTS AGE
app-ingress alb * 80 10h
Impact
- External users could not access the application.
- No ALB DNS was available from the Ingress.
- Target Group showed 0 registered targets.
- Application health appeared normal internally, which made the issue non-obvious at first glance.
Use Case
Business Scenario
- Application deployed in EKS.
- Needs to be exposed externally over HTTPS
- Uses path-based routing
- Requires container-level health checks.
2. Architecture Overview
High-Level Flow
3. Timeline of Events:
- The application was deployed successfully in the EKS cluster.
- Pods were in Running state and passing readiness and liveness probes.
- Kubernetes Service (ClusterIP) showed valid endpoints.
- An ALB-backed Ingress was created to expose the application externally.
- Despite successful Ingress creation, the Ingress ADDRESS field remained empty.
- AWS Console showed an ALB and Target Group created, but the Target Group had zero registered targets.
- Because the Ingress did not publish an ADDRESS, application traffic could not reach the cluster.
- This resulted in an outage-like situation where the application was “up” internally but unreachable externally.
4. Initial Observation
At a high level, everything appeared correct:
- Pods were healthy.
[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
amazon-cloudwatch amazon-cloudwatch-observability-controller-manager-586c44c2cclk 1/1 Running 0 7h6m
amazon-cloudwatch cloudwatch-agent-xxx 1/1 Running 0 6h41m
amazon-cloudwatch cloudwatch-agent-xxxx 1/1 Running 0 6h41m
amazon-cloudwatch fluent-bit-xxxx 1/1 Running 0 6h41m
amazon-cloudwatch fluent-bit-xxxx 1/1 Running 0 6h41m
external-dns external-dns-75f7b59749-dfkgn 1/1 Running 0 24h
ep-apps condition-service-96475888c-bdmdn 1/1 Running 0 22h
ep-apps web-query-service-78b5d4dcb7-nms56 1/1 Running 0 23h
ep-apps web-query-service-78b5d4dcb7-xlfj9 1/1 Running 0 23h
ep-apps web-apps-59658b6868-fkwvp 1/1 Running 0 22h
kube-system aws-node-4xrsc 2/2 Running 0 24h
kube-system aws-load-balancer-controller-78bddb649b-w56d5 1/1 Running 0 24h
kube-system aws-load-balancer-controller-78bddb649b-z5s5g 1/1 Running 0 24h
kube-system aws-node-ncp5f 2/2 Running 0 24h
- Service endpoints existed.
- Ingress configuration looked valid
- ALB resources were present in AWS
- However, traffic was not flowing due to the missing Ingress ADDRESS, indicating a failure in Ingress‑to‑ALB reconciliation.
5. Root Cause Analysis (What Went Wrong)
This issue was not a single problem, but a chain of configuration gaps.
Root Causes Identified
5.1 Ingress Group Conflict
- TEST ingress was using DEV group name.
- Caused ALB ownership conflict.
[ec2-user@ip-xx-xxx-xxx-xxx ~]$ kubectl describe ingress app-ingress -n ep-apps
Name: app-ingress
Labels: app=xxx
app.kubernetes.io/name=app-ingress
app.kubernetes.io/part-of=ep
Namespace: ep-apps
Address:
Ingress Class: alb
Default backend: <default>
Rules:
Host Path Backends
---- ---- --------
*
/ app:80 (xx.xx.xx.xxx:xxxx,xxx.xx.xx.xx.xxx:xxxx)
**_Annotations: alb.ingress.kubernetes.io/group.name: app-dev_**
alb.ingress.kubernetes.io/group.order: 100
alb.ingress.kubernetes.io/healthcheck-interval-seconds: 30
alb.ingress.kubernetes.io/healthcheck-path: /api/health
5.2 ACM Certificate Issue
- 1. Certificate attached was in PENDING_VALIDATION
- 2. ALB HTTPS listener creation failed
5.3 Subnet Tagging Missing
- Public subnets lacked required tags
- ALB could not discover subnets correctly
5.4 Broken ALB Controller Webhook
- aws-load-balancer-webhook service had no endpoints
- Blocked creation of TargetGroupBinding
- Prevented Pod IP registration
5.5 Ingress Finalizer Stuck
- Failed reconciliation added finalizer
- Controller unable to clean up state
6. Solution Applied
Step-by-Step Resolution
6.1 Correct Ingress Group
[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl annotate ingress app-ingress -n ep-apps \
alb.ingress.kubernetes.io/group.name=app-test \
--overwrite
ingress.networking.k8s.io/app-ingress annotated
6.2 Use ISSUED Valid ACM Certificate
[ec2-user@ip-xx-xxx-xxx-xx ~]$ kubectl annotate ingress app-ingress -n ep-apps \
alb.ingress.kubernetes.io/certificate-arn=arn:aws:acm:us-west-2:xxxxxxxxxxx:certificate/xxxxxxxxxxxxxxxxxxxxxxx \
--overwrite
ingress.networking.k8s.io/web-ingress annotated
6.3 Tag Public Subnets (Mandatory)
kubernetes.io/role/elb = 1
kubernetes.io/cluster/<cluster-name> = shared
6.4 Allow ALB → Node Traffic (Critical for IP Mode)
Inbound rule on worker node security group.
6.5 Remove Broken ALB Webhook
Command to check logs
kubectl logs -n kube-system deployment/aws-load-balancer-controller --tail=200
{"level":"error","ts":"2026-04-15T04:30:28Z","msg":"Reconciler error","controller":"ingress","object":{"name":"ep-test"},"namespace":"","name":"ep-test","reconcileID":"7ea1f646-368e-473f-b6b4-cc0a76cf4785","error":"Internal error occurred: failed calling webhook \"mtargetgroupbinding.elbv2.k8s.aws\": failed to call webhook: Post \"https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s\": context deadline exceeded"}
{"level":"error","ts":"2026-04-15T04:30:32Z","msg":"Reconciler error","controller":"ingress","object":{"name":"search-query-service","namespace":"ep-apps"},"namespace":"ep-apps","name":"search-query-service","reconcileID":"970d799a-1982-4e78-9791-76daa6a54d4d","error":"Internal error occurred: failed calling webhook \"mtargetgroupbinding.elbv2.k8s.aws\": failed to call webhook: Post \"https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s\": context deadline exceeded"}
Solution-
[ec2-user@ip-xxx-xxxx-xxxx ~]$ kubectl get mutatingwebhookconfigurations
NAME WEBHOOKS AGE
amazon-cloudwatch-observability-mutating-webhook-configuration 5 18h
aws-load-balancer-webhook 6 11h
pod-identity-webhook 1 6d21h
vpc-resource-mutating-webhook 1 6d21h
[ec2-user@ip-xxx-xxxx-xxxx ~]$ kubectl delete mutatingwebhookconfiguration aws-load-balancer-webhook
mutatingwebhookconfiguration.admissionregistration.k8s.io "aws-load-balancer-webhook" deleted
6.6 Rollout and restart deployment & Recreate Ingress
Restart the ALB controller
✅ This forces the controller to:
- Re‑build the model
- Create TargetGroupBinding
- Register pod IPs
- Update ingress status
[ec2-user@ip-xx-xxx-xxx ~]$ kubectl rollout restart deployment aws-load-balancer-controller -n kube-system
deployment.apps/aws-load-balancer-controller restarted
✅ 7. FINAL VERIFICATION
[ec2-user@ip-xx-xxx-xx-xx ~]$ kubectl get ingress app-ingress -n ep-apps -o wide
NAME CLASS HOSTS ADDRESS PORTS AGE
app-ingress alb * k8s-eptest-erfs423536-xxxxxxxxxx.us-west-2.elb.amazonaws.com 80 10h
✅ 8. Validation Commands
kubectl get ingress -A
kubectl get endpoints -A
kubectl logs -n kube-system deployment/aws-load-balancer-controller
kubectl get targetgroupbinding -A
9. Final Outcome
✅ ALB created successfully
✅ Target Group registered Pod IPs
✅ Health checks passed
✅ Ingress ADDRESS populated
✅ Application accessible externally over HTTPS
10. Best Practices Checklist (Must Follow Every Time)
✅ Ingress Configuration Checklist
- Environment-specific ingress group (dev/test/prod)
- Valid target-type (ip or instance)
- Correct service name and port
- Health check path works from Pod
✅ ACM Certificate Checklist
- Certificate status = ISSUED
- Cert region = same as ALB
- Domain matches DNS
✅ Subnet Checklist (CRITICAL)
For internet-facing ALB
- Public subnets
- Route to Internet Gateway
- Tags: kubernetes.io/role/elb=1 kubernetes.io/cluster/=shared
✅ Security Group Checklist (IP Mode)
- ALB SG allows inbound 80/443
- Node SG allows inbound from ALB SG on container port
- No restrictive NACLs
✅ Controller Health Checklist
- aws-load-balancer-controller pods Running
- No webhook timeouts in controller logs
- TargetGroupBinding objects created
11. Key Learnings
- ALB IP mode requires explicit SG permissions
- Broken webhooks can silently block target registration
- Ingress ADDRESS updates only after full reconciliation
- Always validate subnet tags before troubleshooting ALB
12. Conclusion
Ingress with ALB provides a powerful, scalable, and production-ready way to expose applications in EKS.
However, it relies on tight integration between Kubernetes and AWS infrastructure, and misalignment at any layer can lead to hard‑to‑debug issues.
Following the checklists and best practices above will ensure:
- Faster deployments
- Predictable behavior
- Reduced downtime
- Easier troubleshooting
Happy Learning & Reliable Kubernetes! 🚀
Follow me on LinkedIn: www.linkedin.com/in/alok-shankar-55b94826





Top comments (0)