Let’s remove the training wheels. We are going to break down exactly how these concepts operate mechanically under the hood, how bytes physically travel through an AWS environment, and what a Senior DevOps Engineer (6+ Years Experience) writes, debugs, and architects daily in production.
You cannot understand EKS Services without understanding how Pods get their IPs. Vanilla Kubernetes uses an "overlay network" (like Flannel or Calico vxlan), encapsulating packets inside packets. EKS does not do this by default. It uses the native AWS VPC CNI.
Under the Hood
Every Pod is a first-class citizen in your AWS VPC. It gets a real, routable IP address pulled directly from your AWS Subnet's CIDR block.
-
The
aws-nodeDaemonSet: Runs on every worker node. It consists of two components: the CNI Plugin (which wires up network interfaces) and the IPAMD (IP Address Management Daemon). -
Warm Pools:
ipamdkeeps a pool of Elastic Network Interfaces (ENIs) and secondary IPv4 addresses pre-attached to your EC2 worker nodes so that when a Pod schedules, it gets an IP instantly.
+--------------------------------------------------------------+
| Worker Node (EC2 Instance) |
| [Primary ENI (eth0)] -> Host Node IP (10.0.1.50) |
| |
| [Secondary ENI (eth1)] |
| |-- Secondary IP 1 -> Assigned to Pod A (10.0.1.61) |
| |-- Secondary IP 2 -> Assigned to Pod B (10.0.1.62) |
+--------------------------------------------------------------+
Senior Architectural Engineering: The IP Exhaustion Problem
Every EC2 instance size has a hard limit on how many ENIs and secondary IPs it can host. For example, a t3.medium can attach 3 ENIs, and each ENI can hold 6 IPs.
$$\text{Max Pods} = (\text{ENIs} \times (\text{IPs per ENI} - 1)) + 2$$
A t3.medium maxes out at 17 Pods. If your subnet is small (e.g., a /24), a few large nodes will completely consume your subnet's IP addresses, preventing scaling.
Senior Solutions implemented at 6+ years:
-
Prefix Delegation: Instead of allocating individual secondary
/32IPs,ipamdallocates entire/28blocks (16 IPs) to the ENI. This increases pod density per node dramatically (up to the K8s recommended 110 pods per node). -
Custom Networking: You configure the VPC CNI to assign Pod IPs from an entirely separate, non-routable secondary VPC CIDR block (e.g.,
100.64.0.0/16CGNAT space), saving your primary corporate subnet IPs for the actual EC2 nodes.
2. ClusterIP & kube-proxy Core Mechanics
When you define a ClusterIP service, Kubernetes creates a stable virtual IP address. But this IP does not exist on any physical network card. It is a ghost IP.
The Linux Kernel Data Path (iptables vs IPVS)
Every node runs a daemon called kube-proxy. It watches the Kubernetes API server for new Services and EndpointSlices (the real IPs of the backend pods matching your service selector).
[Pod A] ---> Tries to talk to ClusterIP (10.100.0.15:80)
|
(Linux Kernel intercept via Netfilter)
|
[iptables / IPVS Rules Engine]
|
(Changes Destination IP via DNAT)
|
v
[Pod B Real IP (10.0.1.62:8080)]
-
iptablesMode (Default):kube-proxywrites sequential sequentialO(N)evaluation rules inside the Linux kernel's Netfilter stack. When a packet leaves a pod targeting a ClusterIP, the kernel intercepts it, executes a DNAT (Destination Network Address Translation), and swaps the ClusterIP with a randomly selected healthy Pod IP. -
The 6-Year Gotcha: At large scales (over 5,000 services),
iptablescauses massive CPU overhead because every single network packet must traverse a massive, sequential list of rules. -
Production Fix: Senior engineers switch
kube-proxyto IPVS (IP Virtual Server) mode. IPVS utilizes a Netfilter hash tableO(1), allowing lookup times to remain completely flat regardless of how many thousands of microservices exist in the cluster.
3. NodePort: The Multi-Hop Bridge
A NodePort service allocates a port across every worker node (30000-32767).
The Hidden Packet Flow
If an external client hits Node-1-IP:32145, the traffic path looks like this:
- Packet arrives at Node 1.
- Node 1's
iptablescatches port32145and maps it internally to the correspondingClusterIP. - The rule randomly selects a backend pod. If that pod happens to live on Node 2, Node 1 performs an SNAT (Source NAT) and forwards the packet across the AWS network to Node 2.
- Node 2 delivers it to the Pod.
Senior Structural Problem: externalTrafficPolicy
Notice the extra network hop between Node 1 and Node 2. This increases latency and erases the client's real IP address (the pod sees Node 1's IP as the source).
Senior engineers modify the service manifest:
spec:
type: NodePort
externalTrafficPolicy: Local # <--- CRITICAL
-
Localpolicy: Forces the node that receives the traffic to only route it to pods living on that exact same node. If no local pods exist, the packet is dropped. This preserves the original Client IP and removes the inter-node network hop.
4. Ingress & AWS Load Balancer Controller (Enterprise Tier)
An Ingress is a collection of Layer 7 (Application Layer) routing rules. In EKS, you deploy the AWS Load Balancer Controller, an open-source operator that sits in your cluster, watches for Ingress objects, and calls AWS APIs to create an Application Load Balancer.
Architectural Deep Dive: Target-Type Modes
A Senior Engineer carefully chooses between two design modes using annotations:
alb.ingress.kubernetes.io/target-type: instance
The ALB targets the EC2 worker nodes using a NodePort.
-
Path: Client $\rightarrow$ ALB $\rightarrow$ NodePort (EC2 Instance) $\rightarrow$
kube-proxy(iptables) $\rightarrow$ Pod IP. - Cons: Double hopping, higher latency, complex health checking.
alb.ingress.kubernetes.io/target-type: ip
The ALB bypasses the EC2 instances completely and targets the Pods directly. This is only possible because the AWS VPC CNI gives Pods real VPC IPs.
- Path: Client $\rightarrow$ ALB $\rightarrow$ Pod IP directly.
-
Pros: Blazing fast, zero
kube-proxyinterference, cleaner health checks, direct traffic pattern.
[Internet Client]
|
v
[AWS ALB]
|
+-----------------------+ (Target Type: IP)
| |
v v
[Pod 1 (10.0.1.61)] [Pod 2 (10.0.1.62)]
5. Egress Architechture & Security Boundaries
Managing outbound traffic is a massive part of auditing and compliance (PCI-DSS, SOC2).
The Infrastructure Layer
Pods live on nodes inside Private Subnets. When they call an external API (e.g., Salesforce, GitHub), the traffic passes from the Pod $\rightarrow$ ENI $\rightarrow$ Private Subnet Route Table $\rightarrow$ AWS NAT Gateway (living in a Public Subnet) $\rightarrow$ Internet.
- The NAT Gateway maps the internal IP to a single public Elastic IP (EIP).
The Senior Level Layer-7 Security Problem
Standard Kubernetes NetworkPolicies operate at Layer 3/4 (IP and Port). They cannot inspect domain names. If a malicious dependency slips into your application code, it can easily exfiltrate data to a domain like malicious-attacker.com over standard port 443, bypassing standard network policies.
Senior Design Implementations:
- Deploy Cilium utilizing eBPF to implement L7 network policies:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: restrict-egress-to-stripe
spec:
endpointSelector:
matchLabels:
app: payment-processor
egress:
- toFQDNs:
- matchName: "api.stripe.com" # <--- Only allow out to this domain
toPorts:
- ports:
- port: "443"
protocol: TCP
6. Real-World Troubleshooting Playbook for a Senior Engineer
When an application times out inside EKS, a Senior Engineer does not guess; they trace the network stack systemically.
[Is DNS Resolving?]
/ \
(No) / \ (Yes)
v v
Check CoreDNS Logs [Can Pod contact ClusterIP?]
Verify NodeLocal Cache / \
(No) / \ (Yes)
v v
Check kube-proxy rules Check Ingress / ALB Targets
Verify EndpointSlices Verify Security Groups
1. "My Ingress returns a 502 Bad Gateway"
- Senior Action: Check the AWS ALB Target Group status via the AWS console or CLI. If targets are unhealthy, check the Kubernetes Pod Readiness Probes. If the container's readiness probe fails, the AWS Load Balancer Controller removes the Pod IP from the ALB Target Group, causing a 502.
- Security Group Check: Ensure the Security Group attached to the ALB allows inbound traffic to the Worker Node/Pod security groups on the application port.
2. "Intermittent DNS Resolution Timeouts (5-second delays)"
-
Senior Action: This is a famous Linux kernel bug involving
glibctracking concurrent UDP requests (ndotsproblem). -
Resolution: Deploy NodeLocal DNSCache as a DaemonSet to handle DNS lookup requests locally on the node via a loopback interface (
169.254.20.10), cutting out connection tracking overhead entirely.
3. "The Pod can't connect to an AWS RDS Database outside the cluster"
- Senior Action:
- Run
kubectl get pod -o wideto determine the Pod's actual IP. - Check the AWS Security Group assigned to the RDS instance. Ensure it allows ingress from the Pod's specific IP block (or the Security Group assigned directly to the Pod if using Security Groups for Pods via Branch ENIs).
- Verify that the routing tables in the EKS node's subnets point correctly to the VPC subnets hosting the database.
Top comments (0)