Inside Kubernetes: How ClusterIP Services Route Traffic to Pods With Real-World Debugging Case Study

#kubernetes #devops #aws #discuss

Understand how ClusterIP routes traffic in Kubernetes, learn from real-world production failures, and fix them like a pro. This guide covers everything from kube-proxy internals to service misconfigurations and observability.

ClusterIP is the default service type in Kubernetes. It enables intra-cluster communication by exposing a virtual IP address that distributes traffic to backend pods.

But when things break in production, understanding how ClusterIP works internally is essential to resolving outages fast. This guide walks you through the ClusterIP traffic flow, and shares two real-world production issues, including how they were detected and fixed.

How ClusterIP Services Route Traffic Internally

Here’s the step-by-step path of a packet routed through a ClusterIP:

1. Client (e.g., frontend pod or app)
2. DNS Query → CoreDNS responds with ClusterIP
3. Client sends request to ClusterIP:Port
4. Kube-proxy intercepts traffic (iptables/ipvs)
5. Kube-proxy chooses Pod IP from Endpoints list
6. Packet routed via node bridge (e.g., cni0)
7. Traffic enters pod’s veth interface
8. Container receives traffic on targetPort

This routing works seamlessly—until it doesn’t.

Real-World Debug Case Study #1: The Mystery of the Missing Traffic

The Problem
A development team deployed a Node.js app and exposed it via a ClusterIP service.

DNS was resolving.
Pods were healthy.

But traffic wasn’t reaching the app.
Investigation

$ kubectl get svc my-app
NAME      TYPE        CLUSTER-IP      PORT(S)   AGE
my-app    ClusterIP   10.96.0.10      80/TCP    3m

$ kubectl get endpoints my-app
NAME      ENDPOINTS   AGE
my-app    <none>      3m

Endpoints were empty!
Let’s inspect pod labels:

$ kubectl get pods --show-labels
my-app-5db77c68c5-x8b4z   Running   ...   app=node-service

But the Service was expecting:

selector:
  app: my-app

Root Cause
Label mismatch. The service selector didn’t match any pods, so no endpoints were created, and traffic had nowhere to go.

Fix

kubectl label pod my-app-5db77c68c5-x8b4z app=my-app --overwrite

Now the endpoint object populated and traffic began flowing.

Real-World Debug Case Study #2: Backend Unreachable

Symptom
Frontend service couldn’t reach the backend service via ClusterIP.

Investigation

kubectl get svc backend-svc -o wide
kubectl get endpoints backend-svc

Root Cause

# Service selector:
selector:
  app: backend

# Pod label:
app: backend-v2

The mismatch meant Endpoints object had no IPs, and kube-proxy had no targets.

Fix

kubectl label pods <backend-pod-name> app=backend

Once fixed, traffic routed correctly again.

Why Do ClusterIP Issues Arise?

| Cause               | Impact                                                             |
| ------------------- | ------------------------------------------------------------------ |
| ❌ Label mismatch    | Services won’t resolve to any pods                                 |
| 🔁 Rolling updates  | Old pods might receive traffic unless readinessProbe is used       |
| 🛑 NetworkPolicies  | Might silently drop ClusterIP traffic                              |
| ⚠️ Hairpin Mode     | A pod calling its own ClusterIP IP might fail  without hairpin mode |
| 🚨 kube-proxy crash | Traffic never routed; iptables/ipvs rules stale                    |

Observability: How to Monitor ClusterIP Behavior

Use Prometheus metrics to detect issues like traffic drops or missing endpoints:

- kube_endpoint_address_not_ready_total
- kube_service_spec_type
- kube_proxy_sync_proxy_rules_duration_seconds
- container_network_transmit_errors_total

Grafana Panels:

1.Endpoint readiness timeline

2.CoreDNS response latency

3.Kube-proxy rule sync duration histogram

Best Practices to Avoid ClusterIP Failures

| Practice                          | Benefit                               |
| --------------------------------- | ------------------------------------- |
| ✔️ Use `readinessProbe`           | Avoid routing traffic to unready pods |
| ✔️ Monitor `Endpoints` objects    | Detect misconfigurations quickly      |
| ✔️ Audit NetworkPolicies          | Prevent accidental traffic drops      |
| ✔️ Enable kube-proxy alerts       | Spot sync failures or crash loops     |
| ✔️ Align pod labels and selectors | Prevent broken service routing        |

How to Simulate a ClusterIP Failure (For Testing)

kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: bad-service
spec:
  selector:
    app: non-existent
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
EOF

kubectl get svc bad-service
kubectl get endpoints bad-service

you'll see this one

NAME          ENDPOINTS   AGE
bad-service   <none>      5s

Perfect for training and testing alerting pipelines.

ClusterIP Traffic Flow Diagram

[1] User/Pod/App
     |
     |  --> Makes request to service DNS name (e.g., my-service.default.svc.cluster.local)
     v

[2] CoreDNS
     |
     |  --> Resolves DNS name to ClusterIP
     |      (Metric: coredns_dns_request_duration_seconds)
     v

[3] ClusterIP Service
     |
     |  --> Receives packet on virtual IP
     |  --> Load balances to one of the pod endpoints
     v

[4] kube-proxy (running on node)
     |
     |  --> Uses iptables/ipvs to redirect to pod IP
     |      (Metric: kube_proxy_sync_proxy_rules_duration_seconds)
     v

[5] CNI Plugin (e.g., Calico, Flannel, Cilium)
     |
     |  --> Handles pod networking
     |  --> Sends packet through veth pair to correct pod
     v

[6] Pod
     |
     |  --> Container receives packet via eth0
     |  --> Application logic handles the request
     v

[7] Response path (reversed) back to the client

ClusterIP might look like a magical black box—but it’s not. By understanding how services, kube-proxy, endpoints, and pod networking work together, you’ll debug traffic issues with confidence.