Understand how ClusterIP routes traffic in Kubernetes, learn from real-world production failures, and fix them like a pro. This guide covers everything from kube-proxy internals to service misconfigurations and observability.
ClusterIP is the default service type in Kubernetes. It enables intra-cluster communication by exposing a virtual IP address that distributes traffic to backend pods.
But when things break in production, understanding how ClusterIP works internally is essential to resolving outages fast. This guide walks you through the ClusterIP traffic flow, and shares two real-world production issues, including how they were detected and fixed.
How ClusterIP Services Route Traffic Internally
Here’s the step-by-step path of a packet routed through a ClusterIP:
1. Client (e.g., frontend pod or app)
2. DNS Query → CoreDNS responds with ClusterIP
3. Client sends request to ClusterIP:Port
4. Kube-proxy intercepts traffic (iptables/ipvs)
5. Kube-proxy chooses Pod IP from Endpoints list
6. Packet routed via node bridge (e.g., cni0)
7. Traffic enters pod’s veth interface
8. Container receives traffic on targetPort
This routing works seamlessly—until it doesn’t.
Real-World Debug Case Study #1: The Mystery of the Missing Traffic
The Problem
A development team deployed a Node.js app and exposed it via a ClusterIP service.
DNS was resolving.
Pods were healthy.
But traffic wasn’t reaching the app.
Investigation
$ kubectl get svc my-app
NAME TYPE CLUSTER-IP PORT(S) AGE
my-app ClusterIP 10.96.0.10 80/TCP 3m
$ kubectl get endpoints my-app
NAME ENDPOINTS AGE
my-app <none> 3m
Endpoints were empty!
Let’s inspect pod labels:
$ kubectl get pods --show-labels
my-app-5db77c68c5-x8b4z Running ... app=node-service
But the Service was expecting:
selector:
app: my-app
Root Cause
Label mismatch. The service selector didn’t match any pods, so no endpoints were created, and traffic had nowhere to go.
Fix
kubectl label pod my-app-5db77c68c5-x8b4z app=my-app --overwrite
Now the endpoint object populated and traffic began flowing.
Real-World Debug Case Study #2: Backend Unreachable
Symptom
Frontend service couldn’t reach the backend service via ClusterIP.
Investigation
kubectl get svc backend-svc -o wide
kubectl get endpoints backend-svc
Root Cause
# Service selector:
selector:
app: backend
# Pod label:
app: backend-v2
The mismatch meant Endpoints object had no IPs, and kube-proxy had no targets.
Fix
kubectl label pods <backend-pod-name> app=backend
Once fixed, traffic routed correctly again.
Why Do ClusterIP Issues Arise?
| Cause | Impact |
| ------------------- | ------------------------------------------------------------------ |
| ❌ Label mismatch | Services won’t resolve to any pods |
| 🔁 Rolling updates | Old pods might receive traffic unless readinessProbe is used |
| 🛑 NetworkPolicies | Might silently drop ClusterIP traffic |
| ⚠️ Hairpin Mode | A pod calling its own ClusterIP IP might fail without hairpin mode |
| 🚨 kube-proxy crash | Traffic never routed; iptables/ipvs rules stale |
Observability: How to Monitor ClusterIP Behavior
Use Prometheus metrics to detect issues like traffic drops or missing endpoints:
- kube_endpoint_address_not_ready_total
- kube_service_spec_type
- kube_proxy_sync_proxy_rules_duration_seconds
- container_network_transmit_errors_total
Grafana Panels:
1.Endpoint readiness timeline
2.CoreDNS response latency
3.Kube-proxy rule sync duration histogram
Best Practices to Avoid ClusterIP Failures
| Practice | Benefit |
| --------------------------------- | ------------------------------------- |
| ✔️ Use `readinessProbe` | Avoid routing traffic to unready pods |
| ✔️ Monitor `Endpoints` objects | Detect misconfigurations quickly |
| ✔️ Audit NetworkPolicies | Prevent accidental traffic drops |
| ✔️ Enable kube-proxy alerts | Spot sync failures or crash loops |
| ✔️ Align pod labels and selectors | Prevent broken service routing |
How to Simulate a ClusterIP Failure (For Testing)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
name: bad-service
spec:
selector:
app: non-existent
ports:
- protocol: TCP
port: 80
targetPort: 8080
EOF
kubectl get svc bad-service
kubectl get endpoints bad-service
you'll see this one
NAME ENDPOINTS AGE
bad-service <none> 5s
Perfect for training and testing alerting pipelines.
ClusterIP Traffic Flow Diagram
[1] User/Pod/App
|
| --> Makes request to service DNS name (e.g., my-service.default.svc.cluster.local)
v
[2] CoreDNS
|
| --> Resolves DNS name to ClusterIP
| (Metric: coredns_dns_request_duration_seconds)
v
[3] ClusterIP Service
|
| --> Receives packet on virtual IP
| --> Load balances to one of the pod endpoints
v
[4] kube-proxy (running on node)
|
| --> Uses iptables/ipvs to redirect to pod IP
| (Metric: kube_proxy_sync_proxy_rules_duration_seconds)
v
[5] CNI Plugin (e.g., Calico, Flannel, Cilium)
|
| --> Handles pod networking
| --> Sends packet through veth pair to correct pod
v
[6] Pod
|
| --> Container receives packet via eth0
| --> Application logic handles the request
v
[7] Response path (reversed) back to the client
ClusterIP might look like a magical black box—but it’s not. By understanding how services, kube-proxy, endpoints, and pod networking work together, you’ll debug traffic issues with confidence.
Top comments (1)
Pretty cool seeing real debugging like this-no fluff, all stuff I run into too.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.