DEV Community

Aisalkyn Aidarova
Aisalkyn Aidarova

Posted on

Kubernetes Networking — DevOps Production Guide

Big Picture (Traffic Reality)

Image

Image

Image

User / Browser
      ↓
Ingress (optional)
      ↓
Service
      ↓
Endpoints
      ↓
Pod
      ↓
Container
Enter fullscreen mode Exit fullscreen mode

If traffic fails, it always fails at one of these layers.


1️⃣ Pod Networking (Lowest Level)

What a Pod Is (Networking Perspective)

  • Each Pod gets one IP
  • Containers inside a Pod share:

    • Network namespace
    • localhost
  • Containers communicate via localhost

Key Rule

Pods are ephemeral. Their IPs must never be used directly.


How Pod Networking Breaks

Common Failures

  • Container not listening on correct port
  • App listening on 127.0.0.1 instead of 0.0.0.0
  • Readiness probe fails → Pod removed from Service

Troubleshooting

kubectl get pod
kubectl describe pod <pod>
kubectl logs <pod>
kubectl exec -it <pod> -- netstat -tuln
Enter fullscreen mode Exit fullscreen mode

Interview Answer

Q: Can we access a Pod directly in production?
A: No. Pod IPs are dynamic and replaced frequently. Services abstract Pod networking.


2️⃣ ClusterIP Service (Most Used in Production)

Image

Image

What It Does

  • Internal load balancing
  • Stable virtual IP
  • DNS name inside cluster

Example DNS:

web-svc.default.svc.cluster.local
Enter fullscreen mode Exit fullscreen mode

When It Is Used

  • Backend services
  • Microservice-to-microservice communication
  • Internal APIs

This is the most common Service in production.


How It Works Internally

Service
  ↓ selector
Endpoints
  ↓
Pod IPs
Enter fullscreen mode Exit fullscreen mode

Common Failure Scenarios

❌ Service Exists but No Traffic

kubectl get svc
kubectl get endpoints web-svc
Enter fullscreen mode Exit fullscreen mode

If endpoints are empty → selector mismatch


❌ Pod Running but Not Reachable

Causes:

  • Wrong targetPort
  • App not listening
  • Readiness probe failing

How to Fix

  • Check labels:
kubectl get pods --show-labels
Enter fullscreen mode Exit fullscreen mode
  • Compare with Service selector
  • Fix labels or selector

Pros / Cons

Pros

  • Stable
  • Internal only
  • Scales well

Cons

  • Not accessible outside cluster

Interview Answers

Q: Why is ClusterIP preferred in production?
A: It enforces internal-only access and works with Ingress for controlled exposure.


3️⃣ NodePort Service (Debug / Rare Production Use)

Image

Image

What It Does

  • Opens a port on every node
  • Traffic goes:
NodeIP:NodePort → Service → Pod
Enter fullscreen mode Exit fullscreen mode

When It Is Used

  • Debugging
  • Learning
  • Temporary access

Rarely used in real production


Problems with NodePort

❌ Security Risk

  • Port open on all nodes

❌ Unstable

  • Node IPs change
  • Manual management

Troubleshooting NodePort

kubectl get svc
kubectl describe svc
kubectl get nodes -o wide
Enter fullscreen mode Exit fullscreen mode

Test:

curl http://<NodeIP>:<NodePort>
Enter fullscreen mode Exit fullscreen mode

Pros / Cons

Pros

  • Simple
  • No cloud dependency

Cons

  • Not scalable
  • Poor security
  • No TLS
  • No routing rules

Interview Answers

Q: Why not use NodePort in production?
A: It exposes every node directly, lacks routing and security controls, and doesn’t scale.


4️⃣ LoadBalancer Service (Cloud Managed Entry)

Image

Image

What It Does

  • Creates cloud load balancer (AWS / GCP / Azure)
  • Provides external IP
User → Cloud LB → Service → Pod
Enter fullscreen mode Exit fullscreen mode

When It Is Used

  • Simple external exposure
  • Legacy systems
  • Small setups

Problems in Production

  • One LB per service (expensive)
  • No path-based routing
  • Limited flexibility

Troubleshooting

kubectl get svc
kubectl describe svc
Enter fullscreen mode Exit fullscreen mode

Check cloud console:

  • Health checks
  • Security groups

Pros / Cons

Pros

  • Simple external access
  • Cloud-managed

Cons

  • Expensive
  • Limited routing
  • Not flexible

Interview Answers

Q: Why use Ingress instead of LoadBalancer?
A: Ingress provides routing, TLS, and multiple services behind one entry point.


5️⃣ Ingress (Real Production Standard)

Image

Image

What Ingress Is

  • HTTP/HTTPS routing layer
  • Requires Ingress Controller
  • One entry point for many services

Traffic Flow

User
 ↓
Ingress Controller (NGINX / ALB)
 ↓
Service
 ↓
Pod
Enter fullscreen mode Exit fullscreen mode

What Ingress Solves

  • Path-based routing
  • Host-based routing
  • TLS termination
  • Canary / Blue-Green

Common Failures

❌ Ingress Exists but No Traffic

Causes:

  • Ingress Controller not installed
  • Wrong service name
  • Wrong port

Check:

kubectl get pods -n ingress-nginx
kubectl describe ingress
Enter fullscreen mode Exit fullscreen mode

❌ 404 from Ingress

  • Path mismatch
  • Service backend incorrect

Pros / Cons

Pros

  • Production-ready
  • Secure
  • Flexible
  • Scales well

Cons

  • Requires understanding
  • Controller dependency

Interview Answers

Q: How do you expose multiple services under one domain?
A: Use Ingress with path or host-based routing.


6️⃣ DNS in Kubernetes

How DNS Works

Service:

service.namespace.svc.cluster.local
Enter fullscreen mode Exit fullscreen mode

DNS Failures

  • CoreDNS down
  • Service not created
  • Wrong namespace

Check:

kubectl get pods -n kube-system | grep dns
Enter fullscreen mode Exit fullscreen mode

Interview Answers

Q: How do Pods find each other?
A: Through Kubernetes DNS resolving Services to ClusterIP.


7️⃣ Endpoint Failures (Most Common Production Bug)

Image

Image

Endpoint Empty = No Traffic

Reasons:

  • Selector mismatch
  • Pod not Ready
  • Label typo

Check:

kubectl get endpoints
Enter fullscreen mode Exit fullscreen mode

8️⃣ How DevOps Responds to Network Incidents

Standard Debug Order

  1. Ingress
  2. Service
  3. Endpoints
  4. Pod
  5. Container

Never jump randomly.


Example Incident

Symptom: App Running, Browser Blank
Root Cause: Service selector mismatch
Fix: Align labels


9️⃣ Interview Cheat Sheet (Quick)

Question Correct Answer
Most used Service ClusterIP
External prod traffic Ingress
Why not NodePort Security & scalability
Pod IP usage Never directly
Empty endpoints Selector or readiness issue

Final Reality

Kubernetes networking failures are:

  • Not magic
  • Always observable
  • Always layered

Understanding traffic flow is the difference between:

  • Junior DevOps
  • Production-ready DevOps

Top comments (0)