Aisalkyn Aidarova

Posted on Jun 15

Lecture: Architectural Masterclass — Kubernetes Networking In-Depth

Kubernetes networking operates on a fundamental principle: Every Pod gets its own unique, routable IP address. In traditional infrastructure, multiple applications on a single server have to share an IP address and fight over port allocations (e.g., App A uses port 8080, App B must use 8081). In Kubernetes, every Pod behaves like a distinct physical server or virtual machine on the network, eliminating port conflicts entirely.

1. The Four Layers of Kubernetes Networking

To understand how data flows through a cluster, we must break networking down into four distinct communication boundaries.

Layer 1: Container-to-Container Networking (Within the Same Pod)

The Mechanism: All containers inside a single Pod share the exact same network namespace. This means they share the same IP address, MAC address, and port space.
How they talk: They communicate with each other over the local loopback interface (localhost).
Real-World Use Case: A frontend application container talks to a local logging sidecar container on localhost:9000.

Layer 2: Pod-to-Pod Networking (Same Node vs. Across Nodes)

The foundational rule of Kubernetes networking is that any Pod must be able to communicate with any other Pod without utilizing Network Address Translation (NAT), regardless of which machine they live on.

On the Same Node: The virtual network interfaces (veth pairs) of the Pods are plugged into a local virtual bridge (like cbr0 or docker0) running on the host OS. Traffic flows across the bridge directly from one Pod's virtual interface to another.
Across Different Nodes: This requires a Container Network Interface (CNI) plugin. The CNI builds an overlay network (an encrypted or encapsulated tunnel using protocols like VXLAN or Geneve) or routes traffic natively using BGP. When Pod A on Node 1 sends a packet to Pod B on Node 2, the CNI encapsulates the packet inside a standard host-to-host physical packet, transmits it across the underlying data center network, and unpacks it on the destination node.

2. Deep Dive into the Container Network Interface (CNI)

Kubernetes does not have a built-in network provider. Instead, it exposes an interface specification called the CNI. When a Pod is created, the local kubelet agent calls the configured CNI plugin to provision a virtual network interface and assign an IP address.

As an engineer, choosing the right CNI determines cluster performance, security, and scalability:

Flannel: A lightweight overlay network provider. It uses standard VXLAN encapsulation. It is simple to configure but lacks advanced features like Network Policies.
Calico: An enterprise-grade provider that routes packets natively via Layer 3 using BGP (Border Gateway Protocol) without encapsulation overhead. It features a highly robust implementation of Kubernetes Network Policies.
Cilium: The modern industry standard. Cilium completely skips traditional Linux IPTables routing by leveraging eBPF (Extended Berkeley Packet Filter) directly inside the Linux Kernel. It routes packets at near-native hardware speeds and provides deep cryptographic visibility and security profiling.

3. Layer 3: Pod-to-Service Networking (The East-West Traffic Engine)

Because Pods are ephemeral, relying on individual Pod IPs for long-term internal routing is impossible. If a Pod crashes, its replacement gets a completely different IP.

A Service provides a stable, permanent IP address and DNS name that fronts a collection of identical Pods.

The Role of Kube-Proxy

Services are completely abstract concepts; they do not possess a real physical network interface or a network cable. They are managed by kube-proxy, a network daemon running on every single worker node.

When a Service is created, the control plane assigns it a virtual IP called a ClusterIP. kube-proxy watches the API server for these objects and immediately writes routing rules into the node's underlying operating system kernel using one of three modes:

IPVS Mode (IP Virtual Server): The modern production standard. It operates using netfilter hooks inside the Linux kernel to implement true Layer 4 load balancing via an efficient $O(1)$ hash table lookup. It scales to tens of thousands of services without degrading network performance.
IPTables Mode: The historical default. kube-proxy sequentially appends netfilter firewall rules for every service in the cluster. While functional, it operates on a linear lookup model ($O(N)$). If a cluster grows to thousands of services, evaluating millions of sequential IPTables rules for every single packet significantly degrades system performance and consumes excessive CPU.
Userspace Mode: The legacy model. Traffic is routed out of kernel space, up into the user application space of kube-proxy, and back down to the kernel. This double-context switch introduces severe latency and is completely deprecated in production environments.

CoreDNS: Service Discovery

Every time a Service is provisioned, the internal cluster DNS engine (CoreDNS) automatically registers a DNS entry mapped directly to that Service's virtual ClusterIP:

$$\texttt{..svc.cluster.local}$$

This allows an application to reliably communicate with an internal database using a static hostname like postgres-db.production instead of tracking shifting IP addresses.

4. Layer 4: Exposing Workloads to the Outside World (North-South Traffic)

To accept external requests from clients sitting outside the cluster boundary, you must transition from internal networking to external publishing models.

Service Types

ClusterIP: The default mode. It exposes the service on an internal-only cluster IP. This means the service is completely unreachable from outside the cluster network.
NodePort: Opens a dedicated high-order port (by default between 30000-32767) across the physical network interface of every single worker node in the cluster. External clients can hit any node's public IP address on that specific port to be automatically routed inside to the target application Pods.
LoadBalancer: The enterprise standard for cloud-based clusters. It instructs Kubernetes to reach out to your cloud provider's API (such as AWS, Google Cloud, or Azure) and provision a dedicated physical Cloud Load Balancer. The cloud load balancer automatically routes external public traffic into your cluster's underlying NodePort or ClusterIP networks.

5. Ingress Controllers and the Gateway API

While a Cloud LoadBalancer service works well, creating a unique cloud load balancer for every individual microservice becomes highly cost-prohibitive and complicated to manage. This is where edge routing abstractions step in.

The Ingress Architecture

An Ingress Controller acts as an application-layer (Layer 7) reverse proxy and application load balancer running right inside your cluster. You provision a single Cloud Load Balancer pointing directly to your Ingress Controller (e.g., Nginx Ingress, Traefik, AWS ALB Controller).

The Ingress Controller reads incoming HTTP requests, looks at the host header and URI path, and performs intelligent routing based on declarative rules:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: API-routing-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: company.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 8080
      - path: /billing
        pathType: Prefix
        backend:
          service:
            name: billing-service
            port:
              number: 9000

The Next Generation: Gateway API

As clusters scale across massive engineering organizations, the traditional Ingress resource breaks down because infrastructure configurations (like TLS certificates) and routing mechanisms are tightly coupled into a single file.

The Gateway API splits this monolithic structure into modular roles:

GatewayClass: Created by cluster administrators to define the underlying proxy infrastructure type (e.g., Envoy, Istio).
Gateway: Managed by the infrastructure operations team to define the public-facing entry points, listening ports, and global TLS certifications.
HTTPRoute / TCPRoute: Managed independently by software development teams to map their specific microservice endpoints behind the pre-established Gateway.

6. Securing the Network via Network Policies

By default, Kubernetes network design assumes complete trust—any Pod can send traffic to any other Pod across any namespace. To build a secure enterprise environment, you must implement a zero-trust architecture using Network Policies. Network Policies act as stateful firewalls for your Pods, controlling traffic at Layer 3 and Layer 4.

⚠️ Crucial Operational Warning: Network Policies are purely declarative specifications. If your cluster is running a CNI plugin that does not support network policies (such as raw Flannel), your Network Policy manifests will be successfully saved to the API server, but they will be completely ignored, leaving your cluster entirely wide open.

📄 Production Example: Isolating a Database Pod

This policy isolates any Pod tagged with app: backend-db. It implements a default-deny rule for all traffic, explicitly allowing incoming packets (Ingress) exclusively from Pods matching the label app: api-server on port 5432.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-security-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend-db
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-server
    ports:
    - protocol: TCP
      port: 5432

7. The Engineer's Guide to Network Troubleshooting

When an internal application fails to communicate over the network, a professional DevOps engineer uses a systematic triage strategy to isolate the root cause.

Step 1: Validate Pod-to-Pod Connectivity

Verify if the low-level overlay network is functional by passing traffic directly between Pod IPs.

# Get the raw IP addresses of your source and destination pods
kubectl get pods -o wide

# Execute an interactive ping or curl directly from one container to the destination Pod IP
kubectl exec -it target-pod-name -- curl http://<destination-pod-ip>:port/healthz

If this fails: The issue resides in your CNI overlay encapsulation, or a strict Network Policy is explicitly blocking the path.

Step 2: Validate Service VIP Transformation

Verify if kube-proxy is properly load balancing requests over the Service abstraction layer.

# Fetch the virtual ClusterIP of your service
kubectl get svc

# Attempt to communicate directly with the ClusterIP
kubectl exec -it target-pod-name -- curl http://<cluster-ip>:port/healthz

If Step 1 succeeds but Step 2 fails: The kube-proxy daemon on that specific node is likely frozen, or its local IPTables/IPVS routing tables have desynchronized. Check the health of the kube-proxy pods inside the kube-system namespace.

Step 3: Validate DNS Service Discovery

Verify if CoreDNS is successfully translating network hostnames to virtual cluster IPs.

# Execute a DNS lookup inside an application container
kubectl exec -it target-pod-name -- nslookup postgres-service.production.svc.cluster.local

If Step 2 succeeds but Step 3 fails: The internal CoreDNS deployment is misconfigured or overwhelmed. Check the deployment logs using kubectl logs -n kube-system deployment/coredns.

DEV Community