Part 9: Are You Okay? Health Probes for Resilient Applications

#kubernetes #devops #tutorial #beginners

We have a stateful, configurable application running in our cluster. Kubernetes already provides a degree of resilience through self-healing—if a Pod disappears, the Deployment replaces it.

But this raises a critical question: what does it mean for an application to be "healthy"?

Right now, Kubernetes only knows if the container process has started. It has no idea if the application inside the container is actually working. What if your app has started but is stuck in an infinite loop? What if it's frozen and can't respond to requests? What if it needs 30 seconds to warm up and load data before it's ready to serve traffic?

To Kubernetes, a running but broken application looks the same as a healthy one. This can lead to traffic being sent to Pods that can't handle it.

To solve this, Kubernetes provides Health Probes, which give the cluster deep insight into your application's true state.

The Three Types of Probes

There are three distinct probes you can configure, each answering a different question.

Liveness Probe: "Are you alive?"
- Purpose: This probe checks if your application is running and responsive.
- Action: If the liveness probe fails a certain number of times, Kubernetes concludes the container is deadlocked or unhealthy and restarts the container.
- Analogy: This is the paramedic checking for a pulse. No pulse? Time for a defibrillator (a restart).
Readiness Probe: "Are you ready to serve traffic?"
- Purpose: This probe checks if your application is ready to accept new connections. An application might be "alive" but still busy starting up, loading data, or running migrations.
- Action: If the readiness probe fails, Kubernetes knows the Pod is not ready for traffic. It removes the Pod's IP address from the Service's endpoint list. Traffic will no longer be sent to it until the probe succeeds again. The container is not restarted.
- Analogy: This is the restaurant host asking the kitchen, "Is table 5 ready for new customers?" If the kitchen says no, the host won't seat anyone there, but they don't tear down the kitchen.
Startup Probe: "How long do you need to start up?"
- Purpose: This probe is specifically for slow-starting applications. It disables the liveness and readiness checks until the startup probe succeeds, preventing the app from being killed before it's even ready.
- Action: Once the startup probe succeeds, Kubernetes hands off control to the liveness and readiness probes. If it fails, the container is restarted.

How Probes Work

A probe can check a container's health in three ways:

httpGet: It makes an HTTP GET request to a specific path and port. A response code between 200-399 is a success. (Perfect for web servers).
tcpSocket: It tries to open a TCP connection to a specific port. If it can connect, it's a success. (Good for non-HTTP services).
exec: It runs a command inside the container. An exit code of 0 is a success.

Adding Probes to Our Deployment

Let's add Liveness and Readiness probes to our Nginx deployment. Nginx serves a default page on the root path (/), which is perfect for an httpGet probe.

We will update our deployment.yaml to include a livenessProbe and a readinessProbe.

Here is the complete, updated deployment.yaml:

# deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      volumes:
      - name: nginx-storage
        persistentVolumeClaim:
          claimName: nginx-pvc
      containers:
      - name: nginx-web-server
        image: nginx
        volumeMounts:
        - name: nginx-storage
          mountPath: /usr/share/nginx/html
        ports: # It's good practice to declare the container port
        - containerPort: 80
        # --- NEW SECTION: Liveness Probe ---
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 15
          periodSeconds: 20
        # --- NEW SECTION: Readiness Probe ---
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
        envFrom:
        - configMapRef:
            name: app-config
        - secretRef:
            name: app-secret

Let's look at the new probe parameters:

httpGet: We're telling both probes to check the / path on port 80.
initialDelaySeconds: The liveness probe will wait 15 seconds after the container starts before performing its first check. The readiness probe will wait 5. This gives the app time to begin starting up.
periodSeconds: The liveness probe will be executed every 20 seconds, and the readiness every 10 seconds.

Apply and Verify

Apply the updated deployment manifest. Kubernetes will trigger a rolling update to create a new Pod with the probes configured.

kubectl apply -f deployment.yaml

To see the probes in action, the kubectl describe command is your best friend. Get your new Pod's name and then run:

# Replace <your-pod-name> with the name of the new Pod
kubectl describe pod <your-pod-name>

Scroll down to the Containers section and look under nginx-web-server. You'll see both the Liveness and Readiness probes defined. If you scroll to the very bottom to the Events section, you'll see messages from the kubelet (the agent on the node) confirming the probes are running and succeeding.

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  2m    default-scheduler  Successfully assigned default/hello-nginx...
  Normal  Pulled     119s  kubelet            Container image "nginx" already present...
  Normal  Created    119s  kubelet            Created container nginx-web-server
  Normal  Started    118s  kubelet            Started container nginx-web-server

Now your cluster won't just restart a dead container; it will also intelligently route traffic away from applications that are temporarily busy or not fully initialized.

What's Next

We have now built a robust, single application. It's configured, stateful, and resilient with detailed health checks.

But most real-world systems aren't just one application. They are a collection of microservices—a frontend, a user API, a billing API, etc.—all running in the same cluster. How do you manage traffic coming from the internet and route it to the correct service based on the URL? Exposing every service with its own LoadBalancer would be expensive and unwieldy.

In the next part, we will solve this by introducing the Ingress resource, the smart traffic cop and front door for your entire cluster.