Dylan Dumont

Posted on Apr 2

Health Checks That Actually Work: Liveness vs Readiness vs Startup Probes

#architecture #observability #backend #devops

"Deploying a service that crashes during its warm-up window is a guaranteed outage until your liveness probes detect and restart it."

What We're Building

We are implementing a Go-based backend service that adheres to strict observability standards within a Kubernetes environment. The goal is to distinguish between the time an application spins up, the time it accepts traffic, and the time it detects internal deadlocks. We will use three distinct HTTP endpoints to signal different states to the orchestrator. This architecture prevents traffic from hitting a pod that cannot serve requests, while allowing the orchestrator to restart a frozen process without dropping in-flight connections.

Startup Probe:  [--- Waiting ---] -> Ready to Accept
Readiness Probe:   [--- Idle  ---] -> [--- Serving ---]
Liveness Probe:        [--- Running ---] -> [--- Crash ---]

Step 1 — Startup Probe Implementation

The startup probe tells the orchestrator to ignore traffic checks until the application has initialized core components like the signal handler or logging. We set an initial delay and a long timeout to match the heavy initialization cost.

In Go, we create a handler that simulates a cold start. We use a boolean flag ready initialized to false that flips only after heavy logic.

var ready = false

func main() {
    // Simulate initialization
    time.Sleep(10 * time.Second) 
    ready = true
}

This specific choice matters because it allows the container to boot up without failing the liveness check prematurely.

Step 2 — Readiness Probe Implementation

The readiness probe validates whether the application can actually handle requests from the user base. This check must verify database connections and external dependencies before returning a 200 status.

func readinessHandler(w http.ResponseWriter, r *http.Request) {
    if !dbIsConnected {
        http.Error(w, "Dependencies not ready", http.StatusServiceUnavailable)
        return
    }
    if ready {
        w.WriteHeader(http.StatusOK)
    } else {
        w.WriteHeader(http.StatusServiceUnavailable)
    }
}

This separation ensures that the liveness loop does not kill a process that is merely waiting for a connection pool to fill.

Step 3 — Liveness Probe Implementation

The liveness probe detects if the process is stuck in a bad state, such as a deadlock or excessive garbage collection pauses. If this check fails, the orchestrator must restart the container.

We keep this handler extremely lightweight to avoid increasing latency during the check window.

func livenessHandler(w http.ResponseWriter, r *http.Request) {
    // If stuck, this never returns.
    // Orchestration triggers restart here.
    http.Error(w, "Service is alive", http.StatusOK)
}

Using a heavy check here would defeat the purpose, so we strictly monitor process survival here.

Step 4 — Dependency Validation Logic

Readiness probes often fail when external services become unreachable. We implement a retry loop in the startup logic to prevent rapid restart loops.

func connectToDB() error {
    // Attempt connection with exponential backoff
    for i := 0; i < 5; i++ {
        conn, err := db.New()
        if err == nil {
            return nil
        }
        time.Sleep(time.Duration(i) * 2 * time.Second)
    }
    return errors.New("db unreachable")
}

This prevents a pod from entering a crash loop if the database is temporarily overloaded or undergoing migration.

Step 5 — Configuration and Timing Tuning

Finally, we expose these checks via configuration. We define the intervals and thresholds in the deployment manifest rather than hardcoding them.

spec:
  containers:
    - name: api
      livenessProbe:
        httpGet:
          path: /health/live
        initialDelaySeconds: 15
        periodSeconds: 10
      readinessProbe:
        httpGet:
          path: /health/ready
        initialDelaySeconds: 30
        periodSeconds: 5
        failureThreshold: 3

This approach allows DevOps to tune stability parameters without recompiling the application binary.

Key Takeaways

Startup Probes should delay traffic routing until the process has finished heavy initialization tasks.
Readiness Probes must verify external connectivity before accepting live user requests.
Liveness Probes are strictly for detecting process deadlocks and triggering restarts.
Graceful Shutdowns must drain connections before terminating to satisfy the liveness contract.
Failure Thresholds define how many consecutive failures trigger an action without false positives.

What's Next

You can now integrate these patterns into your CI/CD pipelines. Consider adding custom metrics to track probe latency over time.

DEV Community