DEV Community

myougaTheAxo
myougaTheAxo

Posted on

Health Checks with Claude Code: Kubernetes readiness/liveness Probes and Dependency Validation

/health returning 200 while the database is down is worse than returning 503. The load balancer thinks your service is healthy and keeps sending traffic — into a broken process that can't handle it.

Claude Code, guided by a well-written CLAUDE.md, generates correct health check endpoints from the start: lightweight for LB polling, thorough for Kubernetes readiness, and minimal for liveness.


CLAUDE.md: Health Check Rules

## Health Check Rules

- GET /health — lightweight (load balancer use, no DB check)
- GET /ready — all dependency checks (K8s readinessProbe)
- GET /live — process alive check only (K8s livenessProbe)

Readiness checks must include:
  - Database: SELECT 1 query
  - Redis: PING command
  - External APIs: lightweight GET (if used)

- If any check fails → respond 503 (not 200)
- 3 second timeout per individual check
- Response shape: { status: "ok" | "degraded", checks: {} }
Enter fullscreen mode Exit fullscreen mode

Three endpoints, clear rules for each. Claude Code generates the correct structure for every service that defines this in CLAUDE.md.


Dependency Check Functions

import { db } from "./db";
import { redis } from "./redis";

interface CheckResult {
  status: "ok" | "error";
  responseTimeMs: number;
  error?: string;
}

async function checkDatabase(): Promise<CheckResult> {
  const start = Date.now();
  try {
    await Promise.race([
      db.query("SELECT 1"),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error("timeout")), 3000)
      ),
    ]);
    return { status: "ok", responseTimeMs: Date.now() - start };
  } catch (err) {
    return {
      status: "error",
      responseTimeMs: Date.now() - start,
      error: err instanceof Error ? err.message : "unknown",
    };
  }
}

async function checkRedis(): Promise<CheckResult> {
  const start = Date.now();
  try {
    await Promise.race([
      redis.ping(),
      new Promise((_, reject) =>
        setTimeout(() => reject(new Error("timeout")), 3000)
      ),
    ]);
    return { status: "ok", responseTimeMs: Date.now() - start };
  } catch (err) {
    return {
      status: "error",
      responseTimeMs: Date.now() - start,
      error: err instanceof Error ? err.message : "unknown",
    };
  }
}
Enter fullscreen mode Exit fullscreen mode

Promise.race with a 3-second timeout prevents a slow DB from hanging the health check indefinitely. responseTimeMs gives you latency data for alerting.


The Three Endpoints

import express from "express";
const router = express.Router();

// Lightweight — load balancer polling (no DB)
router.get("/health", (_req, res) => {
  res.json({
    status: "ok",
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

// Full dependency check — K8s readinessProbe
router.get("/ready", async (_req, res) => {
  const [dbResult, redisResult] = await Promise.allSettled([
    checkDatabase(),
    checkRedis(),
  ]);

  const db = dbResult.status === "fulfilled" ? dbResult.value : { status: "error", error: "check threw" };
  const cache = redisResult.status === "fulfilled" ? redisResult.value : { status: "error", error: "check threw" };

  const allOk = db.status === "ok" && cache.status === "ok";

  res.status(allOk ? 200 : 503).json({
    status: allOk ? "ok" : "degraded",
    checks: { db, cache },
  });
});

// Process alive — K8s livenessProbe
router.get("/live", (_req, res) => {
  res.json({ status: "ok" });
});
Enter fullscreen mode Exit fullscreen mode

Promise.allSettled (not Promise.all) is critical here — it runs both checks even if one throws. With Promise.all, a Redis timeout would short-circuit the DB check and you'd lose diagnostic data.


Kubernetes Deployment Configuration

# deployment.yaml
spec:
  containers:
    - name: api
      image: your-api:latest
      ports:
        - containerPort: 3000

      livenessProbe:
        httpGet:
          path: /live
          port: 3000
        initialDelaySeconds: 10
        periodSeconds: 10
        failureThreshold: 3

      readinessProbe:
        httpGet:
          path: /ready
          port: 3000
        initialDelaySeconds: 15
        periodSeconds: 5
        failureThreshold: 3
        successThreshold: 1
Enter fullscreen mode Exit fullscreen mode

Why different settings?

  • livenessProbe uses /live — if this fails 3 times, the container is restarted. Checking DB here would cause unnecessary restarts during DB maintenance windows.
  • readinessProbe uses /ready — if this fails, the pod is removed from the load balancer's target group but NOT restarted. Traffic stops; the pod waits for dependencies to recover.

This distinction matters in production. A DB restart should make pods temporarily unreachable, not trigger a cascade of container restarts.


Summary

Endpoint Purpose Checks On Failure
/health Load balancer polling None (process only) N/A
/ready K8s readinessProbe DB + Redis (+ external) 503, removed from LB
/live K8s livenessProbe Process alive Container restart

CLAUDE.md + 3 endpoints + Promise.allSettled + K8s probes — define the rules once, and Claude Code generates the correct health check structure for every service that needs it.

The common mistake is a single /health endpoint that does everything. The correct pattern is three endpoints with different responsibilities, timeout budgets, and failure behaviors.


Want Claude Code to audit whether your health check endpoints follow these patterns — including timeout coverage and probe configuration?

Code Review Pack (¥980) — A Claude Code custom skill that reviews health check design, Kubernetes probe configuration, and dependency validation logic in your codebase.

Available on prompt-works.jp

Top comments (0)