Health Checks in Production APIs: Beyond Returning 200 OK
The /health endpoint that just returns { "status": "ok" } is useless. It tells you the process is running. It does not tell you if the app can actually serve requests.
Liveness vs Readiness
- Liveness: Is the process alive? If not, restart it.
- Readiness: Can it handle traffic? If not, stop sending requests.
app.get("/health/live", (_, res) => res.status(200).json({ status: "alive" }));
app.get("/health/ready", async (_, res) => {
const checks = await Promise.allSettled([
checkDatabase(),
checkRedis(),
checkExternalAPI(),
]);
const results = {
database: checks[0].status === "fulfilled",
redis: checks[1].status === "fulfilled",
externalAPI: checks[2].status === "fulfilled",
};
const healthy = Object.values(results).every(Boolean);
res.status(healthy ? 200 : 503).json({ status: healthy ? "ready" : "not ready", checks: results });
});
Dependency Check Pattern
async function checkDatabase(): Promise<void> {
const start = Date.now();
await pool.query("SELECT 1");
const latency = Date.now() - start;
if (latency > 5000) throw new Error(`DB latency too high: ${latency}ms`);
}
async function checkRedis(): Promise<void> {
await redis.ping();
}
Kubernetes Integration
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
Do Not
- Do not check every downstream service in liveness. If Redis is down, restarting your app will not fix Redis.
- Do not let health checks take more than 2s total. Use timeouts.
- Do not require authentication on health endpoints. Load balancers need to reach them.
Part of my Production Backend Patterns series. Follow for more practical backend engineering.
Top comments (0)