Zero-downtime deployments are non-negotiable for production APIs. Yet one of the most common causes of dropped requests and 502 errors during deployments is something deceptively simple: your Node.js process doesn't know how to die gracefully.
When Kubernetes sends a SIGTERM to your pod, or Docker stops a container, your API has a window to finish in-flight requests, close database connections, flush queues, and exit cleanly. Without a proper shutdown handler, requests get silently dropped, transactions left open, and Redis connections leak — all while your users experience mysterious errors during what should be a seamless deploy.
This guide walks through building a production-grade graceful shutdown system for Node.js APIs in 2026, covering Express, Fastify, Hono, and Kubernetes-specific patterns.
Why Graceful Shutdown Matters in 2026
Modern deployment pipelines run rolling updates continuously. A typical Kubernetes rolling update sends SIGTERM to the old pod while simultaneously routing new traffic to the new pod. The gap between these two events — the termination grace period — is your window to clean up.
| Scenario | Without Graceful Shutdown | With Graceful Shutdown |
|---|---|---|
| Rolling deploy | Requests dropped, 502 errors | Zero dropped requests |
| Scale-down event | Connections terminated | Connections drained |
| Pod eviction | Open transactions, data risk | Clean commit or rollback |
| Health check transition | Traffic sent to dying pod | Removed from load balancer first |
In Kubernetes, the default terminationGracePeriodSeconds is 30 seconds. That's your budget. Use it wisely.
Understanding the Shutdown Signal Chain
Before writing code, understand what actually happens when your pod terminates:
- Kubernetes sends
SIGTERMto PID 1 in your container - Kubernetes simultaneously removes your pod from the
Endpointslist (this can take 2–10 seconds to propagate through kube-proxy/iptables) - After
terminationGracePeriodSeconds, Kubernetes sendsSIGKILL(no escape)
The critical gap is step 2. Traffic may still arrive at your pod for several seconds after it receives SIGTERM. This is why a naive process.exit(0) on SIGTERM still drops requests.
The solution: add a pre-stop sleep and stop accepting new connections gradually.
Step 1: Basic Shutdown Handler
Let's start with the fundamentals — a minimal but correct shutdown handler for any Node.js HTTP server:
// server.js
import express from 'express';
const app = express();
let isShuttingDown = false;
// Middleware: reject new requests during shutdown
app.use((req, res, next) => {
if (isShuttingDown) {
res.set('Connection', 'close');
return res.status(503).json({ error: 'Server is shutting down' });
}
next();
});
app.get('/health', (req, res) => {
if (isShuttingDown) return res.status(503).json({ status: 'shutting_down' });
res.json({ status: 'ok' });
});
app.get('/api/data', async (req, res) => {
// Simulate async work
await new Promise(resolve => setTimeout(resolve, 200));
res.json({ data: 'response' });
});
const server = app.listen(3000, () => {
console.log('Server listening on port 3000');
});
// Track active connections for forced drain
const connections = new Set();
server.on('connection', socket => {
connections.add(socket);
socket.on('close', () => connections.delete(socket));
});
async function shutdown(signal) {
console.log(`[Shutdown] Received ${signal}`);
isShuttingDown = true;
// Stop accepting new connections
server.close(err => {
if (err) {
console.error('[Shutdown] Error:', err);
process.exit(1);
}
console.log('[Shutdown] All connections closed. Exiting.');
process.exit(0);
});
// Force-close idle connections after 30s
setTimeout(() => {
console.warn('[Shutdown] Timeout hit, destroying remaining connections');
for (const socket of connections) socket.destroy();
process.exit(1);
}, 30_000);
}
process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT', () => shutdown('SIGINT'));
This handles the basics: the isShuttingDown flag prevents new work from entering, and server.close() waits for in-flight requests before exiting. The 30-second hard exit is your safety net.
Step 2: The Pre-Stop Sleep (Kubernetes Critical Pattern)
Because Kubernetes propagates endpoint removal asynchronously, you need to delay the actual shutdown start by a few seconds after receiving SIGTERM. This prevents 502s from traffic that's still being routed to the terminating pod.
There are two ways to implement this:
Option A: Kubernetes preStop hook (recommended)
# deployment.yaml
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
The preStop hook runs before SIGTERM is sent. Setting it to sleep 10 gives kube-proxy 10 seconds to drain existing connections before your app even starts shutting down. Your terminationGracePeriodSeconds must be greater than preStop duration + your app's shutdown time.
Option B: Sleep inside the shutdown handler
async function shutdown(signal) {
console.log(`[Shutdown] Received ${signal}`);
isShuttingDown = true;
// Give load balancer time to route traffic away (5–15 seconds)
const DRAIN_DELAY = parseInt(process.env.SHUTDOWN_DRAIN_DELAY ?? '10', 10) * 1000;
console.log(`[Shutdown] Waiting ${DRAIN_DELAY}ms for traffic drain...`);
await new Promise(resolve => setTimeout(resolve, DRAIN_DELAY));
// Now stop accepting connections
server.close(() => {
console.log('[Shutdown] Server closed. Exiting.');
process.exit(0);
});
setTimeout(() => process.exit(1), 30_000);
}
Option A is preferred in Kubernetes because it keeps your app logic clean and the delay configurable per-environment without code changes.
Step 3: Production-Grade Shutdown Manager
Real APIs have more than just HTTP connections to clean up: PostgreSQL pools, Redis connections, open file handles, message queue consumers. Here's a ShutdownManager class that handles all of them:
// shutdown.js
export class ShutdownManager {
#isShuttingDown = false;
#cleanupHandlers = [];
#timeout;
constructor({ timeoutMs = 30_000 } = {}) {
this.#timeout = timeoutMs;
}
/** Register a named cleanup handler */
register(name, fn) {
this.#cleanupHandlers.push({ name, fn });
return this; // chainable
}
/** Call from your server setup */
listen(server) {
const connections = new Set();
server.on('connection', s => {
connections.add(s);
s.on('close', () => connections.delete(s));
});
const handler = async signal => {
if (this.#isShuttingDown) return;
this.#isShuttingDown = true;
console.log(`\n[Shutdown] Signal: ${signal}`);
// Hard timeout as last resort
const forceExit = setTimeout(() => {
console.error('[Shutdown] Timeout exceeded — forcing exit');
process.exit(1);
}, this.#timeout);
forceExit.unref(); // Don't prevent clean exit
// 1. Stop HTTP server
await new Promise(resolve => server.close(resolve));
console.log('[Shutdown] HTTP server closed');
// 2. Run cleanup handlers in order
for (const { name, fn } of this.#cleanupHandlers) {
try {
await fn();
console.log(`[Shutdown] ✓ ${name}`);
} catch (err) {
console.error(`[Shutdown] ✗ ${name}:`, err.message);
}
}
// 3. Force-close any remaining sockets
for (const s of connections) s.destroy();
clearTimeout(forceExit);
console.log('[Shutdown] Complete. Goodbye.');
process.exit(0);
};
process.on('SIGTERM', handler);
process.on('SIGINT', handler);
}
get isShuttingDown() { return this.#isShuttingDown; }
}
Usage with real resources:
// app.js
import { Pool } from 'pg';
import { Redis } from 'ioredis';
import { ShutdownManager } from './shutdown.js';
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
const redis = new Redis(process.env.REDIS_URL);
const shutdown = new ShutdownManager({ timeoutMs: 30_000 });
shutdown
.register('PostgreSQL pool', () => pool.end())
.register('Redis client', () => redis.quit())
.register('Flush metrics', () => metrics.flush()); // optional
const server = app.listen(3000);
shutdown.listen(server);
Each cleanup handler runs sequentially, with individual error isolation — a failed Redis close won't prevent the PostgreSQL pool from shutting down.
Step 4: Health Check Integration
Your Kubernetes readiness probe needs to know when to stop sending traffic before SIGTERM arrives. Update your health endpoint:
app.get('/health/ready', (req, res) => {
if (shutdown.isShuttingDown) {
// Return 503 → Kubernetes removes pod from endpoints immediately
return res.status(503).json({
status: 'shutting_down',
message: 'Pod is draining'
});
}
res.json({ status: 'ready' });
});
app.get('/health/live', (req, res) => {
// Liveness probe stays 200 until we're fully done
// (returning 500 triggers a pod restart, not a clean shutdown)
res.json({ status: 'alive' });
});
Corresponding Kubernetes probes:
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 1 # Remove from rotation immediately on first failure
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
With failureThreshold: 1 on the readiness probe, as soon as your pod returns 503 on /health/ready, Kubernetes removes it from the load balancer — eliminating the race condition entirely.
Step 5: Fastify and Hono Patterns
The same principles apply to modern frameworks, with minor differences.
Fastify (with built-in fastify.close())
import Fastify from 'fastify';
const fastify = Fastify({ logger: true });
let isShuttingDown = false;
fastify.addHook('onRequest', async (request, reply) => {
if (isShuttingDown) {
reply.header('Connection', 'close');
reply.status(503).send({ error: 'shutting down' });
}
});
const shutdown = async signal => {
fastify.log.info({ signal }, 'Shutdown initiated');
isShuttingDown = true;
await fastify.close(); // waits for in-flight requests + runs onClose hooks
process.exit(0);
};
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);
Fastify's fastify.close() handles the connection draining internally and fires any registered fastify.addHook('onClose', ...) handlers — making cleanup registration more ergonomic.
Hono on Node.js
import { serve } from '@hono/node-server';
import { Hono } from 'hono';
const app = new Hono();
let isShuttingDown = false;
app.use('*', async (c, next) => {
if (isShuttingDown) {
return c.json({ error: 'shutting down' }, 503);
}
return next();
});
const server = serve({ fetch: app.fetch, port: 3000 });
process.on('SIGTERM', async () => {
isShuttingDown = true;
server.close(() => process.exit(0));
setTimeout(() => process.exit(1), 30_000);
});
Step 6: Handling BullMQ Workers
If your API runs BullMQ background workers, graceful shutdown is critical — abruptly stopping a worker leaves jobs in the active state and they won't be retried until the lock expires (default: 30 seconds).
import { Worker } from 'bullmq';
const worker = new Worker('emails', processJob, {
connection: redis,
lockDuration: 30_000,
});
shutdown.register('BullMQ worker', async () => {
await worker.close(); // waits for current job to finish
console.log('Worker drained');
});
worker.close() calls worker.pause() internally, waits for the active job to complete, then closes the connection. Pair it with a lockRenewer for jobs that take longer than lockDuration.
Complete Deployment Checklist
Here's the full picture for zero-downtime Kubernetes deployments in 2026:
# deployment.yaml (production template)
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Never reduce capacity during rollout
template:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
image: my-api:latest
ports:
- containerPort: 3000
env:
- name: SHUTDOWN_DRAIN_DELAY
value: "10"
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]
readinessProbe:
httpGet:
path: /health/ready
port: 3000
periodSeconds: 5
failureThreshold: 1
livenessProbe:
httpGet:
path: /health/live
port: 3000
periodSeconds: 10
failureThreshold: 3
Checklist:
- [ ]
process.on('SIGTERM')handler registered - [ ]
process.on('SIGINT')handler registered - [ ]
isShuttingDownflag rejects new requests with 503 - [ ]
/health/readyreturns 503 when shutting down - [ ] Database pool properly closed on shutdown
- [ ] Redis/cache client properly closed on shutdown
- [ ] BullMQ workers drained before exit
- [ ] Hard timeout (
process.exit(1)) as last resort - [ ]
preStophook adds drain delay in Kubernetes - [ ]
terminationGracePeriodSeconds> preStop + shutdown time
Testing Your Shutdown Handler
Don't assume it works — test it:
# Start your server
node server.js &
SERVER_PID=$!
# Send some requests in a loop
for i in $(seq 1 20); do
curl -s http://localhost:3000/api/data &
done
# Send SIGTERM mid-flight
sleep 0.1 && kill -TERM $SERVER_PID
# Wait and check — no 502s, no lost responses
wait
echo "All requests completed cleanly"
For load testing under shutdown, use k6:
// k6-shutdown-test.js
import http from 'k6/http';
import { check } from 'k6';
export const options = {
vus: 50,
duration: '30s',
};
export default function () {
const res = http.get('http://localhost:3000/api/data');
check(res, {
'status is 200 or 503': r => r.status === 200 || r.status === 503,
'never 502': r => r.status !== 502,
});
}
A 503 during shutdown is acceptable (it's intentional). A 502 means a connection was dropped — that's a bug.
Summary
Graceful shutdown is one of those things that seems optional until production punishes you for ignoring it. In a world of continuous deployment, Kubernetes rolling updates, and auto-scaling, every restart is a potential incident without it.
The pattern in 2026 is clear:
- Handle
SIGTERMandSIGINT - Set an
isShuttingDownflag immediately and return 503 on readiness - Stop accepting new connections with
server.close() - Run cleanup handlers (DB, Redis, queues) sequentially
- Force exit after 30 seconds as a last resort
- Add a
preStopsleep in Kubernetes to absorb the endpoint propagation delay
With these patterns in place, your API endpoints can deploy dozens of times a day without a single dropped request.
Originally published at 1xAPI.com
Top comments (0)