DEV Community

1xApi
1xApi

Posted on • Originally published at 1xapi.com

How to Implement Graceful Shutdown in Node.js APIs for Zero-Downtime Deployments (2026 Guide)

Zero-downtime deployments are non-negotiable for production APIs. Yet one of the most common causes of dropped requests and 502 errors during deployments is something deceptively simple: your Node.js process doesn't know how to die gracefully.

When Kubernetes sends a SIGTERM to your pod, or Docker stops a container, your API has a window to finish in-flight requests, close database connections, flush queues, and exit cleanly. Without a proper shutdown handler, requests get silently dropped, transactions left open, and Redis connections leak — all while your users experience mysterious errors during what should be a seamless deploy.

This guide walks through building a production-grade graceful shutdown system for Node.js APIs in 2026, covering Express, Fastify, Hono, and Kubernetes-specific patterns.


Why Graceful Shutdown Matters in 2026

Modern deployment pipelines run rolling updates continuously. A typical Kubernetes rolling update sends SIGTERM to the old pod while simultaneously routing new traffic to the new pod. The gap between these two events — the termination grace period — is your window to clean up.

Scenario Without Graceful Shutdown With Graceful Shutdown
Rolling deploy Requests dropped, 502 errors Zero dropped requests
Scale-down event Connections terminated Connections drained
Pod eviction Open transactions, data risk Clean commit or rollback
Health check transition Traffic sent to dying pod Removed from load balancer first

In Kubernetes, the default terminationGracePeriodSeconds is 30 seconds. That's your budget. Use it wisely.


Understanding the Shutdown Signal Chain

Before writing code, understand what actually happens when your pod terminates:

  1. Kubernetes sends SIGTERM to PID 1 in your container
  2. Kubernetes simultaneously removes your pod from the Endpoints list (this can take 2–10 seconds to propagate through kube-proxy/iptables)
  3. After terminationGracePeriodSeconds, Kubernetes sends SIGKILL (no escape)

The critical gap is step 2. Traffic may still arrive at your pod for several seconds after it receives SIGTERM. This is why a naive process.exit(0) on SIGTERM still drops requests.

The solution: add a pre-stop sleep and stop accepting new connections gradually.


Step 1: Basic Shutdown Handler

Let's start with the fundamentals — a minimal but correct shutdown handler for any Node.js HTTP server:

// server.js
import express from 'express';

const app = express();
let isShuttingDown = false;

// Middleware: reject new requests during shutdown
app.use((req, res, next) => {
  if (isShuttingDown) {
    res.set('Connection', 'close');
    return res.status(503).json({ error: 'Server is shutting down' });
  }
  next();
});

app.get('/health', (req, res) => {
  if (isShuttingDown) return res.status(503).json({ status: 'shutting_down' });
  res.json({ status: 'ok' });
});

app.get('/api/data', async (req, res) => {
  // Simulate async work
  await new Promise(resolve => setTimeout(resolve, 200));
  res.json({ data: 'response' });
});

const server = app.listen(3000, () => {
  console.log('Server listening on port 3000');
});

// Track active connections for forced drain
const connections = new Set();
server.on('connection', socket => {
  connections.add(socket);
  socket.on('close', () => connections.delete(socket));
});

async function shutdown(signal) {
  console.log(`[Shutdown] Received ${signal}`);
  isShuttingDown = true;

  // Stop accepting new connections
  server.close(err => {
    if (err) {
      console.error('[Shutdown] Error:', err);
      process.exit(1);
    }
    console.log('[Shutdown] All connections closed. Exiting.');
    process.exit(0);
  });

  // Force-close idle connections after 30s
  setTimeout(() => {
    console.warn('[Shutdown] Timeout hit, destroying remaining connections');
    for (const socket of connections) socket.destroy();
    process.exit(1);
  }, 30_000);
}

process.on('SIGTERM', () => shutdown('SIGTERM'));
process.on('SIGINT',  () => shutdown('SIGINT'));
Enter fullscreen mode Exit fullscreen mode

This handles the basics: the isShuttingDown flag prevents new work from entering, and server.close() waits for in-flight requests before exiting. The 30-second hard exit is your safety net.


Step 2: The Pre-Stop Sleep (Kubernetes Critical Pattern)

Because Kubernetes propagates endpoint removal asynchronously, you need to delay the actual shutdown start by a few seconds after receiving SIGTERM. This prevents 502s from traffic that's still being routed to the terminating pod.

There are two ways to implement this:

Option A: Kubernetes preStop hook (recommended)

# deployment.yaml
spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: api
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 10"]
Enter fullscreen mode Exit fullscreen mode

The preStop hook runs before SIGTERM is sent. Setting it to sleep 10 gives kube-proxy 10 seconds to drain existing connections before your app even starts shutting down. Your terminationGracePeriodSeconds must be greater than preStop duration + your app's shutdown time.

Option B: Sleep inside the shutdown handler

async function shutdown(signal) {
  console.log(`[Shutdown] Received ${signal}`);
  isShuttingDown = true;

  // Give load balancer time to route traffic away (5–15 seconds)
  const DRAIN_DELAY = parseInt(process.env.SHUTDOWN_DRAIN_DELAY ?? '10', 10) * 1000;
  console.log(`[Shutdown] Waiting ${DRAIN_DELAY}ms for traffic drain...`);
  await new Promise(resolve => setTimeout(resolve, DRAIN_DELAY));

  // Now stop accepting connections
  server.close(() => {
    console.log('[Shutdown] Server closed. Exiting.');
    process.exit(0);
  });

  setTimeout(() => process.exit(1), 30_000);
}
Enter fullscreen mode Exit fullscreen mode

Option A is preferred in Kubernetes because it keeps your app logic clean and the delay configurable per-environment without code changes.


Step 3: Production-Grade Shutdown Manager

Real APIs have more than just HTTP connections to clean up: PostgreSQL pools, Redis connections, open file handles, message queue consumers. Here's a ShutdownManager class that handles all of them:

// shutdown.js
export class ShutdownManager {
  #isShuttingDown = false;
  #cleanupHandlers = [];
  #timeout;

  constructor({ timeoutMs = 30_000 } = {}) {
    this.#timeout = timeoutMs;
  }

  /** Register a named cleanup handler */
  register(name, fn) {
    this.#cleanupHandlers.push({ name, fn });
    return this; // chainable
  }

  /** Call from your server setup */
  listen(server) {
    const connections = new Set();
    server.on('connection', s => {
      connections.add(s);
      s.on('close', () => connections.delete(s));
    });

    const handler = async signal => {
      if (this.#isShuttingDown) return;
      this.#isShuttingDown = true;
      console.log(`\n[Shutdown] Signal: ${signal}`);

      // Hard timeout as last resort
      const forceExit = setTimeout(() => {
        console.error('[Shutdown] Timeout exceeded — forcing exit');
        process.exit(1);
      }, this.#timeout);
      forceExit.unref(); // Don't prevent clean exit

      // 1. Stop HTTP server
      await new Promise(resolve => server.close(resolve));
      console.log('[Shutdown] HTTP server closed');

      // 2. Run cleanup handlers in order
      for (const { name, fn } of this.#cleanupHandlers) {
        try {
          await fn();
          console.log(`[Shutdown] ✓ ${name}`);
        } catch (err) {
          console.error(`[Shutdown] ✗ ${name}:`, err.message);
        }
      }

      // 3. Force-close any remaining sockets
      for (const s of connections) s.destroy();

      clearTimeout(forceExit);
      console.log('[Shutdown] Complete. Goodbye.');
      process.exit(0);
    };

    process.on('SIGTERM', handler);
    process.on('SIGINT',  handler);
  }

  get isShuttingDown() { return this.#isShuttingDown; }
}
Enter fullscreen mode Exit fullscreen mode

Usage with real resources:

// app.js
import { Pool } from 'pg';
import { Redis } from 'ioredis';
import { ShutdownManager } from './shutdown.js';

const pool  = new Pool({ connectionString: process.env.DATABASE_URL });
const redis = new Redis(process.env.REDIS_URL);

const shutdown = new ShutdownManager({ timeoutMs: 30_000 });

shutdown
  .register('PostgreSQL pool', () => pool.end())
  .register('Redis client',    () => redis.quit())
  .register('Flush metrics',   () => metrics.flush()); // optional

const server = app.listen(3000);
shutdown.listen(server);
Enter fullscreen mode Exit fullscreen mode

Each cleanup handler runs sequentially, with individual error isolation — a failed Redis close won't prevent the PostgreSQL pool from shutting down.


Step 4: Health Check Integration

Your Kubernetes readiness probe needs to know when to stop sending traffic before SIGTERM arrives. Update your health endpoint:

app.get('/health/ready', (req, res) => {
  if (shutdown.isShuttingDown) {
    // Return 503 → Kubernetes removes pod from endpoints immediately
    return res.status(503).json({
      status: 'shutting_down',
      message: 'Pod is draining'
    });
  }
  res.json({ status: 'ready' });
});

app.get('/health/live', (req, res) => {
  // Liveness probe stays 200 until we're fully done
  // (returning 500 triggers a pod restart, not a clean shutdown)
  res.json({ status: 'alive' });
});
Enter fullscreen mode Exit fullscreen mode

Corresponding Kubernetes probes:

readinessProbe:
  httpGet:
    path: /health/ready
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 1   # Remove from rotation immediately on first failure

livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3
Enter fullscreen mode Exit fullscreen mode

With failureThreshold: 1 on the readiness probe, as soon as your pod returns 503 on /health/ready, Kubernetes removes it from the load balancer — eliminating the race condition entirely.


Step 5: Fastify and Hono Patterns

The same principles apply to modern frameworks, with minor differences.

Fastify (with built-in fastify.close())

import Fastify from 'fastify';

const fastify = Fastify({ logger: true });
let isShuttingDown = false;

fastify.addHook('onRequest', async (request, reply) => {
  if (isShuttingDown) {
    reply.header('Connection', 'close');
    reply.status(503).send({ error: 'shutting down' });
  }
});

const shutdown = async signal => {
  fastify.log.info({ signal }, 'Shutdown initiated');
  isShuttingDown = true;

  await fastify.close(); // waits for in-flight requests + runs onClose hooks
  process.exit(0);
};

process.on('SIGTERM', shutdown);
process.on('SIGINT',  shutdown);
Enter fullscreen mode Exit fullscreen mode

Fastify's fastify.close() handles the connection draining internally and fires any registered fastify.addHook('onClose', ...) handlers — making cleanup registration more ergonomic.

Hono on Node.js

import { serve } from '@hono/node-server';
import { Hono } from 'hono';

const app = new Hono();
let isShuttingDown = false;

app.use('*', async (c, next) => {
  if (isShuttingDown) {
    return c.json({ error: 'shutting down' }, 503);
  }
  return next();
});

const server = serve({ fetch: app.fetch, port: 3000 });

process.on('SIGTERM', async () => {
  isShuttingDown = true;
  server.close(() => process.exit(0));
  setTimeout(() => process.exit(1), 30_000);
});
Enter fullscreen mode Exit fullscreen mode

Step 6: Handling BullMQ Workers

If your API runs BullMQ background workers, graceful shutdown is critical — abruptly stopping a worker leaves jobs in the active state and they won't be retried until the lock expires (default: 30 seconds).

import { Worker } from 'bullmq';

const worker = new Worker('emails', processJob, {
  connection: redis,
  lockDuration: 30_000,
});

shutdown.register('BullMQ worker', async () => {
  await worker.close();           // waits for current job to finish
  console.log('Worker drained');
});
Enter fullscreen mode Exit fullscreen mode

worker.close() calls worker.pause() internally, waits for the active job to complete, then closes the connection. Pair it with a lockRenewer for jobs that take longer than lockDuration.


Complete Deployment Checklist

Here's the full picture for zero-downtime Kubernetes deployments in 2026:

# deployment.yaml (production template)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0       # Never reduce capacity during rollout
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: api
        image: my-api:latest
        ports:
        - containerPort: 3000
        env:
        - name: SHUTDOWN_DRAIN_DELAY
          value: "10"
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3000
          periodSeconds: 5
          failureThreshold: 1
        livenessProbe:
          httpGet:
            path: /health/live
            port: 3000
          periodSeconds: 10
          failureThreshold: 3
Enter fullscreen mode Exit fullscreen mode

Checklist:

  • [ ] process.on('SIGTERM') handler registered
  • [ ] process.on('SIGINT') handler registered
  • [ ] isShuttingDown flag rejects new requests with 503
  • [ ] /health/ready returns 503 when shutting down
  • [ ] Database pool properly closed on shutdown
  • [ ] Redis/cache client properly closed on shutdown
  • [ ] BullMQ workers drained before exit
  • [ ] Hard timeout (process.exit(1)) as last resort
  • [ ] preStop hook adds drain delay in Kubernetes
  • [ ] terminationGracePeriodSeconds > preStop + shutdown time

Testing Your Shutdown Handler

Don't assume it works — test it:

# Start your server
node server.js &
SERVER_PID=$!

# Send some requests in a loop
for i in $(seq 1 20); do
  curl -s http://localhost:3000/api/data &
done

# Send SIGTERM mid-flight
sleep 0.1 && kill -TERM $SERVER_PID

# Wait and check — no 502s, no lost responses
wait
echo "All requests completed cleanly"
Enter fullscreen mode Exit fullscreen mode

For load testing under shutdown, use k6:

// k6-shutdown-test.js
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  vus: 50,
  duration: '30s',
};

export default function () {
  const res = http.get('http://localhost:3000/api/data');
  check(res, {
    'status is 200 or 503': r => r.status === 200 || r.status === 503,
    'never 502': r => r.status !== 502,
  });
}
Enter fullscreen mode Exit fullscreen mode

A 503 during shutdown is acceptable (it's intentional). A 502 means a connection was dropped — that's a bug.


Summary

Graceful shutdown is one of those things that seems optional until production punishes you for ignoring it. In a world of continuous deployment, Kubernetes rolling updates, and auto-scaling, every restart is a potential incident without it.

The pattern in 2026 is clear:

  1. Handle SIGTERM and SIGINT
  2. Set an isShuttingDown flag immediately and return 503 on readiness
  3. Stop accepting new connections with server.close()
  4. Run cleanup handlers (DB, Redis, queues) sequentially
  5. Force exit after 30 seconds as a last resort
  6. Add a preStop sleep in Kubernetes to absorb the endpoint propagation delay

With these patterns in place, your API endpoints can deploy dozens of times a day without a single dropped request.


Originally published at 1xAPI.com

Top comments (0)