Ramer Labs

Posted on Sep 26

The Ultimate Checklist for Zero‑Downtime Deploys with Docker & Nginx

#devops #docker #performance #cloud

Introduction

When a service goes down for a deploy, users feel the impact instantly—slow pages, broken flows, or outright errors. For modern SaaS products, even a few seconds of unavailability can erode trust and hurt revenue. This checklist walks a DevOps lead through a practical, Docker‑centric workflow that guarantees zero‑downtime deployments behind an Nginx reverse proxy. All steps are actionable, version‑controlled, and observable.

Why Zero‑Downtime Matters

User experience: Seamless upgrades keep session state alive.
Business continuity: SLAs often demand 99.9%+ uptime.
Rollback safety: If something goes wrong, you can instantly revert without affecting live traffic.

Achieving this requires a combination of container orchestration, smart proxying, and robust health‑checking. The checklist below assumes you already run Docker Engine (or Docker Desktop) on a Linux host and use Nginx as a front‑end load balancer.

Prerequisites

Docker Engine

Docker ≥ 20.10 installed.
docker compose (v2) available on the PATH.
A private registry (Docker Hub, GitHub Packages, or self‑hosted) for versioned images.

Nginx Reverse Proxy

Nginx ≥ 1.21 compiled with the stream and http modules.
TLS certificates in place (Let’s Encrypt or internal PKI).
Access to edit /etc/nginx/conf.d/ and reload the service without dropping connections.

The Checklist

✅	Item	Why it matters
1	Versioned Docker images	Guarantees repeatable rollouts; you can pin a specific SHA.
2	Immutable infrastructure	Containers never mutate at runtime; configuration lives in code.
3	Blue‑Green service definitions	Two parallel stacks (blue & green) let you switch traffic atomically.
4	Health‑check endpoints	Nginx only routes to containers that report healthy.
5	Graceful shutdown hooks	Allows in‑flight requests to finish before containers stop.
6	Zero‑downtime Nginx reload	`nginx -s reload` swaps configs without closing sockets.
7	Observability stack	Metrics & logs confirm the new version is healthy before cut‑over.
8	Rollback plan	A one‑command revert if health checks fail after traffic switch.

Below each item is a concrete action you can copy‑paste into your repo.

1. Build and Tag a Release‑Ready Image

# Build the app image and tag it with the git SHA
GIT_SHA=$(git rev-parse --short HEAD)
docker build -t myorg/api:$GIT_SHA .
# Push to registry for later pull by the green stack
docker push myorg/api:$GIT_SHA

Tip: Automate this in your CI pipeline and store the SHA as an environment variable for the next steps.

2. Define Blue & Green Stacks in Docker Compose

version: "3.9"
services:
  api-blue:
    image: myorg/api:${BLUE_SHA}
    restart: always
    ports: [ "8001:80" ]
    healthcheck:
      test: [ "CMD", "curl", "-f", "http://localhost/health" ]
      interval: 10s
      timeout: 2s
      retries: 3
  api-green:
    image: myorg/api:${GREEN_SHA}
    restart: always
    ports: [ "8002:80" ]
    healthcheck:
      test: [ "CMD", "curl", "-f", "http://localhost/health" ]
      interval: 10s
      timeout: 2s
      retries: 3

Only one stack is active at a time. The other stays idle (or runs a previous version) ready for a quick switch.

3. Wire Nginx to Both Stacks

Create /etc/nginx/conf.d/api_upstream.conf:

upstream api_backend {
    # Blue stack – default traffic
    server 127.0.0.1:8001 max_fails=3 fail_timeout=30s;
    # Green stack – initially marked as backup
    server 127.0.0.1:8002 backup;
}

server {
    listen 443 ssl http2;
    server_name api.example.com;
    ssl_certificate /etc/ssl/certs/example.crt;
    ssl_certificate_key /etc/ssl/private/example.key;

    location / {
        proxy_pass http://api_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

The backup flag tells Nginx to use the green server only when the blue server is marked unhealthy.
After a successful green deployment you will edit the file to make green primary and reload Nginx.

4. Deploy the Green Stack

# Export the new SHA (set by CI)
export GREEN_SHA=abcdef1
# Bring up the green stack without touching the blue one
docker compose -f docker-compose.yml up -d api-green

Verify health:

docker exec $(docker ps -q -f name=api-green) curl -s http://localhost/health

If the endpoint returns 200 OK, the green containers are ready.

5. Switch Traffic Atomically

Edit the upstream config – remove backup from the green server and add it to blue.
Reload Nginx without dropping connections:

sudo nginx -s reload

Nginx now routes new connections to the green stack while existing sessions on blue finish gracefully.

6. Graceful Shutdown of the Blue Stack

# Tell Docker to stop accepting new connections but keep existing ones alive
docker compose -f docker-compose.yml stop -t 30 api-blue
# After 30 seconds, containers are removed
docker compose -f docker-compose.yml rm -f api-blue

The -t 30 flag gives in‑flight requests a 30‑second window to complete.

7. Verify Observability

Metrics: Hook Prometheus to scrape /metrics from both stacks. Compare latency and error rates before and after the switch.
Logs: Ship container logs to Loki or Elastic via the Docker logging driver. Search for ERROR or panic after the cut‑over.
Alerting: Configure an alert that fires if the green stack health check fails for more than two consecutive intervals.

8. Rollback (If Needed)

If the green stack shows anomalies, revert with a single command:

# Re‑make blue primary in Nginx config and reload
sudo sed -i 's/server 127.0.0.1:8001.*;/server 127.0.0.1:8001;/' /etc/nginx/conf.d/api_upstream.conf
sudo sed -i 's/server 127.0.0.1:8002 backup;/server 127.0.0.1:8002 backup;/' /etc/nginx/conf.d/api_upstream.conf
sudo nginx -s reload

The traffic instantly flows back to the known‑good version, and you can investigate the green release offline.

Wrap‑Up

Zero‑downtime deployments are not a magic button; they are a disciplined set of practices that you can codify and version. By keeping Docker images immutable, using a blue‑green pattern, and letting Nginx handle graceful traffic switches, you eliminate the most common sources of downtime. Pair this workflow with health checks, observability, and a clear rollback path, and you’ll meet even the strictest SLA requirements.

If you need help shipping this, the team at https://ramerlabs.com can help.

DEV Community