Ramer Labs

Posted on Sep 24

The Ultimate Checklist for Zero‑Downtime Deploys with Docker & Nginx

#cloud #devops #docker

Introduction

As a DevOps lead, you’ve probably felt the panic of a deployment that takes your service offline for even a few seconds. In a world where users expect instant availability, a brief outage can translate into lost revenue, damaged brand trust, and a flood of support tickets. This checklist walks you through a practical, end‑to‑end process for achieving zero‑downtime deployments using Docker containers behind an Nginx reverse‑proxy. The steps are deliberately granular so you can copy‑paste them into your own runbooks.

1. Prep Your Build Environment

Pin base images – Use immutable tags like python:3.11-slim instead of latest.
Enable BuildKit – Faster, reproducible builds. Add export DOCKER_BUILDKIT=1 to your CI shell.
Run static analysis – Tools like hadolint catch Dockerfile anti‑patterns early.

# Example: lint a Dockerfile in CI
hadolint Dockerfile

2. Create a Reproducible Docker Image

Your Dockerfile should be minimal and layered for cache efficiency. Below is a solid starter for a Node.js API, but the same principles apply to any language.

# syntax=docker/dockerfile:1.4
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build

FROM node:20-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]

Key points:

Multi‑stage build keeps the final image tiny.
--omit=dev ensures dev dependencies never make it to production.
Expose only the port you need.

3. Define a Blue‑Green Architecture in Nginx

Nginx will act as a router that can switch traffic between two upstream groups: blue (current) and green (new). Store the config in a version‑controlled file (e.g., nginx.conf).

http {
    upstream blue {
        server 127.0.0.1:3001 max_fails=3 fail_timeout=30s;
    }
    upstream green {
        server 127.0.0.1:3002 max_fails=3 fail_timeout=30s;
    }

    map $upstream_status $target_upstream {
        default "blue";   # default traffic goes to blue
    }

    server {
        listen 80;
        location / {
            proxy_pass http://$target_upstream;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

When you’re ready to promote, you simply flip the $target_upstream variable (or reload a tiny snippet) without touching the client‑facing endpoint.

4. Spin Up Containers with Docker Compose

Use a docker-compose.yml that defines both blue and green services. Assign static ports that Nginx knows about.

version: "3.9"
services:
  api-blue:
    image: myorg/api:{{CURRENT_TAG}}
    container_name: api_blue
    ports:
      - "3001:3000"
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 2s
      retries: 3

  api-green:
    image: myorg/api:{{NEW_TAG}}
    container_name: api_green
    ports:
      - "3002:3000"
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 2s
      retries: 3

Checklist items for this step:

Replace {{CURRENT_TAG}} and {{NEW_TAG}} with CI variables.
Ensure health checks are strict; a failing container never receives traffic.
Keep restart: always to let Docker auto‑recover crashes.

5. Validate the Green Stack Before Switchover

Run integration tests against the green endpoint (http://localhost:3002).
Check logs – docker logs api_green should show a clean start.
Confirm health – docker exec api_green curl -f http://localhost:3000/health.

If any test fails, abort the promotion and investigate before proceeding.

6. Switch Traffic Atomically

The actual cut‑over can be performed with a single Nginx reload that swaps the upstream mapping. Store the mapping in a small snippet (upstream_switch.conf) that you edit on the fly.

# upstream_switch.conf – generated by CI
map $upstream_status $target_upstream {
    default "green";   # flip to green for zero‑downtime
}

Run the following commands:

# Update the snippet with the new target (green)
printf "map \$upstream_status \$target_upstream {\n    default \"green\";\n}\n" > /etc/nginx/conf.d/upstream_switch.conf

# Reload Nginx without dropping connections
nginx -s reload

Because Nginx reloads its configuration gracefully, existing connections finish on the blue containers while new requests flow to green.

7. Observe, Verify, and Roll Back if Needed

Metrics – Monitor latency and error rates via Prometheus or Datadog. A spike after the switch indicates a problem.
Logs – Centralize container logs with Loki or Elasticsearch; search for ERROR within the first 5 minutes.
Rollback – If anything looks off, revert the snippet back to default "blue"; and reload Nginx. The blue containers are still running, so traffic instantly returns to the stable version.

8. Clean Up Old Images

After a successful rollout and a safe observation window (usually 15‑30 minutes), prune the old containers and images to free resources.

# Stop and remove the blue containers
docker-compose stop api-blue && docker-compose rm -f api-blue

# Remove the old image tag
docker image rm myorg/api:{{CURRENT_TAG}}

Automate this step in your CI pipeline so you never accumulate stale layers.

9. Document the Process

Even the most reliable automation can fail when a human intervenes. Keep a markdown runbook that mirrors this checklist, version‑controlled alongside your infrastructure code. Include:

CI variable names (CURRENT_TAG, NEW_TAG).
Where the Nginx snippet lives.
Contact points for on‑call engineers.

10. Final Thoughts

Zero‑downtime deployments aren’t a magic button; they’re a disciplined series of checks, health‑guards, and graceful hand‑offs. By treating the blue and green stacks as first‑class citizens and letting Nginx handle the traffic switch, you can push updates without ever dropping a client request.

If you need help shipping this, the team at https://ramerlabs.com can help.

DEV Community