DEV Community

Ramer Labs
Ramer Labs

Posted on

The Ultimate Checklist for Zero‑Downtime Deploys with Docker & Nginx

Introduction

Zero‑downtime deployments are a non‑negotiable expectation for modern services. As a DevOps lead, you’ve probably seen the panic when a new release briefly knocks the site offline. In this practical guide we’ll walk through a Docker‑centric, Nginx‑backed blue‑green deployment pattern that keeps users blissfully unaware of any change. By the end you’ll have a ready‑to‑run CI/CD pipeline, a reversible Nginx config, and a short checklist you can embed in your sprint retrospectives.


Prerequisites

Before diving into the steps, make sure you have the following in place:

  • A Docker‑compatible host (Linux VM, AWS EC2, or a local dev machine).
  • Nginx installed as a reverse proxy on the same host or a separate bastion.
  • Access to a Git repository (GitHub, GitLab, Bitbucket) where the application code lives.
  • Basic familiarity with Dockerfiles, docker compose, and a CI platform (GitHub Actions, GitLab CI, or CircleCI).

Tip: Keep your Docker Engine version ≥ 20.10 and Nginx ≥ 1.21 for the best compatibility with the snippets below.


Blueprint Overview

The core idea is simple: two identical environmentsblue (current production) and green (next version). Nginx routes traffic to the active environment, and a CI job flips the upstream target once the green containers pass health checks.

Docker Image Versioning

Every commit that touches the Dockerfile should produce a semantic tag (v1.2.3). Use the --label flag to embed the Git SHA – this makes roll‑backs traceable.

# Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
EXPOSE 3000
LABEL org.opencontainers.image.version="${GIT_TAG}"\
      org.opencontainers.image.revision="${GIT_SHA}"\
      org.opencontainers.image.source="https://github.com/yourorg/yourapp"
CMD ["node", "dist/index.js"]
Enter fullscreen mode Exit fullscreen mode

The GIT_TAG and GIT_SHA variables are injected by the CI pipeline (see later). Tagging ensures you can pull yourapp:1.4.0 for green and keep yourapp:1.3.9 for blue.

Nginx Reverse Proxy Config

Below is a minimal yet production‑ready Nginx snippet that defines two upstream blocks – blue and green – and a variable $upstream that decides which one receives traffic.

# /etc/nginx/conf.d/app.conf
upstream blue {
    server 127.0.0.1:3001;   # Docker container listening on host port 3001
}

upstream green {
    server 127.0.0.1:3002;   # Docker container listening on host port 3002
}

# Default to blue; CI will toggle this file via envsubst
map $http_x_deploy_target $upstream {
    default blue;
    "green" green;
}

server {
    listen 80;
    server_name example.com;

    location / {
        proxy_pass http://$upstream;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
Enter fullscreen mode Exit fullscreen mode

When the CI job finishes a successful deployment, it updates the X-Deploy-Target header (or rewrites the map block) to point traffic to green. Rolling back is as simple as flipping the variable back to blue.


CI/CD Pipeline Steps

Below is a GitHub Actions workflow that automates the entire cycle. Adjust the jobs.deploy matrix if you prefer GitLab CI or CircleCI.

name: Deploy Blue‑Green
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}
    steps:
      - uses: actions/checkout@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Extract metadata (tags, labels)
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: yourorg/yourapp
          tags: |
            type=semver,pattern={{version}}
            type=sha,format=long
      - name: Build and push image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: SSH into host
        uses: appleboy/ssh-action@v0.1.7
        with:
          host: ${{ secrets.HOST }}
          username: ${{ secrets.USER }}
          key: ${{ secrets.SSH_KEY }}
          script: |
            # Pull new image as green
            docker pull yourorg/yourapp:${{ needs.build.outputs.image_tag }}
            docker run -d --name green -p 3002:3000 \
              yourorg/yourapp:${{ needs.build.outputs.image_tag }}
            # Health check loop (max 30s)
            for i in {1..30}; do
              if curl -sSf http://localhost:3002/health; then break; fi;
              sleep 1;
            done
            # Switch Nginx upstream to green
            sudo sed -i 's/default blue;/default green;/' /etc/nginx/conf.d/app.conf
            sudo nginx -s reload
            # Optional: keep blue running for rollback window (5 min)
            sleep 300
            # Stop blue container
            docker stop blue && docker rm blue
            # Rename green → blue for next cycle
            docker rename green blue
Enter fullscreen mode Exit fullscreen mode

Key points:

  • The build job tags the image with both a semantic version and the commit SHA.
  • The deploy job pulls the image, runs it on a dedicated host port (3002), performs a health check, then updates Nginx and reloads it.
  • A short grace period keeps the old container alive, giving you a manual rollback window.

Blue‑Green Switch with Nginx

Health Checks

Your application should expose a lightweight /health endpoint that returns 200 OK when all internal dependencies (DB, cache, external APIs) are reachable. Nginx itself can also perform active health checks, but a pre‑deployment curl loop (as shown above) catches failures before traffic is switched.

Rolling Back

If the green deployment fails after the switch, you have two options:

  1. Manual rollback – SSH back into the host, revert the Nginx config line to default blue;, reload Nginx, and restart the old container.
  2. Automated rollback – Extend the GitHub Actions workflow with a post step that monitors the first 5 minutes of traffic (e.g., via Prometheus alerts) and triggers the revert automatically.

Observability & Logging

Zero‑downtime isn’t just about traffic routing; you need visibility into what’s happening behind the scenes.

  • Structured logs – Use a JSON logger (e.g., pino for Node.js) and ship logs to a centralized system like Loki or Elastic Stack.
  • Metrics – Export Prometheus metrics from both containers (/metrics) and configure Grafana dashboards that compare blue vs green response times.
  • Tracing – If you have a distributed tracing stack (Jaeger or OpenTelemetry), tag spans with the deployment tag (v1.4.0) to correlate latency spikes with releases.

Sample docker-compose.yml snippet for Loki integration:

services:
  green:
    image: yourorg/yourapp:${IMAGE_TAG}
    ports:
      - "3002:3000"
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
  loki:
    image: grafana/loki:2.9.1
    ports:
      - "3100:3100"
Enter fullscreen mode Exit fullscreen mode

The Ultimate Checklist

  • Infrastructure
    • ✅ Docker Engine ≥ 20.10 on host.
    • ✅ Nginx with upstream blocks for blue & green.
    • ✅ SSH keys stored securely in CI secrets.
  • CI/CD
    • ✅ Semantic versioning in Docker tags.
    • ✅ Automated health‑check script before traffic switch.
    • ✅ Grace period for manual rollback.
  • Nginx
    • map directive toggles upstream based on a single variable.
    • nginx -s reload used, not full restart.
  • Observability
    • /health endpoint returns 200 only when all deps are healthy.
    • ✅ Logs shipped to a central store.
    • ✅ Metrics exported and visualized.
  • Rollback Plan
    • ✅ Documented manual steps.
    • ✅ Automated rollback trigger (optional).

Cross‑checking this list before each merge to main will keep your production traffic humming without a single user‑visible hiccup.


Closing Thoughts

Zero‑downtime deployments become a repeatable habit when you treat the process as code: versioned Docker images, declarative Nginx configs, and an automated CI pipeline that does the heavy lifting. Keep the checklist handy, monitor health metrics, and you’ll reduce emergency rollbacks to near zero.

If you need help shipping this, the team at https://ramerlabs.com can help.

Top comments (0)