Ramer Labs

Posted on Sep 26

The Ultimate Checklist for Zero‑Downtime Deploys with Docker & Nginx

#cloud #devops #performance

Introduction

Zero‑downtime deployments are a non‑negotiable expectation for modern services. As a DevOps lead, you’re tasked with keeping traffic flowing while you swap out code, containers, and infrastructure. This checklist walks you through a pragmatic, Docker‑centric workflow that leverages Nginx as a lightweight reverse‑proxy for blue‑green releases. The steps are deliberately concrete so you can copy‑paste them into your own pipelines.

1️⃣ Planning the Release

Before you touch a single line of code, answer three questions:

What is the target version? Use semantic versioning and tag Docker images accordingly (e.g., myapp:1.4.0).
How will traffic be split? Decide on a 100 % → 0 % cut‑over or a gradual 10 % ramp‑up.
What rollback criteria exist? Define health‑check thresholds and a timeout for automatic rollback.

Having these answers documented in a release‑plan.md file keeps the whole team aligned.

2️⃣ Build Immutable Docker Images

2.1 Dockerfile Best Practices

# syntax=docker/dockerfile:1.4
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN npm run build

FROM node:20-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]

Multi‑stage builds keep the final image tiny.
Pin the base image (node:20-alpine) to avoid surprise updates.
Expose only the needed port – Nginx will forward to it.

2.2 Tagging & Pushing

# Build the image with a semantic tag
docker build -t myorg/myapp:1.4.0 .
# Push to your registry (Docker Hub, ECR, GCR …)
docker push myorg/myapp:1.4.0

Store the image digest (docker inspect --format='{{.RepoDigests}}' …) for later verification.

3️⃣ Blue‑Green Architecture with Nginx

Nginx will sit in front of two upstream groups – green (current) and blue (candidate). Swapping traffic is just a reload of the upstream block.

3.1 Nginx Upstream Configuration

# /etc/nginx/conf.d/upstreams.conf
upstream green {
    server 10.0.1.10:3000; # current version
    server 10.0.1.11:3000;
}

upstream blue {
    server 10.0.2.10:3000; # new version (initially down)
    server 10.0.2.11:3000;
}

server {
    listen 80;
    location / {
        proxy_pass http://green;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

When the blue containers pass health checks, you simply change proxy_pass to http://blue; and reload Nginx:

sudo nginx -s reload

Because Nginx reloads gracefully, existing connections finish on the green upstream while new connections start hitting blue.

4️⃣ CI/CD Pipeline Steps

Below is a minimal GitHub Actions workflow that automates the checklist.

name: Deploy
on:
  push:
    tags:
      - 'v*.*.*'
jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Log in to registry
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Build & push image
        run: |
          IMAGE=ghcr.io/${{ github.repository }}:${{ github.ref_name }}
          docker build -t $IMAGE .
          docker push $IMAGE
      - name: Deploy blue stack
        env:
          IMAGE: ${{ env.IMAGE }}
        run: |
          # Pull new image on blue hosts
          ssh user@10.0.2.10 "docker pull $IMAGE && docker run -d --name myapp -p 3000:3000 $IMAGE"
          ssh user@10.0.2.11 "docker pull $IMAGE && docker run -d --name myapp -p 3000:3000 $IMAGE"
      - name: Smoke test blue
        run: |
          curl -sSf http://10.0.2.10:3000/health || exit 1
          curl -sSf http://10.0.2.11:3000/health || exit 1
      - name: Switch traffic
        run: |
          sudo nginx -s reload
      - name: Verify green is drained
        timeout-minutes: 5
        run: |
          # Simple check that green hosts return 502 after traffic cut
          sleep 30
          curl -s -o /dev/null -w "%{http_code}" http://10.0.1.10:3000/health || true

Key points:

Tag‑triggered builds guarantee version traceability.
Separate SSH steps let you target the blue hosts without disturbing green.
Smoke tests run before the traffic switch.
Graceful Nginx reload performs the cut‑over.

5️⃣ Observability & Automated Rollback

5️⃣1 Metrics to Watch

Metric	Threshold	Action
`request_success_rate`	< 99.5 %	Trigger rollback
`latency_p95`	> 300 ms	Alert & investigate
`cpu_usage` (blue)	> 80 % for 5 min	Scale out

Export these via Prometheus and set up Alertmanager rules.

5️⃣2 One‑Click Rollback

If a health check fails after the switch, revert by pointing Nginx back to the green upstream and reloading:

sed -i 's/proxy_pass http:\/\/blue;/proxy_pass http:\/\/green;/' /etc/nginx/conf.d/upstreams.conf
sudo nginx -s reload

Because the green containers are still running, the rollback is instantaneous.

6️⃣ Security & Secrets Management

Never bake secrets into Docker images. Use Docker secrets or a vault (e.g., HashiCorp Vault, AWS Secrets Manager) and inject them at runtime.
TLS termination should happen at the Nginx layer. Store certificates in a secure location and reload Nginx without dropping connections:

sudo cp /etc/letsencrypt/live/example.com/fullchain.pem /etc/nginx/ssl/
sudo nginx -s reload

Conclusion

Zero‑downtime deployments become repeatable when you treat the process as a checklist: immutable images, blue‑green Nginx routing, health‑gated traffic switches, and observability‑driven rollbacks. By codifying each step in CI/CD, you eliminate human error and keep your services available during every release. If you need help shipping this, the team at RamerLabs can help.

DEV Community