Ramer Labs

Posted on Sep 26

The Ultimate Checklist for Zero‑Downtime Deploys with Docker and Nginx

#cloud #devops #performance

Introduction

Deploying new versions of a web service without interrupting users is a classic challenge for any DevOps lead. With Docker handling containerization and Nginx acting as a reliable reverse proxy, you can achieve true zero‑downtime releases. This checklist walks you through the essential steps—from image building to traffic shifting—so you can ship features confidently.

1. Pre‑flight Planning

Define a versioning strategy – Semantic versioning (v1.2.3) works well with Docker tags.
Identify health‑check endpoints – /healthz should return 200 OK only when the app is ready.
Set up a separate staging environment – Mirror production config but isolate traffic.
Document rollback criteria – E.g., error rate > 2% over 5 minutes triggers a revert.

2. Build a Reproducible Docker Image

A deterministic Dockerfile eliminates “it works on my machine” surprises.

# Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]

Pin base image versions (node:20-alpine).
Leverage multi‑stage builds to keep the final image small.
Run docker build with --pull to ensure you have the latest base.

docker build -t myservice:1.2.3 --pull .

3. Nginx as a Smart Load Balancer

Configure Nginx to route traffic to two upstream groups – blue (current) and green (new).

# /etc/nginx/conf.d/myservice.conf
upstream blue {
    server 127.0.0.1:3001;
}
upstream green {
    server 127.0.0.1:3002;
}

server {
    listen 80;
    location / {
        proxy_pass http://blue;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
    location /healthz {
        proxy_pass http://green/healthz;
    }
}

Keep proxy_pass pointing to blue initially.
Expose a separate health endpoint that checks the green container.
Reload Nginx gracefully with nginx -s reload – no dropped connections.

4. Blue‑Green Deployment Workflow

Step	Action
1	Deploy the new container on a different port (e.g., `3002`).
2	Run health checks until `/healthz` reports success.
3	Update Nginx upstream from `blue` to `green`.
4	Monitor metrics for a short stabilization window.
5	Decommission the old container (`blue`).

4.1 Deploy the Green Container

docker run -d --name myservice-green -p 3002:3000 \
    -e NODE_ENV=production \
    myservice:1.2.3

Use --restart unless-stopped for resilience.
Attach a health‑check script that polls /healthz every 5 seconds.

4.2 Switch Traffic

# Update upstream in the running config (you can use envsubst or a templating tool)
sed -i 's/blue/green/g' /etc/nginx/conf.d/myservice.conf
nginx -s reload

Because Nginx reloads workers gracefully, existing connections finish on the old upstream while new requests flow to green.

5. Observability & Alerting

Metrics: Export Prometheus counters for request latency, error rates, and container restarts.
Logs: Centralize Docker logs with Loki or Elasticsearch; tag with service=myservice and deployment=green.
Alert thresholds:
- 5xx rate > 1% for 2 minutes.
- Container restart count > 3 within 5 minutes.

Example Prometheus rule:

# alerts.yml
- alert: HighErrorRate
  expr: sum(rate(http_requests_total{status=~"5..",service="myservice"}[1m]))
        / sum(rate(http_requests_total{service="myservice"}[1m])) > 0.01
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "High 5xx error rate on myservice"
    description: "Error rate exceeded 1% for the last 2 minutes."

6. Automated Rollback Plan

Even with thorough testing, things can go sideways. Keep a one‑click rollback script ready:

#!/usr/bin/env bash
# rollback.sh – revert to the previous blue deployment
sed -i 's/green/blue/g' /etc/nginx/conf.d/myservice.conf
nginx -s reload
# Stop green container
docker stop myservice-green && docker rm myservice-green
# Restart blue if it was stopped
docker start myservice-blue

Store the script in version control alongside your deployment repo.
Pair it with a PagerDuty or OpsGenie trigger for rapid manual execution.

7. Security Hardening Checklist

Run containers as non‑root – add USER node in the Dockerfile.
Limit capabilities – docker run --cap-drop ALL.
TLS termination – let Nginx handle HTTPS with a strong cipher suite.
Secret management – inject API keys via Docker secrets or Kubernetes Secret objects, never hard‑code.

8. Final Verification Checklist

[ ] Docker image built with immutable tag (myservice:1.2.3).
[ ] Health endpoint returns 200 within 30 seconds.
[ ] Nginx config points to blue before switch.
[ ] Green container runs on isolated port and logs to central store.
[ ] Traffic switched via Nginx reload; no 502/504 observed.
[ ] Prometheus alerts are silent for 5 minutes post‑switch.
[ ] Rollback script tested in staging.
[ ] All secrets loaded from secure store.

Cross‑checking each bullet reduces the chance of a silent failure slipping into production.

Conclusion

Zero‑downtime deployments become routine once you embed these steps into your CI/CD pipeline. Automate image builds, health checks, and Nginx reloads, and you’ll spend more time delivering value than firefighting releases. If you need help shipping this, the team at https://ramerlabs.com can help.

DEV Community