Introduction
Zero‑downtime deployments are a non‑negotiable expectation for modern services. As a DevOps lead, you’ve probably seen the panic when a new release briefly knocks the site offline. In this practical guide we’ll walk through a Docker‑centric, Nginx‑backed blue‑green deployment pattern that keeps users blissfully unaware of any change. By the end you’ll have a ready‑to‑run CI/CD pipeline, a reversible Nginx config, and a short checklist you can embed in your sprint retrospectives.
Prerequisites
Before diving into the steps, make sure you have the following in place:
- A Docker‑compatible host (Linux VM, AWS EC2, or a local dev machine).
- Nginx installed as a reverse proxy on the same host or a separate bastion.
- Access to a Git repository (GitHub, GitLab, Bitbucket) where the application code lives.
- Basic familiarity with Dockerfiles,
docker compose
, and a CI platform (GitHub Actions, GitLab CI, or CircleCI).
Tip: Keep your Docker Engine version ≥ 20.10 and Nginx ≥ 1.21 for the best compatibility with the snippets below.
Blueprint Overview
The core idea is simple: two identical environments – blue (current production) and green (next version). Nginx routes traffic to the active environment, and a CI job flips the upstream target once the green containers pass health checks.
Docker Image Versioning
Every commit that touches the Dockerfile
should produce a semantic tag (v1.2.3
). Use the --label
flag to embed the Git SHA – this makes roll‑backs traceable.
# Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
EXPOSE 3000
LABEL org.opencontainers.image.version="${GIT_TAG}"\
org.opencontainers.image.revision="${GIT_SHA}"\
org.opencontainers.image.source="https://github.com/yourorg/yourapp"
CMD ["node", "dist/index.js"]
The GIT_TAG
and GIT_SHA
variables are injected by the CI pipeline (see later). Tagging ensures you can pull yourapp:1.4.0
for green and keep yourapp:1.3.9
for blue.
Nginx Reverse Proxy Config
Below is a minimal yet production‑ready Nginx snippet that defines two upstream blocks – blue and green – and a variable $upstream
that decides which one receives traffic.
# /etc/nginx/conf.d/app.conf
upstream blue {
server 127.0.0.1:3001; # Docker container listening on host port 3001
}
upstream green {
server 127.0.0.1:3002; # Docker container listening on host port 3002
}
# Default to blue; CI will toggle this file via envsubst
map $http_x_deploy_target $upstream {
default blue;
"green" green;
}
server {
listen 80;
server_name example.com;
location / {
proxy_pass http://$upstream;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
When the CI job finishes a successful deployment, it updates the X-Deploy-Target
header (or rewrites the map
block) to point traffic to green. Rolling back is as simple as flipping the variable back to blue.
CI/CD Pipeline Steps
Below is a GitHub Actions workflow that automates the entire cycle. Adjust the jobs.deploy
matrix if you prefer GitLab CI or CircleCI.
name: Deploy Blue‑Green
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
outputs:
image_tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Extract metadata (tags, labels)
id: meta
uses: docker/metadata-action@v4
with:
images: yourorg/yourapp
tags: |
type=semver,pattern={{version}}
type=sha,format=long
- name: Build and push image
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
deploy:
needs: build
runs-on: ubuntu-latest
environment: production
steps:
- name: SSH into host
uses: appleboy/ssh-action@v0.1.7
with:
host: ${{ secrets.HOST }}
username: ${{ secrets.USER }}
key: ${{ secrets.SSH_KEY }}
script: |
# Pull new image as green
docker pull yourorg/yourapp:${{ needs.build.outputs.image_tag }}
docker run -d --name green -p 3002:3000 \
yourorg/yourapp:${{ needs.build.outputs.image_tag }}
# Health check loop (max 30s)
for i in {1..30}; do
if curl -sSf http://localhost:3002/health; then break; fi;
sleep 1;
done
# Switch Nginx upstream to green
sudo sed -i 's/default blue;/default green;/' /etc/nginx/conf.d/app.conf
sudo nginx -s reload
# Optional: keep blue running for rollback window (5 min)
sleep 300
# Stop blue container
docker stop blue && docker rm blue
# Rename green → blue for next cycle
docker rename green blue
Key points:
- The
build
job tags the image with both a semantic version and the commit SHA. - The
deploy
job pulls the image, runs it on a dedicated host port (3002
), performs a health check, then updates Nginx and reloads it. - A short grace period keeps the old container alive, giving you a manual rollback window.
Blue‑Green Switch with Nginx
Health Checks
Your application should expose a lightweight /health
endpoint that returns 200 OK
when all internal dependencies (DB, cache, external APIs) are reachable. Nginx itself can also perform active health checks, but a pre‑deployment curl loop (as shown above) catches failures before traffic is switched.
Rolling Back
If the green deployment fails after the switch, you have two options:
-
Manual rollback – SSH back into the host, revert the Nginx config line to
default blue;
, reload Nginx, and restart the old container. -
Automated rollback – Extend the GitHub Actions workflow with a
post
step that monitors the first 5 minutes of traffic (e.g., via Prometheus alerts) and triggers the revert automatically.
Observability & Logging
Zero‑downtime isn’t just about traffic routing; you need visibility into what’s happening behind the scenes.
-
Structured logs – Use a JSON logger (e.g.,
pino
for Node.js) and ship logs to a centralized system like Loki or Elastic Stack. -
Metrics – Export Prometheus metrics from both containers (
/metrics
) and configure Grafana dashboards that compare blue vs green response times. -
Tracing – If you have a distributed tracing stack (Jaeger or OpenTelemetry), tag spans with the deployment tag (
v1.4.0
) to correlate latency spikes with releases.
Sample docker-compose.yml
snippet for Loki integration:
services:
green:
image: yourorg/yourapp:${IMAGE_TAG}
ports:
- "3002:3000"
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
loki:
image: grafana/loki:2.9.1
ports:
- "3100:3100"
The Ultimate Checklist
-
Infrastructure
- ✅ Docker Engine ≥ 20.10 on host.
- ✅ Nginx with upstream blocks for blue & green.
- ✅ SSH keys stored securely in CI secrets.
-
CI/CD
- ✅ Semantic versioning in Docker tags.
- ✅ Automated health‑check script before traffic switch.
- ✅ Grace period for manual rollback.
-
Nginx
- ✅
map
directive toggles upstream based on a single variable. - ✅
nginx -s reload
used, not full restart.
- ✅
-
Observability
- ✅
/health
endpoint returns 200 only when all deps are healthy. - ✅ Logs shipped to a central store.
- ✅ Metrics exported and visualized.
- ✅
-
Rollback Plan
- ✅ Documented manual steps.
- ✅ Automated rollback trigger (optional).
Cross‑checking this list before each merge to main
will keep your production traffic humming without a single user‑visible hiccup.
Closing Thoughts
Zero‑downtime deployments become a repeatable habit when you treat the process as code: versioned Docker images, declarative Nginx configs, and an automated CI pipeline that does the heavy lifting. Keep the checklist handy, monitor health metrics, and you’ll reduce emergency rollbacks to near zero.
If you need help shipping this, the team at https://ramerlabs.com can help.
Top comments (0)