Introduction
Zero‑downtime deployments are a non‑negotiable expectation for modern services. As a DevOps lead, you’re tasked with keeping traffic flowing while you swap out code, containers, and infrastructure. This checklist walks you through a pragmatic, Docker‑centric workflow that leverages Nginx as a lightweight reverse‑proxy for blue‑green releases. The steps are deliberately concrete so you can copy‑paste them into your own pipelines.
1️⃣ Planning the Release
Before you touch a single line of code, answer three questions:
-
What is the target version? Use semantic versioning and tag Docker images accordingly (e.g.,
myapp:1.4.0
). - How will traffic be split? Decide on a 100 % → 0 % cut‑over or a gradual 10 % ramp‑up.
- What rollback criteria exist? Define health‑check thresholds and a timeout for automatic rollback.
Having these answers documented in a release‑plan.md
file keeps the whole team aligned.
2️⃣ Build Immutable Docker Images
2.1 Dockerfile Best Practices
# syntax=docker/dockerfile:1.4
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN npm run build
FROM node:20-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]
- Multi‑stage builds keep the final image tiny.
-
Pin the base image (
node:20-alpine
) to avoid surprise updates. - Expose only the needed port – Nginx will forward to it.
2.2 Tagging & Pushing
# Build the image with a semantic tag
docker build -t myorg/myapp:1.4.0 .
# Push to your registry (Docker Hub, ECR, GCR …)
docker push myorg/myapp:1.4.0
Store the image digest (docker inspect --format='{{.RepoDigests}}' …
) for later verification.
3️⃣ Blue‑Green Architecture with Nginx
Nginx will sit in front of two upstream groups – green (current) and blue (candidate). Swapping traffic is just a reload of the upstream block.
3.1 Nginx Upstream Configuration
# /etc/nginx/conf.d/upstreams.conf
upstream green {
server 10.0.1.10:3000; # current version
server 10.0.1.11:3000;
}
upstream blue {
server 10.0.2.10:3000; # new version (initially down)
server 10.0.2.11:3000;
}
server {
listen 80;
location / {
proxy_pass http://green;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
When the blue containers pass health checks, you simply change proxy_pass
to http://blue;
and reload Nginx:
sudo nginx -s reload
Because Nginx reloads gracefully, existing connections finish on the green upstream while new connections start hitting blue.
4️⃣ CI/CD Pipeline Steps
Below is a minimal GitHub Actions workflow that automates the checklist.
name: Deploy
on:
push:
tags:
- 'v*.*.*'
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Log in to registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build & push image
run: |
IMAGE=ghcr.io/${{ github.repository }}:${{ github.ref_name }}
docker build -t $IMAGE .
docker push $IMAGE
- name: Deploy blue stack
env:
IMAGE: ${{ env.IMAGE }}
run: |
# Pull new image on blue hosts
ssh user@10.0.2.10 "docker pull $IMAGE && docker run -d --name myapp -p 3000:3000 $IMAGE"
ssh user@10.0.2.11 "docker pull $IMAGE && docker run -d --name myapp -p 3000:3000 $IMAGE"
- name: Smoke test blue
run: |
curl -sSf http://10.0.2.10:3000/health || exit 1
curl -sSf http://10.0.2.11:3000/health || exit 1
- name: Switch traffic
run: |
sudo nginx -s reload
- name: Verify green is drained
timeout-minutes: 5
run: |
# Simple check that green hosts return 502 after traffic cut
sleep 30
curl -s -o /dev/null -w "%{http_code}" http://10.0.1.10:3000/health || true
Key points:
- Tag‑triggered builds guarantee version traceability.
- Separate SSH steps let you target the blue hosts without disturbing green.
- Smoke tests run before the traffic switch.
- Graceful Nginx reload performs the cut‑over.
5️⃣ Observability & Automated Rollback
5️⃣1 Metrics to Watch
Metric | Threshold | Action |
---|---|---|
request_success_rate |
< 99.5 % | Trigger rollback |
latency_p95 |
> 300 ms | Alert & investigate |
cpu_usage (blue) |
> 80 % for 5 min | Scale out |
Export these via Prometheus and set up Alertmanager rules.
5️⃣2 One‑Click Rollback
If a health check fails after the switch, revert by pointing Nginx back to the green upstream and reloading:
sed -i 's/proxy_pass http:\/\/blue;/proxy_pass http:\/\/green;/' /etc/nginx/conf.d/upstreams.conf
sudo nginx -s reload
Because the green containers are still running, the rollback is instantaneous.
6️⃣ Security & Secrets Management
- Never bake secrets into Docker images. Use Docker secrets or a vault (e.g., HashiCorp Vault, AWS Secrets Manager) and inject them at runtime.
- TLS termination should happen at the Nginx layer. Store certificates in a secure location and reload Nginx without dropping connections:
sudo cp /etc/letsencrypt/live/example.com/fullchain.pem /etc/nginx/ssl/
sudo nginx -s reload
Conclusion
Zero‑downtime deployments become repeatable when you treat the process as a checklist: immutable images, blue‑green Nginx routing, health‑gated traffic switches, and observability‑driven rollbacks. By codifying each step in CI/CD, you eliminate human error and keep your services available during every release. If you need help shipping this, the team at RamerLabs can help.
Top comments (0)