Introduction
Zero‑downtime deployments are no longer a nice‑to‑have; they’re a baseline expectation for modern services. As a DevOps lead, you’re probably juggling Docker containers, Nginx reverse proxies, and a CI/CD pipeline that must keep the lights on while you push new code. This checklist walks you through the practical steps to achieve seamless rollouts, from image building to observability, without sacrificing safety.
1. Prepare Your Docker Images
1.1 Use Multi‑Stage Builds
Multi‑stage Dockerfiles keep your final image lean and free of build‑time dependencies.
# ---- Build Stage ----
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# ---- Production Stage ----
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY package*.json ./
RUN npm ci --production
EXPOSE 3000
CMD ["node", "dist/index.js"]
1.2 Tag Images with Immutable Versions
Never push latest
to production. Tag each build with a Git SHA or semantic version, e.g., myapp:1.4.2‑a1b2c3
.
docker build -t myregistry.com/myapp:1.4.2-a1b2c3 .
2. Blueprint Your Nginx Proxy
2.1 Separate Upstream Blocks per Release
Define distinct upstream groups for the current and candidate containers. This makes traffic shifting painless.
upstream app_current {
server 127.0.0.1:8081;
}
upstream app_candidate {
server 127.0.0.1:8082;
}
server {
listen 80;
location / {
proxy_pass http://app_current;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
2.2 Enable Graceful Shutdowns
Add proxy_next_upstream
and a reasonable keepalive_timeout
so Nginx can finish in‑flight requests before dropping a container.
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
proxy_send_timeout 30s;
keepalive_timeout 65s;
3. Adopt a Blue‑Green Deployment Strategy
3.1 Define the Flow
- Deploy candidate – spin up a new container on the candidate port (e.g., 8082).
- Health‑check – run automated smoke tests against the candidate upstream.
-
Swap traffic – update Nginx config to point
proxy_pass
fromapp_current
toapp_candidate
and reload. - Monitor – watch error rates, latency, and logs for a brief period.
- Retire old – once confidence is high, stop the old container and rename the candidate to current.
3.2 Automate with a CI/CD Job
Here’s a minimal GitHub Actions workflow that orchestrates the above steps.
name: Blue‑Green Deploy
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build Docker Image
run: |
IMAGE_TAG=${{ github.sha }}
docker build -t myregistry.com/myapp:$IMAGE_TAG .
docker push myregistry.com/myapp:$IMAGE_TAG
- name: Deploy Candidate
run: |
docker run -d --name myapp_candidate -p 8082:3000 myregistry.com/myapp:${{ github.sha }}
- name: Health Check
run: |
curl -f http://localhost:8082/health || exit 1
- name: Switch Nginx Traffic
run: |
ssh user@host "sudo cp /etc/nginx/conf.d/app_candidate.conf /etc/nginx/conf.d/app.conf && sudo nginx -s reload"
- name: Observe
run: |
sleep 30 # give a few seconds for traffic to settle
curl -s http://localhost/metrics | grep error_rate || exit 1
- name: Cleanup Old
run: |
docker stop myapp_current && docker rm myapp_current
docker rename myapp_candidate myapp_current
4. Observability & Logging
4.1 Centralize Logs
Send container stdout/stderr to a log aggregator (e.g., Loki, ELK). In Docker Compose, add a logging
driver:
services:
myapp:
image: myregistry.com/myapp:1.4.2-a1b2c3
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
4.2 Export Metrics
Expose Prometheus‑compatible metrics from your app and scrape them via the Nginx exporter.
location /metrics {
proxy_pass http://app_current/metrics;
allow 127.0.0.1;
deny all;
}
4.3 Alert on Anomalies
Create alerts for:
- Error rate > 1% over a 5‑minute window.
- Latency P95 > 300ms.
- Container restarts > 2 in the last hour.
5. Rollback Plan
Even with thorough checks, things can go sideways. Keep a one‑click rollback:
- Re‑apply the previous Nginx config (
app_previous.conf
). - Reload Nginx.
- Verify health endpoints.
- If needed, spin up the older Docker image again.
Document the rollback steps in your runbook and test them quarterly.
6. Checklist Summary
- [ ] Immutable image tags – no
latest
in prod. - [ ] Multi‑stage Dockerfile – smallest possible runtime.
- [ ] Separate Nginx upstreams for current and candidate.
- [ ] Graceful timeout settings in Nginx.
- [ ] Automated health checks before traffic switch.
- [ ] CI/CD job that builds, deploys, validates, and swaps.
- [ ] Centralized logging and metrics export.
- [ ] Alert thresholds for error rate, latency, restarts.
- [ ] Documented rollback procedure and periodic drills.
Following this checklist will give you confidence that each push lands without a hiccup, keeping users happy and your team stress‑free.
Closing Thoughts
Zero‑downtime deployments are a combination of disciplined image management, smart proxy configuration, and robust observability. By treating each release as a candidate rather than an overwrite, you gain the safety net needed for fast iteration. If you need help shipping this, the team at https://ramerlabs.com can help.
Top comments (0)