Introduction
As a DevOps lead, you’ve probably felt the panic of a deployment that takes your service offline for even a few seconds. In a world where users expect instant availability, a brief outage can translate into lost revenue, damaged brand trust, and a flood of support tickets. This checklist walks you through a practical, end‑to‑end process for achieving zero‑downtime deployments using Docker containers behind an Nginx reverse‑proxy. The steps are deliberately granular so you can copy‑paste them into your own runbooks.
1. Prep Your Build Environment
-
Pin base images – Use immutable tags like
python:3.11-slim
instead oflatest
. -
Enable BuildKit – Faster, reproducible builds. Add
export DOCKER_BUILDKIT=1
to your CI shell. -
Run static analysis – Tools like
hadolint
catch Dockerfile anti‑patterns early.
# Example: lint a Dockerfile in CI
hadolint Dockerfile
2. Create a Reproducible Docker Image
Your Dockerfile should be minimal and layered for cache efficiency. Below is a solid starter for a Node.js API, but the same principles apply to any language.
# syntax=docker/dockerfile:1.4
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY . .
RUN npm run build
FROM node:20-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]
Key points:
- Multi‑stage build keeps the final image tiny.
-
--omit=dev
ensures dev dependencies never make it to production. - Expose only the port you need.
3. Define a Blue‑Green Architecture in Nginx
Nginx will act as a router that can switch traffic between two upstream groups: blue
(current) and green
(new). Store the config in a version‑controlled file (e.g., nginx.conf
).
http {
upstream blue {
server 127.0.0.1:3001 max_fails=3 fail_timeout=30s;
}
upstream green {
server 127.0.0.1:3002 max_fails=3 fail_timeout=30s;
}
map $upstream_status $target_upstream {
default "blue"; # default traffic goes to blue
}
server {
listen 80;
location / {
proxy_pass http://$target_upstream;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
}
When you’re ready to promote, you simply flip the $target_upstream
variable (or reload a tiny snippet) without touching the client‑facing endpoint.
4. Spin Up Containers with Docker Compose
Use a docker-compose.yml
that defines both blue and green services. Assign static ports that Nginx knows about.
version: "3.9"
services:
api-blue:
image: myorg/api:{{CURRENT_TAG}}
container_name: api_blue
ports:
- "3001:3000"
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 10s
timeout: 2s
retries: 3
api-green:
image: myorg/api:{{NEW_TAG}}
container_name: api_green
ports:
- "3002:3000"
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 10s
timeout: 2s
retries: 3
Checklist items for this step:
- Replace
{{CURRENT_TAG}}
and{{NEW_TAG}}
with CI variables. - Ensure health checks are strict; a failing container never receives traffic.
- Keep
restart: always
to let Docker auto‑recover crashes.
5. Validate the Green Stack Before Switchover
-
Run integration tests against the green endpoint (
http://localhost:3002
). -
Check logs –
docker logs api_green
should show a clean start. -
Confirm health –
docker exec api_green curl -f http://localhost:3000/health
.
If any test fails, abort the promotion and investigate before proceeding.
6. Switch Traffic Atomically
The actual cut‑over can be performed with a single Nginx reload that swaps the upstream mapping. Store the mapping in a small snippet (upstream_switch.conf
) that you edit on the fly.
# upstream_switch.conf – generated by CI
map $upstream_status $target_upstream {
default "green"; # flip to green for zero‑downtime
}
Run the following commands:
# Update the snippet with the new target (green)
printf "map \$upstream_status \$target_upstream {\n default \"green\";\n}\n" > /etc/nginx/conf.d/upstream_switch.conf
# Reload Nginx without dropping connections
nginx -s reload
Because Nginx reloads its configuration gracefully, existing connections finish on the blue containers while new requests flow to green.
7. Observe, Verify, and Roll Back if Needed
- Metrics – Monitor latency and error rates via Prometheus or Datadog. A spike after the switch indicates a problem.
-
Logs – Centralize container logs with Loki or Elasticsearch; search for
ERROR
within the first 5 minutes. -
Rollback – If anything looks off, revert the snippet back to
default "blue";
and reload Nginx. The blue containers are still running, so traffic instantly returns to the stable version.
8. Clean Up Old Images
After a successful rollout and a safe observation window (usually 15‑30 minutes), prune the old containers and images to free resources.
# Stop and remove the blue containers
docker-compose stop api-blue && docker-compose rm -f api-blue
# Remove the old image tag
docker image rm myorg/api:{{CURRENT_TAG}}
Automate this step in your CI pipeline so you never accumulate stale layers.
9. Document the Process
Even the most reliable automation can fail when a human intervenes. Keep a markdown runbook that mirrors this checklist, version‑controlled alongside your infrastructure code. Include:
- CI variable names (
CURRENT_TAG
,NEW_TAG
). - Where the Nginx snippet lives.
- Contact points for on‑call engineers.
10. Final Thoughts
Zero‑downtime deployments aren’t a magic button; they’re a disciplined series of checks, health‑guards, and graceful hand‑offs. By treating the blue and green stacks as first‑class citizens and letting Nginx handle the traffic switch, you can push updates without ever dropping a client request.
If you need help shipping this, the team at https://ramerlabs.com can help.
Top comments (0)