DEV Community

Cover image for Blue/Green Deployment with Nginx Upstreams Using Docker Compose
Ifeanyi Nworji
Ifeanyi Nworji

Posted on

Blue/Green Deployment with Nginx Upstreams Using Docker Compose

Auto-Failover, Zero Downtime, and Manual Traffic Switching

One of the core responsibilities of a DevOps engineer is ensuring application availability in the presence of failure. Downtime is rarely caused by deployments themselves, but by how traffic is handled when something goes wrong.

In Stage 2 of my DevOps internship, I implemented a Blue/Green deployment architecture using Nginx upstreams and Docker Compose, focusing on:

  • Zero failed client requests during outages
  • Automatic failover within a single request
  • Manual traffic switching without restarting containers
  • No application code changes
  • No image rebuilds

This article is a beginner-friendly but production-accurate walkthrough of the solution, explaining both the configuration and the runtime behavior in detail.

Problem Overview

We are provided with two identical Node.js services packaged as pre-built Docker images:

  • Blue — primary (active)
  • Green — backup

Each service exposes the following endpoints:

Endpoint Purpose
GET /version Returns JSON + headers
GET /healthz Liveness check
POST /chaos/start Simulates failure
POST /chaos/stop Restores service

The task is to place Nginx in front of both services and guarantee:

  • All traffic goes to Blue by default
  • On Blue failure, Nginx automatically switches to Green
  • No client request returns non-200 during failover
  • Application response headers are forwarded unchanged
  • Traffic can be manually toggled between Blue and Green

Architecture Overview
The final architecture is intentionally simple and production-aligned:

Client --> Nginx (8080) --> Blue App (8081) OR failover to Green App (8082)

Key characteristics:

  • Nginx is the single public entrypoint
  • Blue/Green run simultaneously
  • Docker Compose orchestrates everything
  • No Kubernetes, no service mesh, no rebuilds

Environment-Driven Configuration

All behavior is controlled via environment variables, making the setup CI-friendly and reproducible.

Key variables:

  • BLUE_IMAGE, GREEN_IMAGE
  • ACTIVE_POOL (blue or green)
  • RELEASE_ID_BLUE, RELEASE_ID_GREEN
  • PORT, BLUE_PORT, GREEN_PORT
  • NGINX_PORT

This design ensures:

  • No hardcoded values
  • Safe traffic switching
  • Easy automated verification

Docker Compose: Service Breakdown

Blue Application Service

app_blue:
  image: ${BLUE_IMAGE}
  container_name: app_blue
  restart: always
  environment:
    - PORT=${PORT}
    - RELEASE_ID=${RELEASE_ID_BLUE}
    - APP_POOL=blue
  expose:
    - "${PORT}"
  ports:
    - "${BLUE_PORT}:${PORT}"
  healthcheck:
    test: ["CMD-SHELL", "node -e \"process.exit(0)\""]
    interval: 5s
    timeout: 2s
    retries: 3
Enter fullscreen mode Exit fullscreen mode

What this achieves:

  • Runs the provided Blue image without modification
  • Injects runtime metadata used in response headers
  • Exposes the service internally to Nginx
  • Maps a direct port (8081) for chaos testing
  • Keeps the container healthy and restartable

The Green service is identical, differing only in image, release ID, and port.
This symmetry is critical for Blue/Green deployments.

Nginx Reverse Proxy

nginx:
  image: nginx:latest
  ports:
    - "${NGINX_PORT}:80"
  volumes:
    - ./nginx/nginx.tmpl:/etc/nginx/templates/default.conf.template:ro
    - ./nginx/entrypoint.sh:/docker-entrypoint.d/10-envsubst.sh:ro
    - ./nginx-logs:/var/log/nginx
  environment:
    - ACTIVE_POOL=${ACTIVE_POOL}
    - PORT=${PORT}
Enter fullscreen mode Exit fullscreen mode

Key decisions:

  • Nginx is the only public interface
  • Configuration is templated, not static
  • Logs are persisted for inspection and alerting
  • No container restarts needed for traffic switching

Nginx Upstreams: Blue/Green Routing

The heart of the solution lies in the Nginx upstream configuration.

Timeout and Retry Configuration

proxy_connect_timeout 1s;
proxy_read_timeout 5s;
proxy_send_timeout 3s;

proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
proxy_next_upstream_tries 2;
proxy_next_upstream_timeout 8s;
Enter fullscreen mode Exit fullscreen mode

These values ensure:

  • Failures are detected quickly
  • Retries happen automatically
  • Total request time remains under 10 seconds
  • Clients never see partial or failed responses

Primary / Backup Upstreams

upstream blue {
  server app_blue:${PORT} max_fails=1 fail_timeout=5s;
  server app_green:${PORT} backup;
}

upstream green {
  server app_green:${PORT} max_fails=1 fail_timeout=5s;
  server app_blue:${PORT} backup;
}
Enter fullscreen mode Exit fullscreen mode

Why this works:

  • max_fails=1 marks the primary unhealthy after a single failure
  • fail_timeout=5s enables fast recovery
  • backup ensures Green is only used when Blue fails
  • The same config supports both active pools

Deep Dive: Request Flow During Failure
This is the most important part of the system.

Normal Operation

  1. Client sends:
GET http://localhost:8080/version
Enter fullscreen mode Exit fullscreen mode
  1. Nginx forwards to Blue
  2. Blue responds with 200
  3. Headers returned:
X-App-Pool: blue
X-Release-Id: <RELEASE_ID_BLUE>
Enter fullscreen mode Exit fullscreen mode

Failure Scenario (Blue Down)

Chaos is induced directly on Blue:

POST http://localhost:8081/chaos/start?mode=error
Enter fullscreen mode Exit fullscreen mode

Now let’s trace a single client request.

Step 1: Request Hits Nginx

The client is unaware of Blue or Green.

Step 2: Nginx Proxies to Blue

Blue is the primary upstream.

Step 3: Blue Fails

Blue returns a 5xx or times out.

Step 4: Nginx Intercepts the Failure

Because of:

proxy_next_upstream error timeout http_5xx;
Enter fullscreen mode Exit fullscreen mode

Nginx does not forward the failure to the client.

Step 5: Immediate Retry to Green

Within the same client request, Nginx retries the request to Green.

Step 6: Green Responds Successfully

Green returns:

HTTP 200
X-App-Pool: green
X-Release-Id: <RELEASE_ID_GREEN>
Enter fullscreen mode Exit fullscreen mode

Result:
The client sees HTTP 200, even though Blue failed.

Why Proxy Buffering Matters

proxy_buffering on;
Enter fullscreen mode Exit fullscreen mode

This ensures Nginx does not stream partial responses.
If Blue fails mid-request, Nginx can safely retry Green without exposing errors to clients.

Header Preservation
Each application response includes:

  • X-App-Pool
  • X-Release-Id Nginx forwards these headers unchanged:
proxy_pass_header X-App-Pool;
proxy_pass_header X-Release-Id;
Enter fullscreen mode Exit fullscreen mode

This allows:

CI validation

Runtime verification

Clear observability of which pool served the request

Manual Blue/Green Switching

Traffic switching is handled by configuration templating.

Entrypoint Script

envsubst '$ACTIVE_POOL $PORT $RELEASE_ID_BLUE $RELEASE_ID_GREEN' \
  < default.conf.template > default.conf
Enter fullscreen mode Exit fullscreen mode

This allows:

  • Changing ACTIVE_POOL=green
  • Regenerating the Nginx config
  • Reloading Nginx without downtime No containers are restarted.

Stability Under Sustained Failure
During a ~10 second request loop:

  • Zero non-200 responses
  • ≥95% responses from Green
  • Blue remains isolated until healthy

This satisfies all grader stability requirements.

Key Takeaways

This project demonstrates:

  • Blue/Green deployment without Kubernetes
  • Auto-failover within a single HTTP request
  • Resilience implemented at the proxy layer
  • Environment-driven infrastructure design
  • Production-grade reliability using simple tools

Conclusion

High availability is not about avoiding failure—it’s about handling failure correctly.

By combining Nginx upstreams, Docker Compose, and strict timeout and retry controls, we achieve:

  • Zero downtime
  • Safe rollbacks
  • Transparent failover
  • CI-ready verification

This approach mirrors real production systems and is an excellent foundation for any DevOps engineer.

If you’re learning DevOps, mastering patterns like this matters far more than chasing tools. Reliability is a design choice.
Explore the code here

Top comments (0)