How I built a CLI tool that generates, manages, and monitors a full containerised stack from a single YAML manifest — and everything that broke along the way.
Introduction
Most DevOps tasks ask you to configure infrastructure manually. You write an Nginx config. You write a Docker Compose file. You run commands. You check if things are healthy. You repeat this every time you spin up a new service.
SwiftDeploy flips that script entirely.
The premise is simple but powerful: one YAML file describes your entire deployment, and a CLI tool derives everything else from it. No handwritten Nginx configs. No manually crafted Docker Compose files. No guessing at container states. You edit the manifest, you run the CLI, and your stack is live.
This article is a complete technical walkthrough of how I built SwiftDeploy for the HNG DevOps Internship Stage 4A task — covering the architecture, every component, every design decision, the bugs I hit (including a particularly nasty WSL2 healthcheck issue), and the debugging process that resolved them.
By the end, you'll understand:
- How to build a declarative infrastructure tool from scratch
- How template-based config generation works
- How to write a multi-subcommand Python CLI
- How Docker healthchecks work and why they fail in unexpected ways
- How nginx reverse proxying, canary deployments, and chaos engineering fit together
Let's get into it.
The Architecture
Before writing a single line of code, I mapped out how the pieces would connect.
You (the human)
│
│ edit only this
▼
manifest.yaml ─────────────────────────────────────┐
│
▼
swiftdeploy (CLI)
│
┌───────────────────────────┤
│ │
▼ ▼
nginx.conf docker-compose.yml
(generated file) (generated file)
│ │
└──────────────┬────────────┘
│
▼
Docker starts 2 containers
│
┌──────────────┴──────────────┐
│ │
▼ ▼
[nginx container] [app container]
port 8080 (public) port 3000 (private)
│ │
└──── nginx proxies ──────────┘
Internet → port 8080 → nginx → port 3000 → Python app
The key architectural decisions:
The manifest is the single source of truth. Everything — nginx timeouts, container ports, network names, deployment mode — lives in
manifest.yaml. The CLI reads it and generates everything else.The app is never exposed directly. All traffic flows through nginx on port 8080. The app container only uses
expose(internal Docker network), neverports(host-facing).Generated files are gitignored.
nginx.confanddocker-compose.ymlare outputs, not inputs. They're always regeneratable from the manifest. The grader tests this explicitly — deleting generated files and re-runninginit.The CLI is self-contained. One executable Python script, no framework, handles five subcommands.
The Project Structure
swiftdeploy/
├── manifest.yaml ← the ONLY file you edit
├── swiftdeploy ← CLI executable
├── Dockerfile ← app image definition
├── app/
│ └── main.py ← Python HTTP service
├── templates/
│ ├── nginx.conf.tmpl ← nginx template
│ └── docker-compose.yml.tmpl ← compose template
├── nginx.conf ← generated by init (gitignored)
├── docker-compose.yml ← generated by init (gitignored)
├── .gitignore
└── README.md
Component 1: The Manifest
manifest.yaml is the brain of the entire system. Every other component reads from it either directly or via the generated files.
services:
image: swift-deploy-1-node:latest
port: 3000
mode: stable
version: "1.0.0"
restart_policy: unless-stopped
log_volume: swiftdeploy-logs
nginx:
image: nginx:latest
port: 8080
proxy_timeout: 30
network:
name: swiftdeploy-net
driver_type: bridge
contact: "ops@swiftdeploy.local"
The design intention is that this reads like infrastructure-as-documentation. You can look at this file and understand the entire deployment without reading a single generated config.
What each field controls
-
services.mode— controls whether the app runs instableorcanarymode. The CLI'spromotesubcommand updates this in-place. -
nginx.proxy_timeout— propagates into the nginx config asproxy_connect_timeout,proxy_send_timeout, andproxy_read_timeout. -
contact— injected into nginx's custom JSON error bodies for 502/503/504 responses. -
log_volume— a named Docker volume shared between the app container and nginx, so both write logs to the same persistent location.
Component 2: The Python HTTP Service
The app is a from-scratch HTTP server built on Python's http.server stdlib — no Flask, no FastAPI, no external dependencies. This keeps the Docker image small and the container startup fast.
The server setup
import os
import time
import random
import threading
import json
from http.server import HTTPServer, BaseHTTPRequestHandler
MODE = os.environ.get("MODE", "stable")
APP_VERSION = os.environ.get("APP_VERSION", "1.0.0")
APP_PORT = int(os.environ.get("APP_PORT", "3000"))
START_TIME = time.time()
Configuration comes entirely from environment variables injected by Docker Compose at runtime. The defaults exist only as fallbacks for local development.
START_TIME is captured at module load — this is how /healthz calculates uptime without a database or external state store.
Thread-safe chaos state
chaos_lock = threading.Lock()
chaos_state = {"mode": None, "duration": None, "rate": None}
def get_chaos():
with chaos_lock:
return dict(chaos_state)
def set_chaos(state):
with chaos_lock:
chaos_state.update(state)
Python's http.server handles each request in the same thread by default (it's not async), but I added explicit thread safety here anyway — the Lock ensures that if you ever extend this to a threaded server, chaos state reads and writes remain atomic. dict(chaos_state) returns a copy, preventing the caller from holding a reference to the mutable internal state.
The request handler
class Handler(BaseHTTPRequestHandler):
def log_message(self, format, *args):
pass # suppress default logging — nginx handles access logs
def send_json(self, code, body, extra_headers=None):
payload = json.dumps(body).encode()
self.send_response(code)
self.send_header("Content-Type", "application/json")
self.send_header("X-Deployed-By", "swiftdeploy")
if MODE == "canary":
self.send_header("X-Mode", "canary")
self.end_headers()
self.wfile.write(payload)
send_json is a helper that consolidates all the boilerplate of setting response codes, content type, and custom headers in one place. Every route calls it — this is the DRY principle applied to HTTP handlers.
Suppressing log_message is intentional. The default Python HTTP server writes its own access log to stdout, which would duplicate what nginx already logs in the structured format we defined.
The three routes
GET / — welcome endpoint
if self.path == "/":
self.send_json(200, {
"message": "Welcome to SwiftDeploy API",
"mode": MODE,
"version": APP_VERSION,
"timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
})
GET /healthz — liveness check
elif self.path == "/healthz":
uptime = round(time.time() - START_TIME, 2)
self.send_json(200, {
"status": "ok",
"mode": MODE,
"version": APP_VERSION,
"uptime_seconds": uptime,
})
The /healthz endpoint does three jobs simultaneously: it proves the server is alive (Docker healthcheck), it reports the current mode (so promote can confirm the switch happened), and it reports uptime (useful for debugging restart loops).
POST /chaos — chaos injection (canary only)
def do_POST(self):
if self.path == "/chaos":
if MODE != "canary":
self.send_json(403, {"error": "chaos endpoint only available in canary mode"})
return
length = int(self.headers.get("Content-Length", 0))
body = self.rfile.read(length)
data = json.loads(body)
mode = data.get("mode")
if mode == "slow":
set_chaos({"mode": "slow", "duration": data.get("duration", 2), "rate": None})
elif mode == "error":
set_chaos({"mode": "error", "duration": None, "rate": data.get("rate", 0.5)})
elif mode == "recover":
set_chaos({"mode": None, "duration": None, "rate": None})
Reading Content-Length before calling rfile.read() is standard HTTP protocol — you must know exactly how many bytes to read, otherwise the read blocks waiting for more data that never comes.
The chaos modes:
-
slow— injectstime.sleep(N)before responding, simulating a slow upstream -
error— usesrandom.random() < rateto return 500 on a configurable percentage of requests -
recover— clears all chaos state, returning to normal behaviour
This is real chaos engineering in miniature — the same concept used by tools like Chaos Monkey, just scoped to a single service.
Component 3: The Dockerfile
FROM python:3.12-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY app/main.py .
RUN chown -R appuser:appgroup /app
USER appuser
ENV MODE=stable
ENV APP_VERSION=1.0.0
ENV APP_PORT=3000
EXPOSE 3000
HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=5 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:3000/healthz', timeout=4)" || exit 1
CMD ["python", "main.py"]
Why Alpine?
python:3.12-alpine is approximately 60MB. python:3.12 (Debian-based) is approximately 1GB. The task requires images under 300MB. Alpine gets us there with room to spare.
Why non-root?
The addgroup / adduser pattern is a security baseline. If someone exploits a vulnerability in the app, they get a user with zero privileges — no ability to write to system directories, install packages, or escalate. Running as root inside a container means a container escape gives the attacker root on the host.
The healthcheck evolution
The Dockerfile healthcheck went through several iterations during development. The original:
# BROKEN — env vars don't expand in CMD array form
HEALTHCHECK CMD wget -qO- http://localhost:${APP_PORT}/healthz || exit 1
${APP_PORT} doesn't expand inside the Dockerfile HEALTHCHECK CMD array — it's evaluated at build time, not runtime, so it literally tries to connect to http://localhost:${APP_PORT}. Fixed by hardcoding:
# BROKEN on WSL2 — localhost doesn't resolve inside Alpine healthcheck context
HEALTHCHECK CMD wget -qO- http://localhost:3000/healthz || exit 1
This also failed on WSL2 + Docker Desktop. The wget inside Alpine couldn't resolve localhost to 127.0.0.1 in the healthcheck execution context (a known WSL2 networking quirk). Switching to 127.0.0.1 still failed because of how Docker Desktop on WSL2 handles the network namespace for healthcheck processes.
The final working solution:
# WORKS — uses Python's urllib, no external tool, no DNS resolution
HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=5 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:3000/healthz', timeout=4)" || exit 1
Using Python's own urllib sidesteps the wget/DNS issue entirely. Python's socket layer handles 127.0.0.1 directly without going through the system resolver.
Component 4: The Templates
Templates are the bridge between the manifest and the generated configs. They contain placeholders in {{ key }} format that the CLI replaces with real values.
nginx.conf.tmpl
upstream app_backend {
server app:{{ service_port }};
keepalive 32;
}
log_format swiftdeploy '$time_iso8601 | $status | ${request_time}s | $upstream_addr | $request';
server {
listen {{ nginx_port }};
server_name _;
access_log /var/log/nginx/access.log swiftdeploy;
proxy_connect_timeout {{ proxy_timeout }}s;
proxy_send_timeout {{ proxy_timeout }}s;
proxy_read_timeout {{ proxy_timeout }}s;
add_header X-Deployed-By swiftdeploy always;
proxy_pass_header X-Mode;
location / {
proxy_pass http://app_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
error_page 502 = @error502;
error_page 503 = @error503;
error_page 504 = @error504;
location @error502 {
default_type application/json;
add_header X-Deployed-By swiftdeploy always;
return 502 '{"error":"Bad Gateway","code":502,"service":"app","contact":"{{ contact }}"}';
}
# ... 503, 504 same pattern
}
Design decisions worth noting:
-
Custom log format:
$time_iso8601 | $status | ${request_time}s | $upstream_addr | $requestgives you timestamp, HTTP status, response time in seconds, the upstream container IP, and the full request line — everything you need for debugging in one line. - JSON error bodies: Rather than nginx's default HTML error pages, we return structured JSON. This is essential for APIs — clients expect JSON and need machine-readable error codes.
-
proxy_pass_header X-Mode: nginx strips most custom headers by default. This directive explicitly forwards theX-Mode: canaryheader from the upstream app through to the client, so callers can identify which mode they're talking to. -
keepalive 32: keeps 32 persistent connections to the upstream, reducing connection overhead under load.
docker-compose.yml.tmpl
services:
app:
image: {{ service_image }}
container_name: swiftdeploy-app
environment:
MODE: "{{ mode }}"
APP_VERSION: "{{ version }}"
APP_PORT: "{{ service_port }}"
networks:
- {{ network_name }}
volumes:
- {{ log_volume }}:/app/logs
restart: {{ restart_policy }}
user: "appuser"
cap_drop:
- NET_ADMIN
- SYS_ADMIN
healthcheck:
test: ["CMD-SHELL", "python -c 'import urllib.request; urllib.request.urlopen(\"http://127.0.0.1:3000/healthz\", timeout=4)'"]
interval: 10s
timeout: 5s
start_period: 15s
retries: 5
expose:
- "{{ service_port }}"
nginx:
image: {{ nginx_image }}
container_name: swiftdeploy-nginx
ports:
- "{{ nginx_port }}:{{ nginx_port }}"
volumes:
- ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
- {{ log_volume }}:/var/log/nginx
networks:
- {{ network_name }}
restart: {{ restart_policy }}
depends_on:
app:
condition: service_healthy
The expose vs ports distinction is critical:
-
expose: ["3000"]— makes port 3000 reachable between containers on the same Docker network. Not published to the host. -
ports: ["8080:8080"]— publishes port 8080 to the host machine and the outside world.
The app uses expose only. There is no way to reach port 3000 from outside Docker. All traffic must enter through nginx on 8080.
depends_on: condition: service_healthy means nginx won't start until the app container's healthcheck passes. This prevents nginx from starting and immediately returning 502s because the upstream isn't ready yet.
Component 5: The CLI
The swiftdeploy CLI is a single Python script with five subcommands. Here's how each works internally.
The template engine
def render_template(tmpl_path, context):
with open(tmpl_path) as f:
content = f.read()
for key, val in context.items():
content = content.replace("{{ " + key + " }}", str(val))
return content
Five lines. No Jinja2. No external library. Simple string replacement. This is intentional — the templates are straightforward enough that a minimal custom engine is cleaner than pulling in a dependency.
The context builder
def build_context(manifest):
svc = manifest["services"]
ngx = manifest["nginx"]
net = manifest["network"]
return {
"service_image": svc["image"],
"service_port": svc["port"],
"mode": svc.get("mode", "stable"),
"version": svc.get("version", "1.0.0"),
"restart_policy": svc.get("restart_policy", "unless-stopped"),
"log_volume": svc.get("log_volume", "swiftdeploy-logs"),
"nginx_image": ngx["image"],
"nginx_port": ngx["port"],
"proxy_timeout": ngx.get("proxy_timeout", 30),
"network_name": net["name"],
"network_driver": net["driver_type"],
"contact": manifest.get("contact", "ops@swiftdeploy.local"),
}
This translates the nested YAML structure into the flat dictionary that matches the {{ placeholders }} in the templates. It's the glue layer between manifest and generated configs.
init subcommand
def cmd_init():
manifest = load_manifest()
ctx = build_context(manifest)
nginx_conf = render_template(NGINX_TMPL, ctx)
with open(NGINX_OUT, "w") as f:
f.write(nginx_conf)
compose_conf = render_template(COMPOSE_TMPL, ctx)
with open(COMPOSE_OUT, "w") as f:
f.write(compose_conf)
Straightforward: load manifest → build context → render both templates → write files. The grader deletes the generated files and re-runs this to verify regeneration.
validate subcommand — 5 pre-flight checks
# Check 1: manifest.yaml exists and is valid YAML
try:
manifest = load_manifest()
ok("manifest.yaml found and parsed successfully")
except yaml.YAMLError as e:
fail(f"Invalid YAML: {e}")
# Check 2: required fields present and non-empty
required = {
"services.image": manifest.get("services", {}).get("image"),
"services.port": manifest.get("services", {}).get("port"),
"nginx.image": manifest.get("nginx", {}).get("image"),
"nginx.port": manifest.get("nginx", {}).get("port"),
"network.name": manifest.get("network", {}).get("name"),
"network.driver_type": manifest.get("network", {}).get("driver_type"),
}
# Check 3: Docker image exists locally
result = run(f"docker image inspect {image}", capture=True, check=False)
# exit code 0 = found, non-zero = not found
# Check 4: Nginx port not already bound
result = run(f"ss -tlnp | grep ':{nginx_port} '", capture=True, check=False)
# stdout non-empty = port in use
# Check 5: nginx.conf syntactically valid
result = subprocess.run(
["docker", "run", "--rm",
"-v", f"{test_conf_path}:/etc/nginx/conf.d/default.conf:ro",
"nginx:latest", "nginx", "-t"],
capture_output=True, text=True
)
combined = result.stdout + result.stderr
if "successful" in combined:
ok("nginx.conf syntax is valid")
Check 5 is the most interesting. Running nginx -t in an isolated container is elegant — it validates syntax without needing nginx installed on the host. However, we hit a complication: in an isolated container, app:3000 (the upstream hostname) can't be resolved because there's no Docker network. nginx refuses to start if it can't resolve upstream hostnames, even for a syntax check.
The fix: before handing the config to the test container, swap server app: with server 127.0.0.1: in a temporary copy. 127.0.0.1 always resolves, so nginx validates the rest of the syntax (listen ports, timeouts, location blocks, error pages) correctly. The actual nginx.conf on disk is untouched.
deploy subcommand
def cmd_deploy():
cmd_init()
run(compose_cmd("up -d --build"))
deadline = time.time() + 60
healthy = False
while time.time() < deadline:
try:
url = f"http://localhost:{nginx_port}/healthz"
with urllib.request.urlopen(url, timeout=3) as resp:
body = json.loads(resp.read())
if body.get("status") == "ok":
healthy = True
break
except Exception:
pass
time.sleep(2)
if not healthy:
fail("Health checks did not pass within 60 seconds")
sys.exit(1)
The polling loop is the key part. Containers don't start instantly. docker compose up -d returns as soon as the containers are created, not when they're healthy. The loop hits /healthz through nginx every 2 seconds for up to 60 seconds. Connection refused, timeout, bad JSON — all exceptions are caught and ignored. Only {"status": "ok"} breaks the loop successfully.
promote subcommand
def cmd_promote(target_mode):
# 1. Update manifest in-place using regex
with open(MANIFEST_PATH) as f:
content = f.read()
content = re.sub(r"(mode:\s*)(\S+)", f"\\g<1>{target_mode}", content, count=1)
with open(MANIFEST_PATH, "w") as f:
f.write(content)
# 2. Regenerate docker-compose.yml only
manifest = load_manifest()
ctx = build_context(manifest)
compose_conf = render_template(COMPOSE_TMPL, ctx)
with open(COMPOSE_OUT, "w") as f:
f.write(compose_conf)
# 3. Restart app container only — nginx stays up
run(compose_cmd("up -d --no-deps app"))
# 4. Confirm mode via /healthz
deadline = time.time() + 30
while time.time() < deadline:
try:
with urllib.request.urlopen(url, timeout=3) as resp:
body = json.loads(resp.read())
if body.get("mode") == target_mode and body.get("status") == "ok":
confirmed = True
break
except Exception:
pass
time.sleep(2)
--no-deps in docker compose up -d --no-deps app is the rolling restart mechanism. It tells Compose to restart only the app service without touching nginx. Since nginx is already running and healthy, there's zero downtime at the proxy level — nginx continues serving requests while the app container restarts with the new mode.
The regex re.sub(r"(mode:\s*)(\S+)", f"\\g<1>{target_mode}", content, count=1) uses a backreference \\g<1> to preserve the mode: prefix and only replace the value. count=1 ensures only the first occurrence is replaced.
The Debugging Saga: WSL2 + Docker Desktop Healthchecks
This section documents the most painful part of the build — a cascade of healthcheck failures that took multiple debugging rounds to resolve.
Failure 1: ${APP_PORT} not expanding
wget: can't connect to remote host: Connection refused
Root cause: environment variables don't expand in Dockerfile HEALTHCHECK CMD at runtime. ${APP_PORT} was being passed literally to wget. Fix: hardcode 3000.
Failure 2: localhost not resolving in Alpine
wget: can't connect to remote host: Connection refused
Same error, different cause. Inside the Alpine container's healthcheck execution context on WSL2 + Docker Desktop, localhost wasn't resolving to 127.0.0.1. Fix: use 127.0.0.1 explicitly.
Failure 3: wget still failing with 127.0.0.1
wget: can't connect to remote host: Connection refused
Confirmed the server was listening:
docker exec swiftdeploy-app ss -tlnp
# tcp LISTEN 0.0.0.0:3000
docker exec swiftdeploy-app python -c "import urllib.request; print(urllib.request.urlopen('http://127.0.0.1:3000/healthz').read())"
# b'{"status": "ok", ...}'
The server was reachable via docker exec but not from the healthcheck process. This is a known WSL2 + Docker Desktop network namespace issue — the healthcheck runs in a slightly different network context than docker exec. Fix: replace wget with Python's urllib:
CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:3000/healthz', timeout=4)" || exit 1
Failure 4: Docker cache serving old image
After fixing the Dockerfile, the healthcheck was still running wget. The container was using a cached image layer. Fix:
docker rmi swift-deploy-1-node:latest
docker build --no-cache -t swift-deploy-1-node:latest .
Failure 5: docker-compose.yml overriding the Dockerfile healthcheck
Even after fixing the image, the container was still running wget. Discovery:
grep -A5 "healthcheck" docker-compose.yml
# test: ["CMD", "wget", "-qO-", "http://localhost:3000/healthz"]
The docker-compose.yml template had its own healthcheck definition that was overriding the Dockerfile's. Docker Compose healthcheck always wins over Dockerfile HEALTHCHECK. Fixed the template to use the Python urllib approach and correct YAML quoting:
healthcheck:
test: ["CMD-SHELL", "python -c 'import urllib.request; urllib.request.urlopen(\"http://127.0.0.1:3000/healthz\", timeout=4)'"]
interval: 10s
timeout: 5s
start_period: 15s
retries: 5
Failure 6: YAML indentation error
The healthcheck block had 3 spaces of indentation instead of 4 — a single space difference that broke YAML parsing entirely:
healthcheck: # ← 3 spaces — YAML parse error
healthcheck: # ← 4 spaces — correct
After all six of these were resolved, the deploy succeeded:
✔ Container swiftdeploy-app Healthy 6.8s
✔ Container swiftdeploy-nginx Started 7.2s
✔ Stack is healthy! Listening on port 8080
Deployment Walkthrough
1. Build the image
docker build -t swift-deploy-1-node:latest .
2. Validate
./swiftdeploy validate
[1/5] manifest.yaml exists and is valid YAML
✔ manifest.yaml found and parsed successfully
[2/5] Required fields are present and non-empty
✔ services.image = swift-deploy-1-node:latest
✔ services.port = 3000
✔ nginx.image = nginx:latest
✔ nginx.port = 8080
✔ network.name = swiftdeploy-net
✔ network.driver_type = bridge
[3/5] Docker image exists locally
✔ Image found: swift-deploy-1-node:latest
[4/5] Nginx port not already bound on host
✔ Port 8080 is free
[5/5] Generated nginx.conf is syntactically valid
✔ nginx.conf syntax is valid
ALL CHECKS PASSED ✔
3. Deploy
./swiftdeploy deploy
✔ Container swiftdeploy-app Healthy
✔ Container swiftdeploy-nginx Started
✔ Stack is healthy! Listening on port 8080
deploy complete.
4. Test endpoints
curl http://localhost:8080/
# {"message": "Welcome to SwiftDeploy API", "mode": "stable", "version": "1.0.0", "timestamp": "..."}
curl http://localhost:8080/healthz
# {"status": "ok", "mode": "stable", "version": "1.0.0", "uptime_seconds": 12.4}
5. Promote to canary
./swiftdeploy promote canary
curl http://localhost:8080/healthz
# {"status": "ok", "mode": "canary", ...}
curl -I http://localhost:8080/
# X-Mode: canary
# X-Deployed-By: swiftdeploy
6. Test chaos (canary mode only)
# Slow mode — 3 second delay
curl -X POST http://localhost:8080/chaos \
-H "Content-Type: application/json" \
-d '{"mode": "slow", "duration": 3}'
time curl http://localhost:8080/
# real 0m3.012s
# Error mode — 50% 500 errors
curl -X POST http://localhost:8080/chaos \
-H "Content-Type: application/json" \
-d '{"mode": "error", "rate": 0.5}'
# Recover
curl -X POST http://localhost:8080/chaos \
-H "Content-Type: application/json" \
-d '{"mode": "recover"}'
7. View nginx access logs
docker logs swiftdeploy-nginx
2026-05-03T14:23:01+00:00 | 200 | 0.002s | 172.18.0.3:3000 | GET / HTTP/1.1
2026-05-03T14:23:05+00:00 | 200 | 0.001s | 172.18.0.3:3000 | GET /healthz HTTP/1.1
2026-05-03T14:23:12+00:00 | 500 | 0.001s | 172.18.0.3:3000 | GET / HTTP/1.1
8. Teardown
./swiftdeploy teardown --clean
Key Learnings
1. Docker Compose healthcheck overrides Dockerfile HEALTHCHECK. This is documented but easy to miss. If both exist, Compose wins. Always check your generated compose file when healthchecks misbehave.
2. WSL2 + Docker Desktop has a quirky network namespace for healthchecks. docker exec and HEALTHCHECK run in slightly different contexts. If something works via exec but not via healthcheck, it's almost always a network namespace or tool availability issue. Python's stdlib is more portable than wget in this environment.
3. --no-cache is essential when debugging Dockerfile changes. Docker's layer caching is aggressive. If you change a HEALTHCHECK line but the layers above it are cached, Docker will use the old healthcheck. Always docker rmi and --no-cache when debugging image-level issues.
4. expose vs ports in Docker Compose is a security boundary, not just documentation. expose is container-to-container only. ports is host-facing. Using expose for the app and ports only for nginx enforces the proxy pattern at the infrastructure level.
5. YAML indentation is unforgiving. A single space difference between 3 and 4 spaces of indentation produces a cryptic parse error. Always use a YAML linter or at minimum python3 -c "import yaml; yaml.safe_load(open('file.yml'))" to validate before deploying.
6. Declarative infrastructure pays off immediately. The grader deletes generated files and re-runs init — because we built it right, this is a non-issue. The manifest is always there, the templates are always there, and regeneration is instantaneous.
Conclusion
SwiftDeploy started as a task requirement and ended up being a genuinely useful mental model for how declarative infrastructure tools work. Tools like Terraform, Helm, and Pulumi are all variations of the same core idea: describe what you want, let the tool figure out how to get there.
Building this from scratch — the template engine, the CLI subcommands, the healthcheck polling loop, the rolling restart — gives you an appreciation for what those tools are doing under the hood at scale.
The debugging journey through six layers of healthcheck failures was frustrating in the moment but valuable in retrospect. Every failure taught something concrete about how Docker, WSL2, Alpine Linux, nginx, and Python's network stack interact.
The full source code is available at: https://github.com/AirFluke/hng-swiftdeploy
Built for HNG DevOps Internship — Stage 4A: SwiftDeploy
Tags: #devops #docker #nginx #python #infrastructure #hng
Top comments (0)