DEV Community

Cover image for Building the Tool That Builds the Stack: HNG DevOps Stage 4a
Gideon Bature
Gideon Bature

Posted on

Building the Tool That Builds the Stack: HNG DevOps Stage 4a

This is part of my HNG DevOps internship series. Follow along as I document every stage.
Previous articles:
Stage 0: How I Secured a Linux Server from Scratch
Stage 1: Build, Deploy and Reverse Proxy a Rust API
Stage 2: Containerizing a Microservices App with Docker and CI/CD
Stage 3: Building a Real-Time DDoS Detection Engine from Scratch


A Quick Recap

Stage 0 was server hardening. Stage 1 was deploying an API. Stage 2 was containerization and CI/CD. Stage 3 was building a real-time DDoS detection engine. Stage 4a is different again.

This time the task was not to deploy infrastructure manually. It was to build the tool that does it for you.

The repository is here: https://github.com/GideonBature/hng-stage4a


The Task

The task introduced a concept called a declarative manifest. Instead of manually writing Nginx configs, Docker Compose files, and environment variables, you describe what you want in a single YAML file and a CLI tool called swiftdeploy generates everything else from it.

Here is the core idea:

manifest.yaml (the only file you edit)
        |
        v
swiftdeploy init
        |
        |--- generates nginx.conf
        |--- generates docker-compose.yml
        |
        v
swiftdeploy deploy
        |
        |--- builds Docker image
        |--- brings up the stack
        |--- waits for health checks
Enter fullscreen mode Exit fullscreen mode

The grader tests this by deleting the generated files and re-running swiftdeploy init. If the tool is correct, the files come back exactly as they should be. If the tool is broken, the stack breaks. The manifest is the single source of truth.


What I Built

Five components make up the project:

manifest.yaml         # describes the entire stack
swiftdeploy           # the CLI tool (Python)
app/
  main.py             # FastAPI service with stable and canary modes
  Dockerfile
  requirements.txt
templates/
  nginx.conf.j2       # Jinja2 template for Nginx
  docker-compose.yml.j2
Enter fullscreen mode Exit fullscreen mode

Step 1: The Manifest

The manifest.yaml file is the only thing you are allowed to edit. Everything else is derived from it programmatically.

services:
  image: swift-deploy-1-node:latest
  port: 3000
  mode: stable
  version: "1.0.0"
  replicas: 1
  restart_policy: unless-stopped
  log_volume: app-logs

nginx:
  image: nginx:latest
  port: 8080
  proxy_timeout: 30

network:
  name: swiftdeploy-net
  driver_type: bridge
Enter fullscreen mode Exit fullscreen mode

Every value in this file feeds into the Jinja2 templates that generate the actual infrastructure configs. Change a port here, it changes everywhere. Change the mode here, the container restarts in the new mode. Nothing is hardcoded anywhere else.


Step 2: The Jinja2 Templates

Jinja2 is a Python templating engine. It works like a fill-in-the-blanks document where the blanks are filled from a Python dictionary.

The Nginx template looks like this:

log_format swiftdeploy '$time_iso8601 | $status | $request_time s | $upstream_addr | $request';

upstream app_backend {
    server {{ service_name }}:{{ services.port }};
}

server {
    listen {{ nginx.port }};
    server_name _;

    access_log /var/log/nginx/access.log swiftdeploy;
    add_header X-Deployed-By swiftdeploy always;
    proxy_pass_header X-Mode;

    proxy_connect_timeout {{ nginx.proxy_timeout }}s;
    proxy_send_timeout {{ nginx.proxy_timeout }}s;
    proxy_read_timeout {{ nginx.proxy_timeout }}s;

    location / {
        proxy_pass http://app_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    error_page 502 = @error502;
    location @error502 {
        default_type application/json;
        return 502 '{"error":"Bad Gateway","code":502,"service":"{{ service_name }}","contact":"ops@swiftdeploy.io"}';
    }
}
Enter fullscreen mode Exit fullscreen mode

The {{ service_name }}, {{ services.port }}, and {{ nginx.port }} placeholders get replaced with real values when swiftdeploy init runs. The result is a complete, valid nginx.conf file with no manual editing required.

The Docker Compose template works the same way:

services:
  {{ service_name }}:
    image: {{ services.image }}
    environment:
      - MODE={{ services.mode }}
      - APP_VERSION={{ services.version }}
      - APP_PORT={{ services.port }}
    expose:
      - "{{ services.port }}"
    cap_drop:
      - ALL
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:{{ services.port }}/healthz')"]
Enter fullscreen mode Exit fullscreen mode

Step 3: The API Service

The service is a FastAPI application that runs in either stable or canary mode, controlled entirely by the MODE environment variable injected from the manifest at startup.

MODE = os.environ.get("MODE", "stable").lower()
APP_VERSION = os.environ.get("APP_VERSION", "1.0.0")
APP_PORT = int(os.environ.get("APP_PORT", "3000"))

@app.get("/")
async def root():
    return JSONResponse(content={
        "message": "Welcome to SwiftDeploy service",
        "mode": MODE,
        "version": APP_VERSION,
        "timestamp": datetime.now(timezone.utc).isoformat(),
    })

@app.get("/healthz")
async def healthz():
    uptime = int(time.time() - START_TIME)
    return JSONResponse(content={
        "status": "ok",
        "mode": MODE,
        "version": APP_VERSION,
        "uptime_seconds": uptime,
    })
Enter fullscreen mode Exit fullscreen mode

Canary mode adds X-Mode: canary to every response via middleware and activates the /chaos endpoint. The chaos endpoint accepts three modes:

# Delay all responses by N seconds
curl -X POST http://localhost:8080/chaos \
  -d '{"mode": "slow", "duration": 3}'

# Return 500 on ~50% of requests
curl -X POST http://localhost:8080/chaos \
  -d '{"mode": "error", "rate": 0.5}'

# Cancel all chaos
curl -X POST http://localhost:8080/chaos \
  -d '{"mode": "recover"}'
Enter fullscreen mode Exit fullscreen mode

This is a real pattern in production engineering. Canary deployments let you route a small percentage of traffic to a new version and test it under real conditions before promoting it fully. The chaos endpoint simulates what happens when things go wrong.


Step 4: The CLI Tool

The swiftdeploy script is a Python executable with five subcommands. Here is how each one works.

init

Reads manifest.yaml, builds a template context dictionary, and renders both Jinja2 templates to disk:

def cmd_init(args):
    manifest = load_manifest()
    context = get_template_context(manifest)

    nginx_content = render_template("nginx.conf.j2", context)
    with open("nginx.conf", "w") as f:
        f.write(nginx_content)

    compose_content = render_template("docker-compose.yml.j2", context)
    with open("docker-compose.yml", "w") as f:
        f.write(compose_content)
Enter fullscreen mode Exit fullscreen mode

validate

Runs five pre-flight checks before any deployment. Exits non-zero if any fail so the CI/CD pipeline or operator knows something is wrong before containers start:

# Check 1: manifest.yaml exists and is valid YAML
# Check 2: all required fields present and non-empty
# Check 3: Docker image exists locally
# Check 4: nginx port not already bound by another process
# Check 5: generated nginx.conf is syntactically valid
Enter fullscreen mode Exit fullscreen mode

For Check 5, nginx syntax validation is done by running the nginx Docker image with the generated config mounted in. A quirk here: the upstream hostname swiftdeploy-app does not exist outside of Docker networking, so the validator temporarily substitutes 127.0.0.1 in the config just for the syntax check. The actual config is unchanged.

deploy

Chains everything together:

def cmd_deploy(args):
    cmd_init(args)                              # generate configs
    run(["docker", "build", "-t", image, "app/"])  # build image
    run(["docker", "compose", "up", "-d"])          # start stack
    healthy = wait_for_health(nginx_port)           # poll health
    if not healthy:
        sys.exit(1)
Enter fullscreen mode Exit fullscreen mode

The health check polls /healthz through Nginx every 2 seconds for up to 60 seconds. If the service never becomes healthy within that window, deploy exits with a non-zero code.

promote

This is the most interesting subcommand. It switches deployment mode with a rolling restart:

def cmd_promote(args):
    manifest["services"]["mode"] = target_mode
    save_manifest(manifest)                    # update manifest.yaml in-place
    render_compose_template(manifest)          # regenerate docker-compose.yml
    run(["docker", "compose", "up", "-d",
         "--no-deps", "--force-recreate",
         SERVICE_NAME])                        # restart service only, not nginx
    healthy = wait_for_health(nginx_port)
    if not healthy:
        rollback()                             # automatic rollback on failure
Enter fullscreen mode Exit fullscreen mode

The key line is --no-deps --force-recreate. This restarts only the app container without touching Nginx. Traffic continues flowing through the existing Nginx instance during the restart, so there is no downtime.

If the health check fails after the restart, the promote command automatically rolls back to the previous mode, updates the manifest, and restarts the container again.

teardown

def cmd_teardown(args):
    run(["docker", "compose", "down", "--volumes", "--remove-orphans"])
    if args.clean:
        os.remove("nginx.conf")
        os.remove("docker-compose.yml")
Enter fullscreen mode Exit fullscreen mode

The --clean flag removes the generated files. This resets the project to its source state, where only manifest.yaml and the templates exist.


Step 5: Setting Up and Running

I pushed the code to GitHub and cloned it on the same Oracle Cloud server I have been using throughout the internship:

git clone https://github.com/GideonBature/hng-stage4a.git
cd hng-stage4a
pip3 install pyyaml jinja2 requests
chmod +x swiftdeploy
Enter fullscreen mode Exit fullscreen mode

Then deployed:

./swiftdeploy deploy
Enter fullscreen mode Exit fullscreen mode

Output:

Running deploy...
Running init...
  Generated nginx.conf
  Generated docker-compose.yml
Init complete.
Building image swift-deploy-1-node:latest...
Image built successfully.
Starting stack...
Waiting for health checks on port 8080...
Stack is healthy. Service available at http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Verified the endpoints:

curl http://localhost:8080/
# {"message":"Welcome to SwiftDeploy service","mode":"stable","version":"1.0.0","timestamp":"..."}

curl http://localhost:8080/healthz
# {"status":"ok","mode":"stable","version":"1.0.0","uptime_seconds":70}

curl -I http://localhost:8080/
# X-Deployed-By: swiftdeploy
Enter fullscreen mode Exit fullscreen mode

The Problems I Hit

The healthcheck used wget which Alpine does not have correctly configured. The Dockerfile healthcheck used wget --spider but the Alpine busybox wget behaved differently inside Docker containers. The fix was switching to Python's built-in urllib.request:

HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:3000/healthz')" || exit 1
Enter fullscreen mode Exit fullscreen mode

The same fix needed to go into the Docker Compose template since compose healthcheck overrides the Dockerfile one.

The docker-compose.yml template generated double semicolons. Jinja2's trim_blocks=True setting was adding extra characters after block endings in the nginx template, producing invalid nginx config with ;; and }; everywhere. The fix was removing trim_blocks and lstrip_blocks from the Jinja2 environment:

env = Environment(
    loader=FileSystemLoader(TEMPLATES_DIR),
    # removed trim_blocks and lstrip_blocks
)
Enter fullscreen mode Exit fullscreen mode

The nginx access log was not writing to a file. The official nginx Docker image symlinks access.log to /dev/stdout. This means logs stream to Docker's log buffer rather than a file. To see access logs you use:

docker compose logs nginx | grep "GET\|POST"
Enter fullscreen mode Exit fullscreen mode

Which produces the required format:

2026-05-04T03:14:59+00:00 | 200 | 0.001 s | 172.18.0.2:3000 | GET / HTTP/1.1
Enter fullscreen mode Exit fullscreen mode

The log_format directive was inside the server block. Nginx requires log_format to be defined at the http block level, not inside a server block. Since our template file is loaded as a conf.d include, everything in it is already inside the http block. Moving log_format to the top of the template file fixed it.


Testing promote

# Switch to canary
./swiftdeploy promote canary
Enter fullscreen mode Exit fullscreen mode

Output:

Promoting to canary mode...
  Updated manifest.yaml: mode = canary
  Regenerated docker-compose.yml
  Restarting swiftdeploy-app container...
Promotion successful. Service is running in canary mode.
  /healthz response: {'status': 'ok', 'mode': 'canary', 'version': '1.0.0', 'uptime_seconds': 0}
Enter fullscreen mode Exit fullscreen mode

Confirmed canary headers:

curl -I http://localhost:8080/
# x-mode: canary
# X-Deployed-By: swiftdeploy
Enter fullscreen mode Exit fullscreen mode

Switch back:

./swiftdeploy promote stable
# Promotion successful. Service is running in stable mode.
Enter fullscreen mode Exit fullscreen mode

The Big Picture

Stage 4 introduced a pattern called infrastructure as code taken one step further into infrastructure from code. Instead of writing config files by hand and hoping they are consistent, you write one source of truth and generate everything else programmatically.

What we built Why it matters
Declarative manifest Single place to change anything about the deployment
Jinja2 templates Generated configs are always consistent with the manifest
validate subcommand Catch problems before containers start, not after
Rolling promote Mode switches with zero downtime and automatic rollback
Teardown with clean Full reset in one command, reproducible from scratch

The grader's test is simple but revealing: delete the generated files, run swiftdeploy init, verify they came back correctly. If your tool is solid, the stack rebuilds itself. If your tool has a bug, it fails in a way that is immediately clear.

That is what real infrastructure tooling looks like. The config files are not the source of truth. The tool that generates them is.


Stage 4b is next. Follow along as I keep documenting the journey.

Find me on Dev.to | GitHub

Top comments (0)