Most deployment tools ask you to configure infrastructure manually. This one writes it for you — and refuses to deploy if it is not safe.
The Problem I Set Out to Solve
Every time I deployed a new service I found myself doing the same things:
- Writing a Docker Compose file
- Writing an Nginx config
- Hoping both were consistent with each other
- Manually checking if the server had enough resources
- Deploying and hoping for the best
There had to be a better way. What if a single file described everything — and a tool generated all the configs, checked all the policies, and deployed the stack automatically?
That is what SwiftDeploy does.
What Is SwiftDeploy?
SwiftDeploy is a CLI tool built in Python that:
- Reads a single
manifest.yamlfile - Generates
nginx.confanddocker-compose.ymlfrom templates - Asks OPA (Open Policy Agent) if it is safe to deploy
- Brings up the stack and waits for health checks
- Lets you promote between stable and canary modes — but only if the canary is healthy
- Records every decision in an audit trail
- Shows you a live dashboard of what is happening
The manifest is the only file you ever edit. Everything else is generated.
Part 1 — The Design: A Tool That Writes Its Own Infrastructure
The Manifest
Here is what manifest.yaml looks like:
services:
image: swift-deploy-1-node:latest
port: 3000
mode: stable
version: v1
nginx:
image: nginx:latest
port: 8080
proxy_timeout: 60
network:
name: swiftdeploy-net
driver_type: bridge
That is the entire configuration. One file. Everything else is derived from it.
The Templates
The init command reads the manifest and fills in template files:
def init():
manifest = load_manifest()
with open("templates/docker-compose.yml.tpl", "r") as f:
compose_tpl = f.read()
compose_out = compose_tpl.replace("{{ app_image }}", manifest["services"]["image"])
compose_out = compose_out.replace("{{ mode }}", manifest["services"].get("mode", "stable"))
with open("docker-compose.yml", "w") as f:
f.write(compose_out)
If you delete your configs, run init and you get the exact same stack back. No guessing. No inconsistency.
Why This Matters
In most projects configs drift over time. Someone edits docker-compose.yml directly. Someone else edits nginx.conf. After six months nobody knows what the source of truth is.
With SwiftDeploy the source of truth is always manifest.yaml. If it is not in the manifest it does not exist.
Part 2 — The Guardrails: Policy Enforcement with OPA
Why OPA?
I could have written the policy checks directly in Python. But the task required something more important — separation of concerns.
The CLI should not decide what is safe. That decision should live in a separate system that can be updated independently. That system is OPA — Open Policy Agent.
OPA runs as a separate container. The CLI sends data to OPA and OPA sends back a decision. The CLI just follows orders.
Infrastructure Policy
Before deploying the CLI collects host statistics and sends them to OPA:
def get_host_stats():
disk = shutil.disk_usage("/")
disk_free_gb = disk.free / (1024 ** 3)
cpu_load = psutil.cpu_percent() / 100
return {
"disk_free_gb": round(disk_free_gb, 2),
"cpu_load": round(cpu_load, 2),
}
OPA evaluates the infrastructure policy:
package infra
default allow := false
allow := true if {
input.disk_free_gb >= 10
input.cpu_load <= 2.0
}
reason := "Disk space too low" if {
input.disk_free_gb < 10
}
If the disk is below 10GB or CPU is above 2.0 the deployment is blocked:
Running pre-deploy policy check...
Disk free: 5.2GB | CPU: 0.3 | Memory: 45%
Infrastructure policy: BLOCKED
Reason: Disk space too low
Canary Safety Policy
Before promoting to canary mode the CLI scrapes the /metrics endpoint and calculates the error rate and P99 latency:
def calc_error_rate(metrics):
total = sum(v for k, v in metrics.items() if k.startswith("http_requests_total"))
errors = sum(v for k, v in metrics.items() if 'status_code="5' in k)
return round((errors / total) * 100, 2) if total > 0 else 0.0
OPA evaluates the canary safety policy:
package canary
default allow := false
allow := true if {
input.error_rate <= 1.0
input.p99_latency_ms <= 500
}
reason := "P99 latency too high (must be <= 500ms)" if {
input.p99_latency_ms > 500
}
If the canary is unhealthy the promotion is blocked:
Running pre-promote policy check...
Error rate: 0.0% | P99 latency: 100.0ms
Canary safety policy: BLOCKED
Reason: P99 latency too high (must be <= 500ms)
Why Isolation Matters
OPA runs as a separate container and is only reachable by the CLI — not through Nginx. This means:
- No external actor can query or manipulate policy decisions
- Policies can be updated without touching the CLI code
- Each domain (infrastructure, canary) owns exactly one question
Part 3 — The Chaos: What Happened When Things Broke
Injecting Slow Chaos
The API exposes a /chaos endpoint that simulates degraded behaviour:
curl -X POST http://localhost:8080/chaos \
-H "Content-Type: application/json" \
-d '{"mode": "slow", "duration": 2}'
This makes every request sleep for 2 seconds before responding. The metrics immediately reflect the change — P99 latency spikes.
The Status View Catches It
Running swiftdeploy status shows the live state:
--- Scrape @ Fri May 15 12:38:05 2026 ---
Mode: canary
Uptime: 115s
Error rate: 0.0%
P99 latency: 2100.0ms
Chaos: active
Policy Compliance:
Infrastructure: PASS
Canary safety: FAIL - P99 latency too high
The Promotion Is Blocked
When we tried to promote:
Running pre-promote policy check...
Error rate: 0.0% | P99 latency: 2100.0ms
Canary safety policy: BLOCKED
Reason: P99 latency too high (must be <= 500ms)
The system worked exactly as designed. The broken canary could not be promoted.
Recovery
curl -X POST http://localhost:8080/chaos \
-H "Content-Type: application/json" \
-d '{"mode": "recover"}'
Latency dropped back to normal and the next promote attempt passed.
Part 4 — The Audit Trail
Every action is recorded in history.jsonl:
{"event": "deploy", "status": "success", "timestamp": 1778794519.2}
{"event": "pre_promote_check", "result": {"allow": false, "reason": "P99 latency too high"}, "timestamp": 1778799306.5}
{"event": "promote", "mode": "canary", "status": "success", "timestamp": 1778799535.0}
Running swiftdeploy audit generates audit_report.md:
## Timeline
| Time | Event | Details |
|---|---|---|
| Fri May 15 12:36:48 | deploy | status=success |
| Fri May 15 12:40:17 | pre_promote_check | BLOCKED reason=P99 latency too high |
| Fri May 15 12:44:50 | promote | mode=canary status=success |
## Policy Violations
| Time | Check | Reason |
|---|---|---|
| Fri May 15 12:40:17 | pre_promote_check | P99 latency too high |
You can always answer the question "what happened and when" with a single command.
Lessons Learned
1. Declarative infrastructure is worth the investment
Writing templates takes time upfront but saves enormous time later. When something breaks you regenerate from the manifest and you know the configs are correct.
2. Policies should be external
Keeping policy logic in OPA means you can update thresholds without touching the CLI code. This is how real production systems work.
3. Metrics drive decisions — not just monitoring
I used to think metrics were for dashboards. Now I use them to gate deployments. If the canary is unhealthy the metrics prove it and the policy enforces the consequence.
4. Audit trails matter more than you think
During debugging I could look at history.jsonl and see exactly what happened and in what order. Without it I would have been guessing.
5. The CLI is just an orchestrator
SwiftDeploy does not make decisions. It collects data, asks OPA, and follows the answer. This separation makes the system trustworthy and testable.
The Final Result
A complete declarative deployment system that:
- Generates infrastructure from a single manifest
- Validates pre-flight conditions before deploying
- Enforces infrastructure and canary safety policies via OPA
- Tracks metrics in Prometheus format
- Shows a live dashboard of system state and policy compliance
- Records every decision in a structured audit trail
- Generates a clean audit report in GitHub-flavored Markdown
Full source code: https://github.com/asanteedith/swiftdeploy-project
Written by **Edith Asante* — Cloud & DevOps Engineer*
Top comments (0)