DEV Community

Nehemiah
Nehemiah

Posted on

Building swiftdeploy: A Policy-Gated Deployment CLI

In a world of "one-click" managed solutions, it’s easy to let the underlying mechanics of infrastructure become a mystery. But for those of us who want production-grade control and a deep understanding of the "why" behind the "how", managed solutions can sometimes feel like a black box, feels magical until something breaks at 2am and you have no mental model of what's actually happening under the hood.
​So I built swiftdeploy — a deployment CLI that does the same job as those platforms, but entirely in code I wrote, understand, and can reason about at any layer.
This is the story of how it works, why I made the choices I did, and the one technical problem that turned out to be far more interesting than I expected.
The Philosophy: Own Your Abstractions
There's a version of this project where I reach for an existing solution. Argo CD, Flux — all excellent tools. But using them at this stage would have given me a working deployment pipeline and almost no understanding of what a deployment pipeline actually is.
The constraint I set myself was simple: if I can't explain exactly what happens between typing a command and traffic reaching my app, I don't get to use that tool.
This meant writing my own template engine around Jinja2, my own nginx config generator, my own health-check loop, and eventually my own policy engine integration. Every layer I owned became a layer I understood. That understanding compounds.
The engineering philosophy here isn't "reinvent everything." It's reinvent the things that teach you something. Deployment orchestration teaches you an enormous amount about networking, process lifecycle, and operational trust. That's worth the friction.
What swiftdeploy Actually Does
At its core, swiftdeploy is a CLI that manages a Docker Compose stack with a twist: nothing happens without a policy check first.

swiftdeploy init  #generate nginx.conf + docker-compose.yaml from your manifest 
swiftdeploy deploy # policy check → start stack → health check loop 
swiftdeploy promote # scrape metrics → policy check → switch canary/stable mode 
swiftdeploy status # live terminal dashboard with real-time policy compliance 
swiftdeploy audit # parse history.jsonl → generate audit_report.md 
Enter fullscreen mode Exit fullscreen mode

Every deploy is gated. Every promotion is evidence-based. Every decision is logged.
The Policy Sidecar: OPA as the Brain
The most deliberate architectural choice in this project is that the CLI never makes allow/deny decisions itself. All decision logic lives in Open Policy Agent, running as a sidecar container.
The CLI's job is to collect facts and ask questions. OPA's job is to answer them.
The Architecture


Here's something that initially seems like a minor detail but is actually load-bearing: nginx must never be able to reach OPA.
If nginx could reach the policy engine, a malformed request from the internet could theoretically influence policy evaluation. The trust boundary would be blurred. So OPA runs on an internal-only Docker network.
The CLI reaches OPA via a loopback-bound host port. Only processes running directly on the host can touch it. A container can't reach a loopback port on the host by default.
This isn't theoretical hardening. It's a concrete, verifiable guarantee that public traffic can never trigger a policy evaluation.
The Most Interesting Technical Challenge: P99 From a Histogram
The canary promotion gate blocks you from promoting if P99 latency exceeds 500ms. Simple requirement. Surprisingly interesting implementation.
Prometheus doesn't give you a P99 directly. It gives you a histogram — a set of cumulative bucket counts with upper bounds. To get a percentile, you have to interpolate across those buckets.
The naive approach — just read the bucket that contains the 99th percentile — gives you a ceiling, not a value. If your 99th observation falls somewhere inside the 0.25 bucket, you'd report 250ms regardless of whether the true value was 80ms or 249ms.
The correct approach is linear interpolation within the containing bucket. Find where the target rank (99% × total count) falls, identify which bucket it lands in, then interpolate based on how far through that bucket's count range you are.
But there's a second problem: you can't just read a snapshot. Prometheus counters are cumulative and monotonically increasing. If you scrape once and see 1000 requests in the 0.25 bucket, that's every request since the process started — not the last 30 seconds.
The solution is to take two scrapes separated by a time window and compute deltas. The bucket counts in the second scrape minus the first give you the distribution for only that window. Then you run the interpolation on those deltas.
This is what makes the promote gate actually meaningful: it's not checking the lifetime health of the service, it's checking the last 30 seconds of traffic before you ask it to carry more.
The Audit Trail
Every status scrape appends a JSON record to history.jsonl. Running swiftdeploy audit parses that file and produces a markdown report with metrics trends, and a dedicated violations section.
The design principle here is that observability is not optional. The history file survives stack restarts. The audit report gives you a narrative of exactly what your system was doing and what the policy engine was saying about it at every point.
In Conclusion
This project taught me a lot of things that are usually hidden under abstractions, The metrics interpolation was the most satisfying problem. It looks like a small utility function. It's actually the difference between a gate that measures something real and one that just performs measurement.
And the network isolation detail — the one that's easy to skip — is the one that determines whether your security model is real or decorative.
That's the thing about owning your abstractions. The interesting problems are hiding inside the details that pre-built platforms quietly handle for you.

Top comments (0)