Building a Policy-Gated Deployment System with Observability (SwiftDeploy Stage 4B)

Edith Asante — Wed, 06 May 2026 19:59:23 +0000

Introduction

In Stage 4A, I built a CLI tool (swiftdeploy) that generates infrastructure from a single file (manifest.yaml).
In Stage 4B, I extended it to

include:

Observability (metrics)
Policy enforcement (OPA)
Auditing (history + reports)

The goal was simple but strict:

The system must refuse to deploy or promote if it is unsafe.

This meant moving from just “running containers” to building a system that can think and decide before acting.

⸻

Architectural Overview

manifest.yaml

↓

swiftdeploy CLI

↓

docker-compose + nginx

↓

Docker Network

↓

[ NGINX ] → [ APP (/metrics) ]

At a high level:

manifest.yaml is the single source of truth
swiftdeploy CLI reads it and generates:
- docker-compose.yml
- nginx.conf
Docker runs:
- API service
- Nginx (reverse proxy)
- OPA (policy engine)

flow:
CLI → collect data → send to OPA → receive decision → deploy or block

The Design: A Tool That Writes Its Own Infrastructure

The core idea was:

I don’t manually write configs — I generate them.

Instead of editing multiple files, I only update:
manifest.yaml

then:
python swiftdeploy.py init

This generates:

docker-compose.yml
nginx.conf

Why this matters

Reduces manual errors
Keeps configuration consistent
Makes the system reproducible

If I deletes my configs, I can regenerate everything from the manifest.

Observability: Adding the “Eyes” (/metrics)

I added a /metrics endpoint to the API in Prometheus format.

It tracks:

Throughput & Errors
http_requests_total{method, path, status_code}
Latency
http_request_duration_seconds_bucket
Application State

app_uptime_seconds

app_mode (0=stable, 1=canary)

chaos_active

The Guardrails: Policy Enforcement with OPA

Instead of writing logic inside the CLI, I used Open Policy Agent.

Key Rule:

The CLI must NOT decide anything — OPA decides everything.

🔹 Infrastructure Policy (Pre-Deploy)

Checks:

Disk space
CPU load

Example rule:
Deny if disk_free < 10GB

Deny if cpu_load > 2.0

If I artificially reduce disk space:

BLOCKED: Disk below threshold

👉 This satisfies the Hard Gate requirement

⸻

🔹 Canary Safety Policy (Pre-Promote)

Before promoting, the CLI:

Scrapes /metrics
Calculates:
- Error rate
- P99 latency
Sends to OPA

Policy:
Deny if error_rate > 1%

Deny if p99_latency > 500ms

⸻

Why Isolation Matters

OPA runs as a separate container and:

Is reachable by the CLI
Is NOT exposed through Nginx

👉 This ensures:

No external access to policy engine
Clear separation of responsibilities

This satisfies the “No Leakage” requirement

⸻

🧪 The Chaos: Testing Failure Scenarios

I implemented a /chaos endpoint:

Modes:

slow → delays responses
error → randomly returns 500
recover → resets system

{ "mode": "slow", "duration": 2 }

What Happened

When I injected chaos:

Latency increased
Error rate increased
Metrics reflected the change

When I tried to promote:
BLOCKED: Latency too high

👉 This confirmed:
The system reacts to real runtime conditions, not assumptions

⸻

The Eyes: swiftdeploy status

This command:
python swiftdeploy.py status

Continuously scrapes /metrics
Displays live system state
Logs everything to:

history.jsonl

The Memory: Audit System

From the logs, I generate:

python swiftdeploy.py audit

This creates:
audit_report.md

Contents:

Timeline of events
Policy violations

👉 The report renders cleanly in GitHub Markdown
(Satisfies submission requirement)

⸻

Lessons Learned

This stage changed how I think about DevOps:

Deployment is not just execution

It’s decision-making

⸻

Policies should be external

Keeping logic in OPA:

makes it reusable
avoids tightly coupled code

⸻

Metrics are not just for monitoring

They actively drive decisions

⸻

Debugging is part of the process

I faced:

YAML errors
Docker rebuild issues
Nginx misconfigurations
OPA connection failures

Fixing them helped me understand the system deeply.

⸻

✅ Final Checklist (Submission Criteria)

✔ manifest.yaml is the only edited file
✔ Deployment blocked when disk is low
✔ OPA not exposed via Nginx
✔ Metrics fully implemented
✔ Audit report generated and readable
✔ Blog includes architecture diagram

⸻

Conclusion

This project helped me move from:

running commands → building systems that enforce rules

I now better understand how:

observability
policy
infrastructure

work together in real-world systems.

⸻

If you’re learning DevOps, my biggest takeaway is:

Don’t just deploy — build systems that decide when deployment is safe.

DEV Community: Edith Asante

Building a Policy-Gated Deployment System with Observability (SwiftDeploy Stage 4B)