<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Edith Asante</title>
    <description>The latest articles on DEV Community by Edith Asante (@edith_asante_799bd09cf9c1).</description>
    <link>https://dev.to/edith_asante_799bd09cf9c1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3901913%2Ff989e0fc-e130-4ca5-a86b-35ae5199e0b8.png</url>
      <title>DEV Community: Edith Asante</title>
      <link>https://dev.to/edith_asante_799bd09cf9c1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/edith_asante_799bd09cf9c1"/>
    <language>en</language>
    <item>
      <title>Building a Policy-Gated Deployment System with Observability (SwiftDeploy Stage 4B)</title>
      <dc:creator>Edith Asante</dc:creator>
      <pubDate>Wed, 06 May 2026 19:59:23 +0000</pubDate>
      <link>https://dev.to/edith_asante_799bd09cf9c1/building-a-policy-gated-deployment-system-with-observability-swiftdeploy-stage-4b-4od2</link>
      <guid>https://dev.to/edith_asante_799bd09cf9c1/building-a-policy-gated-deployment-system-with-observability-swiftdeploy-stage-4b-4od2</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Stage 4A, I built a CLI tool (swiftdeploy) that generates infrastructure from a single file (manifest.yaml).&lt;br&gt;
In Stage 4B, I extended it to &lt;/p&gt;

&lt;p&gt;include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Observability (metrics)&lt;/li&gt;
&lt;li&gt;Policy enforcement (OPA)&lt;/li&gt;
&lt;li&gt;Auditing (history + reports)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal was simple but strict:&lt;/p&gt;

&lt;p&gt;The system must refuse to deploy or promote if it is unsafe.&lt;/p&gt;

&lt;p&gt;This meant moving from just “running containers” to building a system that can think and decide before acting.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;Architectural Overview &lt;/p&gt;

&lt;p&gt;manifest.yaml&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; ↓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;swiftdeploy CLI&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; ↓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;docker-compose + nginx&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; ↓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Docker Network&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; ↓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;[ NGINX ] → [ APP (/metrics) ]&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;              ↓

           metrics

              ↓

           CLI

              ↓

            OPA
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;At a high level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;manifest.yaml is the single source of truth&lt;/li&gt;
&lt;li&gt;swiftdeploy CLI reads it and generates:

&lt;ul&gt;
&lt;li&gt;docker-compose.yml&lt;/li&gt;
&lt;li&gt;nginx.conf&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Docker runs:

&lt;ul&gt;
&lt;li&gt;API service&lt;/li&gt;
&lt;li&gt;Nginx (reverse proxy)&lt;/li&gt;
&lt;li&gt;OPA (policy engine)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;flow:&lt;br&gt;
CLI → collect data → send to OPA → receive decision → deploy or block&lt;/p&gt;

&lt;p&gt;The Design: A Tool That Writes Its Own Infrastructure&lt;/p&gt;

&lt;p&gt;The core idea was:&lt;/p&gt;

&lt;p&gt;I don’t manually write configs — I generate them.&lt;/p&gt;

&lt;p&gt;Instead of editing multiple files, I only update:&lt;br&gt;
manifest.yaml&lt;/p&gt;

&lt;p&gt;then:&lt;br&gt;
python swiftdeploy.py init&lt;/p&gt;

&lt;p&gt;This generates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;docker-compose.yml&lt;/li&gt;
&lt;li&gt;nginx.conf&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why this matters&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces manual errors&lt;/li&gt;
&lt;li&gt;Keeps configuration consistent&lt;/li&gt;
&lt;li&gt;Makes the system reproducible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I deletes my configs, I can regenerate everything from the manifest.&lt;/p&gt;

&lt;p&gt;Observability: Adding the “Eyes” (/metrics)&lt;/p&gt;

&lt;p&gt;I added a /metrics endpoint to the API in Prometheus format.&lt;/p&gt;

&lt;p&gt;It tracks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Throughput &amp;amp; Errors&lt;br&gt;
http_requests_total{method, path, status_code}&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Latency&lt;br&gt;
http_request_duration_seconds_bucket&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Application State&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;app_uptime_seconds&lt;/p&gt;

&lt;p&gt;app_mode (0=stable, 1=canary)&lt;/p&gt;

&lt;p&gt;chaos_active&lt;/p&gt;

&lt;p&gt;The Guardrails: Policy Enforcement with OPA&lt;/p&gt;

&lt;p&gt;Instead of writing logic inside the CLI, I used Open Policy Agent.&lt;/p&gt;

&lt;p&gt;Key Rule:&lt;/p&gt;

&lt;p&gt;The CLI must NOT decide anything — OPA decides everything.&lt;/p&gt;

&lt;p&gt;🔹 Infrastructure Policy (Pre-Deploy)&lt;/p&gt;

&lt;p&gt;Checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disk space&lt;/li&gt;
&lt;li&gt;CPU load&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example rule:&lt;br&gt;
Deny if disk_free &amp;lt; 10GB  &lt;/p&gt;

&lt;p&gt;Deny if cpu_load &amp;gt; 2.0&lt;/p&gt;

&lt;p&gt;If I artificially reduce disk space:&lt;/p&gt;

&lt;p&gt;BLOCKED: Disk below threshold&lt;/p&gt;

&lt;p&gt;👉 This satisfies the Hard Gate requirement&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;🔹 Canary Safety Policy (Pre-Promote)&lt;/p&gt;

&lt;p&gt;Before promoting, the CLI:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scrapes /metrics&lt;/li&gt;
&lt;li&gt;Calculates:

&lt;ul&gt;
&lt;li&gt;Error rate&lt;/li&gt;
&lt;li&gt;P99 latency&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Sends to OPA&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Policy:&lt;br&gt;
Deny if error_rate &amp;gt; 1%&lt;/p&gt;

&lt;p&gt;Deny if p99_latency &amp;gt; 500ms&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;Why Isolation Matters&lt;/p&gt;

&lt;p&gt;OPA runs as a separate container and:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is reachable by the CLI&lt;/li&gt;
&lt;li&gt;Is NOT exposed through Nginx&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 This ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No external access to policy engine&lt;/li&gt;
&lt;li&gt;Clear separation of responsibilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This satisfies the “No Leakage” requirement&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;🧪 The Chaos: Testing Failure Scenarios&lt;/p&gt;

&lt;p&gt;I implemented a /chaos endpoint:&lt;/p&gt;

&lt;p&gt;Modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;slow → delays responses&lt;/li&gt;
&lt;li&gt;error → randomly returns 500&lt;/li&gt;
&lt;li&gt;recover → resets system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;{ "mode": "slow", "duration": 2 }&lt;/p&gt;

&lt;p&gt;What Happened&lt;/p&gt;

&lt;p&gt;When I injected chaos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latency increased&lt;/li&gt;
&lt;li&gt;Error rate increased&lt;/li&gt;
&lt;li&gt;Metrics reflected the change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I tried to promote:&lt;br&gt;
BLOCKED: Latency too high&lt;/p&gt;

&lt;p&gt;👉 This confirmed:&lt;br&gt;
The system reacts to real runtime conditions, not assumptions&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;The Eyes: swiftdeploy status&lt;/p&gt;

&lt;p&gt;This command:&lt;br&gt;
python swiftdeploy.py status&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuously scrapes /metrics&lt;/li&gt;
&lt;li&gt;Displays live system state&lt;/li&gt;
&lt;li&gt;Logs everything to:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;history.jsonl&lt;/p&gt;

&lt;p&gt;The Memory: Audit System&lt;/p&gt;

&lt;p&gt;From the logs, I generate:&lt;/p&gt;

&lt;p&gt;python swiftdeploy.py audit&lt;/p&gt;

&lt;p&gt;This creates:&lt;br&gt;
audit_report.md&lt;/p&gt;

&lt;p&gt;Contents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Timeline of events&lt;/li&gt;
&lt;li&gt;Policy violations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 The report renders cleanly in GitHub Markdown&lt;br&gt;
(Satisfies submission requirement)&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;Lessons Learned&lt;/p&gt;

&lt;p&gt;This stage changed how I think about DevOps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deployment is not just execution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It’s decision-making&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Policies should be external&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Keeping logic in OPA:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;makes it reusable&lt;/li&gt;
&lt;li&gt;avoids tightly coupled code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Metrics are not just for monitoring&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;They actively drive decisions&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Debugging is part of the process&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I faced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;YAML errors&lt;/li&gt;
&lt;li&gt;Docker rebuild issues&lt;/li&gt;
&lt;li&gt;Nginx misconfigurations&lt;/li&gt;
&lt;li&gt;OPA connection failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fixing them helped me understand the system deeply.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;✅ Final Checklist (Submission Criteria)&lt;/p&gt;

&lt;p&gt;✔ manifest.yaml is the only edited file&lt;br&gt;
✔ Deployment blocked when disk is low&lt;br&gt;
✔ OPA not exposed via Nginx&lt;br&gt;
✔ Metrics fully implemented&lt;br&gt;
✔ Audit report generated and readable&lt;br&gt;
✔ Blog includes architecture diagram&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;Conclusion&lt;/p&gt;

&lt;p&gt;This project helped me move from:&lt;/p&gt;

&lt;p&gt;running commands → building systems that enforce rules&lt;/p&gt;

&lt;p&gt;I now better understand how:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;policy&lt;/li&gt;
&lt;li&gt;infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;work together in real-world systems.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;If you’re learning DevOps, my biggest takeaway is:&lt;/p&gt;

&lt;p&gt;Don’t just deploy — build systems that decide when deployment is safe.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>automation</category>
      <category>cloud</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
