<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hezekiah Umoh</title>
    <description>The latest articles on DEV Community by Hezekiah Umoh (@hezekiah_umoh).</description>
    <link>https://dev.to/hezekiah_umoh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875823%2Fbe07e4f3-136f-4324-beff-45cc88fe02d7.jpg</url>
      <title>DEV Community: Hezekiah Umoh</title>
      <link>https://dev.to/hezekiah_umoh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hezekiah_umoh"/>
    <language>en</language>
    <item>
      <title>Building TheEpicBook: A Deep Dive into a Node.js Monolithic Web Application</title>
      <dc:creator>Hezekiah Umoh</dc:creator>
      <pubDate>Tue, 26 May 2026 00:53:16 +0000</pubDate>
      <link>https://dev.to/hezekiah_umoh/building-theepicbook-a-deep-dive-into-a-nodejs-monolithic-web-application-3ca1</link>
      <guid>https://dev.to/hezekiah_umoh/building-theepicbook-a-deep-dive-into-a-nodejs-monolithic-web-application-3ca1</guid>
      <description>&lt;h1&gt;
  
  
  Building TheEpicBook: A Deep Dive into a Node.js Monolithic Web Application
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;By a Full-Stack Developer | May 2026&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In an era where microservices and serverless architectures dominate tech conversations, there is still a strong case to be made for the classic monolithic application. TheEpicBook is a full-stack bookstore web application built as a monolith — a single, unified codebase that handles everything from serving HTML pages to managing a relational database. This post walks through the architecture, the tech stack, the challenges faced during deployment, and the lessons learned along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is TheEpicBook?
&lt;/h2&gt;

&lt;p&gt;TheEpicBook is an online bookstore application that allows users to browse a curated collection of books, view book details, add items to a shopping cart, and proceed through a checkout flow. The app greets visitors with the tagline &lt;em&gt;"Discover Your Next Great Read"&lt;/em&gt; and delivers a clean, responsive UI backed by a real relational database.&lt;/p&gt;

&lt;p&gt;At its core, TheEpicBook is a traditional server-rendered web application — no separate frontend framework calling a REST API, no independent microservices. Everything lives in one place, and the server does it all.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tech Stack
&lt;/h2&gt;

&lt;p&gt;TheEpicBook is built on a straightforward but powerful stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node.js + Express&lt;/strong&gt; — the backbone of the application, handling routing, middleware, and HTTP request/response logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Express-Handlebars&lt;/strong&gt; — a server-side templating engine that renders dynamic HTML views on the server before sending them to the browser&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequelize ORM&lt;/strong&gt; — an abstraction layer over the MySQL database that lets the app interact with data using JavaScript models rather than raw SQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MySQL&lt;/strong&gt; — the relational database storing Authors, Books, Carts, Checkouts, and their relationships&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nginx&lt;/strong&gt; — a reverse proxy sitting in front of the Node.js server, handling incoming traffic on port 80 and forwarding it to the app on port 8080&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS EC2&lt;/strong&gt; — the cloud infrastructure running the entire application on an Ubuntu server&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Application Architecture
&lt;/h2&gt;

&lt;p&gt;The monolithic architecture means every concern — routing, templating, business logic, and database access — lives within a single deployable unit. Here is how the key layers fit together:&lt;/p&gt;

&lt;h3&gt;
  
  
  Models
&lt;/h3&gt;

&lt;p&gt;Sequelize models define the database schema and relationships in JavaScript. TheEpicBook has five core models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Author&lt;/strong&gt; — stores author first and last names&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Book&lt;/strong&gt; — stores title, genre, publication year, price, inventory count, and description, with a foreign key linking to Author&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cart&lt;/strong&gt; — tracks quantity and price for a shopping session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkout&lt;/strong&gt; — stores shipping address and subtotal, linked to a Cart&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cartbook&lt;/strong&gt; — a junction table managing the many-to-many relationship between Books and Carts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On startup, Sequelize syncs these models with the database, automatically creating tables if they do not exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Routes
&lt;/h3&gt;

&lt;p&gt;Express routes define the URL structure of the application. Each route handler fetches data from the database via Sequelize and passes it to a Handlebars template for rendering.&lt;/p&gt;

&lt;h3&gt;
  
  
  Views
&lt;/h3&gt;

&lt;p&gt;Handlebars templates receive data from route handlers and produce the final HTML sent to the browser. This server-side rendering approach means the browser receives fully-formed pages — no client-side data fetching required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Static Assets
&lt;/h3&gt;

&lt;p&gt;CSS, images, and client-side JavaScript are served as static files from the &lt;code&gt;public&lt;/code&gt; directory via Express's built-in static middleware.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment on AWS EC2
&lt;/h2&gt;

&lt;p&gt;Deploying TheEpicBook to a live Ubuntu server on AWS involved several real-world challenges worth documenting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Permissions Issues
&lt;/h3&gt;

&lt;p&gt;After removing &lt;code&gt;node_modules&lt;/code&gt; to resolve an npm rename conflict, the directory ended up owned by root due to a prior &lt;code&gt;sudo npm install&lt;/code&gt;. Running &lt;code&gt;sudo chown -R ubuntu:ubuntu /home/ubuntu/theepicbook&lt;/code&gt; restored correct ownership and allowed npm to install cleanly. The lesson: never run &lt;code&gt;npm install&lt;/code&gt; with &lt;code&gt;sudo&lt;/code&gt; inside a project directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nginx as a Reverse Proxy
&lt;/h3&gt;

&lt;p&gt;The server ships with Nginx pre-installed, which intercepts traffic on port 80. Rather than expose the Node.js process directly to the internet, Nginx is configured as a reverse proxy — forwarding requests from port 80 to the app running on port 8080. This is best practice for production Node.js apps, providing a clean entry point and making it easy to add SSL termination later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Groups
&lt;/h3&gt;

&lt;p&gt;On AWS, inbound traffic is controlled by Security Groups at the network level. Port 8080 must be explicitly opened in the EC2 inbound rules for direct access, and port 80 for Nginx proxied access. Missing this step is a common gotcha — the app runs fine on the server, but the browser simply times out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database Seeding
&lt;/h3&gt;

&lt;p&gt;Sequelize creates the schema automatically on app startup, but an empty database shows no books. TheEpicBook ships with SQL seed files — &lt;code&gt;author_seed.sql&lt;/code&gt; and &lt;code&gt;books_seed.sql&lt;/code&gt; — that populate the database with initial data using a simple &lt;code&gt;mysql&lt;/code&gt; import command.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Case for Monoliths
&lt;/h2&gt;

&lt;p&gt;TheEpicBook is a great example of why monolithic applications remain relevant and valuable, especially for smaller projects and early-stage products:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity&lt;/strong&gt; — one codebase, one deployment, one process to manage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easier debugging&lt;/strong&gt; — the entire request lifecycle is traceable within a single application&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster development&lt;/strong&gt; — no API contracts to maintain between services, no network calls between components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower operational overhead&lt;/strong&gt; — no service mesh, no inter-service authentication, no distributed tracing needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trade-offs come at scale — a monolith becomes harder to scale independently, and a bug in one module can affect the whole app. But for a bookstore with a well-defined domain and a small team, the monolith is the right tool for the job.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next for TheEpicBook
&lt;/h2&gt;

&lt;p&gt;There are several natural next steps to evolve the application:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Process management with PM2&lt;/strong&gt; — keep the Node.js process alive across server restarts and crashes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSL with Let's Encrypt&lt;/strong&gt; — add HTTPS via Certbot and Nginx for secure connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User authentication&lt;/strong&gt; — session-based login so users can track their own carts and order history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Admin dashboard&lt;/strong&gt; — a protected route for adding, editing, and removing books without touching the database directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extracting an API layer&lt;/strong&gt; — a first step toward a more modular architecture, serving JSON alongside the server-rendered views&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;TheEpicBook demonstrates that a well-structured monolithic application can be a robust, maintainable, and deployable product. Built with Node.js, Express, Sequelize, and MySQL, and deployed on AWS EC2 behind Nginx, it covers the full stack from database to browser in a single cohesive codebase. The deployment journey — from permissions errors to Security Group configurations — reflects the real-world experience of shipping a web application to a live server.&lt;/p&gt;

&lt;p&gt;Sometimes the right architecture is the simple one. TheEpicBook is proof of that.&lt;/p&gt;

&lt;p&gt;---You want to follow and implement the this project:&lt;a href="https://github.com/ntonous/theepicbook.git" rel="noopener noreferrer"&gt;https://github.com/ntonous/theepicbook.git&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxnxvss0wsgkt5qtisw5c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxnxvss0wsgkt5qtisw5c.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have questions about the stack or the deployment process? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>nginx</category>
      <category>aws</category>
      <category>node</category>
    </item>
    <item>
      <title>How I Built a Miniature Heroku with Chaos Engineering — And Fought Azure to Deploy It</title>
      <dc:creator>Hezekiah Umoh</dc:creator>
      <pubDate>Mon, 11 May 2026 15:51:01 +0000</pubDate>
      <link>https://dev.to/hezekiah_umoh/how-i-built-a-miniature-heroku-with-chaos-engineering-and-fought-azure-to-deploy-it-3aa4</link>
      <guid>https://dev.to/hezekiah_umoh/how-i-built-a-miniature-heroku-with-chaos-engineering-and-fought-azure-to-deploy-it-3aa4</guid>
      <description>&lt;p&gt;A self-service DevOps sandbox platform with auto-destroying environments, dynamic Nginx routing, and a chaos engineering toggle — plus every painful deployment war story.&lt;/p&gt;

&lt;p&gt;The Challenge&lt;br&gt;
Imagine you're on a DevOps team and every developer needs their own isolated environment to test their code. Spinning them up manually is slow. Forgetting to tear them down wastes resources. And nobody ever tests what happens when things actually break in production.&lt;br&gt;
That was the problem I set out to solve.&lt;br&gt;
The result? A fully self-service DevOps Sandbox Platform — a miniature internal Heroku where environments are short-lived by design, chaos is a feature, and everything cleans itself up automatically.&lt;br&gt;
What I didn't plan for was the deployment war that followed.&lt;/p&gt;

&lt;p&gt;What I Built&lt;br&gt;
The platform lets any user:&lt;/p&gt;

&lt;p&gt;Spin up an isolated environment with one command&lt;br&gt;
Deploy an app into it automatically&lt;br&gt;
Monitor its health every 30 seconds&lt;br&gt;
Simulate outages — crashes, network failures, CPU stress&lt;br&gt;
Auto-destroy everything when the TTL expires&lt;/p&gt;

&lt;p&gt;All of this runs on a single Linux VM and starts with one command: make up.&lt;/p&gt;

&lt;p&gt;The Architecture&lt;br&gt;
Everything lives inside one Azure VM:&lt;br&gt;
Client → Nginx (port 80) → App Containers&lt;br&gt;
              ↑&lt;br&gt;
         Auto-generated&lt;br&gt;
         conf.d/*.conf&lt;/p&gt;

&lt;p&gt;API (port 5000) → Bash Scripts → Docker&lt;br&gt;
                                   ↓&lt;br&gt;
                            envs/*.json (state)&lt;br&gt;
                            logs// (logs)&lt;/p&gt;

&lt;p&gt;Background: Health Monitor (30s) + Cleanup Daemon (60s)&lt;br&gt;
Five core components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Nginx — The Front Door
Every environment gets its own config file auto-written to nginx/conf.d/. When a new environment is created, the script writes the config and runs nginx -s reload. Traffic is routed by hostname.&lt;/li&gt;
&lt;li&gt;FastAPI Control API
Seven REST endpoints wrapping all the bash scripts. Create, list, destroy, fetch logs, check health, trigger outages. Swagger docs at /docs.&lt;/li&gt;
&lt;li&gt;The Bash Engine
Four scripts power everything:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;create_env.sh — spins up container, network, Nginx config, log shipping&lt;br&gt;
destroy_env.sh — tears everything down cleanly, archives logs&lt;br&gt;
simulate_outage.sh — chaos engineering with crash/pause/network/recover/stress&lt;br&gt;
cleanup_daemon.sh — runs every 60 seconds, auto-destroys expired environments&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Health Monitor
Python script polling every active environment's /health endpoint every 30 seconds. Three consecutive failures marks the environment as "degraded."&lt;/li&gt;
&lt;li&gt;State Management
JSON files in envs/ written atomically using temp-file + mv to prevent corruption.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Building It Was the Easy Part&lt;br&gt;
The platform came together cleanly. One command to start everything, environments spinning up in seconds, chaos simulation working perfectly. make up → make create → make simulate → everything worked.&lt;br&gt;
Then came deployment.&lt;/p&gt;

&lt;p&gt;The Azure Deployment Wars&lt;br&gt;
Battle 1 — The SSH Key That Didn't Exist&lt;br&gt;
First attempt to SSH into the VM:&lt;br&gt;
Warning: Identity file azureuser_key.pem not accessible: No such file or directory&lt;br&gt;
Permission denied (publickey)&lt;br&gt;
The .pem file path was wrong. Classic. Found the actual file — hng5-vm_key.pem in Downloads — and fixed the path. But then:&lt;br&gt;
Permission denied (publickey)&lt;br&gt;
Still failing. The key didn't match the VM. Had to reset the SSH key directly in Azure portal → Connect → Reset SSH public key. Twenty minutes lost.&lt;br&gt;
Lesson: Always verify your SSH key matches the VM it was created with. Azure makes it easy to reset but you lose time.&lt;/p&gt;

&lt;p&gt;Battle 2 — The Azure Firewall That Silently Blocked Everything&lt;br&gt;
Platform was running. API was live on port 5000 inside the VM. But the browser couldn't reach it.&lt;br&gt;
ERR_CONNECTION_REFUSED&lt;br&gt;
Added inbound port rules in Azure NSG for port 5000. Still refused. Added them again. Still refused.&lt;br&gt;
Tried routing through Nginx on port 8080. Tried a proxy container. Tried 172.17.0.1. Tried 127.0.0.1. Every attempt returned:&lt;br&gt;
502 Bad Gateway&lt;br&gt;
The real problem? The NSG rules were saving but Azure has an additional firewall layer that was silently dropping traffic. Port 5000 was listening perfectly inside the VM — ss -tlnp confirmed it — but nothing from outside could reach it.&lt;br&gt;
The fix that actually worked: Run the API container with --network host mode instead of the default bridge network:&lt;br&gt;
bashdocker run -d \&lt;br&gt;
  --name sandbox-api \&lt;br&gt;
  --network host \&lt;br&gt;
  -v $(pwd):/app \&lt;br&gt;
  -v /var/run/docker.sock:/var/run/docker.sock \&lt;br&gt;
  -w /app \&lt;br&gt;
  devops-sandbox-api \&lt;br&gt;
  python3 platform/api.py&lt;br&gt;
Host network mode binds directly to the VM's network interface, bypassing Docker's bridge entirely. Suddenly port 5000 was reachable from outside.&lt;br&gt;
Lesson: On Azure VMs, Docker bridge networking can be blocked by Azure's internal firewall even when NSG rules look correct. Host network mode is your escape hatch.&lt;/p&gt;

&lt;p&gt;Battle 3 — platform Is a Reserved Python Name&lt;br&gt;
With host networking, the API still wouldn't start:&lt;br&gt;
ERROR: Could not import module "platform.api"&lt;br&gt;
platform is a built-in Python standard library module. Uvicorn was trying to import Python's built-in platform module instead of our platform/api.py file.&lt;br&gt;
The fix: Run the API directly as a Python script instead of through uvicorn's module import:&lt;br&gt;
bashpython3 platform/api.py&lt;br&gt;
instead of:&lt;br&gt;
bashuvicorn platform.api:app&lt;br&gt;
Lesson: Never name your application directory the same as a Python standard library module. platform, json, os, sys — all reserved. Rename to app, api, src instead.&lt;/p&gt;

&lt;p&gt;Battle 4 — The EOF That Kept Disappearing&lt;br&gt;
Writing Nginx config files directly in the terminal kept producing malformed output. The heredoc EOF delimiter was being swallowed or the content was getting duplicated:&lt;br&gt;
bash# This kept failing silently:&lt;br&gt;
cat &amp;gt; nginx/conf.d/api.conf &amp;lt;&amp;lt; 'EOF'&lt;br&gt;
server {&lt;br&gt;
    ...&lt;br&gt;
}&lt;br&gt;
EOF  # ← this was the problem line&lt;br&gt;
The terminal was interpreting EOF as part of the previous command instead of as a delimiter.&lt;br&gt;
The fix: Use tee instead of cat redirection:&lt;br&gt;
bashtee nginx/conf.d/api.conf &amp;gt; /dev/null &amp;lt;&amp;lt; 'EOF'&lt;br&gt;
server {&lt;br&gt;
    listen 8080;&lt;br&gt;
    location / {&lt;br&gt;
        proxy_pass &lt;a href="http://172.17.0.1:5000/" rel="noopener noreferrer"&gt;http://172.17.0.1:5000/&lt;/a&gt;;&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
EOF&lt;br&gt;
Then verify immediately:&lt;br&gt;
bashcat nginx/conf.d/api.conf&lt;br&gt;
Lesson: Always verify config files after writing them in the terminal. One malformed line silently breaks everything downstream.&lt;/p&gt;

&lt;p&gt;The Moment It Worked&lt;br&gt;
After hours of SSH key resets, firewall rules, proxy containers, network debugging, and Python module conflicts — the browser finally loaded:&lt;br&gt;
&lt;a href="http://20.121.185.0:5000/docs" rel="noopener noreferrer"&gt;http://20.121.185.0:5000/docs&lt;/a&gt;&lt;br&gt;
DevOps Sandbox API — 1.0.0 — OAS 3.1&lt;br&gt;
All 7 endpoints. Live. Publicly accessible. 🔥&lt;/p&gt;

&lt;p&gt;The API&lt;br&gt;
MethodEndpointWhat it doesPOST/envsCreate environmentGET/envsList all + TTL remainingDELETE/envs/:idDestroy environmentGET/envs/:id/logsLast 100 lines of logsGET/envs/:id/healthLast 10 health checksPOST/envs/:id/outageTrigger simulationGET/healthAPI health check&lt;/p&gt;

&lt;p&gt;The Chaos Engineering Toggle&lt;br&gt;
bashmake simulate ENV=env-demo-123 MODE=crash    # Kill container&lt;br&gt;
make simulate ENV=env-demo-123 MODE=pause    # Freeze processes&lt;br&gt;
make simulate ENV=env-demo-123 MODE=network  # Cut the network&lt;br&gt;
make simulate ENV=env-demo-123 MODE=stress   # Spike CPU&lt;br&gt;
make simulate ENV=env-demo-123 MODE=recover  # Fix everything&lt;br&gt;
The health monitor detects crashes within 90 seconds. Watch it live:&lt;br&gt;
bashtail -f logs/env-demo-123/health.log&lt;/p&gt;

&lt;p&gt;What I Learned&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The platform code was the easy part.
Bash scripts, Docker networking, Python APIs — all of that came together in hours. The deployment took longer than the build.&lt;/li&gt;
&lt;li&gt;Azure's firewall has layers.
NSG rules are not the only thing blocking traffic. Docker bridge networking adds another layer. When in doubt, use --network host to isolate the variable.&lt;/li&gt;
&lt;li&gt;platform is taken.
Never name your directory after a Python standard library module. It will bite you at the worst possible moment — right before a deadline.&lt;/li&gt;
&lt;li&gt;Always verify file writes.
cat yourfile after every tee or heredoc. One silent corruption cascades into hours of 502 errors.&lt;/li&gt;
&lt;li&gt;Chaos engineering is a mindset.
Building the outage simulator forced me to think about every failure mode before they happened in production. The deployment battle was unplanned chaos engineering on the platform itself.&lt;/li&gt;
&lt;li&gt;Deadlines are the best debugging tool.
Nothing focuses the mind like a submission deadline. Every error gets solved eventually — you just move faster when the clock is running.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Try It Yourself&lt;br&gt;
GitHub: github.com/ntonous/devops-sandbox&lt;br&gt;
Live API: &lt;a href="http://20.121.185.0:5000/docs" rel="noopener noreferrer"&gt;http://20.121.185.0:5000/docs&lt;/a&gt;&lt;br&gt;
Clone it, spin it up, break something, watch it recover. That's the whole point.&lt;/p&gt;

&lt;p&gt;Built and deployed as part of the HNG14 DevOps track — Stage 5 task. Special thanks to every Azure error message that taught me something.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>docker</category>
      <category>nginx</category>
      <category>python</category>
    </item>
    <item>
      <title>Building SwiftDeploy: A Self-Writing Infrastructure Tool with OPA Policy Enforcement and Prometheus Observability</title>
      <dc:creator>Hezekiah Umoh</dc:creator>
      <pubDate>Thu, 07 May 2026 10:01:48 +0000</pubDate>
      <link>https://dev.to/hezekiah_umoh/building-swiftdeploy-a-self-writing-infrastructure-tool-with-opa-policy-enforcement-and-prometheus-2dm1</link>
      <guid>https://dev.to/hezekiah_umoh/building-swiftdeploy-a-self-writing-infrastructure-tool-with-opa-policy-enforcement-and-prometheus-2dm1</guid>
      <description>&lt;h1&gt;
  
  
  Building SwiftDeploy: A Self-Writing Infrastructure Tool with OPA Policy Enforcement and Prometheus Observability
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;What if your deployment tool could refuse to deploy when your disk is full? What if it could block a canary promotion when error rates spike — automatically, based on policy — without a single hardcoded &lt;code&gt;if&lt;/code&gt; statement in the CLI?&lt;/p&gt;

&lt;p&gt;That's exactly what I built for Stage 4b of the HNG14 DevOps track. In this post I'll walk through the full journey: from a manifest-driven deployment engine to a policy-enforced, fully observable stack with a live terminal dashboard and audit trail.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture at a Glance
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;manifest.yaml  (single source of truth)
      |
      v
swiftdeploy CLI
      |
      +-- Jinja2 templates --&amp;gt; docker-compose.yml + nginx.conf
      |
      +-- OPA policy check --&amp;gt; allow / block + reason
      |
      v
Docker Compose Stack
  ├── app (FastAPI + /metrics)
  ├── nginx (public ingress on swiftdeploy-net)
  └── opa (isolated on opa-internal, queried via docker exec)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The core principle: &lt;strong&gt;&lt;code&gt;manifest.yaml&lt;/code&gt; is the only file a human ever edits.&lt;/strong&gt; Everything else — config files, policy decisions, audit reports — is generated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 4a Recap: The Engine
&lt;/h2&gt;

&lt;p&gt;In Stage 4a I built the foundation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;manifest.yaml&lt;/code&gt; that describes the entire stack (image, port, mode, network)&lt;/li&gt;
&lt;li&gt;A Python CLI (&lt;code&gt;swiftdeploy&lt;/code&gt;) that reads the manifest and renders Jinja2 templates into &lt;code&gt;docker-compose.yml&lt;/code&gt; and &lt;code&gt;nginx.conf&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Subcommands: &lt;code&gt;init&lt;/code&gt;, &lt;code&gt;validate&lt;/code&gt;, &lt;code&gt;deploy&lt;/code&gt;, &lt;code&gt;promote&lt;/code&gt;, &lt;code&gt;teardown&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A FastAPI service with &lt;code&gt;/&lt;/code&gt;, &lt;code&gt;/healthz&lt;/code&gt;, and &lt;code&gt;/chaos&lt;/code&gt; endpoints&lt;/li&gt;
&lt;li&gt;Canary/stable mode switching via &lt;code&gt;promote&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight: the CLI never writes config by hand — it always renders from templates. Change one field in &lt;code&gt;manifest.yaml&lt;/code&gt;, re-run &lt;code&gt;init&lt;/code&gt;, and the entire stack config regenerates consistently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stage 4b: The Eyes and the Brain
&lt;/h2&gt;

&lt;p&gt;Stage 4b adds three major capabilities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Eyes&lt;/strong&gt; — Prometheus &lt;code&gt;/metrics&lt;/code&gt; endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Brain&lt;/strong&gt; — OPA policy sidecar enforcing deploy/promote gates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Memory&lt;/strong&gt; — audit trail and report generation&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Instrumentation: The /metrics Endpoint
&lt;/h2&gt;

&lt;p&gt;The FastAPI service now exposes a &lt;code&gt;/metrics&lt;/code&gt; endpoint in Prometheus text format. I implemented the metrics collector entirely in Python without any external library — just a middleware that intercepts every request and records it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;metrics_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;dur&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/metrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;record_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dur&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Five metrics are exposed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;http_requests_total&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;counter&lt;/td&gt;
&lt;td&gt;Requests by method, path, status_code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;http_request_duration_seconds&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;histogram&lt;/td&gt;
&lt;td&gt;Latency with 11 standard buckets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;app_uptime_seconds&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;gauge&lt;/td&gt;
&lt;td&gt;Seconds since process start&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;app_mode&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;gauge&lt;/td&gt;
&lt;td&gt;0=stable, 1=canary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;chaos_active&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;gauge&lt;/td&gt;
&lt;td&gt;0=none, 1=slow, 2=error&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The histogram uses standard Prometheus buckets (0.005s through 10s) so P99 latency can be calculated from bucket counts — no extra libraries needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The Policy Sidecar: OPA
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why OPA?
&lt;/h3&gt;

&lt;p&gt;The spec had a critical requirement: &lt;strong&gt;the CLI must not make any allow/deny decision itself.&lt;/strong&gt; All decision logic lives exclusively in OPA. This is the separation of concerns that makes the system auditable and extensible — you can change policy without touching the CLI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Isolation Architecture
&lt;/h3&gt;

&lt;p&gt;OPA runs as a sidecar in Docker Compose but on a &lt;strong&gt;completely separate network&lt;/strong&gt; from nginx:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;swiftdeploy-net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="c1"&gt;# nginx + app live here&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
  &lt;span class="na"&gt;opa-internal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;       &lt;span class="c1"&gt;# OPA lives here, isolated&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;nginx&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;swiftdeploy-net&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# can NOT reach OPA&lt;/span&gt;
  &lt;span class="na"&gt;opa&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;opa-internal&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;      &lt;span class="c1"&gt;# can NOT be reached via nginx&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means there is zero path from the public port 8081 to the OPA API. The &lt;code&gt;No "Leakage"&lt;/code&gt; requirement from the spec is satisfied architecturally, not just by configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain-Isolated Policies
&lt;/h3&gt;

&lt;p&gt;I wrote two completely independent Rego policies, each owning exactly one domain:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;policies/infrastructure.rego&lt;/code&gt;&lt;/strong&gt; — answers: &lt;em&gt;Is this host safe to deploy onto?&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rego"&gt;&lt;code&gt;&lt;span class="ow"&gt;package&lt;/span&gt; &lt;span class="n"&gt;swiftdeploy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;infrastructure&lt;/span&gt;

&lt;span class="ow"&gt;default&lt;/span&gt; &lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;violations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;violations&lt;/span&gt; &lt;span class="n"&gt;contains&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;disk_free_gb&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;infrastructure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_disk_free_gb&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Disk free (%.1f GB) is below minimum threshold (%.1f GB)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;disk_free_gb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;infrastructure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_disk_free_gb&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;violations&lt;/span&gt; &lt;span class="n"&gt;contains&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpu_load&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;infrastructure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_cpu_load&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"CPU load (%.2f) exceeds maximum threshold (%.2f)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpu_load&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;infrastructure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_cpu_load&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;policies/canary.rego&lt;/code&gt;&lt;/strong&gt; — answers: &lt;em&gt;Is the canary safe to promote?&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rego"&gt;&lt;code&gt;&lt;span class="ow"&gt;package&lt;/span&gt; &lt;span class="n"&gt;swiftdeploy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;canary&lt;/span&gt;

&lt;span class="ow"&gt;default&lt;/span&gt; &lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;violations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;violations&lt;/span&gt; &lt;span class="n"&gt;contains&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_rate_percent&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;canary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_error_rate_percent&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Error rate (%.2f%%) exceeds maximum threshold (%.2f%%)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_rate_percent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;canary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_error_rate_percent&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Crucially, all threshold values live in &lt;code&gt;policies/data.json&lt;/code&gt; — not in the Rego files:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"infrastructure"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"min_disk_free_gb"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_cpu_load"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"min_mem_free_percent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"canary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_error_rate_percent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"max_p99_latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To change the disk threshold from 10GB to 20GB, you edit only &lt;code&gt;data.json&lt;/code&gt;. The Rego files never need to change. This is the &lt;strong&gt;single source of truth for policy thresholds&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  OPA Never Returns a Bare Boolean
&lt;/h3&gt;

&lt;p&gt;Every OPA decision carries the reasoning behind it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"violations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Error rate (46.94%) exceeds maximum threshold (1.00%) over the observation window"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI surfaces this directly to the operator — no cryptic error codes, just a plain English explanation of why deployment was blocked.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The CLI: Gated Lifecycle
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pre-Deploy Check
&lt;/h3&gt;

&lt;p&gt;Before bringing up the stack, &lt;code&gt;swiftdeploy deploy&lt;/code&gt; collects host stats and sends them to OPA:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;swiftdeploy deploy
&lt;span class="go"&gt;  Checking infrastructure policy...
&lt;/span&gt;&lt;span class="gp"&gt;  &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Host -&amp;gt; disk: 328.6 GB | CPU: 0.20 | mem free: 37.4%
&lt;span class="go"&gt;  + [OPA/INFRASTRUCTURE] Policy passed — proceeding
&lt;/span&gt;&lt;span class="gp"&gt;  &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Bringing up the stack...
&lt;span class="gp"&gt;  + Stack healthy -&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;http://localhost:8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If I were to fill up the disk to below 10GB, the output would instead show:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;  x [OPA/INFRASTRUCTURE] Policy FAILED — blocked
              - Disk free (3.2 GB) is below minimum threshold (10.0 GB)
  x Deployment blocked by policy.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pre-Promote Check (The Chaos Test)
&lt;/h3&gt;

&lt;p&gt;This is where it gets interesting. Before promoting a canary to stable, the CLI scrapes &lt;code&gt;/metrics&lt;/code&gt;, calculates error rate and P99 latency, and sends them to OPA.&lt;/p&gt;

&lt;p&gt;I injected an 80% error rate using the chaos endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Invoke-RestMethod&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Method&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Post&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Uri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;http://localhost:8081/chaos&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-ContentType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"application/json"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Body&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;'{"mode":"error","rate":0.8}'&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then tried to promote:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;swiftdeploy promote stable
&lt;span class="go"&gt;  Checking canary health policy...
&lt;/span&gt;&lt;span class="gp"&gt;  &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Canary -&amp;gt; error rate: 46.94% | P99: 10 ms
&lt;span class="go"&gt;  x [OPA/CANARY] Policy FAILED — blocked
              - Error rate (46.94%) exceeds maximum threshold (1.00%) over the observation window
  x Promotion blocked — canary is not healthy enough.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The canary policy gate caught a 47x threshold breach and blocked the promotion.&lt;/strong&gt; This is exactly the kind of automated safety net that prevents bad canaries from reaching production.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Status Dashboard
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;swiftdeploy status&lt;/code&gt; runs a live-refreshing terminal dashboard that scrapes &lt;code&gt;/metrics&lt;/code&gt; every 5 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;------------------------------------------------------------
  SwiftDeploy Status              2026-05-07 09:54:00
------------------------------------------------------------

  Mode: canary   Chaos: none   Uptime: 3420s

  Metric                           Value
  --------------------------------------------
  Throughput (req/s)               2.40
  Error Rate                       0.00%
  P99 Latency                      10 ms

  Policy Compliance
  --------------------------------------------
  [+]  Infra: Disk &amp;gt;= 10 GB
  [+]  Infra: CPU load &amp;lt;= 2.0
  [+]  Infra: Mem free &amp;gt;= 10%
  [+]  Canary: Error rate &amp;lt;= 1%
  [+]  Canary: P99 latency &amp;lt;= 500ms

  Refreshing every 5s — Ctrl+C to exit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every scrape is appended to &lt;code&gt;history.jsonl&lt;/code&gt; — a newline-delimited JSON file that forms the audit trail.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. The Audit Report
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;swiftdeploy audit&lt;/code&gt; parses &lt;code&gt;history.jsonl&lt;/code&gt; and generates &lt;code&gt;audit_report.md&lt;/code&gt; with four sections:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timeline&lt;/strong&gt; — every deploy, promote, teardown, and policy check with timestamps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mode Changes&lt;/strong&gt; — when the stack switched between stable and canary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy Violations&lt;/strong&gt; — every time a check failed, with the full violation message&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics Summary&lt;/strong&gt; — min/max/avg of error rate, P99 latency, and throughput&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The report renders perfectly as GitHub Flavored Markdown.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Windows Challenge: OPA Port Binding
&lt;/h2&gt;

&lt;p&gt;This section is for anyone running Docker Desktop on Windows — I hit a wall that took significant debugging to solve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; OPA's port &lt;code&gt;8181&lt;/code&gt; was correctly configured in &lt;code&gt;docker-compose.yml&lt;/code&gt; as &lt;code&gt;"0.0.0.0:8181:8181"&lt;/code&gt;, and &lt;code&gt;docker inspect&lt;/code&gt; confirmed the binding was set. But &lt;code&gt;netstat&lt;/code&gt; showed nothing listening on 8181, and &lt;code&gt;curl http://localhost:8181/health&lt;/code&gt; failed with connection refused.&lt;/p&gt;

&lt;p&gt;This is a known Docker Desktop + WSL2 bug where port forwarding from WSL2 containers to the Windows host is unreliable for certain port ranges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The solution:&lt;/strong&gt; Instead of querying OPA via HTTP from the host, I switched to &lt;code&gt;docker exec&lt;/code&gt; with the OPA CLI directly inside the container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;docker exec -i &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;opa_container&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; opa eval &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--data /policies &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--stdin-input &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--format json &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;opa_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shell&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;input_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                   &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bypasses host port binding entirely&lt;/li&gt;
&lt;li&gt;Works identically on Linux, Mac, and Windows&lt;/li&gt;
&lt;li&gt;Is actually more reliable — no network stack involved at all&lt;/li&gt;
&lt;li&gt;Satisfies the isolation requirement (OPA is still on its own network, nginx can't reach it)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The lesson: when Docker networking misbehaves on Windows, &lt;code&gt;docker exec&lt;/code&gt; is your escape hatch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Separation of concerns is worth the complexity.&lt;/strong&gt; Having OPA own all policy decisions and the CLI own only orchestration made both parts easier to test and reason about independently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Thresholds in data, logic in code.&lt;/strong&gt; Putting OPA thresholds in &lt;code&gt;data.json&lt;/code&gt; instead of hardcoding them in Rego files means ops teams can tune policy without touching code or redeploying anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Every failure mode needs a distinct message.&lt;/strong&gt; The spec said "every distinct failure mode must produce a different, human-readable outcome." I ended up with five distinct OPA error states (unreachable, timeout, malformed JSON, undefined result, policy failed) each producing a clear, actionable message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Platform-specific bugs are real.&lt;/strong&gt; The Docker Desktop port binding issue cost hours. The fix (docker exec) is actually cleaner than HTTP anyway — but you only find that out after hitting the wall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. The audit trail is free if you build it from the start.&lt;/strong&gt; Appending JSON to &lt;code&gt;history.jsonl&lt;/code&gt; on every event costs almost nothing at runtime but provides complete forensic history for free.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;SwiftDeploy Stage 4b is a deployment tool that can see (metrics), think (OPA policy), remember (audit trail), and refuse (policy gates). The entire stack — from &lt;code&gt;/metrics&lt;/code&gt; to &lt;code&gt;audit_report.md&lt;/code&gt; — is driven by a single &lt;code&gt;manifest.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The code is available at: &lt;strong&gt;[your GitHub repo URL here]&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're building something similar, the key takeaways are: isolate your policy engine, never return bare booleans from policy checks, and always give operators a human-readable reason when you block them.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for HNG14 DevOps Track — Stage 4b&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>docker</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built a Tool That Builds My Infrastructure — Here's How It Went</title>
      <dc:creator>Hezekiah Umoh</dc:creator>
      <pubDate>Tue, 05 May 2026 21:20:09 +0000</pubDate>
      <link>https://dev.to/hezekiah_umoh/i-built-a-tool-that-builds-my-infrastructure-heres-how-it-went-12om</link>
      <guid>https://dev.to/hezekiah_umoh/i-built-a-tool-that-builds-my-infrastructure-heres-how-it-went-12om</guid>
      <description>&lt;h1&gt;
  
  
  I Built a Tool That Builds My Infrastructure — Here's How It Went
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A brutally honest account of building SwiftDeploy for the HNG14 Stage 4A DevOps challenge&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;When I first read the Stage 4A task brief, one line jumped out at me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Most DevOps tasks ask you to configure infrastructure manually — this one asks you to build the tool that does it for you."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That single sentence changed how I approached the entire challenge. This wasn't about setting up servers or writing config files by hand. It was about building something that does all of that for you, from a single source of truth.&lt;/p&gt;

&lt;p&gt;This is the story of how I built &lt;strong&gt;SwiftDeploy&lt;/strong&gt; — and every wall I hit along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is SwiftDeploy?
&lt;/h2&gt;

&lt;p&gt;SwiftDeploy is a declarative deployment CLI tool. You describe your entire infrastructure in a single &lt;code&gt;manifest.yaml&lt;/code&gt; file, and the tool generates your Nginx config, Docker Compose file, manages your container lifecycle, and keeps your stack healthy.&lt;/p&gt;

&lt;p&gt;The stack consists of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;FastAPI&lt;/strong&gt; Python service that runs in either &lt;code&gt;stable&lt;/code&gt; or &lt;code&gt;canary&lt;/code&gt; mode&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;Nginx&lt;/strong&gt; reverse proxy that routes all traffic, logs every request, and returns JSON error responses&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;CLI tool&lt;/strong&gt; written in Python with five subcommands: &lt;code&gt;init&lt;/code&gt;, &lt;code&gt;validate&lt;/code&gt;, &lt;code&gt;deploy&lt;/code&gt;, &lt;code&gt;promote&lt;/code&gt;, and &lt;code&gt;teardown&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Everything generated from &lt;strong&gt;Jinja2 templates&lt;/strong&gt; — no manually written config files allowed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The grader would delete my generated files and re-run &lt;code&gt;swiftdeploy init&lt;/code&gt; to verify everything regenerates correctly. If the tool broke, the stack broke. No shortcuts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Before writing a single line of code, I mapped out how everything would connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;manifest.yaml  →  swiftdeploy init  →  nginx.conf + docker-compose.yml
                                              ↓
                                    docker-compose up
                                              ↓
                              [nginx:8080] → [app:3000]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;manifest.yaml&lt;/code&gt; is the only file a human ever edits. Everything else is derived from it. That constraint is what makes the tool interesting — and what made debugging it so painful at times.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the API Service
&lt;/h2&gt;

&lt;p&gt;The API service is a FastAPI application with three endpoints:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GET /&lt;/strong&gt; returns a welcome message including the current mode, version, and server timestamp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GET /healthz&lt;/strong&gt; returns a liveness check with process uptime in seconds — used by Docker's health check system to determine if the container is ready to serve traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;POST /chaos&lt;/strong&gt; is the interesting one. It's only active in canary mode and lets you simulate degraded behaviour: slow responses, random 500 errors, or a full recovery. This is the kind of endpoint that makes canary deployments genuinely useful — you can test how your system behaves under failure before rolling it out to everyone.&lt;/p&gt;

&lt;p&gt;Canary mode also adds an &lt;code&gt;X-Mode: canary&lt;/code&gt; header to every response, so you can always tell which mode the service is running in just by inspecting the headers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the CLI
&lt;/h2&gt;

&lt;p&gt;The CLI is a single Python script with no external framework — just &lt;code&gt;argparse&lt;/code&gt;-style argument handling, PyYAML for parsing the manifest, and Jinja2 for rendering templates.&lt;/p&gt;

&lt;p&gt;The five subcommands each have a clear responsibility:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;init&lt;/strong&gt; reads the manifest and renders both templates. Simple, fast, deterministic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;validate&lt;/strong&gt; runs five pre-flight checks before anything is deployed. It checks that the manifest exists and is valid YAML, that all required fields are present, that the Docker image exists locally, that the Nginx port is free on the host, and that the generated nginx.conf passes a syntax check.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;deploy&lt;/strong&gt; chains init and validate together, then brings up the stack and blocks until health checks pass — or times out after 60 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;promote&lt;/strong&gt; is the most complex command. It updates the mode field in &lt;code&gt;manifest.yaml&lt;/code&gt; in-place, regenerates &lt;code&gt;docker-compose.yml&lt;/code&gt; with the new MODE environment variable, restarts only the app container (not nginx), and then confirms the new mode is active by hitting &lt;code&gt;/healthz&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;teardown&lt;/strong&gt; brings everything down cleanly. With &lt;code&gt;--clean&lt;/code&gt;, it also deletes the generated config files.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Challenges — And There Were Many
&lt;/h2&gt;

&lt;p&gt;I want to be honest here. This project did not go smoothly. Here is every wall I hit, in order.&lt;/p&gt;

&lt;h3&gt;
  
  
  The folder was named wrong
&lt;/h3&gt;

&lt;p&gt;My templates folder was named &lt;code&gt;template&lt;/code&gt; — without the &lt;strong&gt;s&lt;/strong&gt;. The CLI was looking for &lt;code&gt;templates/nginx.conf.j2&lt;/code&gt; and kept throwing a &lt;code&gt;TemplateNotFound&lt;/code&gt; error. I spent more time than I'd like to admit staring at that error before noticing the missing letter.&lt;/p&gt;

&lt;h3&gt;
  
  
  The file was named wrong too
&lt;/h3&gt;

&lt;p&gt;Once the folder name was fixed, the nginx config template was named &lt;code&gt;nginx.config.j2&lt;/code&gt; instead of &lt;code&gt;nginx.conf.j2&lt;/code&gt;. Config versus conf — four characters making the whole thing fail silently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Windows doesn't have chmod
&lt;/h3&gt;

&lt;p&gt;Running &lt;code&gt;chmod +x swiftdeploy&lt;/code&gt; in PowerShell throws an error. On Windows, you just run &lt;code&gt;python swiftdeploy &amp;lt;command&amp;gt;&lt;/code&gt; directly — no permissions needed. This caught me off guard because the instructions assumed a Linux environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Dockerfile wasn't saving
&lt;/h3&gt;

&lt;p&gt;This one was the most frustrating. I edited the Dockerfile in VSCode multiple times, but the changes weren't persisting. The tab showed unsaved changes that I kept missing. Every &lt;code&gt;docker build&lt;/code&gt; was using the old version of the file, and &lt;code&gt;pip install&lt;/code&gt; was installing nothing because the COPY paths were wrong.&lt;/p&gt;

&lt;p&gt;The fix was bypassing VSCode entirely and writing the file content directly from the PowerShell terminal using &lt;code&gt;Out-File&lt;/code&gt;. Once the file was written programmatically, the builds started working correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  app/requirements.txt was empty
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;requirements.txt&lt;/code&gt; inside the &lt;code&gt;app/&lt;/code&gt; folder was created but had no content — completely empty. Because &lt;code&gt;pip install&lt;/code&gt; on an empty file succeeds without error, the container built cleanly but had no packages installed. &lt;code&gt;fastapi&lt;/code&gt; and &lt;code&gt;uvicorn&lt;/code&gt; were both missing, and the container crashed on startup with &lt;code&gt;No module named uvicorn&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I only caught this by running &lt;code&gt;docker run --rm swift-deploy-1-node:latest pip list&lt;/code&gt; and seeing nothing but &lt;code&gt;pip&lt;/code&gt; in the output. The fix was writing the dependencies directly from the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="s2"&gt;"fastapi==0.111.0&lt;/span&gt;&lt;span class="se"&gt;`n&lt;/span&gt;&lt;span class="s2"&gt;uvicorn[standard]==0.29.0"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Out-File&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-FilePath&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;app\requirements.txt&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Encoding&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;utf8&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Docker kept caching broken layers
&lt;/h3&gt;

&lt;p&gt;Even after fixing the Dockerfile and the requirements file, Docker kept serving the old cached image. The fix was force-removing the image entirely and rebuilding with &lt;code&gt;--no-cache&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;rmi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-f&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;swift-deploy-1-node:latest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--no-cache&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;swift-deploy-1-node:latest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Port 3000 was already allocated
&lt;/h3&gt;

&lt;p&gt;My Stage 2 project containers were still running in the background and had port 3000 allocated. Every attempt to test the app container on port 3000 failed with &lt;code&gt;Bind for 0.0.0.0:3000 failed: port is already allocated&lt;/code&gt;. The fix was simply stopping the Stage 2 frontend container temporarily.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nginx upstream validation broke pre-deployment
&lt;/h3&gt;

&lt;p&gt;The validate command tests nginx syntax by spinning up a temporary nginx container. But before the stack is running, the &lt;code&gt;app&lt;/code&gt; hostname doesn't exist on any Docker network, so nginx reports &lt;code&gt;host not found in upstream&lt;/code&gt;. This looks like a failure but is completely expected — the config is syntactically correct, the hostname just doesn't resolve yet.&lt;/p&gt;

&lt;p&gt;The fix was updating the validate logic to treat this specific error as a pass, not a failure. Any other nginx error would still cause validation to fail.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Moment It Worked
&lt;/h2&gt;

&lt;p&gt;After all of that, here is what the final deploy output looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;▶  swiftdeploy deploy
  ✔  nginx.conf generated
  ✔  docker-compose.yml generated
  ✔  manifest.yaml exists and is valid YAML
  ✔  All required manifest fields present and non-empty
  ✔  Docker image exists locally: swift-deploy-1-node:latest
  ✔  Nginx port 8080 is free
  ✔  nginx.conf is syntactically valid
✔  All checks passed — stack is ready to deploy
  ➜  Bringing up the stack…
  ✔  Container swiftdeploy-app-1    Healthy
  ✔  Container swiftdeploy-nginx-1  Started
  ✔  Stack is healthy → http://localhost:8080
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And hitting &lt;code&gt;http://localhost:8080&lt;/code&gt; in the browser returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Welcome to SwiftDeploy API — running in stable mode"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stable"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.0.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-02T17:44:59.804140+00:00"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then promoting to canary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;▶  swiftdeploy promote canary
  ✔  manifest.yaml updated → mode: canary
  ✔  docker-compose.yml regenerated
  ➜  Restarting app container…
  ✔  Service healthy after promote → http://localhost:8080/healthz
  ➜  Active mode confirmed: canary
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That moment — seeing &lt;code&gt;Active mode confirmed: canary&lt;/code&gt; in the terminal — felt genuinely satisfying after everything it took to get there.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Declarative infrastructure is powerful but unforgiving.&lt;/strong&gt; When the manifest is the single source of truth, every typo and every wrong path has consequences. But when it works, the elegance is undeniable — one file describes everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always verify your file saves.&lt;/strong&gt; On Windows especially, VSCode unsaved changes are easy to miss. When something isn't working despite your edits, verify the file content from the terminal before assuming the code is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker caching is a double-edged sword.&lt;/strong&gt; It speeds up builds dramatically, but when you're debugging image content, it can hide your fixes behind stale layers. &lt;code&gt;--no-cache&lt;/code&gt; should be your first instinct when something is inexplicably wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Empty files fail silently.&lt;/strong&gt; An empty &lt;code&gt;requirements.txt&lt;/code&gt; is not an error — it's a valid file with no dependencies. Always verify what's actually inside your files, not just that they exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-flight validation saves deployments.&lt;/strong&gt; The five checks in the validate command caught real problems before they reached production. The nginx upstream check in particular required nuanced handling — not every nginx error is a real error.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;SwiftDeploy is not a perfect tool. But it works. It deploys a full stack from a single manifest, handles canary deployments with a single command, and validates itself before touching anything.&lt;/p&gt;

&lt;p&gt;More importantly, every challenge I hit while building it taught me something real about how infrastructure tools work — and why the details matter so much.&lt;/p&gt;

&lt;p&gt;If you're working through HNG14 or any similar programme, my advice is simple: document your failures as carefully as your successes. The graders can tell the difference between someone who got fortunate and someone who actually understands what they built.&lt;/p&gt;

&lt;p&gt;Good fortune for you out there. 🚀&lt;/p&gt;




&lt;p&gt;Here's the repo incase it interest you to clone and replicate&lt;br&gt;
Github:&lt;a href="https://github.com/ntonous/hng14-stage4-taask.git" rel="noopener noreferrer"&gt;https://github.com/ntonous/hng14-stage4-taask.git&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Built with Python, FastAPI, Nginx, Docker,Jinja2 and a lot of patience.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;HNG14 Stage 4A — DevOps Track&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>nginx</category>
      <category>docker</category>
    </item>
    <item>
      <title>How I Built a Real-Time DDoS Detection Engine from Scratch</title>
      <dc:creator>Hezekiah Umoh</dc:creator>
      <pubDate>Tue, 05 May 2026 20:25:39 +0000</pubDate>
      <link>https://dev.to/hezekiah_umoh/how-i-built-a-real-time-ddos-detection-engine-from-scratch-11ei</link>
      <guid>https://dev.to/hezekiah_umoh/how-i-built-a-real-time-ddos-detection-engine-from-scratch-11ei</guid>
      <description>&lt;h1&gt;
  
  
  How I Built a Real-Time DDoS Detection Engine from Scratch (No Fail2Ban, No Libraries)
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A beginner-friendly walkthrough of how I built a system that watches live web traffic, learns what "normal" looks like, and automatically blocks attackers — all from scratch using Python.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Project Exists
&lt;/h2&gt;

&lt;p&gt;Imagine you run a cloud storage platform. Thousands of users upload and download files every day. Then one morning, a single IP address starts sending &lt;strong&gt;500 requests per second&lt;/strong&gt; to your server — way more than any normal user would ever send.&lt;/p&gt;

&lt;p&gt;Your server starts slowing down. Real users can't log in. Files won't upload. Your platform is under attack.&lt;/p&gt;

&lt;p&gt;This is called a &lt;strong&gt;DDoS attack&lt;/strong&gt; — Distributed Denial of Service. The goal is simple: flood your server with so much traffic that it can't serve real users anymore.&lt;/p&gt;

&lt;p&gt;My job in this project was to build a tool that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Watches all incoming traffic in real time&lt;/li&gt;
&lt;li&gt;Learns what normal traffic looks like&lt;/li&gt;
&lt;li&gt;Detects when something is wrong&lt;/li&gt;
&lt;li&gt;Automatically blocks the attacker&lt;/li&gt;
&lt;li&gt;Sends a Slack alert so the team knows what happened&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And I had to do it &lt;strong&gt;without&lt;/strong&gt; using Fail2Ban or any rate-limiting library. Everything had to be built from scratch.&lt;/p&gt;

&lt;p&gt;Let's walk through how it works — step by step.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Big Picture
&lt;/h2&gt;

&lt;p&gt;Before diving into code, here's what the system looks like at a high level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Internet Traffic
      ↓
   Nginx (reverse proxy)
      ↓ writes JSON logs
   /var/log/nginx/hng-access.log
      ↓ tailed continuously
   monitor.py (sliding windows)
      ↓ feeds counts
   baseline.py (learns normal)
      ↓ compares
   detector.py (flags anomalies)
      ↓ if anomaly found
   blocker.py → iptables DROP rule
   notifier.py → Slack alert
   audit.py → audit log
      ↓ always running
   dashboard.py → live web UI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every component runs as a &lt;strong&gt;daemon&lt;/strong&gt; — a background process that never stops. It's not a cron job that runs once a minute. It's always watching, always learning.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: Watching the Logs (monitor.py)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Nginx doing?
&lt;/h3&gt;

&lt;p&gt;Nginx is a web server that sits in front of our Nextcloud application. Every time someone makes a request — loading a page, uploading a file, logging in — Nginx writes a line to an &lt;strong&gt;access log&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I configured Nginx to write logs in JSON format so they're easy to parse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source_ip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"45.33.10.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2024-01-15T12:34:56+00:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/index.php"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"response_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4521&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line per request. Millions of lines per day on a busy server.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do we read the log in real time?
&lt;/h3&gt;

&lt;p&gt;You know how &lt;code&gt;tail -f&lt;/code&gt; in Linux shows you new lines as they appear in a file? That's exactly what &lt;code&gt;monitor.py&lt;/code&gt; does — but in Python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tail_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seek&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# jump to end of file — skip old history
&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_line&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;   &lt;span class="c1"&gt;# send to main loop
&lt;/span&gt;            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# no new data, wait a moment
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key line is &lt;code&gt;fh.seek(0, 2)&lt;/code&gt; — this moves our reading position to the &lt;strong&gt;end&lt;/strong&gt; of the file when we start. We don't want to process yesterday's logs, just new traffic from this moment forward.&lt;/p&gt;

&lt;p&gt;Then we loop forever: read a line, parse it, yield the result. The &lt;code&gt;yield&lt;/code&gt; makes this a &lt;strong&gt;generator&lt;/strong&gt; — it produces one request at a time for the main detection loop to process.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Sliding Window — tracking who's doing what
&lt;/h3&gt;

&lt;p&gt;Now here's where it gets interesting. For every request that comes in, we need to answer: "How many requests has this IP sent in the last 60 seconds?"&lt;/p&gt;

&lt;p&gt;The naive approach would be to count all requests and reset every minute. But that has a problem — what if someone sends 100 requests at 11:59 and 100 more at 12:00? A per-minute counter would show 100 for each minute, missing the burst.&lt;/p&gt;

&lt;p&gt;The right approach is a &lt;strong&gt;sliding window&lt;/strong&gt; using a &lt;code&gt;deque&lt;/code&gt; (double-ended queue).&lt;/p&gt;

&lt;p&gt;Think of a deque like a conveyor belt. New requests go on the right. Old requests fall off the left. The length of the belt is always exactly 60 seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;

&lt;span class="n"&gt;WINDOW&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;  &lt;span class="c1"&gt;# seconds
&lt;/span&gt;
&lt;span class="n"&gt;global_window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;              &lt;span class="c1"&gt;# all requests
&lt;/span&gt;&lt;span class="n"&gt;ip_windows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# per-IP requests
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Add this request to the right of both deques
&lt;/span&gt;    &lt;span class="n"&gt;global_window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ip_windows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Evict entries older than 60 seconds from the left
&lt;/span&gt;    &lt;span class="n"&gt;cutoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;WINDOW&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;global_window&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;global_window&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;global_window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popleft&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dq&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ip_windows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;dq&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;dq&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;dq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popleft&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every entry in the deque is just a &lt;strong&gt;timestamp&lt;/strong&gt;. So to get the current rate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ip_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip_windows&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;45.33.10.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;   &lt;span class="c1"&gt;# requests from this IP in last 60s
&lt;/span&gt;&lt;span class="n"&gt;global_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;global_window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# all requests in last 60s
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No division needed. No rounding errors. Just count how many timestamps are still in the window.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: Learning What "Normal" Looks Like (baseline.py)
&lt;/h2&gt;

&lt;p&gt;Here's a critical insight: &lt;strong&gt;you can't hardcode what "too many requests" means&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At 3am, getting 5 requests per second might be unusual. At noon, getting 50 requests per second might be perfectly normal. If you hardcode a threshold of "more than 20 req/s = attack", you'll get false alarms all morning and miss attacks at night.&lt;/p&gt;

&lt;p&gt;The solution is a &lt;strong&gt;rolling baseline&lt;/strong&gt; — the system learns what normal looks like by watching recent traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  How the baseline is calculated
&lt;/h3&gt;

&lt;p&gt;Every second, we record how many requests came in that second:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="c1"&gt;# stores (timestamp, count, error_count)
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Increment current-second counter
&lt;/span&gt;    &lt;span class="n"&gt;_current_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;_current_errors&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every second, we flush the current count into our history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_flush&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_errors&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Remove data older than 30 minutes
&lt;/span&gt;    &lt;span class="n"&gt;cutoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1800&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popleft&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every 60 seconds, we recalculate the baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_compute&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;variance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# never go below floor value
&lt;/span&gt;    &lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;std&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# never go below floor value
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The per-hour slot trick
&lt;/h3&gt;

&lt;p&gt;Traffic patterns change throughout the day. Morning rush hour is different from midnight. So instead of one global rolling average, we keep &lt;strong&gt;per-hour slots&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hourly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# { hour_of_day -&amp;gt; [counts] }
&lt;/span&gt;
&lt;span class="c1"&gt;# When adding a sample:
&lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;localtime&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;tm_hour&lt;/span&gt;
&lt;span class="n"&gt;hourly&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When computing the baseline, we prefer the current hour's data if it has enough samples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;current_hour&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;localtime&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;tm_hour&lt;/span&gt;
&lt;span class="n"&gt;hour_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hourly&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_hour&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hour_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hour_data&lt;/span&gt;        &lt;span class="c1"&gt;# use today's 2pm data to judge 2pm traffic
&lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;full_30min_window&lt;/span&gt;   &lt;span class="c1"&gt;# not enough hour data yet, use rolling window
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means at 2pm, the baseline reflects what 2pm traffic normally looks like — not 3am traffic from 6 hours ago.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: Detecting Attacks (detector.py)
&lt;/h2&gt;

&lt;p&gt;Now we have two numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;current_rate&lt;/code&gt; — how many requests this IP sent in the last 60 seconds&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;baseline_mean&lt;/code&gt; and &lt;code&gt;baseline_std&lt;/code&gt; — what normal looks like&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question is: how different does the current rate need to be before we call it an attack?&lt;/p&gt;

&lt;h3&gt;
  
  
  Z-score: the statistical approach
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;z-score&lt;/strong&gt; tells you how many standard deviations away from the mean a value is. The formula is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;standard_deviation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mean = 10 req/s, Std = 2 req/s&lt;/li&gt;
&lt;li&gt;Current rate = 16 req/s&lt;/li&gt;
&lt;li&gt;Z-score = (16 - 10) / 2 = &lt;strong&gt;3.0&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A z-score of 3.0 means the value is 3 standard deviations above normal. In statistics, this happens by chance less than 0.3% of the time. That's suspicious.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_ip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip_error_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseline_error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip_rate&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;

    &lt;span class="c1"&gt;# Check z-score first
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;z-score=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Also check raw multiplier (catches slow z-score rises)
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ip_rate&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ip_rate&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;req/s &amp;gt; 5x baseline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use &lt;strong&gt;two conditions&lt;/strong&gt; because they catch different attack patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Z-score catches gradual increases relative to variance&lt;/li&gt;
&lt;li&gt;5x multiplier catches sudden spikes even when variance is low&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Error surge tightening
&lt;/h3&gt;

&lt;p&gt;Here's a clever trick: if an IP is generating lots of 404 errors or failed login attempts (4xx/5xx responses), it's probably a scanner or brute-force attack. We tighten the thresholds automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# If IP's error rate &amp;gt; 3x the baseline error rate...
&lt;/span&gt;&lt;span class="n"&gt;error_surge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ip_error_rate&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;baseline_error_rate&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;error_surge&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;z_limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;    &lt;span class="c1"&gt;# tighter threshold (was 3.0)
&lt;/span&gt;    &lt;span class="n"&gt;mult&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt;    &lt;span class="c1"&gt;# tighter multiplier (was 5.0)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means suspicious IPs get caught faster, even if their total request rate isn't extreme yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: Blocking the Attacker (blocker.py)
&lt;/h2&gt;

&lt;p&gt;Once we detect an anomaly, we need to block the IP within &lt;strong&gt;10 seconds&lt;/strong&gt;. We use &lt;code&gt;iptables&lt;/code&gt; — Linux's built-in firewall.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;block_ip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseline_mean&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Add a DROP rule at the top of the INPUT chain
&lt;/span&gt;    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iptables&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-I&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INPUT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# source IP
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-j&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DROP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;     &lt;span class="c1"&gt;# drop all packets from this IP
&lt;/span&gt;    &lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-I INPUT 1&lt;/code&gt; means "insert at position 1" — the very top of the firewall rules. This ensures the block takes effect immediately for all subsequent packets.&lt;/p&gt;

&lt;h3&gt;
  
  
  The backoff schedule
&lt;/h3&gt;

&lt;p&gt;We don't permanently ban IPs on the first offense — they might be a misconfigured bot, not a malicious attacker. Instead, we use a &lt;strong&gt;backoff schedule&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Offense&lt;/th&gt;
&lt;th&gt;Ban Duration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1st&lt;/td&gt;
&lt;td&gt;10 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2nd&lt;/td&gt;
&lt;td&gt;30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3rd&lt;/td&gt;
&lt;td&gt;2 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4th+&lt;/td&gt;
&lt;td&gt;Permanent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;BAN_SCHEDULE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;   &lt;span class="c1"&gt;# seconds (-1 = permanent)
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;offense_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ban_count&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offense_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BAN_SCHEDULE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BAN_SCHEDULE&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;ban_count&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;offense_count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a ban expires, &lt;code&gt;unblock_expired()&lt;/code&gt; removes the iptables rule automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5: Slack Alerts (notifier.py)
&lt;/h2&gt;

&lt;p&gt;The team needs to know when something happens. We send structured Slack messages for every ban, unban, and global anomaly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_ban&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;baseline_mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:rotating_light: *IP BANNED*&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*IP:* `&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;`&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*Condition:* &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;condition&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*Rate:* &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;rate&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; req/s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*Baseline:* &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;baseline_mean&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; req/s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*Duration:* &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*Time:* &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WEBHOOK_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The webhook URL is stored as an &lt;strong&gt;environment variable&lt;/strong&gt; — never hardcoded in source code. This is important for security: if you accidentally push your code to GitHub, your webhook won't be exposed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 6: The Live Dashboard (dashboard.py)
&lt;/h2&gt;

&lt;p&gt;The dashboard is a Flask web app that shows live metrics and refreshes every 3 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;home&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    &amp;lt;html&amp;gt;
    &amp;lt;head&amp;gt;&amp;lt;meta http-equiv=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refresh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; content=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&amp;lt;/head&amp;gt;
    &amp;lt;body&amp;gt;
        &amp;lt;h1&amp;gt;Global Req/s: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;get_global_rate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/h1&amp;gt;
        &amp;lt;h2&amp;gt;Baseline Mean: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/h2&amp;gt;
        &amp;lt;h2&amp;gt;Banned IPs: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;get_blocked_list&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/h2&amp;gt;
        &amp;lt;!-- ... more stats ... --&amp;gt;
    &amp;lt;/body&amp;gt;
    &amp;lt;/html&amp;gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;lt;meta http-equiv="refresh" content="3"&amp;gt;&lt;/code&gt; tag makes the browser automatically reload every 3 seconds — no JavaScript needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 7: Putting It All Together (main.py)
&lt;/h2&gt;

&lt;p&gt;The main loop ties everything together. It's beautifully simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Start background threads
&lt;/span&gt;&lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daemon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dashboard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daemon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Main detection loop
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tail_log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_file&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Add to sliding windows
&lt;/span&gt;    &lt;span class="nf"&gt;add_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Unban expired IPs
&lt;/span&gt;    &lt;span class="n"&gt;blocker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unblock_expired&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Get current stats
&lt;/span&gt;    &lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;std&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;ip_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_ip_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Check for IP anomaly
&lt;/span&gt;    &lt;span class="n"&gt;anomaly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;detect_ip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;anomaly&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;blocker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;block_ip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 5. Check for global anomaly
&lt;/span&gt;    &lt;span class="n"&gt;global_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_global_rate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;g_anomaly&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g_reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;detector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;detect_global&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;global_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;g_anomaly&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;notifier&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_global_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g_reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;global_rate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. For every single HTTP request that hits the server, this code runs in milliseconds — checking whether it's part of an attack.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deploying with Docker
&lt;/h2&gt;

&lt;p&gt;The entire stack runs in Docker containers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;nextcloud&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="c1"&gt;# the actual app (pre-built image, not modified)&lt;/span&gt;
  &lt;span class="na"&gt;nginx&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;        &lt;span class="c1"&gt;# reverse proxy + JSON logging&lt;/span&gt;
  &lt;span class="na"&gt;detector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="c1"&gt;# our Python daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Nginx logs are shared via a &lt;strong&gt;named Docker volume&lt;/strong&gt; called &lt;code&gt;HNG-nginx-logs&lt;/code&gt;. Nginx writes to it, and our detector reads from it — even though they're in separate containers.&lt;/p&gt;

&lt;p&gt;The detector runs with &lt;code&gt;network_mode: host&lt;/code&gt; and &lt;code&gt;privileged: true&lt;/code&gt; so that iptables commands affect the actual host machine's firewall, not just the container's network namespace.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Never hardcode thresholds.&lt;/strong&gt; What's "too many requests" depends entirely on your traffic patterns. Build a system that learns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Deques are perfect for sliding windows.&lt;/strong&gt; Python's &lt;code&gt;collections.deque&lt;/code&gt; with a maxlen or manual eviction is exactly the right data structure for time-based windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Two detection methods are better than one.&lt;/strong&gt; Z-score catches gradual increases. Rate multiplier catches sudden spikes. Together they cover more attack patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Store secrets in environment variables.&lt;/strong&gt; Never commit API keys, webhook URLs, or passwords to git. Use &lt;code&gt;.env&lt;/code&gt; files that are gitignored.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Daemons beat cron jobs.&lt;/strong&gt; A continuously running daemon reacts in milliseconds. A cron job that runs every minute can miss a 30-second attack entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;After all this work, here's what the live dashboard looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Global Req/s updating in real time&lt;/li&gt;
&lt;li&gt;Baseline Mean and Std Dev learned from actual traffic&lt;/li&gt;
&lt;li&gt;Active bans with conditions and durations&lt;/li&gt;
&lt;li&gt;Top 10 source IPs&lt;/li&gt;
&lt;li&gt;Audit log showing every ban, unban, and baseline recalculation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When an attack comes in, the sequence is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request arrives → sliding window updated&lt;/li&gt;
&lt;li&gt;Z-score computed → exceeds 3.0&lt;/li&gt;
&lt;li&gt;iptables rule added within 10 seconds&lt;/li&gt;
&lt;li&gt;Slack alert sent to team&lt;/li&gt;
&lt;li&gt;Audit log entry written&lt;/li&gt;
&lt;li&gt;Dashboard updates to show new ban&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of this happens automatically, 24/7, without any human intervention.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.python.org/3/library/collections.html#collections.deque" rel="noopener noreferrer"&gt;Python deque documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.frozentux.net/iptables-tutorial/iptables-tutorial.html" rel="noopener noreferrer"&gt;iptables beginner guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.statisticshowto.com/probability-and-statistics/z-score/" rel="noopener noreferrer"&gt;Z-score explained simply&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://flask.palletsprojects.com/en/3.0.x/quickstart/" rel="noopener noreferrer"&gt;Flask quickstart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker Compose documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built for HNG Internship Stage 3 — DevOps Track&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://github.com/ntonous/hng14-stage3-ddos-detector.git" rel="noopener noreferrer"&gt;https://github.com/ntonous/hng14-stage3-ddos-detector.git&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>python</category>
      <category>security</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
