<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Frank</title>
    <description>The latest articles on DEV Community by Frank (@frank_shadow2).</description>
    <link>https://dev.to/frank_shadow2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3903334%2F6b014329-16c0-42eb-bf2d-3ea6c33cb66b.png</url>
      <title>DEV Community: Frank</title>
      <link>https://dev.to/frank_shadow2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/frank_shadow2"/>
    <language>en</language>
    <item>
      <title>I Built a DevOps Tool That Thinks: Adding "Eyes" and a "Brain" to SwiftDeploy</title>
      <dc:creator>Frank</dc:creator>
      <pubDate>Wed, 06 May 2026 20:35:31 +0000</pubDate>
      <link>https://dev.to/frank_shadow2/i-built-a-devops-tool-that-thinks-adding-eyes-and-a-brain-to-swiftdeploy-4hi</link>
      <guid>https://dev.to/frank_shadow2/i-built-a-devops-tool-that-thinks-adding-eyes-and-a-brain-to-swiftdeploy-4hi</guid>
      <description>&lt;p&gt;Most DevOps tasks start with a manual checklist: Is the disk full? Is the latency too high? Should we promote this Canary? In my latest project for the HNG Internship, I decided that "manual" wasn't fast enough. I didn't just want to deploy code; I wanted to build a tool that protects itself.&lt;/p&gt;

&lt;p&gt;I upgraded my CLI tool, swiftdeploy, from a simple script to a policy-driven engine with its own "Eyes" (Metrics) and "Brain" (Open Policy Agent). Here is how I did it.&lt;/p&gt;

&lt;p&gt;The Architecture: A Single Source of Truth&lt;br&gt;
The core of the project is the manifest.yaml. I wanted to follow the Declarative Infrastructure philosophy—where I describe what I want, and the tool figured out how to build it.&lt;/p&gt;

&lt;p&gt;My tool takes this manifest and programmatically generates the docker-compose.yml and nginx.conf. No more hand-writing configs or fixing typos in Nginx blocks.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Giving it "Eyes" (Instrumentation)
You can't manage what you can't see. I instrumented my API service (the engine) to expose a /metrics endpoint in Prometheus format.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I focused on the Golden Signals:&lt;/p&gt;

&lt;p&gt;Throughput: Tracking every request and status code.&lt;/p&gt;

&lt;p&gt;Latency: Using histograms to calculate P99 latency. (Because if 1% of your users are waiting 5 seconds, your app is broken, even if the average is fine).&lt;/p&gt;

&lt;p&gt;Health: Tracking uptime and whether Chaos Mode was active.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Giving it a "Brain" (The OPA Sidecar)
This was the biggest challenge. I integrated Open Policy Agent (OPA) as a sidecar container.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of hardcoding "if" statements in my Python/Bash script, I moved all the decision-making logic into Rego files.&lt;/p&gt;

&lt;p&gt;Why decoupling matters:&lt;br&gt;
If I want to change the "Safety Standard" (e.g., changing the allowed error rate from 1% to 0.5%), I don't touch my CLI code. I just update the .rego policy.&lt;/p&gt;

&lt;p&gt;I implemented two core policies:&lt;/p&gt;

&lt;p&gt;Infra Policy: Denies deployment if the host has less than 10GB of disk space.&lt;/p&gt;

&lt;p&gt;Canary Safety Policy: Denies promotion if the Canary's P99 Latency is over 500ms or error rates spike.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The "Gated" Lifecycle: Look Before You Leap
I updated the swiftdeploy CLI to be "Gated."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before the promote command actually switches traffic from Canary to Stable, it does a Pre-Promote Check:&lt;/p&gt;

&lt;p&gt;It scrapes the /metrics from the running Canary.&lt;/p&gt;

&lt;p&gt;It sends that data to OPA.&lt;/p&gt;

&lt;p&gt;OPA evaluates the data against the Rego policies.&lt;/p&gt;

&lt;p&gt;If OPA says "Deny," the CLI stops the deployment and explains exactly why (e.g., "Error rate too high").&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Testing with Chaos
To prove it worked, I had to break things. I used a /chaos endpoint to inject a "slow" state into the Canary.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When I ran swiftdeploy status, my real-time dashboard showed the P99 latency shooting up. When I tried to promote that "sick" Canary to production, the CLI refused. &amp;gt; CLI Output: Promotion Blocked: P99 Latency is 2000ms (Threshold: 500ms).&lt;/p&gt;

&lt;p&gt;That is the moment I knew the "Brain" was working.&lt;/p&gt;

&lt;p&gt;Lessons Learned&lt;br&gt;
Fail Fast: Pre-flight validation is a lifesaver. My tool checks if the Nginx port is already taken before it even tries to start a container.&lt;/p&gt;

&lt;p&gt;Observability is not optional: Without the /metrics endpoint, I would have been flying blind.&lt;/p&gt;

&lt;p&gt;Policy as Code: OPA makes infrastructure audit-friendly and incredibly flexible.&lt;/p&gt;

&lt;p&gt;Final Thought&lt;br&gt;
Most DevOps tasks ask you to configure infrastructure. This one asked me to build the tool that manages the infrastructure. It’s been an intense journey from writing basic PHP/MySQL apps to building self-healing DevOps CLI tools, but the control you gain is worth every line of code.&lt;br&gt;
What’s your favorite tool for enforcing deployment policies? Let me know in the comments!&lt;/p&gt;

&lt;h1&gt;
  
  
  DevOps #CloudEngineering #OpenPolicyAgent #Docker #HNG
&lt;/h1&gt;

</description>
      <category>automation</category>
      <category>devops</category>
      <category>showdev</category>
      <category>tooling</category>
    </item>
    <item>
      <title>How I Built a Real-Time DDoS Detection Engine from Scratch</title>
      <dc:creator>Frank</dc:creator>
      <pubDate>Wed, 29 Apr 2026 01:11:25 +0000</pubDate>
      <link>https://dev.to/frank_shadow2/how-i-built-a-real-time-ddos-detection-engine-from-scratch-94i</link>
      <guid>https://dev.to/frank_shadow2/how-i-built-a-real-time-ddos-detection-engine-from-scratch-94i</guid>
      <description>&lt;h1&gt;
  
  
  How I Built a Real-Time DDoS Detection Engine from Scratch
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Imagine you own a popular website. Thousands of people visit every day. &lt;br&gt;
Then one morning, a hacker sends millions of fake requests to your server &lt;br&gt;
all at once — trying to crash it. This is called a &lt;strong&gt;DDoS attack&lt;/strong&gt; &lt;br&gt;
(Distributed Denial of Service).&lt;/p&gt;

&lt;p&gt;For HNG Stage 3, I was tasked with building a system that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Watches all incoming web traffic in real time&lt;/li&gt;
&lt;li&gt;Learns what "normal" traffic looks like&lt;/li&gt;
&lt;li&gt;Automatically detects and blocks attackers&lt;/li&gt;
&lt;li&gt;Sends instant Slack alerts&lt;/li&gt;
&lt;li&gt;Shows everything on a live dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's exactly how I built it — explained simply enough that &lt;br&gt;
a complete beginner can follow along.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Architecture — How Everything Connects
&lt;/h2&gt;

&lt;p&gt;Think of the system like a &lt;strong&gt;security team for a building&lt;/strong&gt;:&lt;br&gt;
Internet → Nginx (doorman) → Nextcloud (the building)&lt;br&gt;
↓&lt;br&gt;
Access Log (visitor diary)&lt;br&gt;
↓&lt;br&gt;
Python Daemon (security guard reading the diary)&lt;br&gt;
↓&lt;br&gt;
┌──────────────────────────────┐&lt;br&gt;
│  Detect attack → Ban IP      │&lt;br&gt;
│  Send Slack alert            │&lt;br&gt;
│  Show on live dashboard      │&lt;br&gt;
└──────────────────────────────┘&lt;br&gt;
&lt;strong&gt;Nginx&lt;/strong&gt; sits in front of everything. Every single request that &lt;br&gt;
comes in — legitimate user or attacker — passes through Nginx first. &lt;br&gt;
Nginx writes a JSON log entry for every request containing the IP &lt;br&gt;
address, timestamp, URL, and status code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our Python daemon&lt;/strong&gt; reads those log entries in real time, &lt;br&gt;
learns what normal traffic looks like, and fires when something &lt;br&gt;
looks wrong.&lt;/p&gt;


&lt;h2&gt;
  
  
  How the Sliding Window Works
&lt;/h2&gt;

&lt;p&gt;Here's the core question our system needs to answer at any moment:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"How many requests did this IP make in the last 60 seconds?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We use a data structure called a &lt;strong&gt;deque&lt;/strong&gt; (double-ended queue) &lt;br&gt;
to answer this efficiently.&lt;/p&gt;

&lt;p&gt;Think of it like a &lt;strong&gt;conveyor belt&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New items (request timestamps) come in from the right&lt;/li&gt;
&lt;li&gt;Old items (timestamps older than 60 seconds) fall off the left automatically
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;deque&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="n"&gt;ip_window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deque&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip_window&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Add new request timestamp to RIGHT
&lt;/span&gt;    &lt;span class="n"&gt;ip_window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Remove old timestamps from LEFT
&lt;/span&gt;    &lt;span class="n"&gt;cutoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;ip_window&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;ip_window&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;ip_window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;popleft&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Length = requests in last 60 seconds
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ip_window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;code&gt;popleft()&lt;/code&gt; is O(1) — it removes from the front instantly. &lt;br&gt;
This is why we use deque instead of a regular list — lists &lt;br&gt;
are slow at removing from the front.&lt;/p&gt;

&lt;p&gt;We maintain two of these windows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One &lt;strong&gt;per IP&lt;/strong&gt; — catches single aggressive attackers&lt;/li&gt;
&lt;li&gt;One &lt;strong&gt;global&lt;/strong&gt; — catches distributed attacks from many IPs&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  How The Baseline Learns From Traffic
&lt;/h2&gt;

&lt;p&gt;Knowing the current rate isn't enough. We need to know if &lt;br&gt;
that rate is &lt;strong&gt;normal or abnormal&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We solve this with a &lt;strong&gt;rolling 30-minute baseline&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Every second, we record how many requests arrived in that second. &lt;br&gt;
We keep a 30-minute history of these per-second counts. &lt;br&gt;
Every 60 seconds, we calculate:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mean&lt;/strong&gt; — the average requests per second:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Standard Deviation&lt;/strong&gt; — how much the traffic usually varies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;variance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stddev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We apply &lt;strong&gt;floor values&lt;/strong&gt; to both — mean never drops below 1.0 &lt;br&gt;
and stddev never drops below 0.5. This prevents false alarms &lt;br&gt;
when traffic is extremely stable.&lt;/p&gt;

&lt;p&gt;We also store baselines in &lt;strong&gt;per-hour slots&lt;/strong&gt;. Traffic at 3pm &lt;br&gt;
looks different from traffic at 3am — so we prefer the current &lt;br&gt;
hour's baseline when making decisions.&lt;/p&gt;


&lt;h2&gt;
  
  
  How The Detection Logic Makes A Decision
&lt;/h2&gt;

&lt;p&gt;With the current rate and the baseline established, we calculate &lt;br&gt;
a &lt;strong&gt;z-score&lt;/strong&gt;:&lt;br&gt;
z = (current_rate - baseline_mean) / baseline_stddev&lt;br&gt;
The z-score answers: &lt;strong&gt;"How many standard deviations above &lt;br&gt;
normal is this?"&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Z-score&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;Slightly above normal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;Noticeably above normal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3.0&lt;/td&gt;
&lt;td&gt;Very unusual — only 0.3% of traffic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10.0+&lt;/td&gt;
&lt;td&gt;Almost certainly an attack&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We flag an IP as anomalous if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;z-score &amp;gt; 3.0&lt;/strong&gt; (statistical threshold), OR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;rate &amp;gt; 5x the baseline mean&lt;/strong&gt; (simple multiplier)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whichever fires first wins. This dual-trigger approach catches &lt;br&gt;
both gradual ramp-up attacks (caught by z-score) and sudden &lt;br&gt;
flood attacks (caught by the multiplier).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error surge detection:&lt;/strong&gt; If an IP is generating a lot of &lt;br&gt;
4xx/5xx errors — like trying hundreds of wrong passwords — &lt;br&gt;
we tighten its detection thresholds by 30%. It's already &lt;br&gt;
behaving suspiciously, so we watch it more closely.&lt;/p&gt;


&lt;h2&gt;
  
  
  How iptables Blocks An IP
&lt;/h2&gt;

&lt;p&gt;When an IP is flagged, we run this Linux firewall command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;iptables &lt;span class="nt"&gt;-I&lt;/span&gt; INPUT &lt;span class="nt"&gt;-s&lt;/span&gt; 1.2.3.4 &lt;span class="nt"&gt;-j&lt;/span&gt; DROP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breaking it down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;iptables&lt;/code&gt; — the Linux kernel firewall tool&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-I INPUT&lt;/code&gt; — INSERT a rule into the INPUT chain (incoming traffic)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-s 1.2.3.4&lt;/code&gt; — match packets from this SOURCE IP&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;-j DROP&lt;/code&gt; — silently DROP all matching packets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;DROP&lt;/strong&gt; means the attacker gets absolutely no response. &lt;br&gt;
Their packets just disappear. They don't even know they've &lt;br&gt;
been blocked — they just stop getting responses.&lt;/p&gt;

&lt;p&gt;We call this from Python using subprocess:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;iptables&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;INPUT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-s&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-j&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DROP&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully blocked &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Progressive ban schedule&lt;/strong&gt; — repeat offenders get longer bans:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1st offence: 10 minutes&lt;/li&gt;
&lt;li&gt;2nd offence: 30 minutes
&lt;/li&gt;
&lt;li&gt;3rd offence: 2 hours&lt;/li&gt;
&lt;li&gt;4th+ offence: Permanent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a ban expires, we delete the rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;iptables &lt;span class="nt"&gt;-D&lt;/span&gt; INPUT &lt;span class="nt"&gt;-s&lt;/span&gt; 1.2.3.4 &lt;span class="nt"&gt;-j&lt;/span&gt; DROP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Live Dashboard
&lt;/h2&gt;

&lt;p&gt;The dashboard is a Flask web server running in a background thread. &lt;br&gt;
It serves an HTML page that calls a &lt;code&gt;/api/stats&lt;/code&gt; endpoint every &lt;br&gt;
3 seconds and updates the display with fresh data.&lt;/p&gt;

&lt;p&gt;It shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Global requests per second&lt;/li&gt;
&lt;li&gt;Current baseline mean and stddev&lt;/li&gt;
&lt;li&gt;All banned IPs with ban details&lt;/li&gt;
&lt;li&gt;Top 10 source IPs by request rate&lt;/li&gt;
&lt;li&gt;CPU and memory usage&lt;/li&gt;
&lt;li&gt;System uptime&lt;/li&gt;
&lt;li&gt;Hourly baseline slots&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Async Python is powerful&lt;/strong&gt; — running log monitoring, &lt;br&gt;
baseline calculation, ban checking, and serving a dashboard &lt;br&gt;
simultaneously with asyncio.gather() is elegant and efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Read the logs&lt;/strong&gt; — when the Nextcloud container had issues, &lt;br&gt;
the logs told us exactly what was wrong and how to fix it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Never hardcode secrets&lt;/strong&gt; — GitHub Push Protection caught &lt;br&gt;
our Slack webhook URL in the code. Always use environment &lt;br&gt;
variables for secrets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Docker volumes are the glue&lt;/strong&gt; — the named HNG-nginx-logs &lt;br&gt;
volume is what allows Nginx and our detector (in separate &lt;br&gt;
containers) to share log files seamlessly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Z-scores are surprisingly simple&lt;/strong&gt; — statistical anomaly &lt;br&gt;
detection sounds intimidating but the math is just subtraction &lt;br&gt;
and division.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building this system taught me that security tooling isn't magic — &lt;br&gt;
it's just careful observation, smart math, and fast response. &lt;br&gt;
The same principles used here are what power enterprise security &lt;br&gt;
tools at companies like Cloudflare and AWS.&lt;/p&gt;

&lt;p&gt;The full source code is available at:&lt;br&gt;
&lt;a href="https://github.com/Frank363-hash/hng-anomaly-detector" rel="noopener noreferrer"&gt;https://github.com/Frank363-hash/hng-anomaly-detector&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have questions or suggestions, drop them in the comments!&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>cybersecurity</category>
      <category>monitoring</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
