How I Built a Real-Time DDoS Detection Engine from Scratch (No Fail2Ban)

#beginners #security #showdev #tutorial

Have you ever wondered how a website "knows" it's being attacked and automatically pulls the plug on the attacker?

I recently built an anomaly detection engine from scratch. It’s a live system that watches incoming traffic, learns what "normal" looks like, and automatically blocks suspicious IPs using Linux firewall rules.

In this post, I’ll walk you through how it works in plain English. No prior security experience required. Lets get into it....

🛠 What the Project Does (and Why It Matters)
Imagine a popular restaurant. Usually, customers walk in, order, and eat. But what if 500 people suddenly rushed in at once, stood at the counter, and ordered nothing? The staff would be so overwhelmed they couldn't serve real customers.

That is a DDoS (Distributed Denial of Service) attack.

The challenge is that you can't just say "block anyone who sends more than 100 requests." A busy server might normally get 200, while a quiet one gets 5. A hardcoded limit would either block real fans or miss real attackers.

The Solution: Build a system that learns your server's "rhythm" and flags anything that breaks it.

The Bird's Eye View
Here is how the data flows:

Internet Traffic hits the Nginx server.

Nginx writes logs to a shared folder.

My Detector Daemon (Python) reads those logs in real-time.

It calculates a Baseline, detects Anomalies, and executes a Ban.

It sends a Slack Alert and updates a Live Dashboard.

1. The Sliding Window: "Forgetting" the Past
To know how busy the server is right now, you can't look at all traffic since the beginning of time. You need a Sliding Window.

Think of a sliding window like a 60-second video clip. Every second, the "window" moves forward. It forgets the oldest second and adds the newest one.

In Python, I used a deque (a double-ended queue) to handle this efficiently:

from collections import deque
import time

# A list of (timestamp, is_error)
ip_window = deque()

def record_request(window):
    now = time.time()
    window.append(now)

    # EVICT OLD: Remove anything older than 60 seconds
    cutoff = now - 60 
    while window and window[0] < cutoff:
        window.popleft()

This is the beauty: It uses almost no memory. Old data literally "falls off" the conveyor belt, leaving you with a fresh count of exactly what happened in the last minute.

2. The Baseline: Learning What’s "Normal"
The sliding window tells us the current speed, but the Baseline tells us the "Speed Limit."

The engine keeps a 30-minute history of traffic. Every minute, it calculates the Average (Mean) and the Standard Deviation (how much the traffic usually fluctuates).

Quiet Morning: Average might be 2 requests/sec.

Busy Afternoon: Average might climb to 40 requests/sec.

Because the baseline is always recalculating, the system adapts. If your site gets a permanent boost in popularity, the "security guard" doesn't panic—it just learns the new normal. It is literally the darwin of this security architecture

Click here to see why this is the "Darwin" of Security

Just like the X-Men's Darwin, who grows gills when submerged in water, this baseline evolves based on the "pressure" of the traffic. If the traffic stays high, the baseline grows to accommodate it. If it stays low, it tightens up. It adapts so it never has to panic.

3. The Math: Z-Scores and Multipliers
How do we actually trigger a ban? We use two "sniff tests":

Test A: The Z-Score (The Statistical Freak-out)
A Z-score measures how many "standard deviations" a value is from the average.

Z-Score of 1: Totally normal.

Z-Score of 3+: This is mathematically "weird." In a normal world, this happens less than 0.2% of the time. Verdict: Blocked.

Test B: The Multiplier (The "Common Sense" Rule)
If the baseline is very quiet (e.g., 0.1 requests/sec), the Z-score can get jumpy. So we add a backup: Is the current rate 5x higher than the average? If yes: Verdict: Blocked

4. The Hammer: iptables
Once we catch a "bad actor," we have to stop them. We use iptables, the Linux kernel's built-in firewall.

When we detect an anomaly, the Python script runs a system command to DROP all traffic from that specific IP:

# What the code tells the Linux Kernel:
iptables -I INPUT -s 1.2.3.4 -j DROP

This is incredibly powerful. The traffic is blocked at the "front door" (the kernel level). It never even reaches the web server, saving your CPU and RAM for real users.

The "Backoff" Schedule
We aren't monsters! Sometimes a user just refreshes too fast. We use a "Three Strikes" system:

1st Offense: 10-minute ban.

2nd Offense: 30-minute ban.

3rd Offense: 2-hour ban.

4th Offense: Permanent block.

Because if an IP hasn't learned by now, it’s not a visitor, its a threat

📢 Real-Time Alerts
Security is only good if you know it's working. Every time a ban happens, the system shoots a message to Slack:

🚨 IP BANNED
IP: 1.2.3.4
Reason: Z-Score 4.5 (Way above normal!)
Rate: 50 req/s (Baseline: 5 req/s)
Duration: 600s

💡 Wrapping Up
By building this from scratch with no pre-made tools like Fail2Ban I learned that security isn't just about "locking doors." It's about observation, statistics, and automation.

The beauty of this engine is that it doesn't care if you're a tiny blog or a massive store; it watches your traffic, learns your baseline, and protects you accordingly.

🔗 Resources
Source Code: [https://github.com/Valescaray/hng-stage-3]

Live Dashboard: [https://monitor.oppsdev.xyz]

What do you think? Would you trust an automated math equation to protect your server? Let me know in the comments!