How I Built a Real-Time DDoS Detection Engine from Scratch

#aws #devops #cloud #productivity

Imagine you run a cloud storage platform serving thousands of users. One day, an attacker floods your server with millions of requests per second. Your server crashes. Real users can't access their files. You lose money and trust.
This is a DDoS attack — Distributed Denial of Service. The goal of this project was to build a tool that detects these attacks automatically and blocks them before they cause damage.
No off-the-shelf tools. No Fail2Ban. Pure Python, built from scratch.

What the Tool Does
Here's the full picture of what I built:
Nginx (logs every request as JSON)
↓
Detector daemon reads logs in real time
↓
Sliding window tracks request rates
↓
Baseline learns what normal traffic looks like
↓
Anomaly detector compares current rate to baseline
↓
If anomalous → block IP with iptables + send Slack alert
↓
Auto-unban after cooldown period
Everything runs continuously as a background service on a Linux server.

Part 1: Reading Nginx Logs in Real Time
The first challenge was getting the tool to watch incoming traffic live. Nginx was configured to write every HTTP request as a JSON line to a log file:
json{
"source_ip": "102.91.99.217",
"timestamp": "2026-04-27T02:31:00+00:00",
"method": "GET",
"path": "/",
"status": 200,
"response_size": 6674
}
To read this in real time, I used a technique called log tailing — the same thing tail -f does in Linux. The program opens the file, jumps to the end, and then sits in a loop reading new lines as they appear:
pythonwith open(log_path, "r") as f:
f.seek(0, 2) # jump to end of file
while True:
line = f.readline()
if not line:
time.sleep(0.1) # wait for new data
continue
yield parse_line(line) # process the line
Every time Nginx writes a new request, the detector picks it up within 100 milliseconds.

Part 2: The Sliding Window
Now that we're reading requests in real time, we need to know how fast each IP is sending requests. This is where the sliding window comes in.
A sliding window answers the question: "How many requests has this IP sent in the last 60 seconds?"
The naive approach would be a counter that resets every minute. But that's inaccurate — an attacker could send 1000 requests in the last 10 seconds of one minute and the first 10 seconds of the next, and the counter would never catch it.
Instead, I used Python's collections.deque — a double-ended queue that lets us add to one end and remove from the other efficiently.
Here's how it works:
pythonfrom collections import deque
import time

ip_window = deque() # stores timestamps of recent requests

def record_request(ip):
now = time.time()
cutoff = now - 60 # 60 second window

# Add current timestamp
ip_window.append(now)

# Evict timestamps older than 60 seconds from the left
while ip_window and ip_window[0] < cutoff:
    ip_window.popleft()

# Rate = number of requests in window / window size
rate = len(ip_window) / 60
return rate

Every time a request comes in, we add its timestamp. Every time we check the rate, we first remove any timestamps older than 60 seconds from the left side of the deque. The rate is simply the count of remaining timestamps divided by 60.
This gives us an accurate, always up-to-date requests-per-second count for every IP on the server.

Part 3: The Baseline — Teaching the Tool What "Normal" Looks Like
Knowing the current rate isn't enough. We need to know if that rate is unusual.
At 3am, 5 requests per second might be suspicious. At noon, it might be completely normal. The tool needs to learn from actual traffic patterns — not from hardcoded values.
This is the rolling baseline. Here's how it works:

Every second, we record how many requests the server received that second
We keep a 30-minute history of these per-second counts
Every 60 seconds, we calculate the mean (average) and standard deviation of these counts

pythonsamples = [count for _, count in rolling_window]

mean = sum(samples) / len(samples)
variance = sum((x - mean) ** 2 for x in samples) / len(samples)
stddev = math.sqrt(variance)
The mean tells us what a typical second looks like. The standard deviation tells us how much variation is normal.
I also maintain per-hour slots — separate baselines for each hour of the day. If the current hour has enough data (at least 5 samples), I prefer that over the general baseline. This means the tool naturally adapts to rush hours vs quiet hours.
To prevent the tool from failing on a fresh start with no data, I set floor values:

Minimum mean: 0.1 req/s
Minimum stddev: 0.1

Part 4: Detecting Anomalies
With a baseline established, detection becomes a statistical question: "Is this IP's current rate unusually high compared to normal?"
I use two detection methods — whichever fires first:
Method 1: Z-Score
The z-score measures how many standard deviations above the mean a value is:
pythonz_score = (current_rate - baseline_mean) / baseline_stddev
If the z-score exceeds 1.5, the IP is anomalous. A z-score of 1.5 means the rate is 1.5 standard deviations above normal — statistically unusual.
For example:

Baseline mean: 1.0 req/s
Baseline stddev: 0.5
Current rate: 2.5 req/s
Z-score: (2.5 - 1.0) / 0.5 = 3.0 → anomalous!

Method 2: Rate Multiplier
Sometimes the stddev is very small and the z-score math doesn't capture obvious spikes. So I also check if the rate is more than 1.5x the baseline mean:
pythonif current_rate > 1.5 * baseline_mean:
# flag as anomalous
Error Rate Tightening
If an IP is sending a lot of 4xx or 5xx errors (bad requests, unauthorized attempts), I automatically tighten the thresholds by 30%. An IP probing for vulnerabilities gets less tolerance.

Part 5: Blocking with iptables
When an IP is flagged as anomalous, we block it at the kernel level using iptables. This is more powerful than blocking at the application level because the packets are dropped before they even reach Nginx.
pythonimport subprocess

def ban_ip(ip):
subprocess.run([
"iptables", "-I", "INPUT", "-s", ip, "-j", "DROP"
])
The -I INPUT inserts the rule at the top of the INPUT chain. -j DROP silently drops all packets from that IP. The attacker's requests never even reach the server.
You can verify bans are active with:
bashsudo iptables -L INPUT -n
Auto-Unban with Backoff Schedule
Permanent bans aren't always appropriate — the IP might be a legitimate user who got flagged by mistake. So I implemented an automatic unban system with a backoff schedule:

First offence: banned for 10 minutes
Second offence: banned for 30 minutes
Third offence: banned for 2 hours
Fourth offence: permanently banned

Each time an IP is unbanned, a Slack notification is sent with the next ban duration if they reoffend.

Part 6: Slack Alerts
Every ban and unban sends an immediate Slack notification via webhook:
pythonrequests.post(webhook_url, json={
"text": f"🚨 IP BANNED: {ip}\nRate: {rate} req/s\nBaseline: {mean} req/s\nDuration: {duration}"
})
The alert includes the condition that fired, the current rate, the baseline, and the ban duration — everything needed to understand what happened without digging through logs.

Part 7: The Live Dashboard
The tool serves a web dashboard on port 8080 that refreshes every 3 seconds showing:

Global requests per second
Current baseline mean and stddev
List of banned IPs with reasons
Top 10 source IPs
CPU and memory usage

Built with pure Python's http.server — no frameworks needed.

What I Learned
Building this from scratch taught me things no tutorial ever could:

Statistical anomaly detection is surprisingly approachable once you understand z-scores
deque is one of the most useful Python data structures for time-based problems
iptables is incredibly powerful — blocking at kernel level is orders of magnitude more efficient than application-level blocking
Baselines must be dynamic — hardcoded thresholds always fail in production because traffic patterns change by hour, day, and season
A daemon is not a cron job — continuous processing requires careful thought about memory, threading, and graceful shutdown

The Stack

Python 3.12 — detector daemon
Docker + Docker Compose — Nextcloud and Nginx deployment
Nginx — reverse proxy with JSON access logging
iptables — kernel-level IP blocking
Slack webhooks — real-time alerts
systemd — keeps the daemon running persistently

Repository
The full source code is available at:
https://github.com/Hacker-Dark/hng-stage3-devops

Built as part of the HNG DevOps Internship Stage 3 task.