Frank

Posted on Apr 29

How I Built a Real-Time DDoS Detection Engine from Scratch

#beginners #cybersecurity #monitoring #tutorial

How I Built a Real-Time DDoS Detection Engine from Scratch

Introduction

Imagine you own a popular website. Thousands of people visit every day.
Then one morning, a hacker sends millions of fake requests to your server
all at once — trying to crash it. This is called a DDoS attack
(Distributed Denial of Service).

For HNG Stage 3, I was tasked with building a system that:

Watches all incoming web traffic in real time
Learns what "normal" traffic looks like
Automatically detects and blocks attackers
Sends instant Slack alerts
Shows everything on a live dashboard

Here's exactly how I built it — explained simply enough that
a complete beginner can follow along.

The Architecture — How Everything Connects

Think of the system like a security team for a building:
Internet → Nginx (doorman) → Nextcloud (the building)
↓
Access Log (visitor diary)
↓
Python Daemon (security guard reading the diary)
↓
┌──────────────────────────────┐
│ Detect attack → Ban IP │
│ Send Slack alert │
│ Show on live dashboard │
└──────────────────────────────┘
Nginx sits in front of everything. Every single request that
comes in — legitimate user or attacker — passes through Nginx first.
Nginx writes a JSON log entry for every request containing the IP
address, timestamp, URL, and status code.

Our Python daemon reads those log entries in real time,
learns what normal traffic looks like, and fires when something
looks wrong.

How the Sliding Window Works

Here's the core question our system needs to answer at any moment:

"How many requests did this IP make in the last 60 seconds?"

We use a data structure called a deque (double-ended queue)
to answer this efficiently.

Think of it like a conveyor belt:

New items (request timestamps) come in from the right
Old items (timestamps older than 60 seconds) fall off the left automatically

from collections import deque
from datetime import datetime, timedelta

ip_window = deque()

def add_request(ip_window, timestamp):
    # Add new request timestamp to RIGHT
    ip_window.append(timestamp)

    # Remove old timestamps from LEFT
    cutoff = timestamp - timedelta(seconds=60)
    while ip_window and ip_window[0] < cutoff:
        ip_window.popleft()

    # Length = requests in last 60 seconds
    return len(ip_window)

popleft() is O(1) — it removes from the front instantly.
This is why we use deque instead of a regular list — lists
are slow at removing from the front.

We maintain two of these windows:

One per IP — catches single aggressive attackers
One global — catches distributed attacks from many IPs

How The Baseline Learns From Traffic

Knowing the current rate isn't enough. We need to know if
that rate is normal or abnormal.

We solve this with a rolling 30-minute baseline:

Every second, we record how many requests arrived in that second.
We keep a 30-minute history of these per-second counts.
Every 60 seconds, we calculate:

Mean — the average requests per second:

mean = sum(counts) / len(counts)

Standard Deviation — how much the traffic usually varies:

variance = sum((x - mean) ** 2 for x in counts) / len(counts)
stddev = math.sqrt(variance)

We apply floor values to both — mean never drops below 1.0
and stddev never drops below 0.5. This prevents false alarms
when traffic is extremely stable.

We also store baselines in per-hour slots. Traffic at 3pm
looks different from traffic at 3am — so we prefer the current
hour's baseline when making decisions.

How The Detection Logic Makes A Decision

With the current rate and the baseline established, we calculate
a z-score:
z = (current_rate - baseline_mean) / baseline_stddev
The z-score answers: "How many standard deviations above
normal is this?"

Z-score	Meaning
1.0	Slightly above normal
2.0	Noticeably above normal
3.0	Very unusual — only 0.3% of traffic
10.0+	Almost certainly an attack

We flag an IP as anomalous if:

z-score > 3.0 (statistical threshold), OR
rate > 5x the baseline mean (simple multiplier)

Whichever fires first wins. This dual-trigger approach catches
both gradual ramp-up attacks (caught by z-score) and sudden
flood attacks (caught by the multiplier).

Error surge detection: If an IP is generating a lot of
4xx/5xx errors — like trying hundreds of wrong passwords —
we tighten its detection thresholds by 30%. It's already
behaving suspiciously, so we watch it more closely.

How iptables Blocks An IP

When an IP is flagged, we run this Linux firewall command:

iptables -I INPUT -s 1.2.3.4 -j DROP

Breaking it down:

iptables — the Linux kernel firewall tool
-I INPUT — INSERT a rule into the INPUT chain (incoming traffic)
-s 1.2.3.4 — match packets from this SOURCE IP
-j DROP — silently DROP all matching packets

DROP means the attacker gets absolutely no response.
Their packets just disappear. They don't even know they've
been blocked — they just stop getting responses.

We call this from Python using subprocess:

import subprocess

cmd = ['iptables', '-I', 'INPUT', '-s', ip, '-j', 'DROP']
result = subprocess.run(cmd, capture_output=True, text=True)

if result.returncode == 0:
    print(f"Successfully blocked {ip}")

Progressive ban schedule — repeat offenders get longer bans:

1st offence: 10 minutes
2nd offence: 30 minutes
3rd offence: 2 hours
4th+ offence: Permanent

When a ban expires, we delete the rule:

iptables -D INPUT -s 1.2.3.4 -j DROP

The Live Dashboard

The dashboard is a Flask web server running in a background thread.
It serves an HTML page that calls a /api/stats endpoint every
3 seconds and updates the display with fresh data.

It shows:

Global requests per second
Current baseline mean and stddev
All banned IPs with ban details
Top 10 source IPs by request rate
CPU and memory usage
System uptime
Hourly baseline slots

Key Lessons Learned

1. Async Python is powerful — running log monitoring,
baseline calculation, ban checking, and serving a dashboard
simultaneously with asyncio.gather() is elegant and efficient.

2. Read the logs — when the Nextcloud container had issues,
the logs told us exactly what was wrong and how to fix it.

3. Never hardcode secrets — GitHub Push Protection caught
our Slack webhook URL in the code. Always use environment
variables for secrets.

4. Docker volumes are the glue — the named HNG-nginx-logs
volume is what allows Nginx and our detector (in separate
containers) to share log files seamlessly.

5. Z-scores are surprisingly simple — statistical anomaly
detection sounds intimidating but the math is just subtraction
and division.

Conclusion

Building this system taught me that security tooling isn't magic —
it's just careful observation, smart math, and fast response.
The same principles used here are what power enterprise security
tools at companies like Cloudflare and AWS.

The full source code is available at:
https://github.com/Frank363-hash/hng-anomaly-detector

If you have questions or suggestions, drop them in the comments!