How I Built a Real-Time DDoS Detection Engine from Scratch
Introduction
Imagine you own a popular website. Thousands of people visit every day.
Then one morning, a hacker sends millions of fake requests to your server
all at once — trying to crash it. This is called a DDoS attack
(Distributed Denial of Service).
For HNG Stage 3, I was tasked with building a system that:
- Watches all incoming web traffic in real time
- Learns what "normal" traffic looks like
- Automatically detects and blocks attackers
- Sends instant Slack alerts
- Shows everything on a live dashboard
Here's exactly how I built it — explained simply enough that
a complete beginner can follow along.
The Architecture — How Everything Connects
Think of the system like a security team for a building:
Internet → Nginx (doorman) → Nextcloud (the building)
↓
Access Log (visitor diary)
↓
Python Daemon (security guard reading the diary)
↓
┌──────────────────────────────┐
│ Detect attack → Ban IP │
│ Send Slack alert │
│ Show on live dashboard │
└──────────────────────────────┘
Nginx sits in front of everything. Every single request that
comes in — legitimate user or attacker — passes through Nginx first.
Nginx writes a JSON log entry for every request containing the IP
address, timestamp, URL, and status code.
Our Python daemon reads those log entries in real time,
learns what normal traffic looks like, and fires when something
looks wrong.
How the Sliding Window Works
Here's the core question our system needs to answer at any moment:
"How many requests did this IP make in the last 60 seconds?"
We use a data structure called a deque (double-ended queue)
to answer this efficiently.
Think of it like a conveyor belt:
- New items (request timestamps) come in from the right
- Old items (timestamps older than 60 seconds) fall off the left automatically
from collections import deque
from datetime import datetime, timedelta
ip_window = deque()
def add_request(ip_window, timestamp):
# Add new request timestamp to RIGHT
ip_window.append(timestamp)
# Remove old timestamps from LEFT
cutoff = timestamp - timedelta(seconds=60)
while ip_window and ip_window[0] < cutoff:
ip_window.popleft()
# Length = requests in last 60 seconds
return len(ip_window)
popleft() is O(1) — it removes from the front instantly.
This is why we use deque instead of a regular list — lists
are slow at removing from the front.
We maintain two of these windows:
- One per IP — catches single aggressive attackers
- One global — catches distributed attacks from many IPs
How The Baseline Learns From Traffic
Knowing the current rate isn't enough. We need to know if
that rate is normal or abnormal.
We solve this with a rolling 30-minute baseline:
Every second, we record how many requests arrived in that second.
We keep a 30-minute history of these per-second counts.
Every 60 seconds, we calculate:
Mean — the average requests per second:
mean = sum(counts) / len(counts)
Standard Deviation — how much the traffic usually varies:
variance = sum((x - mean) ** 2 for x in counts) / len(counts)
stddev = math.sqrt(variance)
We apply floor values to both — mean never drops below 1.0
and stddev never drops below 0.5. This prevents false alarms
when traffic is extremely stable.
We also store baselines in per-hour slots. Traffic at 3pm
looks different from traffic at 3am — so we prefer the current
hour's baseline when making decisions.
How The Detection Logic Makes A Decision
With the current rate and the baseline established, we calculate
a z-score:
z = (current_rate - baseline_mean) / baseline_stddev
The z-score answers: "How many standard deviations above
normal is this?"
| Z-score | Meaning |
|---|---|
| 1.0 | Slightly above normal |
| 2.0 | Noticeably above normal |
| 3.0 | Very unusual — only 0.3% of traffic |
| 10.0+ | Almost certainly an attack |
We flag an IP as anomalous if:
- z-score > 3.0 (statistical threshold), OR
- rate > 5x the baseline mean (simple multiplier)
Whichever fires first wins. This dual-trigger approach catches
both gradual ramp-up attacks (caught by z-score) and sudden
flood attacks (caught by the multiplier).
Error surge detection: If an IP is generating a lot of
4xx/5xx errors — like trying hundreds of wrong passwords —
we tighten its detection thresholds by 30%. It's already
behaving suspiciously, so we watch it more closely.
How iptables Blocks An IP
When an IP is flagged, we run this Linux firewall command:
iptables -I INPUT -s 1.2.3.4 -j DROP
Breaking it down:
-
iptables— the Linux kernel firewall tool -
-I INPUT— INSERT a rule into the INPUT chain (incoming traffic) -
-s 1.2.3.4— match packets from this SOURCE IP -
-j DROP— silently DROP all matching packets
DROP means the attacker gets absolutely no response.
Their packets just disappear. They don't even know they've
been blocked — they just stop getting responses.
We call this from Python using subprocess:
import subprocess
cmd = ['iptables', '-I', 'INPUT', '-s', ip, '-j', 'DROP']
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0:
print(f"Successfully blocked {ip}")
Progressive ban schedule — repeat offenders get longer bans:
- 1st offence: 10 minutes
- 2nd offence: 30 minutes
- 3rd offence: 2 hours
- 4th+ offence: Permanent
When a ban expires, we delete the rule:
iptables -D INPUT -s 1.2.3.4 -j DROP
The Live Dashboard
The dashboard is a Flask web server running in a background thread.
It serves an HTML page that calls a /api/stats endpoint every
3 seconds and updates the display with fresh data.
It shows:
- Global requests per second
- Current baseline mean and stddev
- All banned IPs with ban details
- Top 10 source IPs by request rate
- CPU and memory usage
- System uptime
- Hourly baseline slots
Key Lessons Learned
1. Async Python is powerful — running log monitoring,
baseline calculation, ban checking, and serving a dashboard
simultaneously with asyncio.gather() is elegant and efficient.
2. Read the logs — when the Nextcloud container had issues,
the logs told us exactly what was wrong and how to fix it.
3. Never hardcode secrets — GitHub Push Protection caught
our Slack webhook URL in the code. Always use environment
variables for secrets.
4. Docker volumes are the glue — the named HNG-nginx-logs
volume is what allows Nginx and our detector (in separate
containers) to share log files seamlessly.
5. Z-scores are surprisingly simple — statistical anomaly
detection sounds intimidating but the math is just subtraction
and division.
Conclusion
Building this system taught me that security tooling isn't magic —
it's just careful observation, smart math, and fast response.
The same principles used here are what power enterprise security
tools at companies like Cloudflare and AWS.
The full source code is available at:
https://github.com/Frank363-hash/hng-anomaly-detector
If you have questions or suggestions, drop them in the comments!
Top comments (0)