How I Built a Real-Time DDoS Detection Engine for Nextcloud from Scratch
Introduction
Imagine you're running a cloud storage platform used by thousands of people around the world. One day, your boss walks in and says: "We've been seeing suspicious traffic. Build something that detects and blocks attacks automatically."
That's exactly the challenge I faced. In this post, I'll walk you through how I built a real-time anomaly detection engine that watches HTTP traffic, learns what normal looks like, and automatically blocks attackers — all without using any third-party rate-limiting libraries.
By the end of this post, you'll understand:
- How sliding windows track request rates in real time
- How a baseline learns from your own traffic patterns
- How z-score math decides if something is an attack
- How iptables drops malicious IPs at the kernel level
What Does the Project Do?
The system sits alongside a Nextcloud instance and does five things continuously:
- Reads Nginx access logs in real time, line by line
- Tracks request rates using sliding windows (per IP and globally)
- Learns what normal traffic looks like using a rolling baseline
- Detects anomalies when traffic deviates significantly from normal
- Blocks suspicious IPs automatically using iptables
Here's the overall architecture:
Internet → Nginx (JSON logs) → Nextcloud
↓
Log Volume (shared)
↓
Python Daemon
├── Monitor (reads logs)
├── Baseline (learns traffic)
├── Detector (flags anomalies)
├── Blocker (iptables rules)
├── Unbanner (auto-release)
├── Notifier (Slack alerts)
└── Dashboard (live metrics)
How the Sliding Window Works
A sliding window is a way of asking: "How many requests happened in the last 60 seconds?"
The naive approach would be to count all requests every minute — but that gives you a stale snapshot, not a real-time view.
Instead, I used Python's collections.deque — a double-ended queue. Here's the idea:
from collections import deque
import time
# One deque per IP
ip_window = deque()
def record_request(ip):
now = time.time()
cutoff = now - 60 # 60-second window
# Add current timestamp to the right
ip_window.append(now)
# Evict old timestamps from the left
while ip_window and ip_window[0] < cutoff:
ip_window.popleft()
# Current rate = requests in last 60 seconds
rate = len(ip_window) / 60
return rate
Every time a request comes in, its timestamp is appended to the right. Old timestamps (older than 60 seconds) are removed from the left using popleft(). The current rate is simply the length of the deque divided by 60.
This gives us a perfectly accurate rolling count with O(1) insertions and evictions. No databases, no counters that reset at fixed intervals.
I maintain two windows:
- Per-IP window: one deque per IP address
- Global window: one deque for all traffic combined
How the Baseline Learns from Traffic
The baseline answers the question: "What does normal traffic look like on this server?"
Instead of hardcoding a threshold like "flag anything above 10 req/s", I compute the mean and standard deviation from actual recent traffic.
Here's how it works:
Rolling 30-minute window:
Every second, I record how many requests arrived that second. I keep a rolling window of the last 30 minutes (1800 seconds) of these per-second counts.
Recalculation every 60 seconds:
counts = [c for _, c in self.window]
mean = sum(counts) / len(counts)
variance = sum((c - mean) ** 2 for c in counts) / len(counts)
stddev = math.sqrt(variance)
Per-hour slots:
The baseline stores separate mean/stddev values per hour. This means if your server is busier at 9am than at 3am, the baseline adapts automatically.
Floor values:
To avoid division by zero on idle servers, I set minimum values:
floor_mean = 0.1 req/sfloor_stddev = 0.1
How the Detection Logic Makes a Decision
Once I have the baseline mean and stddev, I use z-score to decide if a rate is anomalous.
The z-score measures how many standard deviations a value is from the mean:
z = (current_rate - mean) / stddev
If z > 3.0, that means the current rate is more than 3 standard deviations above normal — statistically very unlikely under normal conditions.
I also add a rate multiplier check as a backup:
def check_ip(self, ip):
mean, stddev = self.baseline.get_baseline()
rate = self.get_ip_rate(ip)
# Z-score check
if stddev > 0:
zscore = (rate - mean) / stddev
if zscore > 3.0:
return True, f"zscore={zscore:.2f}"
# Rate multiplier check
if rate > 5 * mean:
return True, f"rate={rate:.2f} > 5x mean"
return False, None
Error surge detection:
If an IP's 4xx/5xx error rate is 3x higher than the baseline error rate, the thresholds are automatically tightened by 30%. This catches attackers who probe for vulnerabilities before launching a full attack.
How iptables Blocks an IP
When an IP is flagged as anomalous, blocking happens at the kernel level using iptables — before traffic even reaches Nginx or Nextcloud.
import subprocess
def ban(self, ip):
subprocess.run([
'iptables', '-I', 'INPUT',
'-s', ip, '-j', 'DROP'
])
The -I INPUT inserts the rule at the top of the INPUT chain. -s specifies the source IP. -j DROP silently drops all packets from that IP.
This is extremely efficient — the kernel drops packets without any application-level processing.
Auto-unban with backoff schedule:
Bans don't last forever. The unbanner releases IPs on a backoff schedule:
- 1st ban: 10 minutes
- 2nd ban: 30 minutes
- 3rd ban: 2 hours
- 4th ban+: permanent
unban_schedule = [600, 1800, 7200, -1] # seconds, -1 = permanent
The Live Dashboard
The system serves a live dashboard at a public domain showing:
- Global request rate vs baseline
- Banned IPs and their ban counts
- Top 10 source IPs by request rate
- CPU and memory usage
- Uptime
Built with Flask and auto-refreshes every 3 seconds.
Slack Alerts
Every ban, unban, and global anomaly sends a Slack notification with:
- The IP address
- The condition that fired (z-score or rate multiplier)
- Current rate vs baseline
- Ban duration
- Timestamp
What I Learned
Building this from scratch taught me:
- Deques are powerful — simple data structures can solve complex real-time problems elegantly
- Statistics beat hardcoded thresholds — a z-score adapts to your actual traffic patterns
- Kernel-level blocking is fast — iptables drops packets before your app even sees them
- Baselines need floors — always handle the idle server case
Source Code
The full source code is available on GitHub:
👉 https://github.com/IamHendy/hng-anomaly-detector
The live dashboard is running at:
👉 http://hendyogema.mooo.com
Built as part of the HNG DevSecOps track — Stage 3 challenge.
Top comments (0)