Hendy

Posted on Apr 28

How I Built a Real-Time DDoS Detection Engine for Nextcloud from Scratch

#cybersecurity #networking #security #tutorial

How I Built a Real-Time DDoS Detection Engine for Nextcloud from Scratch

Introduction

Imagine you're running a cloud storage platform used by thousands of people around the world. One day, your boss walks in and says: "We've been seeing suspicious traffic. Build something that detects and blocks attacks automatically."

That's exactly the challenge I faced. In this post, I'll walk you through how I built a real-time anomaly detection engine that watches HTTP traffic, learns what normal looks like, and automatically blocks attackers — all without using any third-party rate-limiting libraries.

By the end of this post, you'll understand:

How sliding windows track request rates in real time
How a baseline learns from your own traffic patterns
How z-score math decides if something is an attack
How iptables drops malicious IPs at the kernel level

What Does the Project Do?

The system sits alongside a Nextcloud instance and does five things continuously:

Reads Nginx access logs in real time, line by line
Tracks request rates using sliding windows (per IP and globally)
Learns what normal traffic looks like using a rolling baseline
Detects anomalies when traffic deviates significantly from normal
Blocks suspicious IPs automatically using iptables

Here's the overall architecture:
Internet → Nginx (JSON logs) → Nextcloud
↓
Log Volume (shared)
↓
Python Daemon
├── Monitor (reads logs)
├── Baseline (learns traffic)
├── Detector (flags anomalies)
├── Blocker (iptables rules)
├── Unbanner (auto-release)
├── Notifier (Slack alerts)

└── Dashboard (live metrics)

How the Sliding Window Works

A sliding window is a way of asking: "How many requests happened in the last 60 seconds?"

The naive approach would be to count all requests every minute — but that gives you a stale snapshot, not a real-time view.

Instead, I used Python's collections.deque — a double-ended queue. Here's the idea:

from collections import deque
import time

# One deque per IP
ip_window = deque()

def record_request(ip):
    now = time.time()
    cutoff = now - 60  # 60-second window

    # Add current timestamp to the right
    ip_window.append(now)

    # Evict old timestamps from the left
    while ip_window and ip_window[0] < cutoff:
        ip_window.popleft()

    # Current rate = requests in last 60 seconds
    rate = len(ip_window) / 60
    return rate

Every time a request comes in, its timestamp is appended to the right. Old timestamps (older than 60 seconds) are removed from the left using popleft(). The current rate is simply the length of the deque divided by 60.

This gives us a perfectly accurate rolling count with O(1) insertions and evictions. No databases, no counters that reset at fixed intervals.

I maintain two windows:

Per-IP window: one deque per IP address
Global window: one deque for all traffic combined

How the Baseline Learns from Traffic

The baseline answers the question: "What does normal traffic look like on this server?"

Instead of hardcoding a threshold like "flag anything above 10 req/s", I compute the mean and standard deviation from actual recent traffic.

Here's how it works:

Rolling 30-minute window:
Every second, I record how many requests arrived that second. I keep a rolling window of the last 30 minutes (1800 seconds) of these per-second counts.

Recalculation every 60 seconds:

counts = [c for _, c in self.window]
mean = sum(counts) / len(counts)
variance = sum((c - mean) ** 2 for c in counts) / len(counts)
stddev = math.sqrt(variance)

Per-hour slots:
The baseline stores separate mean/stddev values per hour. This means if your server is busier at 9am than at 3am, the baseline adapts automatically.

Floor values:
To avoid division by zero on idle servers, I set minimum values:

floor_mean = 0.1 req/s
floor_stddev = 0.1

How the Detection Logic Makes a Decision

Once I have the baseline mean and stddev, I use z-score to decide if a rate is anomalous.

The z-score measures how many standard deviations a value is from the mean:
z = (current_rate - mean) / stddev
If z > 3.0, that means the current rate is more than 3 standard deviations above normal — statistically very unlikely under normal conditions.

I also add a rate multiplier check as a backup:

def check_ip(self, ip):
    mean, stddev = self.baseline.get_baseline()
    rate = self.get_ip_rate(ip)

    # Z-score check
    if stddev > 0:
        zscore = (rate - mean) / stddev
        if zscore > 3.0:
            return True, f"zscore={zscore:.2f}"

    # Rate multiplier check
    if rate > 5 * mean:
        return True, f"rate={rate:.2f} > 5x mean"

    return False, None

Error surge detection:
If an IP's 4xx/5xx error rate is 3x higher than the baseline error rate, the thresholds are automatically tightened by 30%. This catches attackers who probe for vulnerabilities before launching a full attack.

How iptables Blocks an IP

When an IP is flagged as anomalous, blocking happens at the kernel level using iptables — before traffic even reaches Nginx or Nextcloud.

import subprocess

def ban(self, ip):
    subprocess.run([
        'iptables', '-I', 'INPUT',
        '-s', ip, '-j', 'DROP'
    ])

The -I INPUT inserts the rule at the top of the INPUT chain. -s specifies the source IP. -j DROP silently drops all packets from that IP.

This is extremely efficient — the kernel drops packets without any application-level processing.

Auto-unban with backoff schedule:
Bans don't last forever. The unbanner releases IPs on a backoff schedule:

1st ban: 10 minutes
2nd ban: 30 minutes
3rd ban: 2 hours
4th ban+: permanent

unban_schedule = [600, 1800, 7200, -1]  # seconds, -1 = permanent

The Live Dashboard

The system serves a live dashboard at a public domain showing:

Global request rate vs baseline
Banned IPs and their ban counts
Top 10 source IPs by request rate
CPU and memory usage
Uptime

Built with Flask and auto-refreshes every 3 seconds.

Slack Alerts

Every ban, unban, and global anomaly sends a Slack notification with:

The IP address
The condition that fired (z-score or rate multiplier)
Current rate vs baseline
Ban duration
Timestamp

What I Learned

Building this from scratch taught me:

Deques are powerful — simple data structures can solve complex real-time problems elegantly
Statistics beat hardcoded thresholds — a z-score adapts to your actual traffic patterns
Kernel-level blocking is fast — iptables drops packets before your app even sees them
Baselines need floors — always handle the idle server case

Source Code

The full source code is available on GitHub:
👉 https://github.com/IamHendy/hng-anomaly-detector

The live dashboard is running at:
👉 http://hendyogema.mooo.com

Built as part of the HNG DevSecOps track — Stage 3 challenge.

DEV Community

How I Built a Real-Time DDoS Detection Engine for Nextcloud from Scratch

How I Built a Real-Time DDoS Detection Engine for Nextcloud from Scratch

Introduction

What Does the Project Do?

└── Dashboard (live metrics)

How the Sliding Window Works

How the Baseline Learns from Traffic

How the Detection Logic Makes a Decision

How iptables Blocks an IP

The Live Dashboard

Slack Alerts

What I Learned

Source Code

Top comments (0)