Building a Real-Time Attack Detection Daemon

#cloud #machinelearning #python #security

Imagine you're running a busy coffee shop. On a normal day, about 30 customers walk in per hour. You know your regulars, you know the rhythm. Then one afternoon, 300 people rush in through the door in two minutes — and they're not ordering coffee, they're just slamming every cabinet open and closed.
You'd notice. You'd react.
That's exactly what this project does — but for an online service, instead of a coffee shop. It watches every single HTTP request coming into a server, learns what "normal" looks like, and automatically sounds the alarm (and slams the door shut) when something looks wrong.
Let's walk through how it works, piece by piece.
Step 1: Reading the Logs — The Monitor
The first thing the detector needs to do is read traffic data from the log, the reverse proxy, Nginx in this case is configured to write logs in JSON format, which makes them easy to parse programmatically. Each line looks like this:

{
  "timestamp": "1705318496.123",
  "source_ip": "1.2.3.4",
  "method": "GET",
  "path": "/login",
  "status": "200",
  "response_size": "4821"
}

Every field tells you something: who sent the request, when, what they asked for, and whether the server responded OK (status 200) or with an error (status 404, 500, etc.).
The monitor's job is to tail this file — meaning it reads new lines as they appear parses it and drops it into a queue for the detector to process.
This runs as a daemon — a background process that never stops. Not a cron job, not a script you run once. Always on, always watching.
Step 2: The Sliding Window — Counting Requests Over Time
Now the detector has a stream of incoming log entries. The first question it needs to answer is:
How fast is this IP sending requests right now?
You might think — just count all their requests! But that doesn't work. If an IP sent 10,000 requests six hours ago and 2 requests in the last minute, they're not attacking right now. You need to know the recent rate, not the all-time total.
This is where a sliding window comes in.
Think of it like a 60-second sliding ruler on a timeline:
As time moves forward, the window moves with it. Requests older than 60 seconds slide out of the left side. New requests come in on the right. At any moment, you can count how many requests are inside the window to get the current rate.
In code, we use a deque (a double-ended queue) — a list that's cheap to add to on the right and remove from on the left. We keep one window per IP address, plus one global window that counts all requests from all IPs combined.
Step 3: The Rolling Baseline — Learning What "Normal" Looks Like
Here's the thing about anomaly detection: you can't hardcode a threshold like "block anyone over 10 req/s." Why? Because traffic patterns are different at 3am versus 3pm. A small company might have 0.1 req/s average; a big one might have 50 req/s average. A threshold that's too low creates false alarms. Too high and you miss real attacks.
The solution is to let the system learn what normal looks like from real traffic, and update that knowledge continuously.
We do this with a rolling 30-minute baseline.
Every second, we count how many requests came in and store that number. After 30 minutes, from those numbers we calculate two things:
Mean (average) — the typical number of requests per second.
Standard deviation — how much the traffic normally varies around that average. Low stddev means traffic is very steady. High stddev means it's naturally spiky.
Anything beyond 3 standard deviations (3σ) from the mean is statistically very unlikely under normal traffic — it only happens by chance about 0.3% of the time. So if we see it, something unusual is probably happening. This runs every 60 seconds so the baseline is always fresh. If traffic naturally grows over the day, the baseline grows with it. It's self-adapting.
Step 4: Making a** Decision — The Anomaly Detection Log**
Now we have everything we need to answer the key question:
Is this traffic suspicious?
We use two tests, and either one can trigger an alert:
Test 1: The Z-Score
The z-score measures how many standard deviations the current rate is from the mean.
Test 2: The Multiplier
Z-scores can be misleading when traffic is very low (e.g., at 3am when stddev is near zero). So we also check: is the current rate more than 5 times the mean?
Test 3: Error Surge Detection
If an IP is generating lots of 4xx/5xx errors (like hammering /login and failing), that's a signal too. We check whether their error rate is 3× the normal error rate, and if so, we tighten the thresholds — making detection more sensitive for that IP.
What Happens When Something Is Flagged?
Step 5: Blocking With iptables
Once an IP is flagged, we need to actually stop its traffic. We use iptables — Linux's built-in firewall, built directly into the kernel.
Think of iptables as a bouncer standing at the network door. You give it a list of rules, and it checks every packet against that list before letting it through.
After this runs, packets from that IP are dropped at the kernel level — they never even reach Nginx.
Step 6: Auto-Unban With Backoff
We don't ban forever on the first offence (well, almost never). The unbanner follows a tiered schedule:

**Offence    Ban Duration**
 1st           10 minutes
 2nd           30 minutes
 3rd           2 hours
 4th           Permanent

Every 30 seconds, the unbanner checks whether any bans have expired. When it unbans an IP, it remembers the tier — so if that same IP attacks again, the next ban is longer. Repeat offenders escalate toward permanent.

Nginx writes a log line
        │
        ▼
monitor.py reads the new line, parses JSON
        │
        ▼
detector.py adds timestamp to IP's sliding window deque
        │
        ▼
detector.py calculates current rate (len(window) / 60)
        │
        ▼
detector.py computes z-score against rolling baseline
        │
     z > 3.0 or rate > 5x mean?
        │
       YES
        │
        ▼
blocker.py runs: iptables -I INPUT -s <ip> -j DROP && \
iptables -I FORWARD -s <ip> -j DROP
        │
        ▼
notifier.py fires Slack message + writes audit log entry
        │
        ▼
(10 minutes later) unbanner.py removes the iptables rule

All of these steps happen asynchronously inside a single Python process using threads.
What I Learned Building This
A few things that surprised me along the way:
1. Hardcoded thresholds are fragile. The very first version I sketched out used if rate > 20: ban(). That would have been a disaster — blocking legitimate traffic during a busy period, missing attacks at quiet times. The rolling baseline was the most important design decision.
2. iptables chain selection matters enormously. Using INPUT instead of FORWARD meant SSH dropped but HTTP kept flowing. Understanding how Docker intercepts packets before the kernel's normal routing is something a lot of guides skip over.
3. Standard deviation is surprisingly useful. Before this project, stddev felt like a statistics-class abstraction. Using it here to define "normal variance" made it concrete — it's just a measure of how wiggly your traffic normally is.

Security tooling doesn't have to be mysterious. At its core, this project is just: read data, count things, compare to normal, act when something's off. The same pattern underlies intrusion detection systems, fraud detection, network monitoring, and a lot of other "scary" security tools.
Once you understand the pieces — sliding windows, rolling baselines, z-scores, iptables — you can compose them into something genuinely useful.

DEV Community

Building a Real-Time Attack Detection Daemon

Top comments (0)