A beginner-friendly guide to building an anomaly detection engine that watches web traffic in real time and automatically blocks attackers.
What This Project Does (And Why It Matters)
Imagine you run a website. Most days, you get maybe 50 visitors per hour. One afternoon, a single computer starts sending 10,000 requests per second. Your server slows to a crawl. Real users can't load the page. This is a DDoS attack — Distributed Denial of Service — and it's one of the most common threats on the internet.
My project is a daemon (a program that runs continuously in the background) that watches all incoming HTTP traffic to a Nextcloud server, learns what "normal" looks like, and automatically blocks IPs that behave abnormally. Think of it like a bouncer at a club who knows how busy a normal Friday night is and starts turning people away when the crowd gets suspiciously large — or kicks out the one person who keeps trying to barge through the door 100 times a minute.
The entire system is built in Go and runs alongside the web server inside Docker containers.
The Architecture at a Glance
Here's how the pieces fit together:
Internet Traffic
|
v
+------------------+
| Nginx | ──> writes JSON access log
| (reverse proxy)| to /var/log/nginx/hng-access.log
+--------+---------+
|
v
+------------------+
| Nextcloud | (the actual web app)
+------------------+
Meanwhile, running alongside...
+------------------------------------------+
| Anomaly Detector Daemon |
| |
| 1. monitor.go → reads the log file |
| 2. detector.go → counts requests |
| 3. baseline.go → learns "normal" |
| 4. blocker.go → bans bad IPs |
| 5. unbanner.go → lifts bans later |
| 6. notifier.go → alerts via Slack |
| 7. dashboard.go → live web UI |
+------------------------------------------+
Nginx writes every request as a JSON line to a log file. My detector tails that file (like running tail -f in a terminal), parses each line, and feeds it into the detection engine. Let me walk through each core concept.
How the Sliding Window Works
The first question the detector needs to answer is: how fast is this IP sending requests right now?
To answer that, I use a sliding window — a list that only keeps events from the last 60 seconds. Here's how it works in plain language:
- Every time a request comes in from an IP, I record the timestamp.
- Before counting, I throw away any timestamps older than 60 seconds.
- The number of timestamps left = the number of requests in the last minute.
- Divide by 60 = requests per second.
In code, this looks like:
type SlidingWindow struct {
windowSeconds int
events []float64 // timestamps
}
func (sw *SlidingWindow) Add(timestamp float64) {
sw.events = append(sw.events, timestamp)
sw.evict(timestamp) // remove old entries
}
func (sw *SlidingWindow) evict(now float64) {
cutoff := now - float64(sw.windowSeconds)
i := 0
for i < len(sw.events) && sw.events[i] < cutoff {
i++
}
if i > 0 {
sw.events = sw.events[i:] // chop off the old ones
}
}
func (sw *SlidingWindow) Rate(now float64) float64 {
sw.evict(now)
if len(sw.events) == 0 {
return 0
}
return float64(len(sw.events)) / float64(sw.windowSeconds)
}
I maintain two sliding windows:
- Per-IP: Each IP address gets its own window. This catches a single attacker.
- Global: One window for all traffic combined. This catches distributed attacks where many IPs flood you at once.
Why not just count requests per minute? Because a per-minute counter resets at minute boundaries. If an attacker sends 500 requests in the last 5 seconds of one minute and 500 in the first 5 seconds of the next, a per-minute counter would show 500 each minute. The sliding window correctly shows 1000 in 10 seconds — much more suspicious.
How the Baseline Learns From Traffic
Knowing the current rate isn't enough. Is 10 requests per second a lot? For Google, that's nothing. For a personal blog, that's an attack. The detector needs to learn what your normal traffic looks like.
This is where the rolling baseline comes in.
Every second, I record the total request count. I keep 30 minutes of these per-second counts. Every 60 seconds, I calculate two numbers:
- Mean: The average requests per second over the last 30 minutes.
- Standard deviation (stddev): How much the traffic varies from that average.
If your server normally gets 2 requests per second, with occasional bumps to 5, the mean might be 2.5 and the stddev might be 1.2.
Here's the clever part: I also maintain per-hour slots. Traffic at 3am is very different from traffic at 3pm. If I have enough data for the current hour, I use that hour's baseline instead of the global 30-minute window. This prevents the detector from thinking normal afternoon traffic is an attack just because the morning was quiet.
// Prefer current hour's slot if it has enough data
if slot, ok := b.hourlySlots[currentHour]; ok && slot.Size() >= 10 {
rawMean = slot.Mean()
rawStddev = slot.StdDev()
} else if len(b.window) >= 10 {
// Fall back to full rolling window
// ... calculate from all data
}
I also enforce floor values — the mean never goes below 1.0 and stddev never below 0.5. Without these floors, a server with zero traffic would have a mean of 0, and any single request would look like an anomaly. That would be useless.
How the Detection Logic Makes a Decision
Now we have two pieces of information:
- Current rate for an IP (from the sliding window)
- Normal rate for the server (from the baseline)
The detector flags traffic as anomalous if either of these conditions is true:
Condition 1: Z-Score > 3.0
The z-score answers: "How many standard deviations away from normal is this?"
z-score = (current_rate - mean) / stddev
If the mean is 2.5 req/s and stddev is 1.2, and an IP is sending 8 req/s:
z-score = (8 - 2.5) / 1.2 = 4.58
That's way above 3.0 — anomaly detected.
In statistics, a z-score above 3 means the value is in the top 0.1% — it almost certainly isn't normal traffic.
Condition 2: Rate > 5x the Baseline Mean
This is a simpler check. If normal traffic is 2 req/s and an IP is sending 15 req/s, that's 7.5x the baseline — clearly suspicious.
Why have both? Because if the stddev is very low (traffic is extremely consistent), the z-score threshold becomes very sensitive. The 5x multiplier acts as a sanity check.
Bonus: Error Surge Tightening
If an IP is generating a lot of 4xx and 5xx errors (failed login attempts, scanning for vulnerabilities), the detector automatically halves the thresholds. Now the z-score threshold drops from 3.0 to 1.5, and the rate multiplier drops from 5x to 2.5x. This catches attackers who are probing your server even if their request rate isn't extremely high.
How iptables Blocks an IP
When the detector identifies a bad IP, it needs to actually stop that traffic from reaching the server. This is where iptables comes in.
iptables is the Linux firewall. It's a set of rules that tell the operating system what to do with incoming network packets. The two commands that matter:
Ban an IP:
iptables -A INPUT -s 203.0.113.42 -j DROP
plaintext
This says: "Append a rule to the INPUT chain: any packet from source 203.0.113.42, DROP it." The traffic never reaches Nginx, never reaches Nextcloud. It's as if that IP doesn't exist.
Unban an IP:
iptables -D INPUT -s 203.0.113.42 -j DROP
plaintext
Same rule, but -D (delete) instead of -A (append).
In Go, I run these commands using os/exec:
func addIptablesRule(ip string) bool {
cmd := exec.Command("iptables", "-A", "INPUT", "-s", ip, "-j", "DROP")
_, err := cmd.CombinedOutput()
return err == nil
}
plaintext
The Ban Escalation Schedule
Not every attacker deserves a permanent ban. Maybe it was a misconfigured bot, not a malicious attack. So bans follow an escalating schedule:
| *Ban * | *Duration * |
|---|---|
| 1st | 2 minutes |
| 2nd | 30 minutes |
| 3rd | 2 hours |
| 4th+ | Permanent |
An unbanner goroutine checks every 10 seconds for expired bans and removes them. If the same IP triggers detection again, the next ban is longer. By the 4th offense, they're blocked permanently.
The Dashboard
Everything above runs silently. To see what's happening in real time, the daemon serves a web dashboard that auto-refreshes every 3 seconds. It shows:
- Global requests per second — how busy the server is right now
- Banned IPs — who's blocked and for how long
- Top 10 source IPs — who's sending the most traffic
- CPU and memory usage — system health
- Baseline stats — the current effective mean and standard deviation
- Hourly slots — how traffic patterns differ by hour
The dashboard is built with Go's built-in net/http server and plain HTML with inline CSS. No JavaScript frameworks, no build tools. The auto-refresh is a single HTML meta tag:
<meta http-equiv="refresh" content="3">.
What I Learned
Building this project taught me several things:
- Statistics has practical uses. Mean, standard deviation, and z-score aren't just textbook concepts — they're the foundation of anomaly detection in production systems. 2.** Go's concurrency model is elegant.** Each component (log monitor, unbanner, dashboard) runs in its own goroutine. They communicate through channels. No complex threading code needed.
- Docker networking is tricky. The detector needs network_mode: host to run iptables on the host's network stack. Without it, iptables rules only affect the container's isolated network — useless for blocking real attackers.
- Disk space matters. Access logs grow fast under heavy traffic. I learned this the hard way when my server ran out of space multiple times during testing.
- Don't hardcode baselines. A static threshold like "ban anyone over 100 req/s" would either miss slow attacks or block legitimate traffic spikes. The rolling baseline adapts to whatever your actual traffic looks like.
Try It Yourself
The full source code is available on GitHub: https://github.com/ik-alex/HNG14-Stage3-Devops.git
The live dashboard is at: http://ikalex.duckdns.org:5000
If you want to understand security tooling, I'd recommend starting with just the sliding window. Write a small program that counts events per second using a deque. Once that clicks, everything else builds on top of it.
Top comments (0)