DEV Community

Goteh Mbaza
Goteh Mbaza

Posted on

# How I Built a Real-Time HTTP Anomaly Detector for cloud.ng with Python, Nginx, Docker, and iptables

When a platform is public and always online, one of the biggest security questions is:

How do you know when traffic is normal, and when something suspicious is happening?

That was the goal of this project.

I built a real-time anomaly detection engine for cloud.ng, a cloud storage platform powered by Nextcloud, that watches incoming
HTTP traffic, learns what normal traffic looks like, detects unusual behavior, and reacts automatically.

If one IP becomes abusive, the system blocks it with iptables. If the whole platform suddenly gets a global traffic spike, the
system sends an alert to Slack. It also provides a live dashboard so you can watch traffic behavior in real time.

In this post, I’ll explain how I built it in a beginner-friendly way.

———

## What this project does

At a high level, the system works like this:

  1. A user sends an HTTP request
  2. Nginx receives the request first
  3. Nginx forwards it to Nextcloud
  4. Nginx writes the request into a JSON access log
  5. A Python detector daemon reads that log continuously
  6. The detector compares live traffic against a learned baseline
  7. If traffic becomes abnormal, it blocks the IP or sends a Slack alert

So instead of using a fixed hardcoded limit like “100 requests per minute,” this project tries to learn what normal looks like
first.

———

## Why this matters

A fixed limit is easy to write, but not always smart.

Traffic at 2 a.m. is usually different from traffic at 2 p.m. Some endpoints naturally get bursts. Some spikes are harmless,
and some are not.

If your threshold is too low:

  • you block legitimate users

If your threshold is too high:

  • suspicious traffic slips through

That’s why I used a rolling baseline instead of a static number.

———

## The stack I used

This project uses:

  • Docker Compose
  • Nextcloud
  • Nginx
  • Python
  • iptables
  • Slack webhook
  • a live metrics dashboard

The Nextcloud image came from Docker Hub and was used exactly as required.

———

## Architecture overview

The traffic flow looks like this:

Internet Clients
|
v
Nginx Reverse Proxy
|
+--> Nextcloud
|
+--> JSON access logs
|
v
Python Detector Daemon
| | |
v v v
iptables Slack Dashboard

Nginx and the detector share a Docker volume so the detector can read the live access log without modifying the application
container.

———

## Step 1: Logging traffic with Nginx

The detector needs reliable traffic data before it can make decisions.

So I configured Nginx to log every request in JSON format with fields like:

  • source IP
  • timestamp
  • method
  • path
  • status code
  • response size

A simplified example looks like this:

{
"source_ip": "203.0.113.10",
"timestamp": "2026-04-27T09:25:51+00:00",
"method": "GET",
"path": "/",
"status": "200",
"response_size": "612"
}

Structured logs are much easier to parse safely than plain text.

I also configured Nginx to trust and forward the real client IP using X-Forwarded-For, so the detector sees the actual request
source.

———

## Step 2: Continuously reading logs with Python

The detector is not a cron job and not a one-time script.

It runs as a long-lived daemon and continuously tails the Nginx access log file.

For every new line, it:

  • parses the JSON
  • extracts the traffic fields
  • updates request windows
  • updates baselines
  • checks whether the traffic looks anomalous

That means detection happens in near real time.

———

## Step 3: Using a sliding window with deques

One of the most important parts of this project is the 60-second sliding window.

I used Python deque objects because they are excellent for “keep the latest items, remove the oldest items” logic.

### What I tracked

I kept:

  • one global request deque
  • one per-IP request deque
  • one per-IP error deque for 4xx and 5xx responses

### How it works

When a request arrives:

  1. append the current timestamp to the relevant deque
  2. remove any timestamps older than 60 seconds

This gives a true moving view of the latest traffic.

### Why this matters

A simple “requests per minute” counter resets at fixed minute boundaries, which can hide short bursts.

A sliding window answers the better question:

How much traffic happened in the last 60 seconds right now?

That is much better for anomaly detection.

———

## Step 4: Teaching the baseline to learn from traffic

A sliding window shows what is happening now, but it does not tell us whether that traffic is unusual.

For that, I built a rolling baseline manager.

### What the baseline tracks

The baseline stores:

  • per-second request counts
  • per-second error counts
  • a rolling 30-minute history
  • hourly traffic slots

### What gets recalculated

Every 60 seconds, the detector recalculates:

  • mean requests per second
  • standard deviation
  • error rate

### Why idle seconds matter

One important detail was making sure quiet seconds are also included.

If you only record seconds where traffic exists, the average becomes artificially high. Then the system thinks normal traffic
is busier than it really is.

So I made sure the baseline includes:

  • active seconds
  • idle seconds with zero traffic

That makes the learned average much more realistic.

### Hour-slot preference

Traffic usually changes throughout the day.

So I added a rule:

  • if the current hour has enough samples, use the current hour’s baseline
  • otherwise, fall back to the rolling 30-minute baseline

This helps the detector adapt to time-of-day behavior.

———

## Step 5: How the detector makes decisions

Once I had:

  • a live request rate
  • a learned baseline

I needed a way to decide whether traffic is abnormal.

I used two checks.

### 1. Z-score

The z-score answers:

How far is the current traffic from the normal average, measured in standard deviations?

A high z-score means traffic is statistically unusual.

### 2. Rate multiplier

I also added a simpler check:

Is the current rate more than N times the learned average?

That catches obvious spikes even when the z-score is not dramatic yet.

### Detection rule

A request pattern is considered anomalous if either fires first:

  • z-score exceeds threshold
  • current rate exceeds multiplier of baseline mean

I used this logic for:

  • per-IP traffic
  • global traffic

———

## Step 6: Tightening thresholds when errors surge

Not all suspicious behavior is about volume alone.

Sometimes an IP causes a lot of:

  • 401
  • 403
  • 404
  • 500

That often suggests scanning, brute force, or probing.

So I added an error surge rule.

If an IP’s 4xx/5xx rate becomes much worse than its normal baseline, the detector automatically tightens its thresholds.

That way, a suspicious IP gets less tolerance than a normal user.

———

## Step 7: Blocking bad IPs with iptables

When a per-IP anomaly is confirmed, the detector blocks the source IP using Linux iptables.

The command is:

iptables -I INPUT -s -j DROP

For example:

iptables -I INPUT -s 203.0.113.10 -j DROP

### What this means

  • -I INPUT inserts the rule into the input chain
  • -s selects the source IP
  • -j DROP silently drops all packets from that IP

In simple terms:

“If traffic comes from this IP, ignore it.”

This is useful because it stops abusive traffic at the firewall level.

———

## Step 8: Automatically unbanning IPs

Blocking forever on a first offense is not always ideal.

So I added a backoff-based unban system:

  1. first ban: 10 minutes
  2. second ban: 30 minutes
  3. third ban: 2 hours
  4. fourth offense onward: permanent

A background unban loop checks whether each active ban has expired.

If a ban expires:

  • the firewall rule is removed
  • the audit log records the release
  • Slack gets an unban notification

———

## Step 9: Sending Slack alerts

The detector sends Slack notifications for:

  • per-IP bans
  • unbans
  • global anomaly alerts

Each alert includes:

  • the condition that fired
  • the current rate
  • the baseline
  • the timestamp
  • the ban duration if applicable

That makes each notification immediately useful.

———

## Step 10: Building the live dashboard

I also built a live dashboard that shows:

  • global requests per second
  • top source IPs
  • currently banned IPs
  • CPU usage
  • memory usage
  • uptime
  • effective baseline values
  • baseline graph over time

This made testing much easier, because I could see how the detector was behaving without constantly reading raw logs.

———

## Example: sliding window idea in Python

Here is the basic idea behind the 60-second deque window:

from collections import deque
import time

requests = deque()

def add_request():
now = time.time()
requests.append(now)

  cutoff = now - 60
  while requests and requests[0] < cutoff:
      requests.popleft()

  return len(requests) / 60
Enter fullscreen mode Exit fullscreen mode

That tiny pattern is the core of the live request-rate logic.

———

## Problems I ran into

This project also taught me that detection logic is only half the job. The other half is operational reliability.

### 1. The baseline can learn the wrong thing

If you attack too early, the detector can start treating attack traffic as normal.

The fix:

  • warm the system with light traffic first
  • wait for a baseline recalculation
  • then run the burst

### 2. Too much per-request logging

Logging every request at INFO created too much output during heavy bursts.

The fix:

  • make request-by-request logging configurable
  • keep it off by default
  • keep audit events on

### 3. Blocking my own SSH session

At one point, I attacked from the same IP I used for SSH, and the detector correctly blocked that IP.

The fix:

  • add a whitelist for admin IPs
  • use a separate IP for attack traffic

### 4. Capturing iptables state at the right time

Sometimes the ban happened correctly, but the live iptables state was hard to catch.

The fix:

  • automatically write iptables snapshots during BAN and UNBAN

———

## What I learned

This project helped me understand that security tooling is not just about rules.

It is also about:

  • observability
  • realistic baselines
  • good logging
  • safe testing
  • automated response
  • collecting proof that your system actually worked

It also showed me how simple data structures like a deque can be powerful when used carefully.

———

## Final thoughts

In the end, I built a system that can:

  • monitor HTTP traffic in real time
  • learn what normal looks like
  • detect per-IP anomalies
  • detect global anomalies
  • block abusive IPs with iptables
  • notify Slack
  • automatically unban IPs
  • expose live metrics in a dashboard

For a beginner-friendly DevSecOps project, this was a great way to connect traffic monitoring, anomaly detection, alerting, and
response in one real system.

If you are learning security engineering or DevSecOps, this kind of project is a very practical way to understand how defensive
controls work in production-style environments.

———

## Project links

Top comments (0)