Goteh Mbaza

Posted on Apr 27

How I Built a Real-Time HTTP Anomaly Detector for cloud.ng with Python, Nginx, Docker, and iptables

#automation #monitoring #python #security

When a platform is public and always online, one of the biggest security questions is:

How do you know when traffic is normal, and when something suspicious is happening?

That was the goal of this project.

I built a real-time anomaly detection engine for cloud.ng, a cloud storage platform powered by Nextcloud, that watches incoming HTTP traffic, learns what normal traffic looks like, detects unusual behavior, and reacts automatically.

If one IP becomes abusive, the system blocks it with iptables. If the whole platform suddenly gets a global traffic spike, the system sends an alert to Slack.

It also provides a live dashboard so you can watch traffic behavior in real time.

In this post, I’ll explain how I built it in a beginner-friendly way.

What this project does

At a high level, the system works like this:

A user sends an HTTP request
Nginx receives the request first
Nginx forwards it to Nextcloud
Nginx writes the request into a JSON access log
A Python detector daemon reads that log continuously
The detector compares live traffic against a learned baseline
If traffic becomes abnormal, it blocks the IP or sends a Slack alert

So instead of using a fixed hardcoded limit like “100 requests per minute,” this project tries to learn what normal looks like first.

Why this matters

A fixed limit is easy to write, but not always smart.

Traffic at 2 a.m. is usually different from traffic at 2 p.m. Some endpoints naturally get bursts. Some spikes are harmless, and some are not.

If your threshold is too low:

you block legitimate users

If your threshold is too high:

suspicious traffic slips through

That’s why I used a rolling baseline instead of a static number.

The stack I used

This project uses:

Docker Compose
Nextcloud
Nginx
Python
iptables
Slack webhook
a live metrics dashboard

The Nextcloud image came from Docker Hub and was used exactly as required.

Architecture overview

The traffic flow looks like this:

Internet Clients
      |
      v
Nginx Reverse Proxy
      |
      +--> Nextcloud
      |
      +--> JSON access logs
              |
              v
     Python Detector Daemon
        |       |       |
        v       v       v
    iptables  Slack  Dashboard

Nginx and the detector share a Docker volume so the detector can read the live access log without modifying the application container.

Step 1: Logging traffic with Nginx

The detector needs reliable traffic data before it can make decisions.

So I configured Nginx to log every request in JSON format with fields like:

source IP
timestamp
method
path
status code
response size

A simplified example looks like this:

{
  "source_ip": "203.0.113.10",
  "timestamp": "2026-04-27T09:25:51+00:00",
  "method": "GET",
  "path": "/",
  "status": "200",
  "response_size": "612"
}

Structured logs are much easier to parse safely than plain text.

I also configured Nginx to trust and forward the real client IP using X-Forwarded-For, so the detector sees the actual request source.

Step 2: Continuously reading logs with Python

The detector is not a cron job and not a one-time script.

It runs as a long-lived daemon and continuously tails the Nginx access log file.

For every new line, it:

parses the JSON
extracts the traffic fields
updates request windows
updates baselines
checks whether the traffic looks anomalous

That means detection happens in near real time.

Step 3: Using a sliding window with deques

One of the most important parts of this project is the 60-second sliding window.

I used Python deque objects because they are excellent for “keep the latest items, remove the oldest items” logic.

What I tracked

I kept:

one global request deque
one per-IP request deque
one per-IP error deque for 4xx and 5xx responses

How it works

When a request arrives:

append the current timestamp to the relevant deque
remove any timestamps older than 60 seconds

This gives a true moving view of the latest traffic.

Why this matters

A simple “requests per minute” counter resets at fixed minute boundaries, which can hide short bursts.

A sliding window answers the better question:

How much traffic happened in the last 60 seconds right now?

That is much better for anomaly detection.

Step 4: Teaching the baseline to learn from traffic

A sliding window shows what is happening now, but it does not tell us whether that traffic is unusual.

For that, I built a rolling baseline manager.

What the baseline tracks

The baseline stores:

per-second request counts
per-second error counts
a rolling 30-minute history
hourly traffic slots

What gets recalculated

Every 60 seconds, the detector recalculates:

mean requests per second
standard deviation
error rate

Why idle seconds matter

One important detail was making sure quiet seconds are also included.

If you only record seconds where traffic exists, the average becomes artificially high. Then the system thinks normal traffic is busier than it really is.

So I made sure the baseline includes:

active seconds
idle seconds with zero traffic

That makes the learned average much more realistic.

Hour-slot preference

Traffic usually changes throughout the day.

So I added a rule:

if the current hour has enough samples, use the current hour’s baseline
otherwise, fall back to the rolling 30-minute baseline

This helps the detector adapt to time-of-day behavior.

Step 5: How the detector makes decisions

Once I had:

a live request rate
a learned baseline

I needed a way to decide whether traffic is abnormal.

I used two checks.

1. Z-score

The z-score answers:

How far is the current traffic from the normal average, measured in standard deviations?

A high z-score means traffic is statistically unusual.

2. Rate multiplier

I also added a simpler check:

Is the current rate more than N times the learned average?

That catches obvious spikes even when the z-score is not dramatic yet.

Detection rule

A request pattern is considered anomalous if either fires first:

z-score exceeds threshold
current rate exceeds multiplier of baseline mean

I used this logic for:

per-IP traffic
global traffic

Step 6: Tightening thresholds when errors surge

Not all suspicious behavior is about volume alone.

Sometimes an IP causes a lot of:

That often suggests scanning, brute force, or probing.

So I added an error surge rule.

If an IP’s 4xx/5xx rate becomes much worse than its normal baseline, the detector automatically tightens its thresholds.

That way, a suspicious IP gets less tolerance than a normal user.

Step 7: Blocking bad IPs with iptables

When a per-IP anomaly is confirmed, the detector blocks the source IP using Linux iptables.

The command is:

iptables -I INPUT -s <ip-address> -j DROP

For example:

iptables -I INPUT -s 203.0.113.10 -j DROP

What this means

-I INPUT inserts the rule into the input chain
-s selects the source IP
-j DROP silently drops all packets from that IP

In simple terms:

If traffic comes from this IP, ignore it.

This is useful because it stops abusive traffic at the firewall level.

Step 8: Automatically unbanning IPs

Blocking forever on a first offense is not always ideal.

So I added a backoff-based unban system:

first ban: 10 minutes
second ban: 30 minutes
third ban: 2 hours
fourth offense onward: permanent

A background unban loop checks whether each active ban has expired.

If a ban expires:

the firewall rule is removed
the audit log records the release
Slack gets an unban notification

Step 9: Sending Slack alerts

The detector sends Slack notifications for:

per-IP bans
unbans
global anomaly alerts

Each alert includes:

the condition that fired
the current rate
the baseline
the timestamp
the ban duration if applicable

That makes each notification immediately useful.

Step 10: Building the live dashboard

I also built a live dashboard that shows:

global requests per second
top source IPs
currently banned IPs
CPU usage
memory usage
uptime
effective baseline values
baseline graph over time

This made testing much easier, because I could see how the detector was behaving without constantly reading raw logs.

Example: sliding window idea in Python

Here is the basic idea behind the 60-second deque window:

from collections import deque
import time

requests = deque()

def add_request():
    now = time.time()
    requests.append(now)

    cutoff = now - 60
    while requests and requests[0] < cutoff:
        requests.popleft()

    return len(requests) / 60

That tiny pattern is the core of the live request-rate logic.

Problems I ran into

This project also taught me that detection logic is only half the job. The other half is operational reliability.

1. The baseline can learn the wrong thing

If you attack too early, the detector can start treating attack traffic as normal.

The fix:

warm the system with light traffic first
wait for a baseline recalculation
then run the burst

2. Too much per-request logging

Logging every request at INFO created too much output during heavy bursts.

The fix:

make request-by-request logging configurable
keep it off by default
keep audit events on

3. Blocking my own SSH session

At one point, I attacked from the same IP I used for SSH, and the detector correctly blocked that IP.

The fix:

add a whitelist for admin IPs
use a separate IP for attack traffic

4. Capturing iptables state at the right time

Sometimes the ban happened correctly, but the live iptables state was hard to catch.

The fix:

automatically write iptables snapshots during BAN and UNBAN

What I learned

This project helped me understand that security tooling is not just about rules.

It is also about:

observability
realistic baselines
good logging
safe testing
automated response
collecting proof that your system actually worked

It also showed me how simple data structures like a deque can be powerful when used carefully.

Final thoughts

In the end, I built a system that can:

monitor HTTP traffic in real time
learn what normal looks like
detect per-IP anomalies
detect global anomalies
block abusive IPs with iptables
notify Slack
automatically unban IPs
expose live metrics in a dashboard

For a beginner-friendly DevSecOps project, this was a great way to connect traffic monitoring, anomaly detection, alerting, and response in one real system.

If you are learning security engineering or DevSecOps, this kind of project is a very practical way to understand how defensive controls work in production-style environments.

Project links

Live dashboard: http://mbaza.duckdns.org:8081
GitHub repository: https://github.com/Patrickmbaza/hng14-stage3-devops-

DEV Community

How I Built a Real-Time HTTP Anomaly Detector for cloud.ng with Python, Nginx, Docker, and iptables

What this project does

Why this matters

The stack I used

Architecture overview

Step 1: Logging traffic with Nginx

Step 2: Continuously reading logs with Python

Step 3: Using a sliding window with deques

What I tracked

How it works

Why this matters

Step 4: Teaching the baseline to learn from traffic

What the baseline tracks

What gets recalculated

Why idle seconds matter

Hour-slot preference

Step 5: How the detector makes decisions

1. Z-score

2. Rate multiplier

Detection rule

Step 6: Tightening thresholds when errors surge

Step 7: Blocking bad IPs with iptables

What this means

Step 8: Automatically unbanning IPs

Step 9: Sending Slack alerts

Step 10: Building the live dashboard

Example: sliding window idea in Python

Problems I ran into

1. The baseline can learn the wrong thing

2. Too much per-request logging

3. Blocking my own SSH session

4. Capturing iptables state at the right time

What I learned

Final thoughts

Project links

Top comments (0)