When a platform is public and always online, one of the biggest security questions is:
How do you know when traffic is normal, and when something suspicious is happening?
That was the goal of this project.
I built a real-time anomaly detection engine for cloud.ng, a cloud storage platform powered by Nextcloud, that watches incoming
HTTP traffic, learns what normal traffic looks like, detects unusual behavior, and reacts automatically.
If one IP becomes abusive, the system blocks it with iptables. If the whole platform suddenly gets a global traffic spike, the
system sends an alert to Slack. It also provides a live dashboard so you can watch traffic behavior in real time.
In this post, I’ll explain how I built it in a beginner-friendly way.
———
## What this project does
At a high level, the system works like this:
- A user sends an HTTP request
- Nginx receives the request first
- Nginx forwards it to Nextcloud
- Nginx writes the request into a JSON access log
- A Python detector daemon reads that log continuously
- The detector compares live traffic against a learned baseline
- If traffic becomes abnormal, it blocks the IP or sends a Slack alert
So instead of using a fixed hardcoded limit like “100 requests per minute,” this project tries to learn what normal looks like
first.
———
## Why this matters
A fixed limit is easy to write, but not always smart.
Traffic at 2 a.m. is usually different from traffic at 2 p.m. Some endpoints naturally get bursts. Some spikes are harmless,
and some are not.
If your threshold is too low:
- you block legitimate users
If your threshold is too high:
- suspicious traffic slips through
That’s why I used a rolling baseline instead of a static number.
———
## The stack I used
This project uses:
- Docker Compose
- Nextcloud
- Nginx
- Python
- iptables
- Slack webhook
- a live metrics dashboard
The Nextcloud image came from Docker Hub and was used exactly as required.
———
## Architecture overview
The traffic flow looks like this:
Internet Clients
|
v
Nginx Reverse Proxy
|
+--> Nextcloud
|
+--> JSON access logs
|
v
Python Detector Daemon
| | |
v v v
iptables Slack Dashboard
Nginx and the detector share a Docker volume so the detector can read the live access log without modifying the application
container.
———
## Step 1: Logging traffic with Nginx
The detector needs reliable traffic data before it can make decisions.
So I configured Nginx to log every request in JSON format with fields like:
- source IP
- timestamp
- method
- path
- status code
- response size
A simplified example looks like this:
{
"source_ip": "203.0.113.10",
"timestamp": "2026-04-27T09:25:51+00:00",
"method": "GET",
"path": "/",
"status": "200",
"response_size": "612"
}
Structured logs are much easier to parse safely than plain text.
I also configured Nginx to trust and forward the real client IP using X-Forwarded-For, so the detector sees the actual request
source.
———
## Step 2: Continuously reading logs with Python
The detector is not a cron job and not a one-time script.
It runs as a long-lived daemon and continuously tails the Nginx access log file.
For every new line, it:
- parses the JSON
- extracts the traffic fields
- updates request windows
- updates baselines
- checks whether the traffic looks anomalous
That means detection happens in near real time.
———
## Step 3: Using a sliding window with deques
One of the most important parts of this project is the 60-second sliding window.
I used Python deque objects because they are excellent for “keep the latest items, remove the oldest items” logic.
### What I tracked
I kept:
- one global request deque
- one per-IP request deque
- one per-IP error deque for 4xx and 5xx responses
### How it works
When a request arrives:
- append the current timestamp to the relevant deque
- remove any timestamps older than 60 seconds
This gives a true moving view of the latest traffic.
### Why this matters
A simple “requests per minute” counter resets at fixed minute boundaries, which can hide short bursts.
A sliding window answers the better question:
How much traffic happened in the last 60 seconds right now?
That is much better for anomaly detection.
———
## Step 4: Teaching the baseline to learn from traffic
A sliding window shows what is happening now, but it does not tell us whether that traffic is unusual.
For that, I built a rolling baseline manager.
### What the baseline tracks
The baseline stores:
- per-second request counts
- per-second error counts
- a rolling 30-minute history
- hourly traffic slots
### What gets recalculated
Every 60 seconds, the detector recalculates:
- mean requests per second
- standard deviation
- error rate
### Why idle seconds matter
One important detail was making sure quiet seconds are also included.
If you only record seconds where traffic exists, the average becomes artificially high. Then the system thinks normal traffic
is busier than it really is.
So I made sure the baseline includes:
- active seconds
- idle seconds with zero traffic
That makes the learned average much more realistic.
### Hour-slot preference
Traffic usually changes throughout the day.
So I added a rule:
- if the current hour has enough samples, use the current hour’s baseline
- otherwise, fall back to the rolling 30-minute baseline
This helps the detector adapt to time-of-day behavior.
———
## Step 5: How the detector makes decisions
Once I had:
- a live request rate
- a learned baseline
I needed a way to decide whether traffic is abnormal.
I used two checks.
### 1. Z-score
The z-score answers:
How far is the current traffic from the normal average, measured in standard deviations?
A high z-score means traffic is statistically unusual.
### 2. Rate multiplier
I also added a simpler check:
Is the current rate more than N times the learned average?
That catches obvious spikes even when the z-score is not dramatic yet.
### Detection rule
A request pattern is considered anomalous if either fires first:
- z-score exceeds threshold
- current rate exceeds multiplier of baseline mean
I used this logic for:
- per-IP traffic
- global traffic
———
## Step 6: Tightening thresholds when errors surge
Not all suspicious behavior is about volume alone.
Sometimes an IP causes a lot of:
- 401
- 403
- 404
- 500
That often suggests scanning, brute force, or probing.
So I added an error surge rule.
If an IP’s 4xx/5xx rate becomes much worse than its normal baseline, the detector automatically tightens its thresholds.
That way, a suspicious IP gets less tolerance than a normal user.
———
## Step 7: Blocking bad IPs with iptables
When a per-IP anomaly is confirmed, the detector blocks the source IP using Linux iptables.
The command is:
iptables -I INPUT -s -j DROP
For example:
iptables -I INPUT -s 203.0.113.10 -j DROP
### What this means
- -I INPUT inserts the rule into the input chain
- -s selects the source IP
- -j DROP silently drops all packets from that IP
In simple terms:
“If traffic comes from this IP, ignore it.”
This is useful because it stops abusive traffic at the firewall level.
———
## Step 8: Automatically unbanning IPs
Blocking forever on a first offense is not always ideal.
So I added a backoff-based unban system:
- first ban: 10 minutes
- second ban: 30 minutes
- third ban: 2 hours
- fourth offense onward: permanent
A background unban loop checks whether each active ban has expired.
If a ban expires:
- the firewall rule is removed
- the audit log records the release
- Slack gets an unban notification
———
## Step 9: Sending Slack alerts
The detector sends Slack notifications for:
- per-IP bans
- unbans
- global anomaly alerts
Each alert includes:
- the condition that fired
- the current rate
- the baseline
- the timestamp
- the ban duration if applicable
That makes each notification immediately useful.
———
## Step 10: Building the live dashboard
I also built a live dashboard that shows:
- global requests per second
- top source IPs
- currently banned IPs
- CPU usage
- memory usage
- uptime
- effective baseline values
- baseline graph over time
This made testing much easier, because I could see how the detector was behaving without constantly reading raw logs.
———
## Example: sliding window idea in Python
Here is the basic idea behind the 60-second deque window:
from collections import deque
import time
requests = deque()
def add_request():
now = time.time()
requests.append(now)
cutoff = now - 60
while requests and requests[0] < cutoff:
requests.popleft()
return len(requests) / 60
That tiny pattern is the core of the live request-rate logic.
———
## Problems I ran into
This project also taught me that detection logic is only half the job. The other half is operational reliability.
### 1. The baseline can learn the wrong thing
If you attack too early, the detector can start treating attack traffic as normal.
The fix:
- warm the system with light traffic first
- wait for a baseline recalculation
- then run the burst
### 2. Too much per-request logging
Logging every request at INFO created too much output during heavy bursts.
The fix:
- make request-by-request logging configurable
- keep it off by default
- keep audit events on
### 3. Blocking my own SSH session
At one point, I attacked from the same IP I used for SSH, and the detector correctly blocked that IP.
The fix:
- add a whitelist for admin IPs
- use a separate IP for attack traffic
### 4. Capturing iptables state at the right time
Sometimes the ban happened correctly, but the live iptables state was hard to catch.
The fix:
- automatically write iptables snapshots during BAN and UNBAN
———
## What I learned
This project helped me understand that security tooling is not just about rules.
It is also about:
- observability
- realistic baselines
- good logging
- safe testing
- automated response
- collecting proof that your system actually worked
It also showed me how simple data structures like a deque can be powerful when used carefully.
———
## Final thoughts
In the end, I built a system that can:
- monitor HTTP traffic in real time
- learn what normal looks like
- detect per-IP anomalies
- detect global anomalies
- block abusive IPs with iptables
- notify Slack
- automatically unban IPs
- expose live metrics in a dashboard
For a beginner-friendly DevSecOps project, this was a great way to connect traffic monitoring, anomaly detection, alerting, and
response in one real system.
If you are learning security engineering or DevSecOps, this kind of project is a very practical way to understand how defensive
controls work in production-style environments.
———
## Project links
- Live dashboard: http://54.90.137.142:8081
- GitHub repository: https://github.com/Patrickmbaza/hng14-stage3-devops-
Top comments (0)