Daniel Ogbuti

Posted on Apr 29

Building a DDoS Detection Tool

#devsecops #devops #cybersecurity #python

When I first saw the HNG Stage 3 task, it looked intimidating.

The assignment was not just asking me to deploy an application. It wanted me to build a small security system around the application.

The system needed to:

deploy a Nextcloud app,
place Nginx in front of it,
read Nginx access logs,
detect abnormal traffic,
block aggressive IP addresses,
send Slack alerts,
show live metrics on a dashboard,
and keep audit logs as proof of what happened.

At first, this felt like many separate things. But after breaking it down, I understood that the project is really about one simple idea:

Watch traffic, learn what is normal, and react when something becomes suspicious.

This blog post explains the project in simple terms.

What the Project Does

The project protects a Nextcloud application using a custom Python detector.

The traffic flow looks like this:

User traffic
    ↓
Nginx reverse proxy
    ↓
Nextcloud container

Nginx is the first thing users touch. It receives public traffic and forwards it to Nextcloud.

At the same time, Nginx writes a log for every request.

Those logs are written in JSON format, like this:

{
  "source_ip": "102.91.92.195",
  "timestamp": "2026-04-29T13:30:00+00:00",
  "method": "GET",
  "path": "/",
  "status": 200,
  "response_size": 5321
}

Then the Python detector reads those logs continuously.

The monitoring flow looks like this:

Nginx JSON logs
    ↓
Python detector daemon
    ↓
Sliding window + baseline + anomaly detection
    ↓
Slack alerts / iptables block / dashboard / audit log

So the detector does not receive web traffic directly. It watches the log file and makes decisions from what it sees.

Why This Project Matters

This project matters because servers are always exposed to traffic from the internet.

Some traffic is normal. For example:

A user opens the login page.
A user refreshes the app.
A user uploads a file.

But some traffic can be suspicious. For example:

One IP sends hundreds of requests quickly.
Many bad requests hit invalid pages.
The whole server suddenly gets much more traffic than usual.

If we do not monitor these patterns, the application can become slow or unavailable.

This project teaches an important DevOps/security lesson:

Logs are not just for debugging. Logs can also be used to detect attacks.

By reading Nginx logs, the detector can understand what is happening and respond automatically.

Main Parts of the System

The project has a few important parts.

1. Nextcloud

Nextcloud is the application being protected.

In this project, I did not build Nextcloud myself. I used the provided Docker image.

Nextcloud is the app users access.

2. Nginx

Nginx is used as a reverse proxy.

A reverse proxy is a server that receives requests first, then forwards them to another application behind it.

In this project:

User → Nginx → Nextcloud

Nginx also writes JSON access logs.

That is very important because the detector depends on those logs.

3. Python Detector

The detector is the main custom part of the project.

It does these things:

Reads Nginx logs
Counts recent traffic
Learns normal traffic
Detects abnormal traffic
Blocks aggressive IPs
Sends Slack alerts
Updates dashboard metrics
Writes audit logs
Automatically unbans temporary bans

I used Python because it is easier to read and good for this type of project. Python also has useful tools for JSON parsing, background threads, HTTP requests, and running Linux commands.

How the Sliding Window Works

The first thing the detector needs to know is:

How many requests happened recently?

For that, I used a 60-second sliding window.

A sliding window means the detector always looks at the most recent 60 seconds of traffic.

Example:

Current time: 10:01:00
Window:       10:00:00 → 10:01:00

Ten seconds later:

Current time: 10:01:10
Window:       10:00:10 → 10:01:10

The window moves forward as time moves forward.

The detector tracks:

Total requests in the last 60 seconds
Requests per IP in the last 60 seconds
Errors in the last 60 seconds
Top source IPs

This allows the detector to answer questions like:

How many requests did this IP make recently?
How many total requests did the server receive recently?
Which IPs are the busiest right now?

In Python, I used deque for this.

A deque is like a list that is efficient when adding items to one end and removing old items from the other end.

For every request, the detector adds a timestamp. When a timestamp becomes older than 60 seconds, it is removed.

That way, the count always stays fresh.

How the Baseline Learns Traffic

Counting current traffic is not enough.

The detector also needs to know what is normal.

For example, 100 requests may be normal for a busy server, but suspicious for a small server.

So I added a rolling baseline.

The baseline looks at the last 30 minutes of traffic and calculates:

Average request rate
Standard deviation
Normal error rate
Current hour traffic pattern

The detector stores traffic as per-second counts.

Example:

10:00:01 → 2 requests
10:00:02 → 0 requests
10:00:03 → 1 request
10:00:04 → 5 requests

From these numbers, it calculates the average.

The average is called the mean.

If the counts are:

2, 0, 1, 5

The mean is:

(2 + 0 + 1 + 5) / 4 = 2

So the detector learns that normal traffic is around 2 requests per second.

What Standard Deviation Means

Standard deviation tells the detector how much traffic normally changes.

If traffic is always like this:

2, 2, 2, 2, 2

Then the traffic is very stable.

If traffic is like this:

0, 5, 1, 8, 2

Then the traffic jumps around a lot.

This matters because a small spike may be suspicious on a very quiet server, but normal on a server that already has unstable traffic.

So the detector uses both:

mean
standard deviation

to understand what normal traffic looks like.

How Detection Decisions Are Made

The detector compares the current request rate with the learned baseline.

It uses two main rules.

Rule 1: Z-score

The z-score checks how far the current traffic is from normal.

The formula is:

z = (current_rate - baseline_mean) / baseline_stddev

If the z-score is greater than 3.0, the traffic is treated as suspicious.

In simple terms:

If traffic is much higher than what the server normally sees, raise an alert.

Rule 2: 5x Baseline Rule

The detector also checks if the current rate is more than 5 times the baseline mean.

Example:

Baseline mean: 1 request/sec
Current rate: 6 requests/sec

That is more than 5 times the baseline.

So the detector treats it as suspicious.

This rule is useful because it is simple and easy to understand.

Error Surge Detection

The detector also watches error responses.

Errors are HTTP status codes like:

404 Not Found
403 Forbidden
500 Server Error

If one IP produces too many errors, it may be scanning or attacking invalid paths.

The detector checks:

IP error rate > 3x baseline error rate

If that happens, the detector tightens the thresholds for that IP.

Normal thresholds:

z-score > 3.0
rate > 5x baseline

Tightened thresholds:

z-score > 2.0
rate > 3x baseline

This means the detector becomes stricter when an IP is already behaving suspiciously.

Per-IP Anomaly vs Global Anomaly

The detector handles two types of anomalies.

Per-IP Anomaly

This happens when one IP is sending too much traffic.

Example:

One IP sends 500 requests quickly.

The detector responds by blocking that IP.

Global Anomaly

This happens when the whole server receives too much traffic.

Example:

The total server traffic suddenly becomes much higher than normal.

For global anomalies, the detector sends a Slack alert but does not block everyone.

This is important because a global spike might be caused by:

Many normal users
A test
A campaign
A real attack

Blocking everyone would be dangerous.

So global anomaly only sends an alert.

How iptables Blocks an IP

To block aggressive IPs, the detector uses iptables.

iptables is a Linux firewall tool.

A firewall decides whether traffic should be accepted or dropped.

The blocking command looks like this:

iptables -I DOCKER-USER -s BAD_IP -j DROP

This means:

If traffic comes from BAD_IP, drop it.

Once the rule is active, traffic from that IP should no longer reach Nginx or Nextcloud.

The flow becomes:

Blocked IP
    ↓
iptables
    ↓
DROP

The request does not reach the application.

Why I Used the DOCKER-USER Chain

At first, I thought blocking in the INPUT chain was enough.

But Nginx is running inside Docker, and Docker uses its own iptables rules.

Because of that, a normal INPUT rule may not always block traffic going to Docker-published ports.

So I used the DOCKER-USER chain.

This is a better place to put custom firewall rules when working with Docker.

The command is:

iptables -I DOCKER-USER -s BAD_IP -j DROP

To inspect the rules:

sudo iptables -L DOCKER-USER -n --line-numbers

Example output:

Chain DOCKER-USER (1 references)
num  target  prot opt source          destination
1    DROP    all  --  102.91.92.195   0.0.0.0/0

This proves the IP is blocked.

Auto-Unban Logic

Blocking forever is not always the best idea.

So I added an auto-unban system.

The ban schedule is:

1st offense → 10 minutes
2nd offense → 30 minutes
3rd offense → 2 hours
4th offense → permanent

The detector stores ban state in:

detector/state.json

This file remembers:

Which IP is banned
How many times it has been banned
When it was banned
When it should be unbanned
Whether the ban is permanent

The auto-unbanner checks the state file regularly.

If the ban time has expired, it removes the iptables rule:

iptables -D DOCKER-USER -s IP -j DROP

Then it sends a Slack message and writes an audit log entry.

Slack Alerts

Slack alerts are used so that I do not have to watch the terminal all the time.

The detector sends Slack alerts for:

IP ban
IP unban
Global anomaly
Error surge

The Slack webhook is not stored directly inside config.yaml.

Instead, config.yaml points to an environment variable:

slack:
  enabled: true
  webhook_url_env: "SLACK_WEBHOOK_URL"

The real webhook is stored in .env:

SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...

The .env file is ignored by Git so the secret is not exposed.

Live Metrics Dashboard

The project also has a live dashboard.

The dashboard shows:

Banned IPs
Global requests per second
Top 10 source IPs
CPU usage
Memory usage
Effective mean
Effective standard deviation
Uptime

The dashboard refreshes every 3 seconds or less.

This helps the reviewer see what the detector is doing live.

The dashboard is served through a DuckDNS subdomain, while Nextcloud remains accessible by server IP only.

Audit Logs

The audit log is the permanent record of what happened.

It is stored at:

detector/audit.log

The format is:

[timestamp] ACTION ip | condition | rate | baseline | duration

Examples:

[2026-04-29T13:00:00+00:00] BASELINE - | recalculated using rolling-window | - | mean=0.1000,stddev=0.1000,error_rate=0.0100,samples=60,hour=13 | -

[2026-04-29T13:05:00+00:00] GLOBAL_ANOMALY - | z-score 8.50 exceeded threshold 3.00 | 30.00/s | mean=2.0000,stddev=1.0000,z=28.00,multiplier=15.00x | -

[2026-04-29T13:06:00+00:00] BAN 102.91.92.195 | rate exceeded 5x baseline | 5.00/s | mean=0.5000,stddev=0.2000,z=22.50,multiplier=10.00x,tightened=False | 10 minutes

[2026-04-29T13:16:00+00:00] UNBAN 102.91.92.195 | previous ban expired | - | - | 10 minutes

This log proves that the detector is not just running, but actually making decisions and recording them.

What I Learned

This project helped me understand how monitoring and security automation can work together.

The most important things I learned were:

Nginx can be used as both a reverse proxy and a source of logs. Logs can be used to detect suspicious behavior.
A sliding window helps track recent traffic.
A baseline helps define what normal means.
iptables can block traffic at the Linux firewall level.
Docker networking affects where firewall rules should be placed.
Slack alerts make the system easier to monitor.
Audit logs are important for proof and debugging.

I also learned that building a system like this is easier when broken into small parts.

Instead of trying to build everything at once, I built it step by step:

Deploy app
Add logs
Read logs
Count traffic
Learn baseline
Detect anomaly
Send alert
Block IP
Auto-unban
Show dashboard
Write audit logs

That made the project easier to understand.

Conclusion

This project is a small but practical example of how a server can monitor and defend itself.

It is not a full enterprise security system, but it shows the foundation of one:

observe traffic
learn normal behavior
Detect abnormal behavior
respond automatically
record what happened

For me, the biggest lesson is that DevOps is not only about deploying applications. It is also about understanding how applications behave in real conditions and building systems that can respond when something goes wrong.

DEV Community

Building a DDoS Detection Tool

What the Project Does

Why This Project Matters

Main Parts of the System

1. Nextcloud

2. Nginx

3. Python Detector

How the Sliding Window Works

How the Baseline Learns Traffic

What Standard Deviation Means

How Detection Decisions Are Made

Rule 1: Z-score

Rule 2: 5x Baseline Rule

Error Surge Detection

Per-IP Anomaly vs Global Anomaly

Per-IP Anomaly

Global Anomaly

How iptables Blocks an IP

Why I Used the DOCKER-USER Chain

Auto-Unban Logic

Slack Alerts

Live Metrics Dashboard

Audit Logs

What I Learned

Conclusion

Top comments (0)