When I first saw the HNG Stage 3 task, it looked intimidating.
The assignment was not just asking me to deploy an application. It wanted me to build a small security system around the application.
The system needed to:
- deploy a Nextcloud app,
- place Nginx in front of it,
- read Nginx access logs,
- detect abnormal traffic,
- block aggressive IP addresses,
- send Slack alerts,
- show live metrics on a dashboard,
- and keep audit logs as proof of what happened.
At first, this felt like many separate things. But after breaking it down, I understood that the project is really about one simple idea:
Watch traffic, learn what is normal, and react when something becomes suspicious.
This blog post explains the project in simple terms.
What the Project Does
The project protects a Nextcloud application using a custom Python detector.
The traffic flow looks like this:
User traffic
↓
Nginx reverse proxy
↓
Nextcloud container
Nginx is the first thing users touch. It receives public traffic and forwards it to Nextcloud.
At the same time, Nginx writes a log for every request.
Those logs are written in JSON format, like this:
{
"source_ip": "102.91.92.195",
"timestamp": "2026-04-29T13:30:00+00:00",
"method": "GET",
"path": "/",
"status": 200,
"response_size": 5321
}
Then the Python detector reads those logs continuously.
The monitoring flow looks like this:
Nginx JSON logs
↓
Python detector daemon
↓
Sliding window + baseline + anomaly detection
↓
Slack alerts / iptables block / dashboard / audit log
So the detector does not receive web traffic directly. It watches the log file and makes decisions from what it sees.
Why This Project Matters
This project matters because servers are always exposed to traffic from the internet.
Some traffic is normal. For example:
A user opens the login page.
A user refreshes the app.
A user uploads a file.
But some traffic can be suspicious. For example:
One IP sends hundreds of requests quickly.
Many bad requests hit invalid pages.
The whole server suddenly gets much more traffic than usual.
If we do not monitor these patterns, the application can become slow or unavailable.
This project teaches an important DevOps/security lesson:
Logs are not just for debugging. Logs can also be used to detect attacks.
By reading Nginx logs, the detector can understand what is happening and respond automatically.
Main Parts of the System
The project has a few important parts.
1. Nextcloud
Nextcloud is the application being protected.
In this project, I did not build Nextcloud myself. I used the provided Docker image.
Nextcloud is the app users access.
2. Nginx
Nginx is used as a reverse proxy.
A reverse proxy is a server that receives requests first, then forwards them to another application behind it.
In this project:
User → Nginx → Nextcloud
Nginx also writes JSON access logs.
That is very important because the detector depends on those logs.
3. Python Detector
The detector is the main custom part of the project.
It does these things:
Reads Nginx logs
Counts recent traffic
Learns normal traffic
Detects abnormal traffic
Blocks aggressive IPs
Sends Slack alerts
Updates dashboard metrics
Writes audit logs
Automatically unbans temporary bans
I used Python because it is easier to read and good for this type of project. Python also has useful tools for JSON parsing, background threads, HTTP requests, and running Linux commands.
How the Sliding Window Works
The first thing the detector needs to know is:
How many requests happened recently?
For that, I used a 60-second sliding window.
A sliding window means the detector always looks at the most recent 60 seconds of traffic.
Example:
Current time: 10:01:00
Window: 10:00:00 → 10:01:00
Ten seconds later:
Current time: 10:01:10
Window: 10:00:10 → 10:01:10
The window moves forward as time moves forward.
The detector tracks:
Total requests in the last 60 seconds
Requests per IP in the last 60 seconds
Errors in the last 60 seconds
Top source IPs
This allows the detector to answer questions like:
How many requests did this IP make recently?
How many total requests did the server receive recently?
Which IPs are the busiest right now?
In Python, I used deque for this.
A deque is like a list that is efficient when adding items to one end and removing old items from the other end.
For every request, the detector adds a timestamp. When a timestamp becomes older than 60 seconds, it is removed.
That way, the count always stays fresh.
How the Baseline Learns Traffic
Counting current traffic is not enough.
The detector also needs to know what is normal.
For example, 100 requests may be normal for a busy server, but suspicious for a small server.
So I added a rolling baseline.
The baseline looks at the last 30 minutes of traffic and calculates:
Average request rate
Standard deviation
Normal error rate
Current hour traffic pattern
The detector stores traffic as per-second counts.
Example:
10:00:01 → 2 requests
10:00:02 → 0 requests
10:00:03 → 1 request
10:00:04 → 5 requests
From these numbers, it calculates the average.
The average is called the mean.
If the counts are:
2, 0, 1, 5
The mean is:
(2 + 0 + 1 + 5) / 4 = 2
So the detector learns that normal traffic is around 2 requests per second.
What Standard Deviation Means
Standard deviation tells the detector how much traffic normally changes.
If traffic is always like this:
2, 2, 2, 2, 2
Then the traffic is very stable.
If traffic is like this:
0, 5, 1, 8, 2
Then the traffic jumps around a lot.
This matters because a small spike may be suspicious on a very quiet server, but normal on a server that already has unstable traffic.
So the detector uses both:
mean
standard deviation
to understand what normal traffic looks like.
How Detection Decisions Are Made
The detector compares the current request rate with the learned baseline.
It uses two main rules.
Rule 1: Z-score
The z-score checks how far the current traffic is from normal.
The formula is:
z = (current_rate - baseline_mean) / baseline_stddev
If the z-score is greater than 3.0, the traffic is treated as suspicious.
In simple terms:
If traffic is much higher than what the server normally sees, raise an alert.
Rule 2: 5x Baseline Rule
The detector also checks if the current rate is more than 5 times the baseline mean.
Example:
Baseline mean: 1 request/sec
Current rate: 6 requests/sec
That is more than 5 times the baseline.
So the detector treats it as suspicious.
This rule is useful because it is simple and easy to understand.
Error Surge Detection
The detector also watches error responses.
Errors are HTTP status codes like:
404 Not Found
403 Forbidden
500 Server Error
If one IP produces too many errors, it may be scanning or attacking invalid paths.
The detector checks:
IP error rate > 3x baseline error rate
If that happens, the detector tightens the thresholds for that IP.
Normal thresholds:
z-score > 3.0
rate > 5x baseline
Tightened thresholds:
z-score > 2.0
rate > 3x baseline
This means the detector becomes stricter when an IP is already behaving suspiciously.
Per-IP Anomaly vs Global Anomaly
The detector handles two types of anomalies.
Per-IP Anomaly
This happens when one IP is sending too much traffic.
Example:
One IP sends 500 requests quickly.
The detector responds by blocking that IP.
Global Anomaly
This happens when the whole server receives too much traffic.
Example:
The total server traffic suddenly becomes much higher than normal.
For global anomalies, the detector sends a Slack alert but does not block everyone.
This is important because a global spike might be caused by:
Many normal users
A test
A campaign
A real attack
Blocking everyone would be dangerous.
So global anomaly only sends an alert.
How iptables Blocks an IP
To block aggressive IPs, the detector uses iptables.
iptables is a Linux firewall tool.
A firewall decides whether traffic should be accepted or dropped.
The blocking command looks like this:
iptables -I DOCKER-USER -s BAD_IP -j DROP
This means:
If traffic comes from BAD_IP, drop it.
Once the rule is active, traffic from that IP should no longer reach Nginx or Nextcloud.
The flow becomes:
Blocked IP
↓
iptables
↓
DROP
The request does not reach the application.
Why I Used the DOCKER-USER Chain
At first, I thought blocking in the INPUT chain was enough.
But Nginx is running inside Docker, and Docker uses its own iptables rules.
Because of that, a normal INPUT rule may not always block traffic going to Docker-published ports.
So I used the DOCKER-USER chain.
This is a better place to put custom firewall rules when working with Docker.
The command is:
iptables -I DOCKER-USER -s BAD_IP -j DROP
To inspect the rules:
sudo iptables -L DOCKER-USER -n --line-numbers
Example output:
Chain DOCKER-USER (1 references)
num target prot opt source destination
1 DROP all -- 102.91.92.195 0.0.0.0/0
This proves the IP is blocked.
Auto-Unban Logic
Blocking forever is not always the best idea.
So I added an auto-unban system.
The ban schedule is:
1st offense → 10 minutes
2nd offense → 30 minutes
3rd offense → 2 hours
4th offense → permanent
The detector stores ban state in:
detector/state.json
This file remembers:
Which IP is banned
How many times it has been banned
When it was banned
When it should be unbanned
Whether the ban is permanent
The auto-unbanner checks the state file regularly.
If the ban time has expired, it removes the iptables rule:
iptables -D DOCKER-USER -s IP -j DROP
Then it sends a Slack message and writes an audit log entry.
Slack Alerts
Slack alerts are used so that I do not have to watch the terminal all the time.
The detector sends Slack alerts for:
IP ban
IP unban
Global anomaly
Error surge
The Slack webhook is not stored directly inside config.yaml.
Instead, config.yaml points to an environment variable:
slack:
enabled: true
webhook_url_env: "SLACK_WEBHOOK_URL"
The real webhook is stored in .env:
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
The .env file is ignored by Git so the secret is not exposed.
Live Metrics Dashboard
The project also has a live dashboard.
The dashboard shows:
Banned IPs
Global requests per second
Top 10 source IPs
CPU usage
Memory usage
Effective mean
Effective standard deviation
Uptime
The dashboard refreshes every 3 seconds or less.
This helps the reviewer see what the detector is doing live.
The dashboard is served through a DuckDNS subdomain, while Nextcloud remains accessible by server IP only.
Audit Logs
The audit log is the permanent record of what happened.
It is stored at:
detector/audit.log
The format is:
[timestamp] ACTION ip | condition | rate | baseline | duration
Examples:
[2026-04-29T13:00:00+00:00] BASELINE - | recalculated using rolling-window | - | mean=0.1000,stddev=0.1000,error_rate=0.0100,samples=60,hour=13 | -
[2026-04-29T13:05:00+00:00] GLOBAL_ANOMALY - | z-score 8.50 exceeded threshold 3.00 | 30.00/s | mean=2.0000,stddev=1.0000,z=28.00,multiplier=15.00x | -
[2026-04-29T13:06:00+00:00] BAN 102.91.92.195 | rate exceeded 5x baseline | 5.00/s | mean=0.5000,stddev=0.2000,z=22.50,multiplier=10.00x,tightened=False | 10 minutes
[2026-04-29T13:16:00+00:00] UNBAN 102.91.92.195 | previous ban expired | - | - | 10 minutes
This log proves that the detector is not just running, but actually making decisions and recording them.
What I Learned
This project helped me understand how monitoring and security automation can work together.
The most important things I learned were:
- Nginx can be used as both a reverse proxy and a source of logs. Logs can be used to detect suspicious behavior.
- A sliding window helps track recent traffic.
- A baseline helps define what normal means.
- iptables can block traffic at the Linux firewall level.
- Docker networking affects where firewall rules should be placed.
- Slack alerts make the system easier to monitor.
- Audit logs are important for proof and debugging.
I also learned that building a system like this is easier when broken into small parts.
Instead of trying to build everything at once, I built it step by step:
- Deploy app
- Add logs
- Read logs
- Count traffic
- Learn baseline
- Detect anomaly
- Send alert
- Block IP
- Auto-unban
- Show dashboard
- Write audit logs
That made the project easier to understand.
Conclusion
This project is a small but practical example of how a server can monitor and defend itself.
It is not a full enterprise security system, but it shows the foundation of one:
observe traffic
learn normal behavior
Detect abnormal behavior
respond automatically
record what happened
For me, the biggest lesson is that DevOps is not only about deploying applications. It is also about understanding how applications behave in real conditions and building systems that can respond when something goes wrong.
Top comments (0)