A lightweight, containerized anomaly detection system that monitors traffic in real time, detects abuse patterns, and automatically blocks malicious IPs at the host firewall level.
I built a real-time anomaly detection system that monitors nginx access logs, computes adaptive rolling baselines per time window, detects traffic anomalies using statistical methods (z-score + spike multipliers), and automatically blocks malicious IPs using host-level iptables rules. The system includes Slack alerts and a live dashboard for observability and debugging.
๐ง Background / Motivation
Modern systems face constant threats such as:
- DDoS attacks
- Credential stuffing
- API abuse and scraping bots
- Sudden traffic spikes that degrade service
Most production solutions rely on expensive managed WAFs or cloud security tools. I wanted to build a low-cost, self-hosted anomaly detection engine that runs entirely on a VPS using logs, statistics, and system-level enforcement.
Constraints:
- Must be containerized (Docker-based)
- Must run on low-cost VPS infrastructure
- Must use logs (not packet inspection tools)
- Must enforce bans at host level (not only inside containers)
- Must provide real-time visibility and debugging
๐๏ธ What I Built
A full-stack anomaly detection pipeline composed of:
- Detector Service (Python)
- Baseline Engine (rolling statistical model)
- Blocker Service (iptables enforcement on host)
- Dashboard (real-time monitoring UI)
- Slack Alerting System (incident notifications)
โ๏ธ How It Works
Nginx logs every request in structured JSON format.
{
"ip": "1.2.3.4",
"endpoint": "/",
"status": 200,
"timestamp": 1710000000
}
๐ From Logs to Detection
Once nginx writes request logs, the detector continuously processes them in real time.
Each incoming log entry goes through the following pipeline:
- Parse JSON log entry
- Extract IP, timestamp, and status code
- Update per-second counters
- Feed values into rolling baseline engine
- Evaluate anomaly conditions
This pipeline runs continuously with minimal latency, ensuring near real-time detection.
๐ Rolling Baseline Behavior
The system does not rely on fixed thresholds. Instead, it learns traffic behavior over time.
For each time window, the baseline tracks:
- Average request rate (mean)
- Variance (standard deviation)
- Traffic distribution per second
This allows the system to adapt dynamically to traffic changes.
Example behavior:
- Normal traffic period โ stable baseline
- Gradual increase โ baseline adjusts slightly
- Sudden spike โ deviation becomes statistically significant
โ ๏ธ Anomaly Decision Process
Every second, the detector evaluates:
- Current request rate vs baseline mean
- Z-score deviation
- Spike multiplier threshold
- Error rate deviation
If any condition exceeds configured thresholds, the IP or system state is flagged.
This ensures:
- Low false positives during normal usage
- Fast reaction to sudden abuse patterns
๐ซ Blocking Execution Flow
When an anomaly is confirmed, the system does not block immediately inside the application layer.
Instead, it uses a decoupled enforcement pipeline:
- IP is added to a shared ban queue
- Host worker process reads queue
- Firewall rule is applied at kernel level
This ensures:
- Separation of detection and enforcement
- Reliability even if app crashes
- Immediate packet-level blocking
๐ฅ Why Host-Level Blocking Matters
Blocking inside containers or application code is not sufficient because:
- Traffic may already be routed through Docker bridge
- App-level blocking still consumes resources
- Reverse proxies may already forward requests
Using iptables DOCKER-USER ensures:
Traffic is dropped before it reaches the container network stack
This makes enforcement fast and reliable.
๐ Observability Layer
To ensure visibility, the system exposes:
- Live request rate graphs
- Current baseline values
- Active banned IP list
- Recent anomaly events
The dashboard updates in real time based on detector outputs.
๐งช Testing Strategy (k6)
The system is validated using controlled load testing:
- Gradual ramp-up tests
- Sudden spike injection
- Sustained high traffic simulation
This ensures:
- Baseline accuracy
- Proper Z-score calibration
- Reliable ban triggering
๐งฉ System Reliability Design
Several mechanisms improve stability:
- Warm-up period (prevents early noise)
- Duplicate ban suppression
- Rolling window smoothing
- Queue-based enforcement (decoupled architecture)
These ensure the system remains stable under continuous load.
๐งญ Summary of Flow
- Nginx logs requests
- Detector parses logs
- Baseline is updated
- Anomaly detected using statistical rules
- IP is queued for blocking
- Host worker applies firewall rule
- Slack alert is sent
- Dashboard reflects updated state
Visit repo for code workflow: https://github.com/izzyjosh/cloud-anomaly-detector
Top comments (0)