Joseph Joshua

Posted on Apr 28 • Edited on Apr 30

Detecting & Blocking Anomalous Traffic with Cloud Anomaly Detector

#programming #python #devops #aws

A lightweight, containerized anomaly detection system that monitors traffic in real time, detects abuse patterns, and automatically blocks malicious IPs at the host firewall level.

I built a real-time anomaly detection system that monitors nginx access logs, computes adaptive rolling baselines per time window, detects traffic anomalies using statistical methods (z-score + spike multipliers), and automatically blocks malicious IPs using host-level iptables rules. The system includes Slack alerts and a live dashboard for observability and debugging.

Background / Motivation

Modern systems face constant threats such as:

DDoS attacks
Credential stuffing
API abuse and scraping bots
Sudden traffic spikes that degrade service

Most production solutions rely on expensive managed WAFs or cloud security tools. I wanted to build a low-cost, self-hosted anomaly detection engine that runs entirely on a VPS using logs, statistics, and system-level enforcement.

Constraints:

Must be containerized (Docker-based)
Must run on low-cost VPS infrastructure
Must use logs (not packet inspection tools)
Must enforce bans at host level (not only inside containers)
Must provide real-time visibility and debugging

What I Built

A full-stack anomaly detection pipeline composed of:

Detector Service (Python)
Baseline Engine (rolling statistical model)
Blocker Service (iptables enforcement on host)
Dashboard (real-time monitoring UI)
Slack Alerting System (incident notifications)

How It Works

Nginx logs every request in structured JSON format.

{
  "ip": "1.2.3.4",
  "endpoint": "/",
  "status": 200,
  "timestamp": 1710000000
}

From Logs to Detection

Once nginx writes request logs, the detector continuously processes them in real time.

Each incoming log entry goes through the following pipeline:

Parse JSON log entry
Extract IP, timestamp, and status code
Update per-second counters
Feed values into rolling baseline engine
Evaluate anomaly conditions

This pipeline runs continuously with minimal latency, ensuring near real-time detection.

Rolling Baseline Behavior

The system does not rely on fixed thresholds. Instead, it learns traffic behavior over time.

For each time window, the baseline tracks:

Average request rate (mean)
Variance (standard deviation)
Traffic distribution per second

This allows the system to adapt dynamically to traffic changes.

Example behavior:

Normal traffic period → stable baseline
Gradual increase → baseline adjusts slightly
Sudden spike → deviation becomes statistically significant

Anomaly Decision Process

Every second, the detector evaluates:

Current request rate vs baseline mean
Z-score deviation
Spike multiplier threshold
Error rate deviation

If any condition exceeds configured thresholds, the IP or system state is flagged.

This ensures:

Low false positives during normal usage
Fast reaction to sudden abuse patterns

Blocking Execution Flow

When an anomaly is confirmed, the system does not block immediately inside the application layer.

Instead, it uses a decoupled enforcement pipeline:

IP is added to a shared ban queue
Host worker process reads queue
Firewall rule is applied at kernel level

This ensures:

Separation of detection and enforcement
Reliability even if app crashes
Immediate packet-level blocking

Why Host-Level Blocking Matters

Blocking inside containers or application code is not sufficient because:

Traffic may already be routed through Docker bridge
App-level blocking still consumes resources
Reverse proxies may already forward requests

Using iptables DOCKER-USER ensures:

Traffic is dropped before it reaches the container network stack

This makes enforcement fast and reliable.

Observability Layer

To ensure visibility, the system exposes:

Live request rate graphs
Current baseline values
Active banned IP list
Recent anomaly events

The dashboard updates in real time based on detector outputs.

Testing Strategy (k6)

The system is validated using controlled load testing:

Gradual ramp-up tests
Sudden spike injection
Sustained high traffic simulation

This ensures:

Baseline accuracy
Proper Z-score calibration
Reliable ban triggering

System Reliability Design

Several mechanisms improve stability:

Warm-up period (prevents early noise)
Duplicate ban suppression
Rolling window smoothing
Queue-based enforcement (decoupled architecture)

These ensure the system remains stable under continuous load.

Summary of Flow

Nginx logs requests
Detector parses logs
Baseline is updated
Anomaly detected using statistical rules
IP is queued for blocking
Host worker applies firewall rule
Slack alert is sent
Dashboard reflects updated state

Visit repo for code workflow: https://github.com/izzyjosh/cloud-anomaly-detector

DEV Community

Detecting & Blocking Anomalous Traffic with Cloud Anomaly Detector

Background / Motivation

Constraints:

What I Built

How It Works

Nginx logs every request in structured JSON format.

From Logs to Detection

Rolling Baseline Behavior

Example behavior:

Anomaly Decision Process

Blocking Execution Flow

Why Host-Level Blocking Matters

Observability Layer

Testing Strategy (k6)

System Reliability Design

Summary of Flow

Top comments (0)