When evaluating a Web Application Firewall (WAF), detection accuracy is only half the story.
The other half — often overlooked — is performance.
How much traffic can it realistically handle?
Where are the bottlenecks?
And what kind of tuning is required to avoid the WAF itself becoming the weakest link?
To answer these questions, a developer conducted a hands-on stress test of SafeLine, a free and self-hosted WAF, focusing on its real-world performance characteristics under load.
Background
SafeLine is an open-source WAF built around semantic analysis rather than traditional rule-based matching.
It has gained attention for offering:
- High detection accuracy
- Low false-positive rates
- Protection against unknown (zero-day–style) payloads
- A fully self-hosted deployment model
But while detection capabilities are frequently discussed, hard performance data is rarely shared.
The goal of this test was simple:
Measure how SafeLine behaves under sustained load, identify bottlenecks, and determine how much traffic it can realistically inspect per CPU core.
Test Environment
All tests were performed on a single machine to eliminate network variability.
Hardware & Software
- CPU: Intel i7-12700
- Memory: 64 GB DDR5 (4800 MHz)
- Kernel: 5.17.15
- Docker: 20.10.21
- SafeLine version: 1.3.0
Deployment Model
- SafeLine deployed via Docker
- SafeLine reverse proxy in front of a local Nginx “business server”
- Backend server returns a simple
200 OKfor all requests
This setup isolates WAF inspection overhead from application logic.
Understanding SafeLine’s Architecture
Before testing performance, the tester first identified which internal services actually scale with traffic.
Three containers showed load directly correlated with QPS:
-
safeline-tengine
- Based on Tengine (an Nginx variant)
- Acts as the reverse proxy and traffic entry point
-
safeline-detector
- Core detection engine
- Performs semantic inspection of HTTP requests
-
safeline-mario
- Responsible for processing and persisting detection logs
- Becomes relevant under sustained high traffic
Understanding these roles was critical for interpreting bottlenecks later.
Testing Methodology
Tools Used
- wrk — for maximum throughput testing
- wrk2 — for fixed-QPS testing and stability validation
Request Types
Two request patterns were tested:
-
Simple requests
- GET requests without a body
-
Complex requests
- GET requests with a 1 KB JSON body
Since WAFs operate at the HTTP layer, QPS (Queries Per Second) was chosen as the primary metric, rather than raw bandwidth.
Baseline Test: Simple Requests
To establish a baseline, all SafeLine services were limited to one CPU core each.
Initial Results
- Maximum QPS: ~4,175
- First bottleneck: safeline-detector
CPU profiling revealed that the detector (snserver) runs as a multi-threaded process, even when constrained to a single core.
This caused excessive context switching and reduced efficiency.
Thread Tuning
By manually reducing:
- Detector thread count → 1
- Nginx worker processes → 1
Performance improved dramatically.
Optimized Result
- Over 17,000 QPS on a single core for simple GET requests
However, a new bottleneck appeared.
Logging Becomes the Bottleneck
At high QPS:
-
safeline-mariohit 100% CPU - Memory usage grew continuously beyond 2 GB
- CPU usage stayed high even after traffic stopped
This behavior indicated that log processing could not keep up, causing internal queues to back up.
Key takeaway:
For extremely high QPS with simple requests, logging throughput — not detection — becomes the limiting factor.
Complex Requests: More Realistic Traffic
Next, the tester switched to requests with a 1 KB JSON payload, closer to real-world API traffic.
Results
- Maximum QPS: ~10,000 per core
- Bottleneck: safeline-detector, as expected
Key observations:
- Detection cost increases significantly with request complexity
-
tengineandmarioCPU usage dropped due to lower QPS - Detector throughput became the dominant constraint
This aligns with how semantic inspection engines behave in practice.
Per-Service Capacity Summary
With 1 KB request bodies, the observed per-core limits were approximately:
| Component | Approx. Single-Core Capacity |
|---|---|
| Detector | ~10,000 QPS |
| Tengine (Nginx) | ~28,000 QPS |
| Mario (logging) | ~11,000 QPS (with 2 cores) |
This makes the detector the primary scaling unit for SafeLine deployments.
What This Means for Real Deployments
From this real-world test, several practical conclusions emerge:
- SafeLine can comfortably handle 10k QPS per CPU core for realistic HTTP traffic
- Performance scales linearly with CPU when detector and logging resources are tuned correctly
- Default thread settings may not be optimal for constrained environments
- Log processing can become a hidden bottleneck at high traffic volumes
In other words:
SafeLine’s detection engine is fast — but production deployments must account for logging and threading behavior.
Final Thoughts
This test demonstrates that SafeLine is not just “good for a free WAF” — it is genuinely performant when properly configured.
For developers running self-hosted services, APIs, or internal platforms:
- It offers strong protection without relying on external SaaS
- Performance characteristics are predictable and tunable
- Scaling behavior is transparent and engineering-friendly
SafeLine may not eliminate the need for architectural planning, but it proves that self-hosted WAFs can scale to real traffic without becoming the bottleneck.
If you’re evaluating SafeLine, test it the same way you’d test your own services — under load, with realistic traffic, and with monitoring enabled. The results might surprise you.

Top comments (0)