Arina Cholee

Posted on Jan 13

Real-World Performance Testing of SafeLine WAF: How Far Can a Free WAF Go?

#websecurity #cybersecurity #waf #safeline

When evaluating a Web Application Firewall (WAF), detection accuracy is only half the story.

The other half — often overlooked — is performance.

How much traffic can it realistically handle?

Where are the bottlenecks?

And what kind of tuning is required to avoid the WAF itself becoming the weakest link?

To answer these questions, a developer conducted a hands-on stress test of SafeLine, a free and self-hosted WAF, focusing on its real-world performance characteristics under load.

Background

SafeLine is an open-source WAF built around semantic analysis rather than traditional rule-based matching.

It has gained attention for offering:

High detection accuracy
Low false-positive rates
Protection against unknown (zero-day–style) payloads
A fully self-hosted deployment model

But while detection capabilities are frequently discussed, hard performance data is rarely shared.

The goal of this test was simple:

Measure how SafeLine behaves under sustained load, identify bottlenecks, and determine how much traffic it can realistically inspect per CPU core.

Test Environment

All tests were performed on a single machine to eliminate network variability.

Hardware & Software

CPU: Intel i7-12700
Memory: 64 GB DDR5 (4800 MHz)
Kernel: 5.17.15
Docker: 20.10.21
SafeLine version: 1.3.0

Deployment Model

SafeLine deployed via Docker
SafeLine reverse proxy in front of a local Nginx “business server”
Backend server returns a simple 200 OK for all requests

This setup isolates WAF inspection overhead from application logic.

Understanding SafeLine’s Architecture

Before testing performance, the tester first identified which internal services actually scale with traffic.

Three containers showed load directly correlated with QPS:

safeline-tengine
- Based on Tengine (an Nginx variant)
- Acts as the reverse proxy and traffic entry point
safeline-detector
- Core detection engine
- Performs semantic inspection of HTTP requests
safeline-mario
- Responsible for processing and persisting detection logs
- Becomes relevant under sustained high traffic

Understanding these roles was critical for interpreting bottlenecks later.

Testing Methodology

Tools Used

wrk — for maximum throughput testing
wrk2 — for fixed-QPS testing and stability validation

Request Types

Two request patterns were tested:

Simple requests
- GET requests without a body
Complex requests
- GET requests with a 1 KB JSON body

Since WAFs operate at the HTTP layer, QPS (Queries Per Second) was chosen as the primary metric, rather than raw bandwidth.

Baseline Test: Simple Requests

To establish a baseline, all SafeLine services were limited to one CPU core each.

Initial Results

Maximum QPS: ~4,175
First bottleneck: safeline-detector

CPU profiling revealed that the detector (snserver) runs as a multi-threaded process, even when constrained to a single core.

This caused excessive context switching and reduced efficiency.

Thread Tuning

By manually reducing:

Detector thread count → 1
Nginx worker processes → 1

Performance improved dramatically.

Optimized Result

Over 17,000 QPS on a single core for simple GET requests

However, a new bottleneck appeared.

Logging Becomes the Bottleneck

At high QPS:

safeline-mario hit 100% CPU
Memory usage grew continuously beyond 2 GB
CPU usage stayed high even after traffic stopped

This behavior indicated that log processing could not keep up, causing internal queues to back up.

Key takeaway:

For extremely high QPS with simple requests, logging throughput — not detection — becomes the limiting factor.

Complex Requests: More Realistic Traffic

Next, the tester switched to requests with a 1 KB JSON payload, closer to real-world API traffic.

Results

Maximum QPS: ~10,000 per core
Bottleneck: safeline-detector, as expected

Key observations:

Detection cost increases significantly with request complexity
tengine and mario CPU usage dropped due to lower QPS
Detector throughput became the dominant constraint

This aligns with how semantic inspection engines behave in practice.

Per-Service Capacity Summary

With 1 KB request bodies, the observed per-core limits were approximately:

Component	Approx. Single-Core Capacity
Detector	~10,000 QPS
Tengine (Nginx)	~28,000 QPS
Mario (logging)	~11,000 QPS (with 2 cores)

This makes the detector the primary scaling unit for SafeLine deployments.

What This Means for Real Deployments

From this real-world test, several practical conclusions emerge:

SafeLine can comfortably handle 10k QPS per CPU core for realistic HTTP traffic
Performance scales linearly with CPU when detector and logging resources are tuned correctly
Default thread settings may not be optimal for constrained environments
Log processing can become a hidden bottleneck at high traffic volumes

In other words:

SafeLine’s detection engine is fast — but production deployments must account for logging and threading behavior.

Final Thoughts

This test demonstrates that SafeLine is not just “good for a free WAF” — it is genuinely performant when properly configured.

For developers running self-hosted services, APIs, or internal platforms:

It offers strong protection without relying on external SaaS
Performance characteristics are predictable and tunable
Scaling behavior is transparent and engineering-friendly

SafeLine may not eliminate the need for architectural planning, but it proves that self-hosted WAFs can scale to real traffic without becoming the bottleneck.

If you’re evaluating SafeLine, test it the same way you’d test your own services — under load, with realistic traffic, and with monitoring enabled. The results might surprise you.

DEV Community