Bhavy Yadav

Posted on May 31 • Edited on Jun 2

What Happens in 2 Milliseconds: Anatomy of a Single HTTP Request Through a Production WAF

#webdev #programming #backend #go

The rule engine is not the hard part. Everyone builds a rule engine. The hard part is deciding what order the checks run in — because the difference between a hash map lookup and a regex match is two orders of magnitude, and you're doing this on every single request.

Six-stage pipeline. Production. 50+ client websites, 100K+ daily requests. I'll trace one request through all of it.

http
POST /api/login HTTP/1.1
Host: client-website.com
User-Agent: python-requests/2.28.0
Content-Type: application/json
X-Forwarded-For: 185.220.101.45
{"username":"admin' OR '1'='1' --","password":"anything"}

Four problems: Tor exit node IP, automation library User-Agent, no Accept header, SQL injection payload. It gets blocked at stage 4. But all six stages matter.

The Pipeline

func (waf *WAF) Handle(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ctx := &RequestContext{
            IP:    extractIP(r),
            Start: time.Now(),
        }

        // Stage 1: IP reputation — cheapest check, runs first
        ipScore := waf.reputation.Score(ctx.IP)
        ctx.Score += ipScore
        if ctx.Score >= 100 {
            waf.block(w, r, ctx, Decision{Code: 403, Reason: "blocklist"})
            return
        }

        // Stage 2: Rate limiting
        if allowed := waf.limiter.Allow(ctx.IP); !allowed {
            ctx.Score += 25
            ctx.RateLimited = true
        }

        // Stage 3: Header inspection
        headerScore, hardBlock := waf.inspectHeaders(r)
        ctx.Score += headerScore
        if hardBlock != "" {
            waf.block(w, r, ctx, Decision{Code: 400, Reason: hardBlock})
            return
        }

        // Stage 4: Rule engine — most expensive, runs last
        body, _ := io.ReadAll(r.Body)
        r.Body = io.NopCloser(bytes.NewReader(body))
        ctx.Matches = waf.rules.Evaluate(r, body)

        // Stage 5: Decision
        if d := waf.decide(ctx); d.Block {
            waf.block(w, r, ctx, d)
            return
        }

        next.ServeHTTP(w, r)
    })
}

Cheap checks first, expensive last. If IP reputation kills the request, the rule engine never runs. At 100K req/day that ordering shows up measurably in CPU.

Stage 1: IP Reputation [0ms → 0.2ms]

Three data structures, all in memory, all O(1):

Hard blocklist — permanently banned IPs. Hash map, no expiry.
Tor exit nodes — refreshed every 15 minutes via Onionoo. ~1,500 active exit IPs at any given time.
Threat score map — IPs we've observed before, with accumulated scores that decay over time.

type IPReputation struct {
    mu        sync.RWMutex
    blocklist  map[string]struct{}
    torExits   map[string]struct{}
    cidrs      []*net.IPNet // datacenter/hosting ranges
    scores     map[string]scoredIP
}

type scoredIP struct {
    score    int
    lastSeen time.Time
}

func (ipr *IPReputation) Score(ip string) int {
    ipr.mu.RLock()
    defer ipr.mu.RUnlock()

    if _, ok := ipr.blocklist[ip]; ok {
        return 100
    }
    if _, ok := ipr.torExits[ip]; ok {
        return 70
    }

    parsed := net.ParseIP(ip)
    for _, cidr := range ipr.cidrs {
        if cidr.Contains(parsed) {
            return 55 // datacenter origin — not necessarily malicious, but not a browser
        }
    }

    if entry, ok := ipr.scores[ip]; ok {
        days := time.Since(entry.lastSeen).Hours() / 24
        // Score halves every 24h
        return int(float64(entry.score) * math.Pow(0.5, days))
    }

    return 0
}

Score decays at a 24-hour half-life. IPs rotate. Cloud provider ranges get reassigned. Treating a month-old signal the same as a current one tanks precision — false positives climb until the system is more noise than signal.

We should have used MaxMind GeoLite from day one instead of maintaining a CIDR list manually. We added hosting ranges reactively — after seeing attacks rather than before — and missed several in the first few months. Proper ASN lookups would have caught those automatically. That gap cost us a few weeks of noisier detection early on.

The scores map has an unbounded growth problem. A background goroutine evicts entries with decayed scores below 5, running every hour. In production the map stabilized around 40-50k entries.

185.220.101.45 matches the Tor exit list. Score: 70. Continue.

Stage 2: Rate Limiter [0.2ms → 0.5ms]

Sliding window, not fixed. Fixed windows have a boundary exploit: 59 requests at :59, 59 more at :00 — 118 requests through a 60-request limit. The sliding window always covers the last N seconds. There's no boundary to game.

type SlidingWindow struct {
    mu      sync.Mutex
    entries map[string]*ipWindow
    limit   int
    window  time.Duration
}

type ipWindow struct {
    timestamps []int64 // nanoseconds — 8 bytes vs 24 for time.Time
}

func (sw *SlidingWindow) Allow(ip string) bool {
    sw.mu.Lock()
    defer sw.mu.Unlock()

    now := time.Now().UnixNano()
    cutoff := now - sw.window.Nanoseconds()

    w := sw.entries[ip]
    if w == nil {
        w = &ipWindow{}
        sw.entries[ip] = w
    }

    // Prune in-place — avoids allocating a new slice on every call
    n := 0
    for _, t := range w.timestamps {
        if t > cutoff {
            w.timestamps[n] = t
            n++
        }
    }
    w.timestamps = w.timestamps[:n]

    if len(w.timestamps) >= sw.limit {
        return false
    }

    w.timestamps = append(w.timestamps, now)
    return true
}

Threshold: 60 requests per 10-second window. This attacker sent 847.

Rate limit alone doesn't block. +25 to score, continue. A misconfigured load balancer looks identical to a rate violation — same IP, high request count. The system needs the full picture before making a hard call. Rate limit plus anything else usually crosses the block threshold.

Score: 70 + 25 = 95. Continue.

Stage 3: Header Inspection [0.5ms → 0.8ms]

Real browsers are consistent. They send Accept, Accept-Language, Accept-Encoding. Their User-Agent follows recognizable patterns. Automation libraries don't replicate this — not because attackers are careless, but because python-requests, httpx, go-http-client don't send browser headers by default, and most attackers don't bother faking them.

var automationSignatures = []string{
    "python-requests", "python-urllib", "go-http-client",
    "libwww-perl", "java/", "curl/", "wget/",
    "sqlmap", "nikto", "masscan", "zgrab", "scrapy",
    "aiohttp", "httpx", "mechanize",
}

func (waf *WAF) inspectHeaders(r *http.Request) (score int, hardBlock string) {
    ua := r.Header.Get("User-Agent")
    if ua == "" {
        return 40, ""
    }

    uaLow := strings.ToLower(ua)
    for _, sig := range automationSignatures {
        if strings.Contains(uaLow, sig) {
            score += 30
            break
        }
    }

    if r.Header.Get("Accept") == "" {
        score += 15
    }

    // POST from a browser almost always carries a Referer
    if r.Method == http.MethodPost && r.Header.Get("Referer") == "" {
        score += 10
    }

    // Header injection is a hard block regardless of score
    for _, values := range r.Header {
        for _, v := range values {
            if strings.ContainsAny(v, "\r\n") {
                return 0, "header injection"
            }
        }
    }

    return score, ""
}

Header injection is the only hard block at this stage. \r\n in a header value is never legitimate — it can split HTTP responses and poison downstream caches. Everything else is scored and accumulated.

We evaluated TLS fingerprinting (JA3) — comparing cipher suite and extension order from the TLS handshake, which browsers expose consistently and scripts don't. Decided against it. It requires TLS termination at the WAF layer or integration with nginx's ssl_fingerprint module, and it's brittle across library versions. The coupling cost wasn't worth it at our traffic volume. Worth revisiting at scale.

python-requests/2.28.0: +30. No Accept: +15. No Referer on POST: +10. Score: 95 + 55 = 100 (capped). Continue.

Stage 4: Rule Engine [0.8ms → 1.5ms]

Most expensive stage. Runs last.

Pre-compile at startup. regexp.MustCompile is not free. Calling it per request at 100K req/day is burning CPU for no reason. All patterns compile once on server start, stored as *regexp.Regexp struct fields, reused across every request.

Normalize before matching. Attackers don't send raw OR '1'='1'. They URL-encode it, double-encode it, or split it across fields. A rule engine that only looks at the raw payload misses most real attacks.

func normalize(input []byte) []byte {
    // First pass
    s, err := url.QueryUnescape(string(input))
    if err != nil {
        s = string(input)
    }
    // Second pass — catches double-encoding
    s2, err := url.QueryUnescape(s)
    if err != nil {
        s2 = s
    }
    return []byte(strings.ToLower(s2))
}

Then the rules:

type Rule struct {
    ID       string
    Pattern  *regexp.Regexp
    Severity int    // 1–4; severity 4 = block unconditionally regardless of score
    Target   Target // Body, URL, or both
}

// Compiled at init() — never at request time
var coreRules = []*Rule{
    {
        ID:       "SQLI-001",
        Pattern:  regexp.MustCompile(`\bor\b\s+['"]?\w+['"]?\s*=\s*['"]?\w+['"]?`),
        Severity: 4,
        Target:   TargetBody,
    },
    {
        ID:       "SQLI-002",
        Pattern:  regexp.MustCompile(`(--|#|/\*)`),
        Severity: 3,
        Target:   TargetBody,
    },
    {
        ID:       "SQLI-003",
        Pattern:  regexp.MustCompile(`\bunion\b.{0,30}\bselect\b`),
        Severity: 4,
        Target:   TargetBody | TargetURL,
    },
    {
        ID:       "XSS-001",
        Pattern:  regexp.MustCompile(`<script[\s/>]|javascript\s*:`),
        Severity: 4,
        Target:   TargetBody | TargetURL,
    },
    {
        ID:       "PATH-001",
        Pattern:  regexp.MustCompile(`(\.\.[\\/]){2,}`),
        Severity: 3,
        Target:   TargetURL,
    },
    {
        ID:       "CMD-001",
        Pattern:  regexp.MustCompile(`[;|&]\s*(cat|ls|whoami|id|wget|curl)\b`),
        Severity: 4,
        Target:   TargetBody | TargetURL,
    },
}

func (e *RuleEngine) Evaluate(r *http.Request, body []byte) []*Match {
    normBody := normalize(body)
    normURL := normalize([]byte(r.URL.RawQuery + r.URL.Path))

    var matches []*Match
    for _, rule := range e.rules {
        var target []byte
        if rule.Target&TargetBody != 0 {
            target = normBody
        } else {
            target = normURL
        }
        if loc := rule.Pattern.Find(target); loc != nil {
            matches = append(matches, &Match{Rule: rule, At: loc})
        }
    }
    return matches
}

After normalization, the body reads as: {"username":"admin' or '1'='1' --","password":"anything"}.

SQLI-001 fires on or '1'='1'. SQLI-002 fires on --. Two matches. SQLI-001 is severity 4. Score is irrelevant — block unconditionally.

Stage 5: Decision [1.5ms → 1.8ms]

Thin layer. Accumulated context in, decision out. Complexity here is where subtle edge cases live and where probing exploits get found.

func (waf *WAF) decide(ctx *RequestContext) Decision {
    // Severity-4 match: score doesn't matter
    for _, m := range ctx.Matches {
        if m.Rule.Severity == 4 {
            return Decision{Block: true, Code: 403, Reason: m.Rule.ID}
        }
    }

    // High score + any rule match: block
    if ctx.Score >= 80 && len(ctx.Matches) > 0 {
        return Decision{Block: true, Code: 403, Reason: "score+rules"}
    }

    // Rate limited, no rule match: 429, not 403
    if ctx.RateLimited && len(ctx.Matches) == 0 {
        return Decision{Block: true, Code: 429, Reason: "rate-limit"}
    }

    return Decision{Block: false}
}

The 403 vs 429 distinction is operational. Repeated 429s from the same IP often turn out to be misconfigured clients or internal tooling; 403s with rule matches are almost always actual attacks. The alerting pipeline treats them differently, which matters at 2am when you're deciding whether to page someone.

Verdict: Block, 403, SQLI-001.

Stage 6: Logging [1.8ms → 2ms]

Response goes out before logging. Logging is I/O. I/O is slow. Those two facts mean the log write cannot touch the response path.

type WAF struct {
    logCh chan IncidentLog // buffered
}

func NewWAF(cfg Config) *WAF {
    w := &WAF{
        logCh: make(chan IncidentLog, 4096),
    }
    go w.logWorker()
    return w
}

func (waf *WAF) logWorker() {
    for entry := range waf.logCh {
        waf.sink.Write(entry) // JSON to disk + forward to alert pipeline
    }
}

func (waf *WAF) block(w http.ResponseWriter, r *http.Request, ctx *RequestContext, d Decision) {
    // Response first
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(d.Code)
    w.Write([]byte(`{"error":"Forbidden"}`))

    // Log asynchronously — non-blocking send
    select {
    case waf.logCh <- IncidentLog{
        Timestamp:   time.Now().UTC(),
        IP:          ctx.IP,
        Method:      r.Method,
        Path:        r.URL.Path,
        Score:       ctx.Score,
        Matches:     ctx.Matches,
        RateLimited: ctx.RateLimited,
        Decision:    d,
        LatencyMs:   float64(time.Since(ctx.Start).Microseconds()) / 1000,
    }:
    default:
        // Channel full — drop the entry, track the drop count separately
        waf.metrics.LogDropped.Inc()
    }

    go waf.reputation.Increment(ctx.IP, 20)
}

The select with default is intentional. If the log channel fills — writer goroutine falling behind, usually disk I/O saturation during a large attack — drop the log entry rather than stall HTTP responses. Track the drop counter as a separate metric and alert on it. In 8 months of production this happened once, during a coordinated multi-client attack that was also saturating the disk writer. Logging should never affect response latency, even under that load.

The attacker gets:

HTTP/1.1 403 Forbidden
Content-Type: application/json
{"error":"Forbidden"}

No indication of which rule fired. Nothing actionable. The less information a 403 carries, the harder the system is to probe.

Production Numbers

At peak (~180 req/s across all clients), the WAF added a median 0.8ms latency to allowed requests. p99: 3.2ms. Blocked requests averaged 1.9ms — they exit earlier in the pipeline. Memory at steady state: ~90MB for the reputation map, rate limiter state, and rule engine combined.

Over 8 months: 25% reduction in breach incidents across client websites, 35% faster detection from attack onset to alert. The detection improvement came almost entirely from centralized structured logging — correlating patterns across 50+ clients simultaneously instead of treating each site's logs as a separate silo.

Two things I'd rebuild differently. First: MaxMind GeoLite for ASN-level blocking from the start. Maintaining a CIDR list manually is reactive by nature and you're always a step behind. Second: weight rule matches by position in the payload. A pattern found deep inside a multi-part encoded body is more likely to be deliberate evasion than one sitting in a raw field — that distinction should influence severity scoring, and currently it doesn't.

Want more deep-dive backend stories?
I regularly write about:

Go internals and performance
backend system design
building open-source tools
real-world optimization stories
Check out my personal site: https://bhavyyadav25.github.io

You can also find me on:

GitHub: https://github.com/Bhavyyadav25
LinkedIn: https://linkedin.com/in/yadavbhavy

Backend engineer. Go, distributed systems, security infrastructure.