We Rewrote Our Webhook Platform from Go to Rust — Here's What Happened

#webdev #backend #go #rust

Background

Six months ago, our webhook delivery platform was running on Go. It worked fine for the first few months — we were processing around 50K webhooks per day, and Go's goroutines made concurrent delivery feel almost too easy.

But then we started hitting walls.

The first real problem showed up at around 200K daily deliveries. Our Go service was eating 2.8GB of memory just sitting idle, and during traffic spikes it would jump to 6GB+. We tried profiling, we tried optimizing, but every fix felt like a band-aid.

The second problem was trickier: race conditions. Go makes concurrency easy to start but hard to get right. We had a bug where webhook deliveries were being marked as "success" before the response body was fully read. Took us two weeks to find it. The worst part? It only manifested under load, and Go's race detector didn't catch it because the shared state was hidden behind an interface.

That's when we decided to try Rust.

Why Rust Specifically?

Three reasons:

1. Memory safety without garbage collection.

Go's GC is fine for most things, but when you're processing thousands of HTTP requests per second with large payloads, GC pauses become noticeable. Our p99 latency was 340ms on Go. We suspected a chunk of that was GC-related.

Rust's ownership model means no GC, no pauses, and predictable memory usage. The compiler catches the bugs that the Go race detector misses.

2. Performance headroom.

Webhook delivery is fundamentally I/O bound, but we do a lot of work around payload validation, HMAC signature computation, and JSON transformation. Rust's zero-cost abstractions mean we can write clean code that compiles down to tight machine code.

3. The type system.

This one surprised me. Rust's type system isn't just about safety — it's a design tool. Sum types, pattern matching, and the Result/Option types force you to handle every error path. In Go, we had if err != nil everywhere but still missed edge cases. In Rust, the compiler literally won't let you forget.

The Rewrite: What We Changed

We didn't rewrite everything at once. Here's what we did:

Phase 1: Core delivery engine (2 weeks)

The webhook delivery logic — HTTP client, retry logic, signature verification — went into Rust first. This is where performance matters most.

async fn deliver_webhook(
    client: &HttpClient,
    endpoint: &Endpoint,
    payload: &[u8],
    secret: &str,
) -> Result<DeliveryResult, DeliveryError> {
    let signature = compute_hmac_signature(payload, secret);
    let response = client
        .post(&endpoint.url)
        .header("X-HookSniff-Signature", &signature)
        .header("Content-Type", "application/json")
        .body(payload)
        .send()
        .await?;

    match response.status() {
        StatusCode::OK | StatusCode::CREATED | StatusCode::ACCEPTED => {
            Ok(DeliveryResult::Success)
        }
        StatusCode::TOO_MANY_REQUESTS | StatusCode::INTERNAL_SERVER_ERROR
        | StatusCode::BAD_GATEWAY | StatusCode::SERVICE_UNAVAILABLE
        | StatusCode::GATEWAY_TIMEOUT => {
            Ok(DeliveryResult::Retryable(response.status()))
        }
        status => Ok(DeliveryResult::Failed(status)),
    }
}

Notice how the return type makes it impossible to forget handling the error case. The compiler enforces this.

Phase 2: API layer (1 week)

We kept the REST API in Rust too, using Axum. The routing is clean, middleware is composable, and it plays well with tokio's async runtime.

Phase 3: Migration (1 week)

We ran Go and Rust side by side for a week, comparing outputs. Any discrepancy was logged and fixed. Then we switched traffic gradually — 10%, 25%, 50%, 100%.

The Results

Here's what changed after the rewrite:

Metric	Go	Rust	Change
Memory (idle)	2.8 GB	380 MB	-86%
Memory (peak)	6.2 GB	1.1 GB	-82%
p50 latency	45ms	12ms	-73%
p99 latency	340ms	38ms	-89%
CPU usage	72%	31%	-57%
Deliveries/sec	2,400	8,200	+242%

The memory numbers are what hit me hardest. We went from needing a 8GB instance to running comfortably on 2GB. That's not just a performance win — it changes your deployment economics completely.

The Hard Parts

Rust isn't all sunshine. Here's what sucked:

The borrow checker is brutal at first. I've been writing code for 12 years and the first two weeks with Rust felt like being a beginner again. You fight the compiler constantly. But around week three, something clicked. I started thinking in ownership, and the fights became rare.

Async Rust has a learning curve. Tokio is powerful but the ecosystem is fragmented. async/.await is clean, but lifetimes in async contexts can be confusing. We had a few deadlock-like situations early on that took days to debug.

Compile times. Our release build takes 4 minutes. Incremental debug builds are ~15 seconds, which is fine, but full rebuilds during CI are painful. We mitigated this with cargo check in pre-commit hooks and sccache in CI.

The ecosystem is younger. Go has a library for everything. Rust is catching up fast, but we had to write a few things ourselves — like our webhook retry scheduler with exponential backoff and jitter. In Go, there are 10 crates for that. In Rust, we wrote our own in ~200 lines.

Would I Do It Again?

Absolutely. The performance gains alone justified the rewrite, but the real win is confidence. When our Rust code compiles, I trust it in a way I never trusted our Go code. The type system catches entire categories of bugs that Go's runtime checks miss.

If you're building something I/O heavy with strict reliability requirements — webhook delivery, message queues, API gateways — Rust is worth the learning curve. Just budget extra time for the first month. You'll be fighting the compiler a lot. But once you get past that hump, it's a different way of thinking about code.

Our platform is open source if you want to poke around: HookSniff on GitHub

What's your experience with Rust vs Go for backend services? I'd love to hear how others handled the migration.