Juan Torchia

Posted on May 6 • Originally published at juanchi.dev

Async Rust Never Left MVP: I Validated It Against My Real Codebase and Found Exactly the Edge Cases That HN Post Predicted

#english #performance #backend #produccion

Async Rust Never Left MVP: I Validated It Against My Real Codebase and Found Exactly the Edge Cases That HN Post Predicted

60% of projects that adopt Async Rust in production report having rewritten significant parts of their async layer within the first year. Yeah, you read that right. And that doesn't mean Async Rust is useless — it means the ecosystem promised stability before it had it, and the industry bought that promise without reading the fine print.

When I saw the HN post with 434 points arguing that Async Rust is still a glorified MVP, my immediate reaction was defensive. I'd just finished documenting Bun's jump from Zig to Rust and had built up some real enthusiasm for the language. But the post named four concrete problems: executor leaks, cancellation safety, incomprehensible error messages, and Pin hell. These weren't complaints from someone who played with it for two hours. They were scars.

So I did the only thing that makes sense when something makes you uncomfortable: I replicated it against my own code.

Async Rust in Production: What the Consensus Says and Why It Bugs Me

The consensus says Async Rust is the future of high-performance systems programming. Zero-cost abstractions, memory safety without a GC, throughput that competes with C. All of that is true. The problem is the but that comes after, which the consensus tends to whisper.

My thesis, before we get into the code: the problem isn't Async Rust as a concept. The problem is that the ecosystem promised stability in 2019 and in 2025 there are still fundamental rough edges unresolved at the language level. That has real consequences when you build something on top of that promise.

This isn't an ad hominem attack on the Rust team — it's recognizing that the marketing ran faster than the spec. And when that happens in infrastructure, you pay for it in production, not in a benchmark.

The Four Edge Cases from the Viral Post: I Replicated Them One by One

1. Executor Leaks: The One That Hurt the Most

The post argues that executor leaks are silent and hard to track down. I went straight to the part of my codebase where I use Tokio to handle concurrent connections and added explicit instrumentation.

// Measuring pending tasks in the executor — leak diagnostics
use tokio::runtime::Handle;

async fn monitor_executor() {
    // Tokio doesn't expose task metrics by default
    // You have to enable runtime metrics at build time
    let metrics = Handle::current().metrics();

    println!(
        "Active tasks: {}, Pending tasks: {}",
        metrics.num_alive_tasks(),
        metrics.remote_queue_depth()
    );
}

// The real problem: if you drop a JoinHandle without awaiting it,
// the task keeps running. No warning. No error.
// The leak is completely silent.
async fn the_silent_leak() {
    let _handle = tokio::spawn(async {
        // This task lives forever if nobody cancels it
        loop {
            tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
        }
    });
    // _handle is dropped here. The task IS STILL RUNNING.
    // Tokio doesn't tell you. No log. Nothing.
}

I reproduced it in under ten minutes. The handle goes to drop, the task stays alive, and metrics().num_alive_tasks() climbs without any alerting system catching it by default. In my Railway logs, that translates to memory creep that took me two weeks to trace back to the right cause. I thought it was a Railway problem. It was mine.

2. Cancellation Safety: The Problem the Compiler Can't See

This is the one that hit me the hardest emotionally, if I can put it that way. Rust's compiler protects you from data races, use-after-free, everything it promised. But it doesn't protect you from cancellation unsafety in async code. It's a hole in the guarantee.

use tokio::select;
use tokio::sync::Mutex;
use std::sync::Arc;

// Example of a NOT cancellation-safe operation
// The HN post explicitly names this pattern
async fn update_balance(
    db: Arc<Mutex<Vec<i64>>>,
    amount: i64,
) {
    let mut data = db.lock().await; // <-- cancellation point
    // If the task is cancelled HERE, after the lock but before
    // the write, you leave the mutex poisoned or the state inconsistent.
    // The compiler doesn't warn you. That's your problem.
    data.push(amount);
    // Second operation: if there's cancellation between the two,
    // the business invariant breaks silently
    data.push(-amount); // compensation that never arrives
}

async fn usage_with_timeout() {
    let db = Arc::new(Mutex::new(vec![]));

    select! {
        // If the timeout wins, update_balance gets cancelled
        // at any suspension point. No guarantees.
        _ = update_balance(db.clone(), 100) => {},
        _ = tokio::time::sleep(tokio::time::Duration::from_millis(1)) => {
            println!("Timeout — db state: unknown");
        }
    }
}

The HN post calls this a fundamental design flaw, not a fixable bug. After replicating it, I agree. tokio::select! is powerful, but the cancellation semantics aren't specified at the language level — they're delegated to each library to document whether its functions are "cancellation safe." In practice that means you have to read the documentation for every .await you use. In a real project with 40+ async dependencies, that doesn't scale.

3. Error Messages: The Compiler That Lies by Omission

The fairest thing I can say about this part of the post: Async Rust error messages aren't bad because of lack of effort. They're bad because the mental model they expose doesn't match what the developer is thinking about. It's a semantics problem, not a team effort problem.

// This code produces an error that takes 15 minutes to understand
// the first time you see it
use std::future::Future;

fn i_need_a_future<F: Future<Output = ()>>(f: F) {
    // Intentionally incomplete to show the error
}

// Real error I got in my codebase:
// error[E0277]: `*mut ()` cannot be sent between threads safely
// within `impl Future<Output = ()>`, the trait `Send` is not implemented
// for `*mut ()`
// note: future is not `Send` as this value is used across an await
// ...and then 40 more lines of context that don't help
async fn my_function_with_raw_ptr() {
    let ptr: *mut () = std::ptr::null_mut();
    tokio::time::sleep(tokio::time::Duration::from_millis(1)).await;
    // ptr is used after the await — Send not guaranteed
    let _ = ptr;
}

The error I got when I did something similar in production had 47 lines. The actual cause was on line 34 of the output. That's not an exaggeration — I counted.

4. Pin Hell: The Abstraction That Leaked

Pin<Box<dyn Future>> is where Async Rust shows its seams to anyone coming from a GC language. The HN post argues that Pin is a solution to a problem that shouldn't exist at the public API level. After replicating it, I think it's right on the diagnosis but underestimates why it was necessary.

use std::pin::Pin;
use std::future::Future;

// This is what you end up writing when you want to
// store heterogeneous futures — something trivial in Go
type BoxedFuture = Pin<Box<dyn Future<Output = Result<String, Box<dyn std::error::Error>>> + Send>>;

struct AsyncProcessor {
    // You can't do Vec<impl Future<...>> — you have to box
    tasks: Vec<BoxedFuture>,
}

impl AsyncProcessor {
    fn add_task<F>(&mut self, fut: F)
    where
        F: Future<Output = Result<String, Box<dyn std::error::Error>>> + Send + 'static,
    {
        // The Box + Pin is the price of the zero-cost abstraction
        // that in this case has a very visible cost
        self.tasks.push(Box::pin(fut));
    }
}

The first time I wrote something like this in my codebase I stopped for ten minutes asking myself if I was doing something fundamentally wrong. I wasn't. It's the correct pattern. That's the uncomfortable part.

The Mistakes I Made (That the HN Post Doesn't Mention)

The viral post is fair in what it criticizes but leaves out something important: many of these edge cases are generated by you, not the language. That doesn't absolve the ecosystem, but it changes the diagnosis.

In my case, the two most expensive mistakes were:

Mistake 1: Using Tokio like it was Node.js. I came from the JavaScript world where the event loop is an implementation detail. In Tokio, the executor model matters and you have to think about it from the design phase. When I treated it as a black box, the leaks I mentioned started showing up.

Mistake 2: Trusting that "if it compiles, it works" applies to async code. In synchronous Rust, that heuristic takes you far. In Async Rust, the compiler verifies fewer invariants. Cancellation safety, task leaks, and certain operation orderings fall outside what the borrow checker can see. It's an expansion of the implicit contract that nobody warns you you just signed.

This reminds me of what I documented when an agent deleted my production database: the tool didn't fail, I assumed guarantees the tool never offered.

FAQ: Async Rust Production Problems

Does Async Rust have more bugs than Async Go or Async Python?

Not necessarily more bugs — but the bugs are harder to diagnose. Go has a simpler concurrency model (goroutines + channels) that isolates errors better. Python asyncio has its own problems, but the errors tend to be more readable. Rust gives you more control and more rope to hang yourself with.

Is Async Rust worth using in production today, in 2025?

Yes, with conditions. If you have a team that understands the executor model, that documents the cancellation safety of their functions, and that isn't going to iterate fast on the async layer, it's worth it. If you're prototyping or have a team with mixed Rust experience, the onboarding cost is real and you will pay it.

What's the practical alternative if Async Rust has these problems?

Depends on the case. For high-performance networking: async Rust is still hard to beat on raw throughput. For applications where concurrency isn't the bottleneck: Go is more honest about its trade-offs. For fast scripting with I/O: Python asyncio with httpx gets the job done without the cognitive overhead.

Does the 434-point HN post exaggerate?

On the diagnosis, no. On the prescription, yes. Saying Async Rust "isn't ready" is an oversimplification — it's ready for specific use cases with prepared teams. Saying it's a glorified MVP captures the feeling of someone who hits these edge cases, but doesn't reflect that there's real, stable production code built on top of it.

How does it compare to the hidden complexity I found when training an LLM from scratch?

Surprisingly similar in pattern: in both cases, the tutorial or announcement promises something that works, and the real complexity appears when you leave the happy path. With the LLM it was hidden infrastructure costs. With Async Rust, it's the guarantees the compiler doesn't give and nobody documents clearly.

Will Pin improve in future versions of Rust?

The Pin<T> ergonomics proposal has been in discussion on the RFC tracker for years. There's real progress — the pin! macro improved ergonomics in some cases. But the underlying problem (that memory movement and self-referential structs are conceptually hard) doesn't disappear with syntax sugar. The Rust team knows it and is working on it, but there's no concrete date for a complete solution.

My Take: What I Accept, What I Don't Buy, and What I'd Do Differently

I accept that Async Rust has the problems the post describes. I replicated them, measured them, and suffered through them in production before I understood what they were.

I don't buy the narrative that it's "broken." It's incomplete in its ergonomics. It's different, and that difference carries a real cost that the ecosystem underestimated in its communication.

What I'd do differently: I would never adopt Async Rust without first explicitly documenting which parts of my system depend on cancellation safety, and without adding Tokio runtime metrics from day one. That's not a workaround — it's production hygiene that the official onboarding doesn't emphasize enough.

I'd also be more upfront with my team from the start. When I analyzed the tar problems between macOS and Linux in my Railway pipeline, the lesson was the same: the tool does what the docs say. The problem is what the docs assume you already know.

The HN post is right about something nobody in the Rust ecosystem wants to say out loud: you promised production-ready when you were still-figuring-it-out, and that trust cost doesn't recover just by shipping new features. The same thing happened with Chrome installing AI models without permission — the problem isn't the technology, it's the promise wrapped around it.

Async Rust is going to be fine. The ecosystem will mature. But in 2025, if you're starting a new project and someone sells you async Rust as "already solved," ask them to show you their cancellation handling code. That's where you'll see what state it's really in.

Source: Hacker News

This article was originally published on juanchi.dev