I Tried 4 Async Runtimes on a Raspberry Pi — Only One Didn't Make Me Want to Throw It Out the Window

#rust #embedded #programming

Last month I spent three weeks doing something that sounds simple: making an HTTP client work reliably on a Raspberry Pi 4 running a custom Rust service. The service needed to periodically sync sensor data to a cloud endpoint while also handling local database writes. Nothing fancy — maybe 200 lines of logic.

It took me 2,847 lines of code, 4 different async runtimes, and one very close relationship with my debugger to get it working.

Here's what actually happened.

Attempt 1: Tokio — The Standard Choice

Everyone says "just use Tokio." So I did.

[dependencies]
tokio = { version = "1", features = ["full"] }

On my MacBook, it compiled in 12 seconds and ran perfectly. On the Raspberry Pi? Cross-compilation worked, but the binary was 8.3 MB. For a service that was supposed to be lean and embeddable, that felt wrong.

But the real problem was memory. Under load (simulating 50 concurrent sensor readings + database writes), the RSS crept up to 45 MB. On a Pi with 4 GB of RAM running other services, that's not catastrophic, but it's not great either.

The worst part: I needed a specific timer implementation that played nice with the Pi's real-time clock, and Tokio's time module had a subtle drift that accumulated over 24 hours. We're talking milliseconds becoming seconds. When you're timestamping sensor events, that matters.

Verdict: Works, but it's like using a sledgehammer to hang a picture frame.

Attempt 2: async-std — The Alternative

[dependencies]
async-std = { version = "1", features = ["attributes"] }

async-std felt more ergonomic. The API is closer to what you'd expect from Rust's standard library. File I/O felt more natural. The binary was slightly smaller (7.1 MB).

But then I hit the wall: async-std's networking stack had a bug with DNS resolution on ARM64 that caused a hang every ~6 hours. I found an open issue from 18 months ago with 47 upvotes and no resolution.

I tried patching it myself. That's when I realized I'd rather rewrite the whole thing than debug someone else's async DNS resolver.

Verdict: Promising, but production-unsafe on ARM for anything long-running.

Attempt 3: smol — The Minimalist

smol is beautiful in its simplicity. Small binary (4.2 MB), low memory footprint (22 MB RSS under the same load), and the async-io crate underneath is surprisingly robust.

The problem? Dependency hell. smol uses blocking for sync-to-async bridging, and our database library (SQLite, via rusqlite) kept deadlocking in subtle ways when called from multiple async tasks. The blocking crate's thread pool would exhaust, and then... silence. No error, no panic. Just a service that stopped responding.

I spent two days adding timeout wrappers around every database call before I gave up.

Verdict: Perfect if you control every dependency. We didn't.

Attempt 4: embassy — The Embedded Champion

This is where things got interesting. Embassy isn't really an async runtime in the traditional sense — it's an async framework designed for no_std embedded systems.

[dependencies]
embassy-executor = "0.6"
embassy-time = "0.3"
embassy-net = "0.4"

Wait, can you even run Embassy on a Raspberry Pi? Technically, Embassy targets microcontrollers (STM32, nRF, ESP32). But the networking and I/O abstractions work on Linux too, thanks to embassy-net's socket backend.

The binary was 2.8 MB. Memory usage stayed flat at 15 MB under load. The timer was rock-solid (it uses the hardware timer abstraction, and on Linux it maps to the appropriate clock source).

There was one catch: the learning curve. Embassy's model is fundamentally different. You don't spawn tasks like Tokio — you use Spawner and embassy_executor::main. The networking API expects you to think in terms of TcpSocket objects rather than streams. It took me a full day to restructure the code.

But once it compiled? It just worked. No memory leaks, no timer drift, no DNS hangs, no thread pool deadlocks. 72 hours of continuous testing without a single hiccup.

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let net = embassy_net::Stack::new(
        &mut net_config,
        &mut rng,
        &mut interface,
    );

    spawner.spawn(sensor_task(net.clone())).ok();
    spawner.spawn(sync_task(net.clone(), db)).ok();
}

The Hard Lesson

Here's what I wish someone had told me before I started:

Binary size matters on edge devices. 8 MB vs 2.8 MB isn't just a number — it's the difference between fitting in a constrained update partition and failing deployment.
Timer accuracy is a silent killer. Most people don't notice until they're correlating events across devices and the timestamps don't line up.
"Standard" runtimes aren't optimized for your hardware. Tokio is amazing for servers. It's not optimized for a $35 ARM board with eMMC storage.
The ecosystem lock-in is real. Your choice of async runtime determines which libraries you can use, how you handle errors, and what your deployment looks like.

What I'm Using Now

For our robotics work at moteDB, we ended up with a hybrid: Embassy for the embedded layer (sensor I/O, real-time control), and a minimal synchronous Rust core for database operations. We intentionally avoided async in the database layer — synchronous code with a dedicated thread is simpler, more debuggable, and has predictable performance characteristics.

Sometimes the best async architecture includes knowing when NOT to be async.

Has anyone else run into the async runtime choice problem on constrained hardware? I'm curious if there are other options I missed — especially anything that bridges the gap between Tokio's ecosystem and Embassy's efficiency.