I Spent 3 Months Tuning a Tokio Runtime for My Robot — Here's What No Tutorial Tells You

#rust #embedded #ai #programming

Last November, my robot arm started dropping sensor frames at exactly 47ms intervals. Not randomly — exactly 47ms, like clockwork. It would read joint angles perfectly for a while, then miss a window, then recover. The anomaly detector we'd wired into the control loop kept triggering false positives. My teammate Rui and I spent two full weeks convinced the CAN bus driver was broken.

It wasn't the driver.

It was #[tokio::main].

The Setup

We were building an AI-driven robot arm that does pick-and-place with semantic understanding. The control loop needs:

1ms cycle time for joint position updates
10ms window to fuse sensor data before inference
Background persistence — log everything to a local database so we can replay sessions offline

The stack was reasonable: Rust, Tokio for async, a custom message bus, moteDB for embedded storage (vectors + time-series + structured state in one engine). We used #[tokio::main] because that's what every tutorial shows, with a thread pool spawned for the heavy inference work.

It worked great on my laptop. It fell apart on the robot.

What #[tokio::main] Actually Does (And Doesn't Do)

Here is the thing nobody explains in the "Getting Started with Tokio" guides: #[tokio::main] spins up a multi-threaded runtime with a number of worker threads equal to the number of logical CPU cores. On a modern dev machine that's 8-16. On a Raspberry Pi 5? 4 cores — and two of them are already pressured by the camera pipeline and the neural inference engine.

The bigger problem: Tokio's work-stealing scheduler doesn't know anything about real-time priorities. It will cheerfully preempt your 1ms control loop task to service a database flush, a log write, or a DNS resolution that some library decided to make async under the hood.

That 47ms drop? The Tokio scheduler was occasionally parking our sensor polling task while flushing a batch write to moteDB. The flush was async, perfectly polite, and completely invisible in any standard profiling tool because it showed up as I/O wait rather than CPU time.

The Fix: Surgical Runtime Configuration

Instead of #[tokio::main], we switched to:

fn main() {
    // Dedicated 1-thread runtime for the control loop
    let control_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(1)
        .thread_name("control-loop")
        .thread_priority(ThreadPriority::Max)  // with tokio-runtime-extensions
        .build()
        .unwrap();

    // Separate runtime for background I/O (storage, logging, telemetry)
        let io_rt = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(2)
        .thread_name("background-io")
        .build()
        .unwrap();

    // Spawn control loop on dedicated runtime
    control_rt.spawn(async move {
        run_control_loop().await;
    });

    // Spawn storage + inference on IO runtime
    io_rt.block_on(async move {
        run_support_tasks().await;
    });
}

Two runtimes. The control loop never shares a thread pool with storage I/O or inference scheduling. After this change, our 47ms drops disappeared entirely.

Three Things I Wish Someone Had Told Me

1. spawn_blocking is not free

Every call to spawn_blocking steals a thread from a shared blocking thread pool (default: 512 threads). If you're calling it in a tight loop for sensor serialization, you will exhaust the pool under load. We switched to dedicated std::thread::spawn for our serialization hot path and kept spawn_blocking only for true one-offs.

2. Async mutex is slower than you think at high frequency

tokio::sync::Mutex parks the task and hands control to the scheduler when it contends. At 1ms cycle time, this is catastrophic. For shared state between the control loop and the storage layer, we used std::sync::Mutex — a blocking primitive — because the lock hold time was microseconds and the task switch overhead of the async version was larger.

3. The database write path must not block the runtime

This is where moteDB's design helped us: writes are append-only to a WAL first (sub-microsecond), with the actual B-tree / vector index update deferred to the background runtime. If your embedded database does synchronous index updates on every write, you will feel it in your control loop latency. The write path and the read path need different scheduling contracts.

The Bigger Pattern

Embedded AI systems are not web servers. On a web server, a 50ms hiccup on one request is invisible to other requests. On a robot, a 50ms hiccup in your control loop is a dropped object, a wrong turn, or a crash.

The #[tokio::main] default was designed for web services where fairness across tasks is the right trade-off. For real-time embedded work, you need:

Isolation: critical tasks on dedicated runtimes
Priority: OS-level thread priorities for the control loop
Non-blocking storage: a database whose write path does not block the scheduler

We ended up with a three-layer architecture: hard real-time control loop, soft real-time sensor fusion + inference, and best-effort persistence and telemetry. Each layer has its own Tokio runtime, and they communicate via lock-free channels (tokio::sync::mpsc with bounded capacity).

The 47ms drops are gone. We have been running stable for three months.

TL;DR

#[tokio::main] is fine for most things. For embedded real-time, it is a footgun.
Use separate tokio::runtime::Builder instances to isolate critical paths.
std::sync::Mutex beats tokio::sync::Mutex when lock hold time is microseconds.
Make sure your storage layer (whatever it is) has non-blocking write semantics.

Has anyone else hit scheduler interference issues in embedded Rust? I'm curious whether the community has converged on better patterns here — or whether this is still a "figure it out yourself" problem.