I built a daemon in Rust that supervises multiple AI coding agents. My first instinct was tokio. Two weeks later, I ripped it out and replaced it with a synchronous loop and std::thread::sleep().
The daemon has been running in production ever since. Here's why.
What the daemon does
Batty supervises AI coding agents running in tmux panes. Every 5 seconds, it:
- Polls each pane to detect agent state (idle, working, dead)
- Delivers queued messages to agents via tmux paste-buffer
- Dispatches tasks from a Markdown kanban board
- Runs test suites on completed work
- Serializes merges across git worktrees
- Nudges idle agents
Each of these is a shell command (tmux capture-pane, cargo test, git merge) or a file operation. None of them benefit from concurrent I/O. The daemon spends 99.9% of its time sleeping.
The async version
My first implementation used tokio:
#[tokio::main]
async fn main() {
loop {
tokio::join!(
poll_agents(),
deliver_messages(),
check_tests(),
);
tokio::time::sleep(Duration::from_secs(5)).await;
}
}
This looked clean until the state management arrived. The daemon tracks per-agent lifecycle state, active tasks, delivery retries, nudge schedules, and failure patterns. All mutable, all interdependent.
In async, shared mutable state means Arc<Mutex<_>> everywhere:
let state = Arc::new(Mutex::new(DaemonState::new()));
let s1 = state.clone();
tokio::spawn(async move {
let mut guard = s1.lock().await;
guard.poll_agents(); // borrows mut state
// guard dropped here
});
let s2 = state.clone();
tokio::spawn(async move {
let mut guard = s2.lock().await;
guard.deliver_messages(); // also borrows mut state
});
Every step needs the lock. The steps are interdependent — delivering a message might change agent state, which affects whether we should dispatch a new task. The "concurrent" async tasks were actually serialized behind a single mutex. I was paying the complexity cost of async with none of the concurrency benefit.
The synchronous version
fn main() {
let mut state = DaemonState::new();
loop {
state.poll_agents();
state.deliver_messages();
state.check_tests();
state.dispatch_tasks();
state.nudge_idle_agents();
thread::sleep(Duration::from_secs(5));
}
}
No Arc. No Mutex. No .await. No runtime. State is a single struct with methods called in order. The borrow checker is happy because there's exactly one owner.
What I gained
Readable stack traces. When something panics in async Rust, the backtrace is a maze of tokio internals. In synchronous code, the backtrace shows the exact call chain.
Predictable execution order. Steps run in the order they appear in the code. If dispatch happens before nudging, it's because line 7 comes before line 8. No task scheduling surprises.
Simpler testing. Unit tests call methods on a struct. No need for #[tokio::test], no test runtime, no async setup.
Fewer dependencies. Removing tokio removed a significant chunk of the dependency tree. The binary got smaller. Compile times got faster.
Easier reasoning about state. When deliver_messages() modifies state, check_tests() immediately sees the new state on the next line. No race conditions, no stale reads, no lock ordering bugs.
What I gave up
Sub-second responsiveness. The daemon only reacts every 5 seconds. If an agent finishes a task at second 1, it waits up to 4 seconds before the daemon notices.
For this use case, that's fine. Agents take minutes per task. A 5-second detection delay is noise. If I were building a web server handling thousands of requests per second, async would be the obvious choice.
Concurrent I/O. If the daemon needed to poll 50 agents simultaneously over a network, synchronous iteration would be too slow. At 3-5 local tmux panes, each poll takes 1-5ms. Total poll time: 15-25ms. Async would save microseconds.
When async is the right call
Async Rust shines when you have:
- Many concurrent connections (web servers, proxies, chat systems)
- I/O-bound work where you're waiting on network responses
- High throughput requirements where sleeping for 5 seconds is unacceptable
Async is not the right call when you have:
- A poll loop with a multi-second interval — you're mostly sleeping
- Heavily interdependent mutable state — the mutex serializes everything anyway
-
Shell command execution as the primary I/O —
Command::new()is inherently blocking
The uncomfortable question
Before reaching for #[tokio::main], ask: what concurrent I/O am I actually doing?
If the answer is "none, I'm polling state and sleeping," a synchronous loop is simpler, easier to debug, and equally correct. The Rust ecosystem has a strong async culture, and tokio is excellent software. But not every daemon needs it.
My daemon runs a loop with sleep(5). It's ~200 lines. It's been stable for months. And I don't miss the Arc<Mutex<_>>.
The code: github.com/battysh/batty | Architecture deep dive: Dev.to article
Top comments (0)