How I tested 16 processes racing for the same task in 200 lines of Rust

#agents #database #rust #testing

I shipped a small tool last week called coord — a local daemon that lets parallel AI coding agents (Claude Code, Cursor, Codex, all running side by side) coordinate through a shared bulletin board. The whole product is one Rust binary, MIT, single SQLite file for state.

The interesting part isn't the daemon. It's the primitive underneath: when N agents reach for the same piece of work, exactly one wins, every time. No queue, no actor system, no distributed lock. One SQLite UPDATE and a thin Rust wrapper.

This post is about what that primitive looks like and how I convinced myself it's correct — including the test that proves it survives 16 independent OS processes hammering the same daemon over real HTTP.

The problem, one paragraph

You run two AI coding tabs side by side. They post tasks to a shared bulletin: bug reports, ack notes, "please pick this up." Both tabs scan the bulletin every turn. Both tabs sometimes see the same pending bug at the same instant. Without a coordination primitive, both tabs grab it, both write fixes, both push different commits. The merge conflict is the cheapest failure mode; the silent "two tabs running expensive parallel work, only one of them needed" is worse.

The job of coord is to make exactly one of those tabs the winner of the claim, and to tell the other tab "you lost; pick something else." Determinism, not optimism.

Why a queue or a mutex doesn't cut it

The first idea most engineers have is "stick a Mutex<Vec<Task>> behind an RPC call." It works inside one process. It does not survive process boundaries — and the whole point of the system is that the agents are different processes (different IDE plugins, different language runtimes, sometimes different machines on the same loopback).

The second idea is a queue (Redis, RabbitMQ, NATS). A queue does solve the claim race, but it brings a lot of baggage:

A second daemon to install, monitor, and back up.
A protocol the IDE side has to learn (or you end up writing a queue-to-MCP adapter).
Ordering and delivery guarantees you don't actually need for "is this task claimed yet?" semantics.

The third idea is a distributed lock (e.g. Redis SETNX, Postgres advisory locks). Closer, but still a separate piece of infrastructure for what is essentially a per-developer-laptop tool.

For a tool that wants to be brew install-and-go, the right primitive is the smallest one that does the job. SQLite already lives inside the daemon binary. SQLite already serializes writes. The job becomes "ask SQLite to tell me whether I won."

The whole primitive

This is the entire claim function:

pub fn claim_task(&self, id: Uuid, agent_id: &str) -> Result<Option<Task>> {
    let now = Utc::now();
    let conn = self.conn.lock();
    let updated = conn.execute(
        "UPDATE tasks SET state = 'claimed', claimed_by = ?1, updated_at = ?2
         WHERE id = ?3 AND state = 'pending'",
        params![agent_id, now.to_rfc3339(), id.to_string()],
    )?;
    if updated == 0 {
        Ok(None) // somebody else got there first
    } else {
        // re-read to get the canonical row, including the winner.
        Ok(Some(conn.query_row(
            "SELECT ... FROM tasks WHERE id = ?1",
            params![id.to_string()],
            row_to_task,
        )?))
    }
}

The whole correctness argument is in the WHERE id = ?3 AND state = 'pending' clause. Two clients may issue the same UPDATE simultaneously; SQLite serializes the writes; the second one finds state is no longer 'pending' and updates 0 rows. updated == 0 is the loser signal, propagated up the stack as Ok(None). The winner gets a fresh Some(Task). That's it.

There is no lock object, no compare-and-swap dance, no transaction explicit in user code. The transaction is the single statement. SQLite's WAL mode handles concurrent readers without blocking the writer.

The reason I'm comfortable putting this in a critical path is that it's the same primitive every database-backed work queue you've ever used uses underneath. Postgres SELECT ... FOR UPDATE SKIP LOCKED, Redis SETNX, all the big managed queue services — at the bottom they're some flavor of "atomic compare-and-set on a row." This one just doesn't dress it up.

Test 1: in-process, exhaustive

The first test hammers the primitive directly without going through the network at all. The intent: catch correctness regressions in the SQL or the Rust wrapper, not the HTTP layer.

#[test]
fn one_claimer_per_task_under_concurrent_load() {
    let store = Arc::new(Store::open(&db_path).unwrap());

    const TASKS: usize = 200;
    const CLAIMERS_PER_TASK: usize = 8;

    let task_ids: Vec<_> = (0..TASKS)
        .map(|i| store.create_task(&format!("race-{i}"), json!({"i": i}))
                  .unwrap().id)
        .collect();

    let mut handles = Vec::new();
    for task_id in &task_ids {
        for c in 0..CLAIMERS_PER_TASK {
            let store = store.clone();
            let id = *task_id;
            let agent = format!("claimer-{c}");
            handles.push(thread::spawn(move || {
                store.claim_task(id, &agent).unwrap().is_some()
            }));
        }
    }

    let wins: usize = handles.into_iter()
        .map(|h| h.join().unwrap() as usize)
        .sum();

    assert_eq!(wins, TASKS,
        "expected exactly one win per task (got {wins} wins for {TASKS} tasks)");
}

200 tasks × 8 claimers = 1,600 simultaneous claim attempts. The assertion is severe: the total number of Ok(Some(_)) returns across all threads must equal exactly 200. If any task is claimed twice, the count becomes 201+ and the test fails. If any task is missed entirely (e.g. a row-level lock dropped a write silently), the count becomes 199 and the test fails. The only way this assertion holds is if every single attempt is unambiguously a winner or a loser, with no ambiguous middle.

It runs in about 70 milliseconds in CI. I run it on every commit.

Test 2: real processes, real HTTP

The in-process test is reassuring but it has a giant asterisk: it doesn't prove the daemon survives multiple HTTP clients. SQLite serializing writes inside one Rust process is one thing; an axum server holding a single Mutex<Connection> while 16 reqwest clients hit it concurrently is another. The behavior I cared about — "16 IDE tabs in different OS processes, one daemon, exactly one winner per task" — has to be tested end-to-end.

So the second test spawns the actual daemon binary, lets it bind a free port, and fires 16 independent client processes at it:

const NUM_CLIENTS: usize = 16;

#[test]
fn many_concurrent_clients_share_one_daemon_safely() {
    // 1. Start a real `coord serve` on a free localhost port.
    let mut daemon = Command::new(&coord)
        .arg("serve").arg("--addr").arg(format!("127.0.0.1:{port}"))
        .arg("--db").arg(&db).spawn().unwrap();

    // 2. Seed N pending tasks via the CLI (separate process).
    let task_ids: Vec<String> = (0..NUM_CLIENTS).map(|i| {
        // ... shells out to `coord send` and parses the returned UUID
    }).collect();

    // 3. Spawn N independent shell scripts, each racing to claim
    //    every task in the list.
    let mut children = Vec::new();
    for c in 0..NUM_CLIENTS {
        let agent = format!("multi-agent-{c}");
        let mut script = String::new();
        for tid in &task_ids {
            script.push_str(&format!(
                "{coord} --url {url} claim {tid} --as {agent} \
                 >/dev/null 2>&1 && echo {tid}\n"));
        }
        children.push(Command::new("sh").arg("-c").arg(&script)
            .stdout(Stdio::piped()).spawn().unwrap());
    }

    // 4. Collect every (winner, task_id) pair.
    let mut all_wins = Vec::new();
    for (idx, child) in children.into_iter().enumerate() {
        let out = child.wait_with_output().unwrap();
        for line in String::from_utf8_lossy(&out.stdout).lines() {
            all_wins.push((idx, line.trim().to_string()));
        }
    }

    // 5. Assert: every task has exactly one winner.
    let mut by_task: HashMap<String, Vec<usize>> = Default::default();
    for (idx, id) in &all_wins {
        by_task.entry(id.clone()).or_default().push(*idx);
    }
    assert_eq!(by_task.len(), NUM_CLIENTS);
    for (_id, winners) in &by_task {
        assert_eq!(winners.len(), 1, "race lost: {winners:?}");
    }
}

A few things worth pointing out about this test design:

The clients are real processes, not threads. Command::new("sh") forks. Each one gets its own coord CLI invocation, its own reqwest client, its own TCP connection. There is no shared memory between them and no shared Tokio runtime. That's the actual deployment shape.
Each script tries to claim every task. With 16 clients and 16 tasks, that's 256 individual claim attempts hitting the daemon. The daemon has to reject 240 of them and accept 16 of them, with no overlap.
The assertion is again a counting argument. by_task.len() == NUM_CLIENTS says "every task got at least one winner;" the per-task winners.len() == 1 says "no task got more than one winner." Together they're a strict bijection between tasks and winners.
It runs in about 1.3 seconds, including spawning the daemon, waiting for it to bind, seeding tasks, racing, killing the daemon. Slow enough to be annoying as a unit test, fast enough to keep in CI.

This test is the one that gives me confidence to claim "coord scales to N agents and N apps" in the README. Not because 16 is a large number, but because the failure modes for 16 are the failure modes for 1,600 — if it works at this scale and the design hasn't introduced an O(N) lock contention path, it works at the next scale up.

What I didn't need to build

The motivation for writing this post is half technical and half cultural. There is a strong gravity in 2026 engineering culture toward immediately reaching for distributed primitives — actors, channels, CRDTs, lock services — when the actual workload fits comfortably on one machine.

For coord's actual workload (a developer's laptop, 2–10 agents, a few thousand tasks per day) the right answer is a single SQLite file and a single statement. The work isn't "design a queue;" the work is "trust the database to do its job, and verify it with a test that would catch a regression."

Things I considered and threw out:

A tokio::sync::Mutex<HashMap<Uuid, Task>> — works in-process, dies the moment the daemon restarts.
An actor per task — neat but every task lives only seconds; the spawn cost dominates.
Optimistic concurrency at the application level (read row, compare version, write) — that's just reimplementing what the SQL WHERE state = 'pending' clause already does for free, with more chances to get it wrong.
A pluggable storage backend — overkill until somebody asks. The whole product is one binary right now.

The shape of the test suite reinforces the design. If the primitive were doing more, the tests would have to be more complicated to constrain its behaviour. They aren't, because it isn't.

What you should take from this

If you're building anything where two agents (humans, processes, or LLM tabs) might reach for the same piece of work at once, you probably do not need a queue. You need a single atomic compare-and-set, plus a test that proves you can issue 1,600 of them concurrently and get exactly the right number of winners.

The technique generalizes far past AI agents:

Job runners (one task, one worker)
Lease/lock servers (one holder, until release)
Reservation systems (one seat, one buyer)
Cron deduplication (one runner per scheduled instant)

In each case the loop is the same: a WHERE state = 'available' clause in an UPDATE, the rowcount as your winner signal, and a counting-argument test that asserts the total number of winners across all attempts equals the number of available slots. That last test, the counting argument, is what turns "I think this is right" into "I know this is right." It catches things eyeballed code reviews don't — the clever-but-wrong refactor, the changed default isolation level, the silent race introduced by a connection pool tweak.

If you want to see the full machinery, coord is open-source: github.com/DmarshalTU/coord. The two tests above live in tests/race.rs and tests/multi_client.rs. Together they're under 200 lines and they're the load-bearing proof for everything else the project promises.

coord runs locally as a single Rust binary. It's MIT-licensed and installable with brew tap dmarshaltu/coord && brew install coord. The repo is github.com/DmarshalTU/coord.