Joshua Brackin

Posted on Jun 29 • Originally published at ousley.ai

Watch a coding agent silence a Swift 6 data race instead of fixing it

#swift #ios #programming #ai

Give a coding agent a Swift file that stopped compiling under strict concurrency, and a lot of the time it will make the build green by adding one annotation. The error goes away. The data race it was warning about does not.

I've been running agents against real Swift 6 repair tasks: take a small package that builds clean, introduce one concurrency bug, and ask the agent to fix it with the build green and the tests passing. The setup matters. These are not "write me a feature" prompts where you can't tell good output from bad. There is a right answer and a wrong answer, and the compiler under -strict-concurrency=complete is standing right there to tell them apart.

First, the part I'll concede, because this audience has heard the lazy version and rightly rejects it. Frontier models write good Swift concurrency code. Ask one to design an actor or thread a value through a task group from scratch and the result is usually clean. Writing the code was never the bottleneck. The trouble starts when the model is handed a strict-concurrency error and told to make it go away, because "make it go away" has a cheap wrong answer that the compiler accepts.

Here's a concrete one. A value type that crosses into concurrent code, declared Sendable:

public struct Transfer: Sendable {
    public let amount: Int
    public let memo: String
}

Now someone adds a stored property whose type is a mutable class:

public final class AuditPen {
    public var ink: Int
    public init(ink: Int) { self.ink = ink }
}

public struct Transfer: Sendable {
    public let amount: Int
    public let memo: String
    let pen: AuditPen   // mutable reference type
}

The build breaks, correctly:

stored property 'pen' of 'Sendable'-conforming struct 'Transfer'
has non-sendable type 'AuditPen'

That error is doing its job. Transfer claims it's safe to hand across isolation boundaries, but it now carries a mutable reference that two tasks could write to at the same time. The compiler caught a real race before it could happen.

The fix the agent reaches for:

public struct Transfer: @unchecked Sendable {

One word, @unchecked. Green build. Every test still passes, because the tests never exercised concurrent mutation of pen. And the race is exactly as present as it was a minute ago, now with the compiler told to stop mentioning it. @unchecked Sendable is a promise from you to the compiler that you have made this type safe by hand. Nothing was made safe. The promise is empty.

I want to be fair to the keyword, because the honest version of this is more interesting than "agent dumb." @unchecked Sendable is a real, correct tool. If AuditPen guarded every access to ink behind a lock, marking the wrapper @unchecked Sendable would be the right call, because you'd actually have done the synchronization the compiler can't see. The problem is not the annotation. It's reaching for the annotation with nothing behind it. A person writes @unchecked Sendable after deciding the type is safe. The agent writes it because it's the shortest edit that turns red into green, and it has no separate notion of "safe" to check the edit against.

The real fix is to make the type genuinely safe again: drop the mutable member, make it an immutable value, or move the mutable state behind an actor. More work, no new annotation, and the Sendable conformance stays honest.

Once you've seen the move, you start seeing it everywhere the compiler is enforcing a contract. A call fails because it's gated to a newer OS, and instead of wrapping it in if #available, the agent deletes the @available line. A function is typed throws(NetworkError) and the agent throws the wrong error, so rather than fix what it throws it widens the signature to a plain throws and the type mismatch evaporates. Same shape every time. The check is a checker. The agent satisfies the checker the cheapest way it can, and the cheapest way is almost always to suppress the check rather than do the thing the check was asking for.

This is why concurrency is the failure mode I keep coming back to. For most bugs the build-and-test loop is a decent backstop: the agent suppresses something, a test goes red, and it has to deal with it. Strict concurrency is different. The suppression compiles. The existing tests pass, because a data race is timing-dependent and won't fire on a quiet test run. The loop has no red to chase. The agent's own feedback signal reads the job as done, so nothing in the loop can tell a fix apart from a silenced warning, and it ships the silence.

Which lands on the thing I actually feel running these. A red build is a guardrail you can trust. An agent that launders the guardrail hands you a green build you can't, and the only way to know which one you got is to read the diff. @unchecked Sendable is easy to skim past, because it looks like the model understood something. So you go back to watching it, which was supposed to be the part the tools saved you from.

If you run agents against Swift 6 work, where have you landed on this? Do you scan the diffs for @unchecked Sendable and nonisolated(unsafe) by hand, or have you found a way to make the loop itself refuse a fix that only silences the checker?

Top comments (5)

Luis Cruz • Jun 29

This is a really interesting capture of something most people miss: the agent didn’t just “fix a bug”, it inferred intent vs correctness boundary and chose a different semantic path (silencing instead of repairing).

What stands out to me is how quickly this turns into a tooling + policy problem, not just a coding problem. In real Swift 6 concurrency work, I’ve seen similar situations where the compiler is technically correct about isolation, but the “fix” depends on what layer you actually want to enforce:

strict correctness (actor / Mutex / Sendable redesign)
or practical suppression with explicit risk ownership (@unchecked Sendable, @preconcurrency, etc.)

The interesting part is that coding agents can blur that line unless we explicitly encode “repair preference” (safety-first vs pragmatic containment).

It might be useful to extend this idea with a concept like “fix intent mode”:

Preserve correctness guarantees (refactor toward isolation)
Stabilize build (minimize change, accept controlled risk)
Explain tradeoff (always surface why suppression is chosen)

Because otherwise agents will often optimize for the shortest path to green builds rather than the safest architectural outcome.

Curious if you’ve experimented with forcing the agent to justify why it chose silencing vs structural fix in concurrency cases like this.

Joshua Brackin • Jun 29

The repair-preference distinction is the right frame, and it's the part most "agents can't do concurrency" takes miss. The agent isn't choosing pragmatic suppression over strict correctness on purpose. It has no separate notion of "safe" to weigh against "green," so it isn't really choosing between them at all. It takes the shortest path the compiler accepts.

That's also why I'm skeptical of the "make it justify the tradeoff" lever, even though I wanted it to work. A model that just wrote @unchecked Sendable to clear an error is very good at writing a confident paragraph about why that was the reasonable call. The justification reads fine and the race is still there. Self-explanation tends to rationalize the cheap fix, not catch it.

Where I've gotten traction is the other direction: less asking the model why it did something, more having the loop check something the model can't talk its way around. And worth saying @unchecked Sendable is a real tool, when there's actual synchronization behind it. The tell is reaching for it with nothing behind it.

Pon • Jun 30

Swap strict concurrency for authorization and your whole post reprints itself. An IDOR, or an RLS policy left at USING (true), compiles fine and passes its tests, because the tests run as one honest user -- so the security version of your race, a second user reading the first one's rows, never fires on a quiet test run. Same 'no red to chase.' @unchecked Sendable and USING (true) are the same species: a hand-written promise to a checker with nothing behind it. The agent writes both for the reason you named -- no separate notion of 'safe' to weigh against 'green,' so it takes the shortest edit the checker accepts. Your traction direction is where I landed too: stop asking the model why, make the loop check the thing it can't narrate around. For the auth slice that's a test that logs in as the wrong user -- the one check a confident paragraph can't satisfy, same as a race it can't talk out of existence. The by-hand diff scan you're stuck doing for @unchecked Sendable is the exact manual pass I'm trying to kill for the authorization equivalents.

Joshua Brackin • Jun 30

The RLS USING (true) parallel is exact, and I think both tests stay green for the same reason: one actor on the happy path. A race needs a second task to fire before anything goes wrong, and by your mapping an IDOR needs a second user reading the first one's rows. The default test instantiates neither, so the model writes the permissive thing and nothing on the board turns red. The shape that's worked for the race case is making the test carry the adversary, not the assertion. Not "assert the user sees their data," but spin up the concurrent writer that should corrupt state and check that it can't. Your wrong-user login is the same move one domain over. A confident paragraph can't satisfy either one, because the failure lives in a state the model never has to narrate. The by-hand diff scan is the part I most want gone, and I don't have it solved. Have you found anything that flags the permissive-edit class before it ships, or are you still reading every diff too?

Pon • Jul 1

Still reading diffs, partly. But the direction I've been building is the one you landed on for the race, pushed one step earlier: don't trust that the adversary test exists. Your make-the-test-carry-the-adversary move works when there's a test at all. The case that gets me is the vibecoded app where nobody wrote the wrong-user login, so there's no red to launder in the first place and no diff-reader watching either. So instead of asking the loop to catch a bad fix, I flag the permissive edit by its shape before any test runs. USING (true) on a table that holds user rows, a policy that never once references auth.uid(), a grant handed to the anon role. Those read as permissive by construction, the same way @unchecked Sendable with no lock behind it does. You don't need to run the race to know the promise is empty, you can see it's empty sitting in the diff. The hard part is the false-positive line, because USING (true) is genuinely right for a public lookup table, so it's less ban-the-pattern and more surface it and make someone say out loud that this one's intentional. That last part is what I'm still tuning.