AI Coding Agents: The Expensive Part Isn't the Agents

#ai #productivity #devtools #programming

I ran three AI coders in parallel last week, each on its own file. Three pull requests came back in minutes.

Then I checked the bill. It had barely moved.

We'd expected running several AI agents at once to burn a fair bit. The numbers said otherwise, and once we dug into why, we hit the thing that changed how we think about putting AI to work.

1. The coders aren't the real cost

Most people picture running a lot of AI codegen as a per-token bill that climbs and climbs, because the mental image is a metered API. But the coders we run don't connect that way. We run them through an agent framework that decides which engine drives each coder, where the engine is just the model doing the real work behind it. The engine we plug in logs in through a monthly subscription we already pay for, not an API key billed per token. So whether we launch one coder or three at once, the cost per round barely changes.

So where does the real cost go? To us. Early on we babysat the coders, ssh-ing in to check status every few seconds. That back-and-forth was what ate the resources, not the coders. The fix wasn't fewer coders, it was to stop babysitting and let a dispatcher hand out the work and close it off automatically. People set the problem and make the calls; people don't sit and stare.

Once you see that, running several coders in parallel stops being the extravagance you feared. The coders are cheap and replaceable. The expensive thing is human time and attention.

2. Cheap and parallel means you need a checker who isn't the writer

Once coders are cheap and you can launch several at once, the risk moves. It isn't about money anymore, it's about quality. Output that comes fast and in volume slips defects through just as easily if nobody checks first.

The path we chose: everything a coder writes goes through a separate reviewer first. The reviewer has one job, to read and flag, never to write. Because the moment the writer and the checker are the same agent, it glosses over the exact spots it just missed. Splitting the reviewer out gives you a second pair of eyes that isn't attached to what was just written.

What we'd like to do better: the reviewer should run a different engine from the writer, so it doesn't share the same blind spots. We tried switching the reviewer to a different engine, but it hung silently inside the system and never produced a single review, so we fell back to the engine we knew worked. A different engine is the right target, but something that actually works now beats an ideal that isn't stable yet.

3. A human keeps merge authority, and the work stays isolated

Coders running in parallel, a reviewer in between, and there's still one last line to draw: who presses merge into the main code. Our answer is a person, not any AI. The coder writes, the reviewer flags, but a human decides whether it actually goes in. We call that last gate Gate A, not because we distrust the AI, but because taking code into the main line is a hard-to-reverse decision, and someone should own that spot every time.

The other thing to set up the moment you run several at once: each coder needs its own workspace, not all writing over the same files. Let several coders edit the same space at once and the work tangles. That's the dispatcher's job, to fence off space for each one up front, not to patch collisions after the fact.

Start small: the shape you can copy

If you take one thing away, take this: running a fleet of AI coders isn't as expensive as you fear, because the coders aren't the real cost. The expensive things are human time and the defects that escape, and both are solved by the same shape.

Start small. You don't need the whole fleet on day one.

One coder, running on a subscription you already pay for, instead of a per-token API.
One reviewer that isn't the writer, reading and flagging first, every time.
A human holding merge authority, with a dispatcher handing out the rest of the work instead of you watching it.
Once that shape is stable, add coders one at a time. Because what lets you run a whole fleet without losing sleep isn't smarter coders, it's the checker and the human who still holds the decision.

Written from real work. The full version, with a Thai edition, lives at productize.life.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.