They Spent $81,267 By Accident...

#ai #webdev #programming #claude

...and I Spent $1.39 On Purpose

Last week, fintech startup Slash told its whole company to lean into AI coding. One employee took that note personally, sat down with Claude, and built a video game.

Then the bill showed up: $81,267. In one week. On the company card.

The game is called Brainrot Shooter. It's a bare-bones, blocky, Minecraft-looking first-person shooter where you run around blasting characters named after viral memes (Skibidi Toilet, Tung Tung Tung Sahur, the whole brainrot pantheon). Slash handled the receipt the only sane way: they posted it to the internet and begged people to play the game so they could write it off as a marketing expense. It went viral. The dumb little game actually found an audience.

The employee summed it up himself: "This is actually insane, am I going to become a case study for how AI spend can get out of control."

Yes. You are. And this is the case study. But not the one you think.

Everybody blamed the wrong thing

The internet's takeaway was "see, AI coding is a money pit." That's the lazy read, and it's wrong. The model didn't cost him eighty grand. The way he used it did.

Here is the part nobody on X bothered to explain.

A coding agent is stateless. The model has no memory between turns. So every single time you hit send, the harness re-sends the context it needs: your system prompt, the conversation so far, and the file contents the model is supposed to look at. The model reads all of it, fresh, every turn. You pay for all of it, fresh, every turn.

Now watch what happens over a full day of active development. Your codebase grows. Your conversation grows. The agent keeps pulling large files into context so it can "look at the whole project and change this one thing." Every one of those turns re-bills everything you already showed it five minutes ago. Do that across hundreds of turns on a swelling codebase and the meter spins like a slot machine that never pays out.

That is not the AI being expensive. That is you paying to make it re-read the same code a few hundred times.

The tell is in the token split

Here's where it gets concrete, and here's the part a developer can actually use.

Claude Opus 4.8 bills input and output tokens at different rates. As of this writing, straight from Anthropic's own announcement:

Input: $5 per million tokens
Output: $25 per million tokens

Output is five times pricier per token. So your gut says output is where the money goes. In agentic coding, that's backwards. Output tokens are what the model writes, and a model can only write so fast. Input tokens are what the model reads, and there is no ceiling on how many times you can make it re-read the same files. Input is cheaper per token and ruinous in volume, because volume is the thing that compounds turn over turn.

So when a coding bill goes nuclear, it's almost always input-dominated. I don't have the Slash employee's dashboard, so I won't pretend to quote his exact split. But a five-figure bill from a single day of iterative development is the unmistakable signature of context reloading: the same big files, read again and again, hundreds of turns deep. That is the mechanism. That is what burned the card.

Which gave me an idea for a test.

Same game. One shot. On purpose.

I rebuilt Brainrot Shooter from scratch (mine's called jBrainRot, naturally). Same premise, same blocky world, same meme enemies waddling at you while a combo counter climbs. The difference was in how I asked.

Instead of a day-long conversation where the model re-reads a growing codebase on every turn, I wrote one complete, self-contained prompt. A full spec: the tech constraints, the controls, the enemies, the game feel, all of it, up front. One message. One shot. No "now look at the file again and tweak this" loop, because the loop is the leak.

I also ran jCodeMunch in the middle, an MCP tool I built that trims the dead weight out of context before it ever reaches the model. (Full disclosure: that's my product. Use it, don't use it, the principle stands either way.)

The result was a playable browser game. And here's the receipt:

Input: 11,900 tokens = $0.0595
Output: 53,400 tokens = $1.335
Total: $1.39

Look at that split. It's output-heavy. The model spent its tokens writing the game, not re-reading the game. That is the exact inverse of the reloading trap, and it's why the number has a decimal point in front of it instead of five digits.

His $81,267 versus my $1.39. That's roughly 58,000 to one, for the same dumb game.

Don't Nick-Up Your Budget

You don't need my tool to avoid this (but it'd really help). You need to stop paying to make the robot re-read. A few rules that actually move the number:

Spec hard, then let it write. The single biggest lever is turning a hundred small "look again and tweak" turns into one well-specified turn. Re-reads are the cost. Front-load the thinking so the model writes instead of re-reads. An output-heavy bill is a healthy bill.
Watch your context window, not just your prompt. The expensive token is the one you send 300 times without noticing. If your agent is loading huge files every turn, that's your leak. Trim it.
Turn on prompt caching. Anthropic's prompt caching bills repeated context at cache-read rates, up to 90% cheaper than fresh input. If you're running long sessions without it, you're paying full freight to re-read static files. This alone would have gutted a bill like Nick's.
Cap your spend, then watch the meter. Set a hard budget limit in your harness before you start, not after the invoice. And put eyes on the live number: something like our free jmunch-console that not only tracks token usage, savings, and throughput, but also fires threshold alerts so a rogue leak trips a wire instead of surfacing as a five-figure surprise. (Disclosure: also mine. The category is the point.) "I underestimated my own ability" is a beautiful sentence and a terrible budgeting strategy.
Mind the defaults. Opus 4.8 defaults to high effort and bills its thinking tokens at output rates. Great for hard problems, wasteful for trivial ones. Match the effort to the task.

None of this is exotic. It's just the stuff nobody tells you until your CFO takes your nameplate off the wall.

The actual lesson

The viral version of this story is "AI made a guy spend $81,000." The true version is "a guy paid to make a model re-read his code a few hundred times, and nobody had set a limit."

The model isn't the money pit. Your context window is. Watch that, and the same work that torched a five-figure bill costs you next to nothing...

*The full one-shot prompt I used (and jBrainRot, the game it built) are free on the repo. More token-efficiency tooling at jcodemunch.com