Token Cost Is the New Performance Metric

#agents #ai #codequality #llm

I watched an agent read forty-seven files looking for a bug last week. Forty-seven. It started in the right place, followed an import into a utility folder, got confused by a generic name, backed out, tried another path, landed in a test helper, read three more files to understand the test helper, then circled back to where it started. Twelve minutes. A few dollars. The bug was a one-line fix in a function called processData.

The function could have been called validateInvoiceLineItems. The agent would have found it in three reads, not forty-seven.

I used to think this was an agent problem. Smarter models, bigger context windows, better tooling. The agent should figure it out. Then I looked at the bill. Navigability has a price tag now, and the meter runs whether the agent is making progress or wandering in circles.

Same bet, new currency

There's a take going around that code quality doesn't matter anymore. AI writes the code. Why invest in architecture? Why use React when you could just write plain HTML and JavaScript? Abstractions are overhead. Let the machine handle the mess.

We've heard a version of this before. "You don't need to write performant code. Hardware gets faster every year."

Niklaus Wirth named the pattern in 1995: software is getting slower more rapidly than hardware is becoming faster. We got faster processors, so we built heavier software. We got more RAM, so we consumed more of it. The headroom never survived contact with ambition.

Token costs are falling fast. And we're already finding ways to spend every bit of the savings. Agents that used to handle a single file now coordinate across entire repositories. Tasks that used to be one prompt are now multi-step workflows running for minutes. The ambition scales with the budget, same as it always has. The Victorians had a name for this. When steam engines got more efficient, Britain didn't burn less coal. It burned more. Cheaper energy unlocked uses that hadn't been viable before, and total demand outran the savings.

We've made this bet before. We lost.

The interface shifted

We keep debating whether abstractions are worth the overhead. It's the wrong question. LLMs are language models. They process natural language. And so do you. That's not a coincidence. It's the whole point.

The reason abstractions work for human developers isn't arbitrary ceremony. Abstractions let you focus on the problem you're solving instead of the mechanics of getting the machine to execute. They create vocabulary. They compress complexity into names that carry meaning. They let you reason at the right level of detail without holding the entire system in your head.

All of that applies to an LLM working in your codebase. Whatever reduces cognitive overhead for a human developer reduces token overhead and context pollution for the model. A well-named function is a signpost. A well-scoped module is a boundary. A clear import structure is a map.

Strip those away, flatten everything into procedural code, and you haven't simplified anything. You've just removed the navigational aids and forced whoever reads it next, human or model, to reconstruct the map from scratch every single time. And when that reader is an agent billing you per token, "from scratch every single time" has a dollar amount attached to it.

Progressive disclosure

If you've followed how Claude Code structures its skills, you'll recognize the concept of progressive disclosure. Show just enough information to orient, then let the reader drill down as needed. Jakob Nielsen coined the term in 1995 for UI design. It turns out it describes exactly how AI agents navigate code.

Anthropic's own engineering blog uses the exact term: "Letting agents navigate and retrieve data autonomously enables progressive disclosure, allowing agents to incrementally discover relevant context through exploration. Each interaction yields context that informs the next decision: file sizes suggest complexity; naming conventions hint at purpose."

That's the mechanism. The agent reads your entry point. It looks at the imports. It decides, based on the names it sees, where to go next. It builds a working model of what's relevant one file at a time, following the trail your architecture laid down.

When the trail is clear, this works remarkably well. When it isn't, the agent wanders. Think about how you onboard a new developer. You don't hand them the entire codebase and say "good luck." You point them to an entry point, explain the key concepts, and let them explore from there. A well-structured codebase does this automatically. A poorly structured one is the equivalent of dropping a new hire into a warehouse full of unlabeled boxes and asking them to find the invoice logic.

Your agent is that new hire. Every single session. It has no memory of yesterday. No institutional knowledge. No "oh right, that lives in the legacy folder." It starts fresh, reads what you give it, and follows the trail your architecture laid down. Bad names, tangled imports, a god file with everything in it. That's not just messy. That's expensive. Every wrong turn is tokens burned. Every ambiguous function name is a coin flip the model takes on your dime.

Clean architecture isn't a nicety. It's the difference between a three-dollar fix and a thirty-dollar fix.

Compound debt

"Sure, but AI makes shipping so fast that it doesn't matter."

You've seen this play out. A team adopts AI tooling, ships three features in the time it used to take to ship one, and everyone celebrates. Three months later the codebase is a maze. The agent that built it can no longer navigate it. Every new task takes longer than the last because the context is polluted with the shortcuts from the previous sprint. The speed that felt free turns out to have been a loan, and the interest is compounding.

Carnegie Mellon studied this pattern across hundreds of repositories. By month three, the speed gains had fully reversed. But the complexity those gains introduced was still there, baked into the codebase, waiting for the next agent to trip over it.

Google's 2025 DORA report nailed the dynamic: AI doesn't fix a team, it amplifies what's already there. An amplifier doesn't care about the signal. It just makes it louder.

Lower barrier, higher ceiling

When something becomes easier to do, the field doesn't become easier to win in. More contenders heighten the competition.

Anyone can prompt an agent to build a feature. That's the new baseline. It's table stakes. The question that follows is whether the feature is built correctly, cheaply, and without breaking three things downstream. Whether the next agent can find its way through what the first one built. Whether the codebase compounds or collapses.

Give an LLM a clean codebase and it builds confidently in the right direction. Give it a mess and it invents its own interpretation of what the code is supposed to do, confidently, in the wrong direction. The quality of what you give the model directly determines the quality of what comes back. Same dynamic we've always had. Different reader.

The developers who dismiss code quality as an anachronism are confusing the lowered barrier to entry with a lowered ceiling. The barrier dropped. The ceiling didn't. Anyone can get an agent to produce code. Fewer people can structure a system so the agent produces the right code, cheaply, without breaking three things downstream. That gap is where the real competition happens.

The architect stays

A lot of people say AI turns every developer into a project manager. Wrong frame. You are now the product owner, domain expert, and architect. The hard thinking, the domain clarity, the structural decisions. That work cannot be delegated to an agent. It is the input the agent depends on.

Kent Beck put it simply: you want tidy code that works, even though you're not typing most of it. You can ignore syntax. You cannot ignore the fundamentals of good software design.

Even Andrej Karpathy walked back "vibe coding" within a year of coining the term, calling his original tweet "a shower of thoughts I just fired off without thinking." By September 2025, he was advocating for "agentic engineering" instead. More oversight. More scrutiny. The founding document of the "code quality doesn't matter" movement was, by its author's own admission, a throwaway thought.

Token cost is falling. Complexity is rising. Same principles, new audience, higher stakes.

Bet accordingly.