Bill Frank Pougoue

Posted on Jun 1

The cheapest token is the one you never spend

#ai #llm #productivity #softwareengineering

*Microsoft and Uber just learned that AI-assisted codebases are expensive to “operate”. I learned a smaller version of the same lesson in 2023, and what I built to cope with it turned out, by accident, to be an answer. *

In the last few weeks, the industry got a bill it hadn't budgeted for. Microsoft, after handing thousands of developers an agentic coding tool late last year, burned through its entire annual AI budget in a few months and started pulling the licenses. Uber's CTO admitted publicly that the company torched its whole 2026 AI coding budget in four months. Individual engineers were running hundreds to thousands of dollars a month in tokens, each one of them productive, none of them wrong to use the tool, and that was exactly the problem. The tools were good enough that people used them constantly and the constant use is what broke the math.

The reflex reaction is "we need cheaper models." That's true, but the token bill isn't dominated by the model writing one clever isolated function. It's dominated by the agent reading your codebase into its context, again and again, on every run, and by re-deriving the same mechanical scaffolding every time something changes upstream. The expensive thing isn't intelligence. It's "surface area". The real lever is bounding how much of your codebase the AI has to touch and re-understand to get anything done.
I didn't arrive at that idea from a whitepaper. I arrived at it from fear.

A side project, for an unglamorous reason

Back in 2023 I started building this solution by hand. Not because I had a grand thesis about model-driven engineering (I'd barely have called it that) but because I was competing with faster developers on Fiverr and I needed to ship full-stack apps quicker than I could type them. Every project was the same spine underneath: a data model, the same CRUD plus search, the same forms, the same auth. I was writing it over and over which made me think "some scripts could write these for me".

It didn't have a name worth saying. It was just a folder called “filegen” with a few scripts in it i.e. two orchestrators, one for the flutter frontend and one for the backend, and two more files that just held the methods describing the structure of the classes I wanted generated. It was for me, nobody else. Only much later, once I had reorganized the mess into something another person could actually use, did it get a real name: KoloStack. "Kolo" a Cameroonian slang for “a thousand” (as in "generate a thousand"), and "stack" for the full-stack/ multi-target span it covers. The first name described a folder. The second described an ambition.

When AI code generation actually got good, I brought it in to push the generator further. But the core (the architecture this whole article is about) was laid down by hand, before any of that existed. That order matters, and I'll come back to why.

The fear that became an architecture

Here's the embarrassing truth I sat with for a long time.

I was scared to regenerate. I'd generate a project, then spend days customizing it with business logic, the parts that made the app actually "the app", and then the schema would change, and regenerating meant my generator would happily flatten everything I'd added. Thus, I stopped regenerating. Which defeated the entire purpose of having a generator. I had built a tool I was afraid to use.

My first fix was the obvious one: an exclusion list. Just don't regenerate the entities I'd already touched. Spoiler alert, it didn't work, and the reason it didn't work is the first half of the whole insight. An exclusion list protects an entity by "freezing" it, so the entities I excluded were exactly the ones I'd customized i.e. my most important ones, and now those were the ones that stopped tracking the schema. I hadn't solved the problem. I'd traded clobbering for staleness, and aimed it at the code I cared most about. The list worked at the wrong granularity: all-or-nothing, around the whole entity.

So, I tried something finer. I added conditions in the generator for the entities that were special eg. User, because auth isn't CRUD and gets managed completely differently; Transaction, because transaction logic isn't CRUD either. Teach the generator that these ones need different treatment? That failed too, twice over and turned the code into a real mess. The generated CRUD and my hand-written auth still lived in the "same file", so regenerating still overwrote the file, auth management and all. And worse, special-casing meant my generator now had to "know about my specific project": hardcode that "users" means auth or that "transactions" means this etc. The next project had different special entities and the whole scheme fell apart. I'd welded one client's domain into a tool that was supposed to be reusable.

Two failed attempts, but together they pointed straight at the answer. The conflict isn't "between" entities (that's what the exclusion list got wrong). And it isn't solved by teaching the generator "which entities” are special (that's what the conditions got wrong). The conflict is "inside every file": part of every artifact is mechanical and should regenerate, part of it is mine and must never be touched, and that's true of User, Transaction and every other entity equally. The seam doesn't go around the entity. It runs "through" the artifact. Once I saw it that way, the design was obvious.

That fear, it turns out, was correct, and so was the path it took to get fixed. Destructive regeneration is "the" unsolved problem in this whole category. The biggest, most mature tool in the space, JHipster, which generates Spring + Angular/React from a domain model, has a long-running community thread literally titled around the customization gap, full of people pointing out that generation is only step one and the real work is everything you add on top. Their official answer is a set of conventions: put your code in a separate package, use this pattern, be disciplined. In other words: please don't get burned. There's no structure stopping you.

So, I built the structure.

Two tiers, one hard line

The fix is almost boringly simple to describe.

Every artifact KoloStack emits is split in two. "Tier 1" is a regenerable base class, schema-derived, framework-owned, overwritten on every single run. It holds the boring stuff: CRUD + search, parsing, scaffolding, the things that "should" track the schema automatically. "Tier 2" is a once-only extension stub that inherits from the Tier 1 base, is written exactly once, and is never touched again. That's where the developer's code or the AI generated code lives.

// Tier 1 — regenerated every run. The machine owns this.
abstract class BaseGeneratedAccountController extends BaseController { ... }

// Tier 2 — written once, never overwritten. You own this.
class AccountController extends BaseGeneratedAccountController { ... }

Tier 1 races to catch up with the schema; Regenerate as often as you like. Tier 2 sits untouched. The writer that puts files on disk (KoloStack) knows the difference between create, overwrite, no-change, and preserve, so the line between machine-owned and human-owned code isn't a convention you're trusted to honor, it's enforced by the tool. When a base class changes underneath a stub that was preserved, you get a warning to review, not a silent clobber.

That's the contribution. Not "a code generator" (there are hundreds of them). It's a "structural" answer to regeneration survival; in a category whose leader still handles the same problem with a style guide.

It also, not by accident, generates a pair almost nobody serves: Flutter on the frontend, Laravel on the backend, wire-compatible, from one SQL schema. JHipster doesn't go there. The lighter generators make you write the templates yourself. If you live in that stack (and a lot of us building for mobile-first markets do) there hasn't been a tool that treats it as a first-class target.

And the same property that makes regeneration safe is what makes KoloStack's target list open-ended. Every target reads from one neutral, frozen spec, i.e. it never reaches into the others' concerns. So, adding a new one (React Native, Django, whatever you need) is a plugin you install, not a fork you maintain. The contract is the whole design; the targets are just readers of it.

Why this is suddenly about money

Now connect it back to the bill nobody budgeted for.

I want to be precise here because the tempting overclaim is wrong. KoloStack would not have saved Microsoft's budget. The money those companies burned went into the day-to-day agentic grind (debugging, refactoring, agents shipping backend changes with no human in the loop), not into initial scaffolding.

But two narrower things are true.

First: "regeneration is a zero-token operation." Add a column, regenerate thirty entities' worth of models, migrations, controllers, forms, and views "deterministically", no model in the loop. The same change handed to an agent ("update everything to match the new schema") is a real, repeated token cost, and it is mechanical work an LLM should never have been paid to do in the first place.

Second, and bigger: "a bounded codebase is cheaper to run an agent against." The Tier 1 boilerplate is correct by construction, so the agent doesn't need to read it, reason about it, or regenerate it. The AI's entire working surface is Tier 2 (the small, intentional part). Less context to load on every run, less surface to re-understand, fewer tokens per task. I didn't build a token-saving tool. I built a codebase architecture that is "cheaper to operate AI on". That's the honest version, and it's enough.

The actual point

There's an old discipline coming back into fashion. Earlier generations of developers wrote memory-efficient code because RAM was expensive and they had no choice. Token-efficiency is becoming the same kind of constraint and the design decisions that produce it aren't tricks you bolt on afterward. They're architectural: deterministic generation, a bounded edit surface, a hard, enforced line between what the machine owns and what you own.

I didn't design KoloStack to be token-efficient. The token economy didn't exist yet when I started. I designed it because I was a freelancer afraid of breaking my own code, and the only way to stop being afraid was to make it structurally impossible for the tool to "hurt me".

That's the part I'd underline if you take one thing from this. The architecture that turns out to matter for operating AI cheaply is the same architecture you'd want anyway if you respected your own hand-written code enough to protect it. I got there in 2023, by hand, out of fear, before AI was good enough to be either the problem or the help. Sometimes the right design shows up early, wearing the wrong clothes, for a reason that has nothing to do with why it ends up mattering.

KoloStack is open source: https://github.com/billfrank904/KoloStack . If you work in Flutter + Laravel, or you've fought the regeneration-survival problem in any generator, I'd like to hear how you've handled it.

DEV Community

The cheapest token is the one you never spend

Top comments (0)