Token Town

#aie #agents #ai

AI Engineer World's Fair Coverage

Many of the sessions from Tuesday, especially on the main stage, revolved around the idea of software factories. Coordinating agents with other agents that check even more agents' work, with humans in an overseer position. Currently, many organizations’ systems are not quite factory-ready, with agents doing work but only as directed and subsequently verified by humans. A critical part of the software factory paradigm is its ability to provide trust by being able to review itself. Those larger orchestration and evaluation tasks may need to use more powerful models in order to provide autonomous output while building the necessary trust with the humans in and above the loop. But do all of the agents in the factory have the same requirements?

Sarah Sachs from Notion gave an incredible talk about just that, and if you missed it, make a note to go watch the recording. The main stage was livestreamed; you could watch it right now. Her main point was this: Most of us don't work for a frontier AI lab. A large fraction of the things we do, both in our own work and in our products, don't require the biggest, hottest, most token-hungry models. If we lock ourselves into one AI vendor or one hard-coded AI model, we're doing ourselves, our shareholders, and our customers a disservice.

At this point, even the models that aren't bleeding edge from any vendor are pretty decent and good enough to do simple to medium tasks like summarization. Using Opus to summarize a Slack thread is a bad choice. You likely don't need any model to convert a file from one type to another. As AI is getting better, it's also getting more expensive. Price is only going up, as are other social and environmental costs, and "simply throwing more tokens at it" is not a viable move for most companies.

Sarah's advice was to build smartly and modularly enough to avoid vendor and model lock-in. Switch models and vendors on a dime when the billing shifts to make the right choice for your customers. Try out open-weight models, especially for those midrange tasks where "good enough" is good enough. Balance the needs of all the different agents in your “factory,” make the "tokenomics" work for you, and it will help keep the costs a little more balanced for everybody else, too.

Top comments (4)

Nazar Boyko • Jul 1

Model routing is the easy part of "switch on a dime." The harder, hidden cost is that a prompt tuned to one model's quirks (its formatting, the way it calls tools) often lands differently on another, so a clean swap can quietly drop quality until you tune and re-check for the new one. The router is a weekend of work. Keeping output stable across three vendors is the ongoing tax. The line about not needing any model at all to convert a file is the sharpest point in here, the cheapest token really is the one you never spend.

UnitBuilds • Jul 1

Especially when switching vendors. A google model is a google model and they're all pretty uniform, because they were distilled from the same dataset. But switching from Google to Anthropic... It's more than just a payload wrapper that changes.

UnitBuilds • Jul 1

Price is only going up is a harsh reality we'll all have to face soon... We think a subscription will stay useful, but remember how heavily it's subsidized just to make it useful. Try run via raw API for a bit and you'll soon realize, it's VERY expensive... Look at Gemini 3.5 Flash, it's $9 per mil output, 3.1 pro is $12... Picture what 3.5 pro or 4.0 pro will cost...

Alex Shev • Jul 2

Token economics become real once the agent loop runs all day. The expensive part is often not one prompt, but repeated context reloads, retries, and tool calls that nobody is measuring.