More Watts, Less Light

#ai #productivity #agents #discuss

Token burn and business outcomes are not correlated. More burn means more inefficiency, not more value.

The electricity problem

Imagine you walk into a dark room. Turning on a light helps you see. Turning on every light in the building does not help you see better. It's still the same room. Now every surface is equally lit, the contrast is gone, and you're paying for power you didn't use.

Tokens work the same way. A focused prompt with clear scope is the single overhead light over your desk. A sprawling prompt with unlimited exploration is every light in the building — you're burning power, not producing insight.

Tokens are electricity, not output. More throughput doesn't mean more value.

I've had weeks where I burned through my allocation and looked back at the end to find nothing concrete. Code that worked but went unused. Exploratory branches that dead-ended. Agents that generated plausible-looking output that didn't survive first review. A lot of motion. Not much progress.

The ceiling stops you from doing that indefinitely. It forces a moment of reflection: did this burn produce anything real? If the answer is no, more capacity isn't the fix. More discipline is.

Three patterns I now use instead

I started paying attention to what actually ships versus what just burns context. I gave the patterns names so I could catch myself faster:

RTK — group similar items, deduplicate, truncate redundancy. Before sending a prompt, scan for repeated patterns, similar blocks, or redundant context. Group them, deduplicate, and shorten. Every repeated instruction or redundant context snippet burns tokens without adding signal.
Caveman — compress before you prompt. Strip greetings, filler words ("I think", "basically", "Let me know if that makes sense"), and closing courtesies. Every word in your prompt multiplies across every response token. Less fluff in means less fluff out.
Ponytail — spec the minimum viable solution. "Robust", "scalable", "enterprise-grade", "comprehensive" — these words invite scope creep. Specify the simplest thing that works: "a Map with TTL, not Redis." An agent given clear constraints ships faster than one given permission to over-engineer.

These aren't hacks. They're accepting that the constraint is real and learning to work inside it before asking for more room.

The question the ceiling forces you to answer

From a business perspective, the question isn't "how many tokens did we use" or even "how much code did we generate." It's "what shipped that wouldn't have shipped otherwise, and was it worth what it cost?"

If the answer is unclear, more capacity isn't going to fix it. A bigger budget for waste just means you waste more, faster.

The real question: if you had unlimited tokens tomorrow, what would you do differently? If you can't articulate a concrete answer — not "ship faster" but "ship which thing, that I can't ship now" — then more capacity is just paying for more waste.

Two perspectives, same person

As the person running the agents day to day, I feel the ceiling. I'd like more room to let exploration run longer before cutting it off. That's an honest feeling.

As someone accountable for what actually ships, I need the ceiling to exist. It forces a discipline that unlimited capacity never would. It turns "try everything" into "try the right thing." It turns motion into direction.

The two perspectives aren't in conflict — they're the same person at different zoom levels. The engineering self wants headroom. The business self wants proof that headroom produces value. Both are correct. The tension between them is the whole point.

Not every team needs to raise the ceiling. Some genuinely do — running many parallel experiments, rapid prototyping, exploring multiple approaches. For them, more throughput would produce measurable value. But that's not most teams. Most teams are fine where they are. Their current output meets their current needs. The ceiling is a signal, not a problem to solve by default. A moment to ask: is this constraint stopping something valuable, or just stopping something busy?

The actual skill

I brush against the limit most weeks. Sometimes it's because the work justified every token. Sometimes it's because I let the agent run too long on inefficient prompts.

More electricity doesn't mean better light. More tokens don't mean better outcomes. The skill is knowing the difference.