A few years ago I owned an electric vehicle (EV). Not a Tesla, but a Hyundai Ioniq (which fit my pay scale appropriately then).
Like many EV owners, I had numerous bouts of "Range Anxiety," or that feeling of uneasiness when the battery is at low levels, which implies that the car probably can't go far to reach a charging station.
It was definitely not a great feeling, especially with a charging station not in sight.
Today, a similar kind of anxiety is happening to developers and software builders that use AI and vibe coding tools linked to popular services like Claude, Gemini, and Codex.
This feeling of uneasiness surfaces when you see the notification in on the website or on the IDE, indicating that you are almost hitting your usage limit, like a feeling your inference tokens are dwindling like sand in an hourglass.
In my usage of the three popular AI services (the aforementioned Claude, Gemini, and Codex), it seemed that this dreaded usage limit notification appears in Claude, but still this happens a lot when I use these AI services to build and refactor apps.
Another feeling is when the AI-enabled apps you are working is just using too much tokens on the backend. This relies on "heavy" AI features that do a lot of processing on the runtime.
This got me to call this feeling as "Token Consumption Anxiety," riffing from the Range Anxiety among EV owners.
Now that we've identified that feeling of uneasiness, now what?
I propose three things developers and software engineers to reduce this Token Consumption Anxiety:
Pick the right model - Like a golfer picking their clubs, or a surgeon picking the right instrument, you have to discern which model and tier to use. Not everything has to be Opus or Pro.
Change your system architecture - not more on the development side but on the workload or app created. Utilize more of "Build-Time AI," where you try to build as much of the inference or use the intelligence at build time then use lighter AI calls on runtime.
Create your own tooling and optimization - Vibe coding means you can build the micro-tools your workflow is missing. Mine was a model-selection utility (RightModel) that tells me when a cheaper model is sufficient — which it usually is for boilerplate and formatting tasks. A targeted tool like this pays for its build time in a single heavy session.
I do foresee a time when token consumption will be an afterthought, like how we feel about electricity or water consumption-- both are finite resources but most of us don't really keep a mental meter of the kilowatts or liters we consume.
We're not there yet. For now, it's a real constraint worth managing. Hopefully some of this helps.
Top comments (0)