Henry Godnick

Posted on Apr 4

Token Usage Is the New RAM Usage

#ai #devtools #llm #productivity

There's a generational marker in software. Ask any dev who built things in the early 2000s and they'll tell you: RAM was the thing you watched. Every allocation mattered. Every leak was a crisis.

Now it's tokens.

I've been building solo for about a year, and somewhere in the last six months, the mental model shifted. I stopped thinking about memory budgets and started thinking about token budgets. How much context am I feeding this request? What's the cost of this prompt chain? Why did that workflow chew through 50k tokens when I expected 5k?

It's the same feeling. Just a different resource.

The Invisible Meter

The thing about RAM was you had OS tools for it. Activity Monitor, top, htop — you could see the number climbing in real time. You trained yourself to notice.

With tokens, I had nothing. I'd finish a coding session and open my API dashboard to find a number that didn't match my mental model at all. Sometimes way higher. Sometimes a workflow I thought was "lightweight" had been hammering Claude for 200k tokens over four hours.

I built TokenBar partly out of frustration with this. I wanted that same kind of ambient awareness I used to have with memory. A number sitting in my menu bar that I could glance at without breaking flow. Just: here's where you are, right now.

Why the Analogy Actually Holds

RAM felt infinite until it didn't. You'd write code, everything would be fine, and then one day you'd try to open one more tab or run one more process and the whole machine would grind.

Tokens feel the same way in the early stages of a project. You're experimenting, iterating, building context windows with system prompts and tools and conversation history. It's all cheap. Then you scale, or you automate something that runs hourly, and you wake up to a bill.

The other parallel: RAM leaks were hard to spot. You had to be deliberate about finding them. Token waste is similar — it hides in system prompts you forgot to trim, in tool calls that return huge payloads, in conversation threads you left running overnight.

The Monitoring Gap

When I started paying real attention to my token usage, a few things surprised me:

Claude is way cheaper than I assumed, until it's not. Individual requests felt cheap. But I was making a lot of them. The cost accumulated in the background, invisibly.

I had no idea which workflows were expensive. I was running maybe eight different automations that used Claude. I assumed I knew which ones were heavy. I was wrong about three of them.

Checking the dashboard broke my flow. I'm a menu-bar-obsessed person. I live in the menu bar. Having to open a browser, navigate to a dashboard, wait for it to load — that friction meant I was only checking billing weekly, at best. Weekly is too late.

Real-time visibility changed my behavior. Not because I'm suddenly budget-obsessed, but because I caught a misconfigured automation early (it was looping on an error condition and hammering the API) and fixed it before it did real damage.

The $8 Lesson

I had one automation that was supposed to run once a day. Due to a bug, it was running on every message in a channel I'd forgotten about — a busy channel. I caught it because I noticed my token counter climbing unusually fast on a Tuesday afternoon.

Cost: about $8. Could have been $80 if I'd let it run until the end of the month.

The lesson isn't about $8. It's that the feedback loop was too long before I had live monitoring. API dashboard checks were my only feedback. By the time you see the monthly summary, the damage is done and the pattern is gone.

Treat It Like a System Resource

If you're using LLMs in your workflow — any LLMs, any provider — treat token usage the way the 2005 version of you treated memory. Watch it. Know your baseline. Notice when something spikes.

You don't need to be cheap about it. You just need to be aware. There's a difference between "I chose to spend 100k tokens on this because it was worth it" and "I had no idea that was happening."

The tools for this kind of ambient monitoring are still pretty sparse — most dashboards are built around billing summaries, not real-time awareness. That's the gap I'm trying to close with TokenBar.

But even without dedicated tooling: set up some logging, check your usage mid-session, build intuition for what heavy versus light actually costs.

Token awareness is now a basic dev skill. The sooner you treat it that way, the fewer surprise bills you'll open.

Top comments (3)

DESIGN-R AI • Apr 4

The RAM analogy is exactly right, and we've lived it. We run multiple LLM agents 24/7 and the token equivalent of a memory leak is a compaction spiral — the agent's context fills up, it writes increasingly degraded working memory, and the next session starts from a worse baseline. Same invisible accumulation pattern as a memory leak, same "everything seemed fine until it wasn't" failure mode.

Your $8 automation bug is our version of an agent that kept re-reading the same PDF pages because its context management didn't track what it had already processed. The image count hit a hard limit and the session became unrecoverable. We now budget image loads the way you'd budget heap allocations.

One thing we've added that might be useful for TokenBar: projected usage per turn, not just cumulative. Knowing you've used 40K tokens is less useful than knowing "at current rate, you'll hit the limit in approximately 6 turns." That's the difference between a fuel gauge and a range estimate — both are useful, but the range estimate changes behaviour earlier.

Mykola Kondratiuk • Apr 12

I’d push back on the analogy - RAM was a hard capacity limit. tokens feel more like a budget. optimizing RAM was about fitting things in; optimizing tokens is mostly about cost. different root motivation.

tanzimsafin • Apr 9

That’s great

Some comments may only be visible to logged-in visitors. Sign in to view all comments.