Last month I had a weird realization: I was shipping less, using AI less deliberately, and somehow paying more.
I opened my Claude invoice expecting it to be lower than March. I'd been intentional about coding. Fewer sessions. Shorter conversations. I thought I had it under control.
It was 40% higher.
I stared at it for a solid minute.
The thing I'd been ignoring
I'd been working on a macOS app — TokenBar, a menu bar tool that tracks LLM token usage in real time (yeah, I know). While building it, I got into a habit of leaving long conversations open instead of starting fresh ones.
I'd start a Claude session in the morning to debug a Swift UI issue. Then in the afternoon I'd continue that same thread to ask something totally unrelated. Then I'd ask it to review some copy. Then help with an error message.
Each message in that thread carried the entire conversation history as context. I wasn't paying per session — I was paying per token, and that context window was growing all day.
A 3-message conversation I could've had fresh for maybe $0.04 was now $0.40 because I was dragging around 8,000 tokens of conversation history I didn't need.
Multiply that across a week and you've got your 40% spike.
Less code, more context
Here's the counterintuitive part: I was writing less code because I was using more AI for planning and architecture discussions. Those are long, meandering conversations. Lots of back-and-forth. Lots of context.
Code generation is actually pretty cheap per token — the outputs are short, the questions are specific. But "help me think through the architecture of this feature" conversations balloon fast.
I'd switched from cheap tasks to expensive tasks without realizing it.
What actually helped
Starting fresh conversations for different topics was the obvious fix. I knew this but kept being lazy about it.
The bigger thing was actually being able to see what I was spending in real time. When I finally had TokenBar running on my own Mac (dogfooding your own tool hits different), I could watch the token counter tick up during a session. That little number in my menu bar made the cost tangible in a way the monthly invoice never did.
Same feeling as watching your bank balance while you shop vs. getting the credit card bill a month later. One creates immediate feedback. The other creates regret.
The pattern I see now
I talk to other solo devs who have this same problem. The bill arrives, they're confused, they try to use AI less, and then they're confused again next month when it doesn't change much.
The issue usually isn't frequency — it's context accumulation and task type. A 20-minute architecture discussion in a long thread can cost more than an hour of focused code generation in fresh sessions.
Once I understood that, I stopped trying to "use AI less" and started using it more intentionally. Separate threads per task. Fresh sessions for new topics. And a way to see what's happening in real time instead of waiting for the monthly surprise.
My bill went down the next month. Made more progress too, weirdly enough — shorter, focused sessions are just better anyway.
If you've noticed similar patterns in your own AI spending, I'd be curious what else you've found. The billing models for these APIs are still pretty opaque and I feel like we're all figuring it out in real time.
Top comments (0)