I thought I was being smart.
Every time I opened a new chat with Claude or GPT-4, I'd paste in the entire codebase context. README, relevant files, the works. My reasoning: more context = fewer mistakes = faster development. Right?
Wrong.
Last month I was building features for TokenBar — a macOS menu bar app that shows real-time token usage. The irony of what happened next is not lost on me.
The Setup
I had a pattern. Before any coding session, I'd feed the AI a massive context dump:
- Current file I was working on
- Related files it might need
- The README
- Recent git diff
- Error messages from the last session
Sometimes that context block alone was 8,000-12,000 tokens before I'd even typed my first question. I felt like a power user. I was being thorough.
Then I started actually tracking my usage.
What the Numbers Said
I built TokenBar specifically to watch LLM costs in real time — it sits in my menu bar and shows token counts as I work. When I finally started paying attention to what it was telling me, I felt dumb.
A typical "context dump" session was running 3-4x more tokens than sessions where I just asked the question directly. Not 20% more. Three to four times.
The math: if I'm at ~$0.003 per 1K input tokens with Claude Sonnet, an extra 10K tokens of context per session adds $0.03. Doesn't sound like much. But I was doing 15-20 sessions a day while building. That's $0.45-0.60 extra per day just from unnecessary context. Over a month: $13-18 burned on... padding.
The Trap
Here's what I didn't understand: the context window isn't free storage. Every token you put in costs money, whether the model actually uses it or not.
Worse, I was often re-pasting the same context across sessions. The model doesn't remember your last chat. So every new conversation I was paying input costs again for all that background info it already "knew" from previous sessions — except it didn't, because each session starts fresh.
I was essentially reprinting the entire project brief at the top of every email I sent.
What I Changed
I got surgical about context:
Before: Paste everything, hope for the best
After: Paste only the specific function, error, or file directly relevant to the question
For quick questions — "why is this SwiftUI view not updating?" — I stopped including anything except the relevant 20-30 lines. The answers were just as good, often better because the model wasn't distracted by irrelevant code.
For complex refactors, I'd chunk it: ask about one module at a time instead of dumping everything and asking for a global review.
The Actual Lesson
Context isn't a safety net. It's a cost.
Being precise about what you feed an AI saves money and — this surprised me — often gets better results. The model focuses on what's actually relevant instead of searching through noise.
This whole thing drove home why I built TokenBar in the first place. You cannot optimize something you can't see. When I could finally watch my token counts in real time while working, my behavior changed almost immediately. It's the same reason people spend less when they track expenses: visibility creates accountability.
If you're a solo dev using AI tools daily and you haven't looked at what you're actually burning through — try it. The context window trap is real, and it's sneaky.
TokenBar shows live token usage per request in your Mac menu bar. It's $5 one-time at tokenbar.site — no subscription.
Top comments (0)