DEV Community

Henry Godnick
Henry Godnick

Posted on

I Switched AI Models Mid-Project and My Costs Tripled. Here's What I Learned.

Six months ago I was happily using GPT-4 for everything. Code reviews, writing, debugging — it worked fine and my bills were predictable. Usually $30-40 a month. Manageable.

Then I started hearing everyone rave about Claude Sonnet. Better at reasoning, more nuanced responses, handles long context better. So I switched mid-project. No big deal, right?

My next bill was $127.

The one after that: $143.

I had no idea why until I actually sat down and looked at the numbers.

The Problem With Just Switching Models

Here's what I didn't account for when I switched:

Token pricing is wildly different between models. GPT-4o at the time was running me about $5 per million input tokens. Claude Sonnet? More than double that for the tier I was using. Same amount of work, roughly 2.4x the cost.

I was sending longer prompts to Claude. Because it's better at following complex instructions, I started writing more detailed system prompts. Which meant more tokens per request. Which meant higher costs. A virtuous cycle in quality terms, a death spiral in billing terms.

I had no real-time visibility. With GPT, I'd gotten used to a rough mental model of what things cost. With Claude, I was flying blind. I'd finish a coding session and have zero idea if I'd spent $2 or $20.

What I Did About It

The wake-up call was month three. I'd been working on a new macOS app — TokenBar — which is actually a real-time token counter that lives in your menu bar. The irony wasn't lost on me: I was building a tool to track AI token usage, while bleeding money because I wasn't tracking my own AI token usage.

So I started using it seriously. And a few things became clear immediately:

My longest context windows were my biggest cost centers. I had one particular workflow where I was feeding an entire codebase into Claude for review. That single prompt sometimes hit 80k tokens. That's not free.

Short back-and-forth chats are cheap. Long single-prompt sessions are expensive. The cost isn't really about how long you work with AI — it's about how much text you send in a single shot.

Multi-turn conversations compound. Every message in a conversation includes the full history. By message 20, you're paying for the entire transcript plus your new question.

The Actual Fix

I didn't stop using Claude. The quality difference is real and worth paying for on the right tasks. What I changed:

  1. Chunked my big prompts. Instead of feeding 50 files at once, I started feeding 5-10 at a time. Costs dropped significantly with maybe a 10% hit to output quality.

  2. Used GPT-4o for cheap tasks. Syntax fixes, simple refactors, boilerplate generation — these don't need Claude-level intelligence. Route accordingly.

  3. Set daily token budgets and watched them. Having a live counter in my menu bar made me conscious of what I was spending in real time, not after the fact.

  4. Killed idle conversations. I had a habit of leaving long chats open and picking them up hours later with a small new question. That small question was actually huge because of context length. Now I start fresh more often.

My last month's bill: $51. Mostly Claude, some GPT-4o. Back in a sane range.

The Bigger Lesson

Switching AI models without changing how you use them is like buying a sports car and driving it the same way you drove your Corolla. You'll notice the difference — mostly in your wallet.

Different models have different cost profiles, different strengths, and different failure modes. Just because something is better doesn't mean you should use it for everything, or use it the same way you used its predecessor.

The developers who are going to win with AI tooling aren't the ones who pick the best model. They're the ones who understand the trade-offs and route work intelligently.

Maybe that's obvious in retrospect. But nobody told me when I made the switch, and the first time I saw that $143 bill, I genuinely thought something was wrong.

Nothing was wrong. I just wasn't paying attention.


If you want to watch your token usage in real time while you work, I built TokenBar for exactly this — it's a $5 menu bar app for macOS that shows you live token counts and cost estimates across models.

Top comments (0)