Stop Using Claude Code for Everything: How I Cut My Token Usage by Being Smarter About Which AI Does What

#ai #llm #productivity #tooling

Claude Code is genuinely impressive. It can navigate a large codebase, plan multi-step changes, write and run tests, and iterate on its own output. But it's also expensive to run, and if you're using it the way I was at first, you're probably burning a lot of tokens on tasks that don't need it.

Here's the workflow shift that made the biggest difference for me.
The problem with using one tool for everything
When Claude Code is open in your terminal, it's tempting to just throw everything at it. Rename this variable. Write a quick helper function. Add a comment here. Fix this typo.

The thing is, Claude Code sends your full conversation context with every request. That means even a tiny ask like "rename this variable" is consuming tokens proportional to how long your session has been running. It adds up fast.

Meanwhile, you probably already have a perfectly capable model sitting right there in your editor that's much cheaper to run for simple tasks.
The setup: run Claude Code inside your editor
Before getting into the workflow, there's a small change worth making. If you've been running Claude Code in an external terminal while VSCode or Cursor is open on the same folder, move it into the integrated terminal.
You don't need to configure anything. Just open the terminal panel inside the editor and run Claude Code from there.

What you gain from this is access to the full editor context while Claude is working. The git diff panel shows you exactly what's being changed in real time. You can stage or revert specific hunks without switching windows. And when something breaks, the debugger is right there. You can set breakpoints, inspect variables, and feed that information back to Claude without ever leaving the editor.
It also makes the two-model workflow I'm about to describe much easier to actually use in practice.

The workflow: right tool for the right task
Once Claude Code is running inside your editor, you have two AI tools in the same window:
Claude Code in the terminal, for tasks that need real reasoning. Architecting a new feature. Debugging something complex. Refactoring across multiple files. Anything where you need the model to actually think through a problem.

Cursor composer or GitHub Copilot (depending on your editor), for everything else. Renaming things. Writing a simple util. Adding a docblock. Generating a test for a function that's already clearly defined. These are pattern-matching tasks, and a smaller model handles them just fine.

The practical rule I use: if I can describe the task in one sentence and I already know roughly what the output should look like, it goes to the composer or Copilot. If I'm not sure how to solve it, or it touches more than one or two files, it goes to Claude Code.
Why this actually matters

Claude Code's billing is based on tokens, and context accumulates over a session. The longer you've been working, the more expensive each request gets, even simple ones. By handling small tasks in the composer instead, you're keeping Claude Code sessions shorter and more focused, which means less context bloat and lower costs overall.
There's also a quality argument here. Claude Code does its best work on hard problems. If you're constantly interrupting it with trivial requests, you're not getting the most out of it.
What this looks like in practice
A typical session for me now looks something like this:
I open Cursor with Claude Code running in the integrated terminal. I'm building a new feature. I ask Claude Code to plan the implementation and start on the core logic. While it's working I can watch the diffs in the git panel. If it breaks something I use the debugger to understand what happened and give Claude Code the specific info it needs.
For smaller things that come up along the way, like tweaking a component or writing a quick helper, I switch to the composer. It's faster, it's cheaper, and it doesn't interrupt the Claude Code session or bloat the context.
When I'm done with the feature, I start a fresh Claude Code session for the next task rather than letting the context grow indefinitely.
The short version
Claude Code is a powerful tool but it's not free, and not every task needs it. Running it inside your editor instead of an external terminal makes it easier to use it alongside your editor's built-in AI for simpler tasks. Keep Claude Code for the hard stuff. Let the lighter models handle the rest.
It's a small workflow change but it makes a real difference over a full day of coding.

I'm Amit Raz, a Software Architect specializing in AI and software development based in Haifa, Israel. I build AI-powered products and help businesses integrate AI into their workflows. More at rzailabs.com.

Top comments (1)

Harjot Singh • May 31

"Being smarter about which AI does what" is the whole game, and you arrived at it manually - which is exactly the right instinct, just done by hand. The realization that a frontier model is overkill for renaming variables, writing boilerplate, or reformatting, while you save it for the genuinely hard reasoning, is the single biggest cost lever there is. Most people never make that switch because picking the right model per task is friction.

The natural endpoint of your approach is automating that decision so you don't have to think about it. That's literally what Moonshift does: a multi-agent pipeline (prompt to a shipped SaaS on your own GitHub + Vercel) with a router that does the "which AI does what" call per step automatically - cheap-80/expensive-20 - landing a full build at ~$3 flat. You've basically been hand-running the algorithm. First run's free, no card. Great post - how do you decide the cutoff in practice? I find the hard part is the medium-difficulty tasks where it's not obvious which model is worth it.