DEV Community

Cover image for Stop chatting with Claude Code: 3 rules for cleaner context and lower bills
Olexandr Uvarov
Olexandr Uvarov

Posted on

Stop chatting with Claude Code: 3 rules for cleaner context and lower bills

I corrected Claude in a debugging session. It said "you're right."

Then it spent the next 20 messages quietly building on the bug I had just disproved.

The bug wasn't in the model. It was in how I was using context.


Most developers I know use Claude Code like a chat window. One terminal, one session, all day long. Backend question in the morning, frontend bug after lunch, a quick "what should I order for lunch" mixed in somewhere, and an architecture decision around 4pm.

By the end of the day, Claude is unhelpful in subtle ways. It mixes the codebases up. It forgets corrections you made an hour ago. The answers get longer, the relevant parts shorter. And you're closer to your weekly rate limit than you should be — or, if you're on API access, your bill is bigger than yesterday — for the same amount of actual work.

None of that is a model problem. It's a context problem. The window is not memory — it's a working surface, and you have to manage it like one.

The three symptoms

1. Cross-topic confusion

A long session has no concept of "topic." Every new message ships the entire prior conversation to the model — until auto-compaction kicks in around 95% of the context window, but by then you've already paid the tokens to get there. If auth.users appeared in your prompt at 10am, it's still in the prompt at 4pm — even if you've been working in a completely unrelated repo for the last three hours.

So when you ask "should we cache this with Redis?" at 4pm, you're not asking a clean question. You're asking a question against a 4-hour-old soup of:

  • The 10am backend table schema
  • A 12pm offhand mention of a deadline
  • The 2pm frontend stack discussion
  • That weird tangent about your dev container at 3pm

The model answers based on the whole soup. It pulls patterns from one task into another. It references functions that exist in a different repo. It gets confidently wrong because it has too much context, not too little.

You can write better prompts forever — it won't fix this. The fix is realizing the surface itself is dirty.

2. Post-correction persistence — the "you're right" trap

This is the one that costs me the most time.

I was debugging a component that re-rendered three times on every keystroke last week.

First hypothesis: the parent was re-rendering on each state change. I asked Claude to look at the parent. It agreed — confirmed parent re-renders, here's a fix wrapping the parent in React.memo.

I opened React DevTools Profiler. The parent wasn't re-rendering at all. The child was re-rendering on its own — because a useEffect inside it had an inline object as a dependency:

useEffect(() => {
  fetchData(filter);
}, [{ filter, sort }]);
Enter fullscreen mode Exit fullscreen mode

That object is recreated on every render. The effect "depends on" a brand new reference every time, so it fires every render, which sets state, which re-renders the child. Classic React footgun.

Me: Parent's not rendering — check the profiler. The child has a useEffect with an inline object in its deps. New reference every render.

Claude: You're right. Sorry for the confusion. The cause is the inline object in the useEffect dependency array.

Moved on. Asked for the fix.

const Parent = React.memo(({ filter, sort, items }) => {
  // ...
});
Enter fullscreen mode Exit fullscreen mode

The fix Claude wrote was wrapping the parent in React.memo. The hypothesis we had ruled out two messages earlier.

I corrected it again. "It's the inline object inside the child's useEffect, not the parent."

Claude: Right, the inline object. Sorry.

Then it proposed the same fix. React.memo on the parent.

I /clear'd the session. Started over. Pasted only the child component and the profiler screenshot. New session caught the inline-object dependency in the first answer and proposed the correct fix: pull the object out with useMemo, or destructure into primitive deps.

Why this happens

Look at what's actually in the context window at the moment Claude proposes the wrong fix for the second time:

  1. A fat system prompt and tool definitions — easily 10k+ tokens, often much more if you have MCP servers connected
  2. My first message: "this component re-renders three times on every keystroke"
  3. Claude's confident first hypothesis: parent re-render — explained over a paragraph
  4. My correction with the profiler output
  5. Claude's "you're right, it's the inline object" — one sentence
  6. A few intermediate messages
  7. My request: "give me the fix"

Item 3 is still in the prompt. It's longer than item 5. It's more confidently argued than item 5. When the model generates the next answer, it weighs everything in the prior turns. The parent-re-render framing wins because there's more of it.

The apology doesn't delete the wrong answer. It adds a short retraction at the end of a long argument. The argument keeps winning.

You can think of it like this: in a long enough conversation, your correction is a sticky note attached to a 10-page memo. The model reads the whole memo every time. The sticky note doesn't always get noticed.

Three signs you're in a "you're right" loop

  • You corrected the model and it agreed.
  • Two or more turns later, the original wrong claim resurfaces.
  • You correct again. It agrees again. It does it again.

When you see this pattern, don't reason with it. Don't write a longer correction. Don't paste the same evidence twice. /clear. Start over with the corrected version as your starting prompt.

The reminder you're tempted to write will just be one more short message in a long polluted history. The history already lost that argument once.

3. The token tax

Most developers using Claude Code today are on a subscription — Claude Pro, Claude Max. Your bill doesn't grow at the end of the month. Instead, your weekly quota burns out faster than it should. You hit the rate limit on Wednesday afternoon and wait until Friday to keep working. If you're on API access, the same pressure shows up as a growing invoice. Either way, you pay the same tax — just in a different currency.

Here's where the tax comes from. Every Claude Code request includes, on every single turn:

  • The system prompt with MCP tool definitions (10k+ tokens, often much more)
  • The full prior conversation (until auto-compaction kicks in)
  • Your new message
  • The response it generates

Prompt caching helps when you're sending similar prefixes within a short window — the default TTL is 5 minutes, with a 1-hour tier available at extra cost. The 5-minute timer refreshes on each cache hit, so as long as you're actively typing, the cache stays warm. But cache writes themselves cost 1.25× the base input price, and the moment you pause for longer than the TTL, the full content gets re-uploaded as a fresh cache write. Caching is not free, and it's not magic.

After 50 messages of about 500 tokens each, you've accumulated ~25k tokens of history. Plus 10k+ of system prompt and tools. Every new request you send ships 35k+ input tokens before you've even typed your actual question. On a subscription, those tokens come out of your weekly allowance. On API, they come out of your wallet.

A back-of-the-envelope calculation on Anthropic's current published pricing (Sonnet 4.6 at $3 per million input tokens; Opus 4.7 at $5 — note that Opus used to be 5× the price of Sonnet, but in the 4.5/4.6/4.7 generation it's only about 1.67× more expensive on input):

Scenario Avg input per request 30 requests Tokens shipped Cost (API)
One long session, no /clear (Sonnet) ~35k ~1.05M 1.05M ~$3.15
/clear every 10 messages + notes on disk (Sonnet) ~15k ~450k 450k ~$1.35
Same work, all on Opus, one session ~35k ~1.05M 1.05M ~$5.25
Research on Opus → write to disk → execute on Sonnet mixed mixed ~250k Opus + ~450k Sonnet ~$2.60

These are model calculations from Anthropic's published pricing, not a billing screenshot. Plug your own usage in — the ratio is what matters. You can do the same work for less than half the token cost just by clearing the surface. And you can do it with the better model for roughly half the cost by splitting the workload between Opus and Sonnet.

If you're on a subscription, divide your weekly token allowance by these numbers. The "Claude Max ran out by Wednesday" feeling has a real cause — and it's not your workload, it's your session length.

A long messy session quietly burns through your quota — or your wallet — in the background. The rate-limit message at 4pm Wednesday looks like a usage problem. It's a hygiene problem.

The three rules

Rule 1 — /clear is hygiene, not panic

Treat /clear the way you treat closing a browser tab. It's not failure. It's not "starting over." It's normal.

Things I clear on:

  • Switching from backend to frontend (different mental model).
  • Switching from designing something to implementing it (different decision style).
  • After any "you're right" loop, no matter how confidently the apology was phrased.
  • After a long debugging detour that didn't pay off — clear, restart with what I learned, not with the failed attempts.
  • At the end of any session that lasted more than an hour.

If you have two parallel tasks, open two terminals, not two threads in one session. The model doesn't have tabs. The window doesn't.

The cost of clearing is zero. The cost of not clearing accumulates linearly in tokens and nonlinearly in quality.

Why not /compact?

Claude Code also has /compact, which summarizes the conversation instead of wiping it. It's useful, but it has one problem you should know about: the model summarizes for itself, not for you. It keeps what it thinks mattered. The decisions it confidently got wrong often survive. The corrections you made — short, one-line, easy to drop — often don't.

/compact is fine for "I'm halfway through a task and want to keep going on a smaller surface." It's not fine for "I just spent an hour fighting a bad hypothesis and need a clean restart." For the second case, /clear plus a markdown file on disk beats /compact every time, because you control the summary instead of the model.

Rule 2 — Context lives on disk, not in the window

The window is volatile. It gets polluted, it gets long, you /clear it, it's gone. Even /compact is volatile — the model decides what survives.

The disk is durable. A markdown file you wrote yesterday is still there today, exactly as you left it. Word for word.

So after every non-trivial research session, I write the conclusions out. A plan into plans/, an architecture decision into docs/adr/, a set of patterns into docs/component-patterns.md. The next session reads the file as ground truth. The previous chat history is not invited.

This is the most underrated habit in Claude Code, and it's also the cheapest. A 2k-token markdown file in your context replaces a 25k-token conversation history that the model would otherwise be re-shipping on every turn.

A practical example: I keep a file that lists the patterns used in my codebase — what to do, what we've already tried and abandoned, what's idiomatic here. Every Claude Code session that touches the frontend reads that file first. Without it, the model invents patterns from training data — usually fine, sometimes wrong, occasionally suggesting a component I built six months ago as if it's a new idea. With it, the model is grounded in the codebase as it exists today.

Disk-based context also survives /clear. Window-based context doesn't. That asymmetry is the whole game.

Rule 3 — Use the right model for the right phase

Opus is the deep, expensive model. Sonnet is the fast, cheaper one. In the current Claude 4 generation, Opus 4.7 is roughly 1.67× the price of Sonnet 4.6 on input ($5 vs $3 per million tokens) and the same multiple on output. That gap used to be 5× in earlier generations — Anthropic cut it dramatically when Opus 4.5 launched — but even at 1.67×, the gap matters once you're running hundreds of requests a day.

The instinct is to pick one and use it for everything. That instinct is wrong for two reasons:

  1. If you pick Opus for everything, you pay the premium on every formatting fix, lint error, file shuffle, and "rename this variable." None of that benefits from Opus reasoning.
  2. If you pick Sonnet for everything, your research phase produces a plan that's almost right, you implement it, you hit a wall, you back up, you re-research, you implement again. The cost is real but invisible — it's the rework, not the line items on your bill.

The pattern that wins: research and plan on Opus, then write the plan to disk, then /clear and start a fresh Sonnet session that reads the plan.

In numbers: a research-heavy task done entirely on Opus might run around $5. The same task split — Opus for research, written to a markdown file, Sonnet reading the file and executing — runs closer to $2.50. The split also makes the execution better, because the plan is correct on the first try.

The split takes one extra step: writing your conclusions to a file. That step pays for itself in the first task — not just in dollars, but in not having to redo the implementation when the plan turns out to be wrong.

What I clear and what I keep

To make this concrete, here is what I throw away and what I keep across the boundary of a /clear:

Thrown away (in the window):

  • The chronological back-and-forth.
  • All the wrong hypotheses I explored.
  • Claude's apologies for being wrong.
  • Long tool outputs I needed once.

Kept (on disk):

  • The final correct plan.
  • The architecture decision with the reasoning.
  • The patterns I want the next session to know about.
  • Any "do not do X again" notes from this session.

When I open a new session, the window starts at zero. The disk hands me a clean version of what I learned. The two together are stronger than any long conversation.

Closing

You don't get better answers from Claude by holding a longer conversation with it. You get better answers by managing the surface it works on — what's in it, what's gone from it, and which model is reading it now.

The long session feels productive because the model still talks back. But it's slowly losing the thread, weighing wrong hypotheses against your corrections, and burning through your quota — or your wallet — for the privilege.

/clear. Write things to disk. Switch models on purpose. Treat the context window like a workbench, not a journal.

Context isn't memory. It's a working surface. Keep it clean.

Top comments (0)