DEV Community

Cover image for My Pro Max Plan Lasted 15 Minutes. Then I Ran /context.
Phil Rentier Digital
Phil Rentier Digital

Posted on • Originally published at rentierdigital.xyz

My Pro Max Plan Lasted 15 Minutes. Then I Ran /context.

A guy* instrumented 858 Claude Code sessions over 33 days. $1,619 of invoice. 264 million tokens wasted on a single misconfigured setting. 54% of his turns happened after a 5+ minute idle gap, so cache expired, so cost x10 for nothing. One file read 33 times in the same session. 19 skills out of 42 almost never called but loaded at startup. 90% of the waste came from settings HE controlled. Not from Anthropic billing. Not from the model. From his config.

I ran /context right after reading that. 24,800 tokens loaded before I typed a single character. 5,500 just for my global CLAUDE.md, the file I have been dragging since months without ever rereading it seriously. Multiply that by my sessions per day, by 20 working days, by a year. The number gets uncomfortable. And you, do you know how much you load before your first prompt? Probably not. Nobody knows before they look.

TLDR: The cost of a Claude Code session is not the volume of unique tokens. It is that volume multiplied by the number of times it gets reloaded. Cache that expires, sub-agents that reload the parent context in full, plus a dozen other reloads you never see. 15 hacks to attack the multiplier, each with its pro AND its con (because half the "tips" floating around cost you more in time than they save in tokens). Run /context before you finish this article. You will see.

two-column diagram. Left column "What you think you pay for" with one medium bar labeled "unique tokens". Right column "What you actually pay for" with the same bar stacked vertically multiple times (reload 1, reload 2, reload 3...). Title above: "Token cost = unique tokens Γ— reload count". Monochrome with red accent on the right column.


Token cost = unique tokens Γ— reload count

Your Cache Expires in 5 Minutes. Your Coffee Break Doesn't.

Anthropic's prompt cache lives 5 minutes. After that, the next turn pays full price to reload everything you already paid for once. The Reddit audit found that 54% of turns happened after a 5+ minute gap. More than half the conversation was effectively cold-cache.

Pro: running /compact or /clear BEFORE you walk away from your machine is the single highest-leverage habit on this list. You announce the break, you collapse the context, you come back to a clean slate that costs almost nothing to warm up. Lunch, a meeting, the kind of break where you go check on the pool and end up fixing a skimmer for 20 minutes.

Con: on a tight task where you are iterating fast, breaking the cache to save 5% of context costs you more in cognitive reload than in tokens. You lose your train of thought, Claude loses the nuance of the last three turns, and you pay in wall-clock time.

Verdict: religious about it before announced breaks. Not for going to the bathroom.

Your Global CLAUDE.md Gets Reloaded Forever

5,500 tokens. That is what my global CLAUDE.md weighs. Reloaded on every session. Every project. For as long as I keep that file the way it is. Do the math on a year of sessions and you stop sleeping.

Confession: my own CLAUDE.md violates the 200-line rule I am about to preach. I know. I am part of the problem.

Pro: turn the file into an index that points to specialized arch docs in each project, instead of a global brain dump. Keep only what applies to 100% of your projects (your name, your shell, your 3-4 hard rules). Everything else lives in a docs/ folder that gets read on demand.

Con: a too-skinny CLAUDE.md means re-explaining your conventions every session. Iterations cost more than the startup tokens you saved. Especially painful on projects you touch once a month.

Verdict: index, not junk drawer. Aim for 80 lines, not 800.

One Line in Your Settings Cut Context in Half

ENABLE_TOOL_SEARCH=true. Copy. Paste. Test. Verify with /context.

That single setting is the one that saved 264M tokens across the Reddit audit. With it on, Claude does not load every tool schema at startup. It searches for the right tool when it needs one. On a setup with 15+ MCP tools the savings are brutal.

Pro: massive context reduction at startup if you have a heavy MCP setup. Instant. Free. One line.

Con: on a setup with less than 10 tools, Tool Search adds a round-trip every time Claude needs a tool, which can actually slow you down. There is also a known macOS quirk where the auto-flag does not always stick. Force it manually and verify with /context that the drop happened. Test it in a throwaway session first so you do not nuke a cache mid-task.

Verdict: turn it on if you have 15+ tools. Skip it under 10. Test before committing.

You're Loading 42 Skills. You Use Six.

The Reddit audit found 19 skills out of 42 that were almost never called. Loaded at startup anyway. Each skill is a schema that costs context whether you invoke it or not.

Pro: quick audit, disable the dormant ones, instant savings on every single session forever. The kind of cleanup that pays back the next morning.

Con: auditing takes 30 minutes and you will end up disabling something you use twice a year, on the exact day you need it. Murphy lives in your skills folder.

Verdict: disable anything you have not touched in a month. Not stricter than that. The 2x-a-year skills are not worth the cleanup pain.

Plan Mode Pays Back in Avoided Rerolls

Everyone sells Plan Mode as a quality-first feature. Look thoughtful, plan carefully, get cleaner code. Sure.

The savings live somewhere else entirely. They live in the iterations you do not have to run. Every time Claude codes the wrong thing and you have to say "no, do it differently", you reload the entire context to re-explain. Plan Mode collapses three rounds of "almost but not quite" into one round of "we agreed before you started". I went deeper on the same idea in the prompt contracts approach I built after enough of these disasters, if you want the full framework.

Pro: mandatory on anything that touches 2+ files. The savings compound on every avoided reroll.

Con: on a trivial task (rename a variable, fix a typo, add a console.log), Plan Mode just adds latency. There was no iteration to avoid in the first place.

Verdict: Plan Mode for multi-file tasks. Skip for surgical fixes.

/clear Is the Cheapest Habit You're Not Doing

Change of task. /clear. That is it. That is the hack.

The reason nobody does it consistently is the same reason nobody flosses. It is too small to feel like a win and too easy to skip "just this once". And then last Tuesday I asked Claude to draft an email and it answered with TypeScript. Because I had been refactoring auth for two hours and never cleared. The previous task was still living rent-free in the context, paying full price on every turn, and now confusing the new one. Twenty minutes later, still untangling.

/clear on every task switch. Two seconds. Free. Done.

The only real risk is hitting it by reflex in the middle of a task and losing context you actually needed. That happens once. You learn fast.

Sub-Agents Cost 7x. Nobody Tells You That.

Every sub-agent reloads the parent context in full. Anthropic's own docs confirm it, the Reddit audit measured it, and the multi-agent pattern that everyone has been hyping for six months is, mechanically, a token trap dressed as a feature.

You spawn 3 sub-agents to "parallelize" a task. Each one inherits the full context. You just paid for that context 4 times instead of 1. Welcome to the club.

Pro: killing the reflex of "I'll send this to a sub-agent" for every small task. The reflex feels productive.

Con: for genuinely parallelizable work (review 5 independent files, run 5 isolated checks), sub-agents are still faster in wall-clock time even if they cost more in tokens. Time vs money tradeoff. Sometimes time wins.

Verdict: sub-agents are a parallelism tool, not a savings tool. Use them when the clock matters more than the bill.

Productivity feels great. The invoice arrives anyway. πŸ’Έ

Disconnect the MCP Servers You Never Open

Even with Tool Search on, every MCP server has a startup cost. Mine, right now: Gmail, Calendar, Chrome, Context7, YouTube transcript, my own rentierdigital MCP. Honest answer: probably half of those I have not touched this week.

Pro: immediate startup context reduction. The kind of cleanup you can do in 90 seconds.

Con: disconnecting and reconnecting on every task switch is friction you will abandon in 3 days. I tried. I quit on day 4.

Verdict: create 2-3 settings profiles (dev / writing / research) and switch by profile, not by individual MCP. The friction drops to a single command. I went deeper on why CLIs end up cheaper than MCP servers for most agent work if you want the longer take.

/compact at 60%, Not 95%. (Yes, I Know What the Docs Say.)

The docs suggest compacting late, when you are running out of room. Polite, conservative advice. I disagree.

By 95%, the response quality has already started to degrade. Claude is fishing in a saturated context. You feel it before the warning fires. By 60%, you compact while everything is still clean and the next 40% of the session runs at full quality on a much lighter context.

Pro: on day-to-day code ops, compacting at 60% gives you better quality AND lower cost in the same move. Rare combo.

Con: on architecture work or deep debugging where you NEED the full history, compacting early throws away the nuance Claude would have used. You compact away the very thing that was about to help you.

Verdict: 60% for daily code. 85% for architecture and deep debug.

1,122 Redundant File Reads. One Session Read the Same File 33 Times.

That number is from the Reddit audit and it physically hurt me to type it. 33 times. Same file. Same session. Each read paid in full.

The cause is usually /compact wiping the file from working memory, or Claude playing it safe after a long turn and re-fetching to be sure. Either way, you pay.

Pro: add a rule in your CLAUDE.md: "do not re-read files you already have in context unless I ask, or unless the file may have changed since the last read". Massive savings on long sessions.

Con: if you constrain re-reads too hard, Claude misses the edits YOU made between two turns and codes against a stale version. That is a worse bug than the wasted tokens.

Verdict: the rule with the explicit exception is the only version that survives contact with reality.

The same file. 33 times. In one session. I am still not over it.

Paste the Function. Not the 1,200-Line File.

Surgery vs grenade. Most devs throw the whole file at Claude because it is faster to copy. Faster to paste, slower to pay.

The default should be: paste the function, not the file. Direct savings, no cognitive cost, works on every prompt. The only time it backfires is when the bug is actually in the interaction between functions, and pasting the isolated piece means Claude misses the root cause. You debug the wrong thing for 20 minutes, then widen the context anyway. Annoying. Still cheaper than pasting the whole file every single time by default.

Start surgical. Widen only if the first pass fails. Most of the time, the first pass works.

Reference @auth.js, Not "The Bug in My Repo"

Targeted reference (@path/to/file.js) means a precise read. Vague reference means Claude runs grep, glob, ls in cascade until it finds what you meant. Each step pumps your context.

Pro: fine control over what gets loaded. You know what you are paying for.

Con: precise reference assumes you know where the bug lives. If you are still searching, vague is necessary.

Verdict: target whenever you can. When you have to search, ask explicitly: "find the file, then stop and show me before reading it." Two-step search. Cheaper than the cascade.

Edit Your Last Message. Don't Send a New One.

Claude gives you a wrong answer. You scroll back, edit your original prompt, hit enter. Claude regenerates from the corrected prompt. The bad reply never enters the context. The thread stays clean.

Versus: typing a new message saying "no, I meant…", which stacks the bad reply, your correction, and the new attempt all in working memory.

Pro: clean thread, smaller context, faster iteration on micro-adjustments. Especially good for fixing typos in your own prompt.

Con: you lose the trace that the first version failed, which is sometimes useful when you are debugging a recurring pattern in how Claude misunderstands you.

Verdict: edit for micro-adjustments. Stack for real reasoning iterations where the failed attempt teaches you something.

A git log Just Ate 8,000 Tokens. You Didn't Notice.

Terminal outputs are the silent vacuum of Claude Code sessions. git log without --oneline -20. npm install with the full dependency resolution dump. tail -f on a server log. All of it lands in the context. You did not see it scroll by because Claude collapsed the output. The tokens are still there.

Pro: add a deny list in CLAUDE.md for known verbose commands. Force --oneline, force tail limits, force log levels. List your repeat offenders.

Con: if you constrain outputs too hard, Claude misses the info needed to diagnose, you run a second command, and now you paid twice.

Verdict: deny list on the known verbose ones. Keep the rest permissive. Refine as you spot new offenders.

You did not see those 8,000 tokens. Your invoice did.

Sonnet by Default. Haiku for Grunt. Opus for Architecture.

The only model rule worth being on this list. Opus on a variable rename is waste. Haiku on an architecture decision is 3x more time spent rolling back.

Pro: right tool for right task. Opus where the reasoning matters, Sonnet for the daily 80%, Haiku for the boring grunt (renames, formatting, doc generation).

Con: switching models mid-session breaks your cache. 4 switches a day and you have lost more than you saved.

Verdict: pick the model at the START of the session based on the task type. No mid-session switching. If you guessed wrong, finish the task on the current model and retune for the next session.


The Cheapest Hack Isn't on This List

All these hacks are guesswork until you measure. Two free commands, already installed on your machine: /context and /cost. The formula fits on one line: tokens loaded at startup Γ— sessions per day Γ— 20 working days. That is what you pay every month just to start. Before the first real prompt.

The market is starting to react. Hasan Toxr (@hasantoxr on X) released a knowledge graph approach that claims 8x to 49x reduction on code reviews. meta_alchemist maintains ccusage and claude-code-usage-monitor. Dashboards are popping up everywhere. Good sign. But no external tool will tell you what /context tells you in two seconds.

The cheapest hack on this list is not on this list. It is running /context once a week and looking honestly at where the reloads are coming from. Your config, not Anthropic's bill, is the first place to look.

Actually, wait. Let me put it differently. The real hack is admitting that most of us have been flying blind. We optimize for features, for speed, for developer experience. But we never look at the bill until it hurts. Then we blame the model, the company, the pricing structure. Anything except the 20 settings we control.

Tell me which hack was the most useful for you. Or which one I got wrong.


Sources

* The Reddit audit: u/Medium_Island_2795 (aka MunchKunSter), 858 sessions instrumented over 33 days, surfaced via @Simba_crpt and @DAIEvolutionHub on X.

  • Hasan Toxr's knowledge graph for code review: @hasantoxr on X.
  • ccusage and claude-code-usage-monitor by meta_alchemist.

(*) The cover is AI-generated. Claude wrote the words. A different machine drew the picture.

Top comments (0)