Anthropic just dropped 1 million token context windows for Claude Opus 4.6 and Sonnet 4.6 — generally available, included in Max plans with no extra cost multiplier.
This is huge. But if you're not careful, it's also an easy way to blow through your quota in a single afternoon.
Here's how I've been approaching large-context sessions without wasting tokens.
The Problem Nobody Talks About
When you go from 200K to 1M context, the natural instinct is to dump everything in. Your entire codebase. All the docs. Every file that might be relevant.
And it works — technically. Claude handles it well. But you're burning 5x the tokens on input for every single response, even when 80% of that context is irrelevant to the current question.
I tracked my Claude Code sessions for a month and found something wild: most of my expensive sessions weren't doing complex work. They were doing simple tasks with massively inflated context.
5 Rules I Follow Now
1. Not Every Task Needs the Big Window
The 1M window is incredible for:
- Full codebase refactors
- Cross-file dependency analysis
- Understanding legacy systems end-to-end
It's overkill for:
- Writing a single function
- Fixing a bug in one file
- Generating tests for a specific module
I default to regular context and only switch to claude-opus-4-6[1m] when I genuinely need the full picture.
2. Track Your Token Usage in Real Time
This was the game-changer. I started running TokenBar in my Mac menu bar — it shows live cost per session as I work. The behavioral shift was immediate.
Before: "I'll just load everything, it's fine."
After: "This session is at $2.40 and I've only asked three questions. Let me trim the context."
You can't optimize what you can't see. Whether you use TokenBar or build your own tracking, having a live cost counter completely changes how you prompt.
3. Use the CLAUDE_CODE_AUTO_COMPACT_WINDOW Env Var
Most people don't know this exists. By default, Claude Code compacts context at around 180K tokens. With 1M available, you might want to adjust this:
export CLAUDE_CODE_AUTO_COMPACT_WINDOW=500000
Or disable auto-compaction entirely if you're doing deep analysis:
export CLAUDE_CODE_AUTO_COMPACT=false
The key insight: compaction at the wrong time can actually waste more tokens by forcing the model to re-discover context it already had.
4. Structure Your Prompts for Context Efficiency
Instead of "look at everything and fix the bug," try:
Focus on src/auth/ directory only. The login flow is returning
a 403 when the user has a valid session token. Check the
middleware chain and identify where the token validation
is failing.
Scoped prompts + large context = the model has everything available but knows exactly where to look.
5. Batch Related Tasks Into Single Sessions
Context loading is the expensive part. If you need to work on three related features, do them in one session rather than three separate ones. The 1M window makes this practical now — you can keep the full project loaded and work through multiple tasks without reloading.
The Deeper Issue: Developer Focus
Here's something I noticed while optimizing my AI workflow: the same problem that causes token waste also causes human productivity waste.
When I'm jumping between Claude sessions, Slack, Twitter, and email, I'm doing the same thing as loading unnecessary context — burning resources on task-switching instead of actual work.
I started using Monk Mode alongside my coding sessions. It blocks the algorithmic feeds on social apps at the system level, so when I'm in a deep coding session with Claude, I'm not getting pulled into Twitter threads every 10 minutes.
The combination of tracking AI costs in real time (TokenBar) and eliminating feed-based distractions (Monk Mode) basically doubled my productive output. Not because either tool is magic, but because visibility + environment design beats willpower every time.
The Numbers
Since switching to this approach:
- Average session cost dropped 40% (from tracking and adjusting in real time)
- Deep work sessions went from ~90 min to 4+ hours (from blocking feed algorithms)
- Context reload frequency dropped 60% (from batching tasks into longer sessions)
TL;DR
1M context is a power tool. Like any power tool, the difference between productive use and expensive waste is awareness and discipline.
Track your tokens. Scope your prompts. And for the love of your own productivity, block the infinite scroll while you're coding.
What's your approach to managing AI costs? Drop your setup in the comments — always looking for new workflows.
Top comments (0)