DEV Community

Henry Godnick
Henry Godnick

Posted on

Claude's 1M Context Window Is Live — Here's How to Actually Use It Without Burning Through Your Quota

Anthropic just dropped 1 million token context windows for Claude Opus 4.6 and Sonnet 4.6 — generally available, included in Max plans with no extra cost multiplier.

This is huge. But if you're not careful, it's also an easy way to blow through your quota in a single afternoon.

Here's how I've been approaching large-context sessions without wasting tokens.

The Problem Nobody Talks About

When you go from 200K to 1M context, the natural instinct is to dump everything in. Your entire codebase. All the docs. Every file that might be relevant.

And it works — technically. Claude handles it well. But you're burning 5x the tokens on input for every single response, even when 80% of that context is irrelevant to the current question.

I tracked my Claude Code sessions for a month and found something wild: most of my expensive sessions weren't doing complex work. They were doing simple tasks with massively inflated context.

5 Rules I Follow Now

1. Not Every Task Needs the Big Window

The 1M window is incredible for:

  • Full codebase refactors
  • Cross-file dependency analysis
  • Understanding legacy systems end-to-end

It's overkill for:

  • Writing a single function
  • Fixing a bug in one file
  • Generating tests for a specific module

I default to regular context and only switch to claude-opus-4-6[1m] when I genuinely need the full picture.

2. Track Your Token Usage in Real Time

This was the game-changer. I started running TokenBar in my Mac menu bar — it shows live cost per session as I work. The behavioral shift was immediate.

Before: "I'll just load everything, it's fine."
After: "This session is at $2.40 and I've only asked three questions. Let me trim the context."

You can't optimize what you can't see. Whether you use TokenBar or build your own tracking, having a live cost counter completely changes how you prompt.

3. Use the CLAUDE_CODE_AUTO_COMPACT_WINDOW Env Var

Most people don't know this exists. By default, Claude Code compacts context at around 180K tokens. With 1M available, you might want to adjust this:

export CLAUDE_CODE_AUTO_COMPACT_WINDOW=500000
Enter fullscreen mode Exit fullscreen mode

Or disable auto-compaction entirely if you're doing deep analysis:

export CLAUDE_CODE_AUTO_COMPACT=false
Enter fullscreen mode Exit fullscreen mode

The key insight: compaction at the wrong time can actually waste more tokens by forcing the model to re-discover context it already had.

4. Structure Your Prompts for Context Efficiency

Instead of "look at everything and fix the bug," try:

Focus on src/auth/ directory only. The login flow is returning 
a 403 when the user has a valid session token. Check the 
middleware chain and identify where the token validation 
is failing.
Enter fullscreen mode Exit fullscreen mode

Scoped prompts + large context = the model has everything available but knows exactly where to look.

5. Batch Related Tasks Into Single Sessions

Context loading is the expensive part. If you need to work on three related features, do them in one session rather than three separate ones. The 1M window makes this practical now — you can keep the full project loaded and work through multiple tasks without reloading.

The Deeper Issue: Developer Focus

Here's something I noticed while optimizing my AI workflow: the same problem that causes token waste also causes human productivity waste.

When I'm jumping between Claude sessions, Slack, Twitter, and email, I'm doing the same thing as loading unnecessary context — burning resources on task-switching instead of actual work.

I started using Monk Mode alongside my coding sessions. It blocks the algorithmic feeds on social apps at the system level, so when I'm in a deep coding session with Claude, I'm not getting pulled into Twitter threads every 10 minutes.

The combination of tracking AI costs in real time (TokenBar) and eliminating feed-based distractions (Monk Mode) basically doubled my productive output. Not because either tool is magic, but because visibility + environment design beats willpower every time.

The Numbers

Since switching to this approach:

  • Average session cost dropped 40% (from tracking and adjusting in real time)
  • Deep work sessions went from ~90 min to 4+ hours (from blocking feed algorithms)
  • Context reload frequency dropped 60% (from batching tasks into longer sessions)

TL;DR

1M context is a power tool. Like any power tool, the difference between productive use and expensive waste is awareness and discipline.

Track your tokens. Scope your prompts. And for the love of your own productivity, block the infinite scroll while you're coding.


What's your approach to managing AI costs? Drop your setup in the comments — always looking for new workflows.

Top comments (1)

Collapse
 
apex_stack profile image
Apex Stack

Rule 4 is the one that changed everything for me. I manage a codebase with 100k+ generated pages across 12 languages — stock analysis, sector pages, ETF data — and the natural instinct was always to let Claude see the whole project structure. But scoping prompts to specific directories cut my session costs dramatically.

The batching point (Rule 5) is underrated too. I run scheduled agent tasks that touch multiple file types — data pipeline scripts, Astro templates, content generation modules — and doing those in a single session instead of three separate ones saves a ton on context reloading. The trick is structuring your CLAUDE.md with directory-scoped rules so the agent knows which conventions apply where without loading everything.

One thing I'd add: for programmatic SEO projects where you're generating thousands of similar pages, the compaction window setting is critical. I found that letting compaction happen too early during batch content operations meant Claude kept losing track of the template patterns mid-run. Setting a higher threshold for those specific workflows made a noticeable difference in output consistency.

What's your typical session length when working with the full 1M window? I've been finding 2-3 hour sessions hit a sweet spot before context quality starts degrading.