Every model brags about context windows now. 128K tokens. 200K tokens. "Paste your entire codebase!" the marketing says.
I tried it. I pasted 80K tokens of a Node.js project into Claude and asked it to find a bug. It found a bug — in a file I didn't care about, while ignoring the actual issue in the file I mentioned.
Here's what I learned about context windows the hard way.
The Attention Problem
Large context windows don't mean the model pays equal attention to everything. Research on "lost in the middle" showed that LLMs disproportionately focus on the beginning and end of the context, with reduced attention in the middle.
In practice, this means:
- File 1 of 50: high attention ✓
- Files 2-49: declining attention ✗
- File 50: high attention ✓
- Your actual question at the end: high attention ✓
So if your bug is in file 27, the model literally pays less attention to it — even though it's "in context."
The Cost Problem
128K input tokens on GPT-4o costs about $0.32. That sounds cheap until you're iterating. Ask 10 follow-up questions with the same context and you've spent $3.20 — for one debugging session. Do this daily and you're burning $60+/month on context you mostly don't need.
The Noise Problem
More context means more potential for the model to get confused. Paste your entire codebase and ask "how does auth work?" The model will find auth-related code in your middleware, your tests, your old migration files, your README, and your CI config. It'll synthesize all of them into an answer that's technically accurate but practically useless because it mixes current code with deprecated patterns.
What Actually Works: The Context Budget
I now treat context like a budget. Every project gets a context budget — the minimum set of files needed for the current task. Nothing more.
Rule 1: 3-File Maximum for Bug Fixes
For any bug fix, I include at most 3 files:
- The file with the bug
- The file that calls it
- The test file (if one exists)
That's usually 1-3K tokens. Fast, cheap, and the model's full attention is on the relevant code.
Rule 2: Tree Before Files
Before pasting any code, I give the model the project structure:
Here's the project tree (files only, no contents):
src/
auth/
middleware.ts
oauth.ts
session.ts
api/
users.ts
posts.ts
db/
schema.prisma
migrations/
Then I ask: "Which files do you need to see to [task]?" The model picks 2-3 files. I paste those. This is dramatically more effective than dumping everything.
Rule 3: Summaries Over Source for Large Codebases
For architectural questions, I write a 200-word summary instead of pasting 50 files:
## Architecture Summary
- Express API with Prisma ORM, PostgreSQL
- Auth: Google OAuth2, sessions stored in Redis
- 3 main domains: users, posts, comments
- Each domain has: router, service, repository layers
- Tests: Vitest for unit, Playwright for e2e
- Current issue: session expiry not triggering token refresh
This gives the model the map without the territory. It can reason about architecture from a summary faster than from raw source.
Rule 4: Rolling Context for Multi-Step Work
For long sessions, I maintain a "working set" — a running summary of what we've done and decided:
## Working Set (updated after each step)
- Fixed: session middleware now checks token expiry
- Changed: added `refreshToken()` call in auth middleware
- Pending: update tests for new refresh flow
- Decision: using sliding window expiry (not fixed)
This replaces the growing conversation history. Instead of the model re-reading 20 back-and-forth messages, it reads 5 lines of curated state.
The Number That Matters
In my experience, the sweet spot is 2,000-8,000 tokens of input context for most coding tasks. That's:
- 3-5 source files, or
- 1 file + project summary + task description, or
- A working set summary + the current file
Above 8K, I start seeing diminishing returns. Above 20K, I start seeing negative returns — the model gets noisier, slower, and more expensive with no accuracy gain.
The 128K window isn't there for you to use all of it. It's there so you can include a large file when you need to. Think of it as cargo capacity, not a target.
Quick Reference
| Task | Context Budget |
|---|---|
| Bug fix | ~2K (3 files max) |
| New feature | ~5K (spec + related files) |
| Architecture question | ~1K (summary only) |
| Code review | ~4K (diff + touched files) |
| Refactoring | ~8K (module + tests + conventions) |
How do you manage context window usage? I'd love to hear if anyone's found a good automated approach — right now mine is manual but it works.
Top comments (0)