You've hit the wall. Your AI assistant knows everything about your project — and now it's slow, expensive, and occasionally confused. The context window is stuffed, and you're not sure what's helping and what's noise.
The Context Budget pattern fixes this. It's a simple discipline: decide in advance how much context each prompt gets, what goes in, and what stays out.
Why context budgets matter
LLMs don't have infinite attention. Even models with 128K+ token windows degrade when overloaded:
- Relevance drops. The model attends to everything equally. Bury your question in 50 pages of code and it'll hallucinate over details.
- Latency spikes. More tokens = slower responses = worse developer experience.
- Cost climbs. Token pricing is linear. Double the context, double the bill.
A context budget forces you to curate instead of dump.
The pattern in practice
Before you send a prompt, answer three questions:
1. What's the task scope?
Write it in one sentence. "Fix the failing auth test" is a scope. "Help me with my project" is not.
2. What context does the model actually need?
For a failing test, the model needs:
- The test file (or the failing test function)
- The function under test
- The error message
It does not need your entire test suite, your README, or your CI config.
3. What's my token target?
Set a rough cap. For most coding tasks:
| Task type | Context budget |
|---|---|
| Single function fix | 1,000–2,000 tokens |
| Feature implementation | 3,000–5,000 tokens |
| Architecture review | 5,000–10,000 tokens |
| Full codebase Q&A | 10,000–20,000 tokens |
These aren't hard limits — they're guardrails that make you think before pasting.
A concrete example
Bad prompt (12,000 tokens of context):
Here's my entire Express app. The login endpoint returns 401
when it shouldn't. Fix it.
[... 400 lines of code across 8 files ...]
Budgeted prompt (1,800 tokens):
The POST /login endpoint returns 401 for valid credentials.
Here's the route handler:
[login.js — 40 lines]
Here's the auth middleware it calls:
[auth.js — 30 lines]
Here's the failing test with error output:
[login.test.js — 25 lines, plus the error message]
Find the bug.
The second prompt is faster, cheaper, and more likely to get the right answer — because the model can actually focus.
Making it a habit
Add a context checklist to your workflow:
## Context Budget Checklist
- [ ] Task scope: one sentence
- [ ] Included files: only what's needed
- [ ] Excluded files: listed (so I don't add them later)
- [ ] Estimated tokens: under my budget
- [ ] Test: could someone with ONLY this context solve the task?
That last question is the key test. If a competent developer couldn't solve it with just the context you're providing, add more. If they could solve it with half, cut.
When to break the budget
Sometimes you genuinely need a big context window:
- Architecture decisions that span multiple services
- Migration planning where you need before-and-after
- Code review of a large PR
Even then, summarize first. Send a high-level overview in the first message, then drill into specific files as follow-ups. Treat context like memory allocation — request what you need, free what you don't.
The payoff
Teams that adopt context budgets consistently report:
- 30–50% faster responses (less input to process)
- Lower API costs (fewer tokens per request)
- Better accuracy (the model focuses on what matters)
- More predictable outputs (less noise = less hallucination)
The context window is a resource. Manage it like one.
The Context Budget pattern is part of a series on practical prompt engineering for developers. No frameworks required — just discipline and a checklist.
Top comments (0)