Nova Elvaris

Posted on Mar 26

The Context Budget Pattern: Keep LLMs Fast Without Losing the Plot

#ai #productivity #programming #llm

You've hit the wall. Your AI assistant knows everything about your project — and now it's slow, expensive, and occasionally confused. The context window is stuffed, and you're not sure what's helping and what's noise.

The Context Budget pattern fixes this. It's a simple discipline: decide in advance how much context each prompt gets, what goes in, and what stays out.

Why context budgets matter

LLMs don't have infinite attention. Even models with 128K+ token windows degrade when overloaded:

Relevance drops. The model attends to everything equally. Bury your question in 50 pages of code and it'll hallucinate over details.
Latency spikes. More tokens = slower responses = worse developer experience.
Cost climbs. Token pricing is linear. Double the context, double the bill.

A context budget forces you to curate instead of dump.

The pattern in practice

Before you send a prompt, answer three questions:

1. What's the task scope?

Write it in one sentence. "Fix the failing auth test" is a scope. "Help me with my project" is not.

2. What context does the model actually need?

For a failing test, the model needs:

The test file (or the failing test function)
The function under test
The error message

It does not need your entire test suite, your README, or your CI config.

3. What's my token target?

Set a rough cap. For most coding tasks:

Task type	Context budget
Single function fix	1,000–2,000 tokens
Feature implementation	3,000–5,000 tokens
Architecture review	5,000–10,000 tokens
Full codebase Q&A	10,000–20,000 tokens

These aren't hard limits — they're guardrails that make you think before pasting.

A concrete example

Bad prompt (12,000 tokens of context):

Here's my entire Express app. The login endpoint returns 401 
when it shouldn't. Fix it.

[... 400 lines of code across 8 files ...]

Budgeted prompt (1,800 tokens):

The POST /login endpoint returns 401 for valid credentials.

Here's the route handler:
[login.js — 40 lines]

Here's the auth middleware it calls:
[auth.js — 30 lines]

Here's the failing test with error output:
[login.test.js — 25 lines, plus the error message]

Find the bug.

The second prompt is faster, cheaper, and more likely to get the right answer — because the model can actually focus.

Making it a habit

Add a context checklist to your workflow:

## Context Budget Checklist
- [ ] Task scope: one sentence
- [ ] Included files: only what's needed
- [ ] Excluded files: listed (so I don't add them later)
- [ ] Estimated tokens: under my budget
- [ ] Test: could someone with ONLY this context solve the task?

That last question is the key test. If a competent developer couldn't solve it with just the context you're providing, add more. If they could solve it with half, cut.

When to break the budget

Sometimes you genuinely need a big context window:

Architecture decisions that span multiple services
Migration planning where you need before-and-after
Code review of a large PR

Even then, summarize first. Send a high-level overview in the first message, then drill into specific files as follow-ups. Treat context like memory allocation — request what you need, free what you don't.

The payoff

Teams that adopt context budgets consistently report:

30–50% faster responses (less input to process)
Lower API costs (fewer tokens per request)
Better accuracy (the model focuses on what matters)
More predictable outputs (less noise = less hallucination)

The context window is a resource. Manage it like one.

The Context Budget pattern is part of a series on practical prompt engineering for developers. No frameworks required — just discipline and a checklist.

DEV Community