Part 2 of Practical AI Engineering: Beyond the Demo
Bigger context windows create a tempting idea:
Put everything in. Let the model work it out.
That is not context engineering. It is moving the junk drawer closer to the model.
An AI system can receive a huge amount of text and still miss the one fact that matters. The problem is not only how much context fits. It is whether the right information reaches the model at the right moment.
Context is an attention budget
A context window is the information available to the model for the current call. It may contain:
- system instructions,
- the user’s task,
- examples,
- retrieved documents,
- tool descriptions,
- tool results,
- conversation history,
- project notes,
- intermediate plans.
All of these compete for attention.
More context can help when it adds missing evidence. It can hurt when it adds stale, repeated, conflicting, or irrelevant material.
The useful question is not:
How much context can this model hold?
It is:
What is the smallest high-signal context that makes the next decision easier?
The seven layers of useful context
I use this mental model when deciding where information belongs.
1. System instructions: stable behaviour
Put durable rules here:
- safety boundaries,
- output conventions,
- non-negotiable policies,
- the broad responsibility of the assistant.
Do not turn the system prompt into a company wiki. Stable behaviour and changing knowledge have different lifecycles.
2. Task contract: the current job
The task should contain:
- the goal,
- relevant local context,
- constraints,
- expected output,
- acceptance checks.
This layer should answer: What are we trying to accomplish now?
3. Examples: the quality bar
Examples are useful when the desired behaviour is easier to show than describe.
One good code-review finding can teach evidence, severity, and scope better than several vague adjectives.
Examples should be representative. A misleading example can anchor the model in the wrong direction.
4. Retrieval: changing evidence
Use retrieval when the answer depends on information that is:
- too large to keep in every prompt,
- frequently updated,
- specific to the current question,
- expected to be cited or verified.
Retrieval is not memory. It is a search step that selects evidence for this task.
5. Working memory: state that must survive
Long-running work creates facts that are not part of the original documents:
- decisions already made,
- files changed,
- tests that failed,
- unresolved questions,
- the next planned step.
Store these outside the context window, then bring back the relevant parts.
A simple NOTES.md can be more useful than replaying an entire conversation.
6. Tools: possible actions
A tool description is also context. It tells the model:
- what action exists,
- when to use it,
- what arguments are valid,
- what result to expect,
- what side effects it may cause.
Poor tool descriptions create poor tool use. Ten overlapping tools with vague names can be worse than three clear ones.
7. Recent history: local continuity
Recent messages can preserve the flow of a task. Old raw history often becomes noise.
Keep decisions and unresolved items. Compress repeated explanations, large tool outputs, and dead ends.
A simple context compiler
Instead of building one permanent mega-prompt, assemble context for each step.
type AgentTask = {
goal: string;
projectId: string;
query: string;
};
async function buildContext(task: AgentTask) {
const policy = await loadStablePolicy();
const projectState = await loadRelevantProjectNotes(task.projectId);
const evidence = await retrieveEvidence(task.query);
const recentDecisions = await summarizeRecentDecisions(task.projectId);
return {
policy,
task,
projectState,
evidence,
recentDecisions,
};
}
The important idea is not this exact code. It is that context should be selected, not dumped.
Five common context failures
1. Context omission
The model never receives the fact needed to make the right choice.
Symptom: confident but generic answers.
Fix: trace the answer back to the evidence available at that step.
2. Context pollution
The prompt contains too much low-value material.
Symptom: the model follows an old detail and ignores the current goal.
Fix: remove repeated history, raw logs, and unrelated documents.
3. Context conflict
Two instructions or sources disagree.
Symptom: inconsistent behaviour across similar runs.
Fix: define precedence. Prefer current, authoritative sources and expose the conflict when it cannot be resolved.
4. Context staleness
The retrieved or remembered information is no longer true.
Symptom: a well-written answer based on an old policy or obsolete API.
Fix: attach source dates, versions, and expiry rules. Retrieval needs freshness, not only similarity.
5. Context leakage
Sensitive information reaches a model, tool, or sub-agent that does not need it.
Symptom: excessive data exposure and a larger security boundary.
Fix: apply least privilege to context. Redact, scope, and route sensitive tasks deliberately.
Long tasks need context management, not just a larger window
For work that spans many steps, three patterns are especially useful.
Compaction
Summarise the important state and discard bulky history.
Keep:
- architectural decisions,
- current failures,
- requirements,
- next actions.
Drop:
- repeated tool output,
- abandoned paths,
- conversational filler.
Structured notes
Let the agent write durable state outside the context window.
For a coding task, that might be:
Goal:
Migrate the retry path without changing the public API.
Decisions:
- Reuse the original idempotency key.
- Keep the existing timeout value.
Open issues:
- Integration test fails only on Windows.
Next:
- Inspect path handling in test fixtures.
Context isolation
Use a focused sub-agent for a bounded investigation, then return a short result to the main agent.
The main agent does not need every search query and failed attempt. It needs the conclusion, evidence, and uncertainty.
The four-C filter
Before adding any context, ask whether it is:
- Correct: Is it trustworthy?
- Current: Is it still valid?
- Compact: Can it be made smaller without losing meaning?
- Connected: Will it change the next decision?
If it fails the fourth test, it probably does not belong in the current context.
Context engineering in one sentence
Prompt engineering focuses on what we say to the model.
Context engineering focuses on what the system lets the model know at each step.
That includes retrieval, memory, tools, history, and the rules that decide what gets included.
A large context window is useful capacity. It is not an information architecture.
Where is your current AI workflow using “paste everything” as a substitute for context selection?
Top comments (0)