Balraj Singh

Posted on Jun 27

Your Context Window Is Not a Knowledge Base

#ai #rag #llm #architecture

Part 2 of Practical AI Engineering: Beyond the Demo

Bigger context windows create a tempting idea:

Put everything in. Let the model work it out.

That is not context engineering. It is moving the junk drawer closer to the model.

An AI system can receive a huge amount of text and still miss the one fact that matters. The problem is not only how much context fits. It is whether the right information reaches the model at the right moment.

Context is an attention budget

A context window is the information available to the model for the current call. It may contain:

system instructions,
the user’s task,
examples,
retrieved documents,
tool descriptions,
tool results,
conversation history,
project notes,
intermediate plans.

All of these compete for attention.

More context can help when it adds missing evidence. It can hurt when it adds stale, repeated, conflicting, or irrelevant material.

The useful question is not:

How much context can this model hold?

It is:

What is the smallest high-signal context that makes the next decision easier?

The seven layers of useful context

I use this mental model when deciding where information belongs.

1. System instructions: stable behaviour

Put durable rules here:

safety boundaries,
output conventions,
non-negotiable policies,
the broad responsibility of the assistant.

Do not turn the system prompt into a company wiki. Stable behaviour and changing knowledge have different lifecycles.

2. Task contract: the current job

The task should contain:

the goal,
relevant local context,
constraints,
expected output,
acceptance checks.

This layer should answer: What are we trying to accomplish now?

3. Examples: the quality bar

Examples are useful when the desired behaviour is easier to show than describe.

One good code-review finding can teach evidence, severity, and scope better than several vague adjectives.

Examples should be representative. A misleading example can anchor the model in the wrong direction.

4. Retrieval: changing evidence

Use retrieval when the answer depends on information that is:

too large to keep in every prompt,
frequently updated,
specific to the current question,
expected to be cited or verified.

Retrieval is not memory. It is a search step that selects evidence for this task.

5. Working memory: state that must survive

Long-running work creates facts that are not part of the original documents:

decisions already made,
files changed,
tests that failed,
unresolved questions,
the next planned step.

Store these outside the context window, then bring back the relevant parts.

A simple NOTES.md can be more useful than replaying an entire conversation.

6. Tools: possible actions

A tool description is also context. It tells the model:

what action exists,
when to use it,
what arguments are valid,
what result to expect,
what side effects it may cause.

Poor tool descriptions create poor tool use. Ten overlapping tools with vague names can be worse than three clear ones.

7. Recent history: local continuity

Recent messages can preserve the flow of a task. Old raw history often becomes noise.

Keep decisions and unresolved items. Compress repeated explanations, large tool outputs, and dead ends.

A simple context compiler

Instead of building one permanent mega-prompt, assemble context for each step.

type AgentTask = {
  goal: string;
  projectId: string;
  query: string;
};

async function buildContext(task: AgentTask) {
  const policy = await loadStablePolicy();
  const projectState = await loadRelevantProjectNotes(task.projectId);
  const evidence = await retrieveEvidence(task.query);
  const recentDecisions = await summarizeRecentDecisions(task.projectId);

  return {
    policy,
    task,
    projectState,
    evidence,
    recentDecisions,
  };
}

The important idea is not this exact code. It is that context should be selected, not dumped.

Five common context failures

1. Context omission

The model never receives the fact needed to make the right choice.

Symptom: confident but generic answers.

Fix: trace the answer back to the evidence available at that step.

2. Context pollution

The prompt contains too much low-value material.

Symptom: the model follows an old detail and ignores the current goal.

Fix: remove repeated history, raw logs, and unrelated documents.

3. Context conflict

Two instructions or sources disagree.

Symptom: inconsistent behaviour across similar runs.

Fix: define precedence. Prefer current, authoritative sources and expose the conflict when it cannot be resolved.

4. Context staleness

The retrieved or remembered information is no longer true.

Symptom: a well-written answer based on an old policy or obsolete API.

Fix: attach source dates, versions, and expiry rules. Retrieval needs freshness, not only similarity.

5. Context leakage

Sensitive information reaches a model, tool, or sub-agent that does not need it.

Symptom: excessive data exposure and a larger security boundary.

Fix: apply least privilege to context. Redact, scope, and route sensitive tasks deliberately.

Long tasks need context management, not just a larger window

For work that spans many steps, three patterns are especially useful.

Compaction

Summarise the important state and discard bulky history.

Keep:

architectural decisions,
current failures,
requirements,
next actions.

Drop:

repeated tool output,
abandoned paths,
conversational filler.

Structured notes

Let the agent write durable state outside the context window.

For a coding task, that might be:

Goal:
Migrate the retry path without changing the public API.

Decisions:
- Reuse the original idempotency key.
- Keep the existing timeout value.

Open issues:
- Integration test fails only on Windows.

Next:
- Inspect path handling in test fixtures.

Context isolation

Use a focused sub-agent for a bounded investigation, then return a short result to the main agent.

The main agent does not need every search query and failed attempt. It needs the conclusion, evidence, and uncertainty.

The four-C filter

Before adding any context, ask whether it is:

Correct: Is it trustworthy?
Current: Is it still valid?
Compact: Can it be made smaller without losing meaning?
Connected: Will it change the next decision?

If it fails the fourth test, it probably does not belong in the current context.

Context engineering in one sentence

Prompt engineering focuses on what we say to the model.

Context engineering focuses on what the system lets the model know at each step.

That includes retrieval, memory, tools, history, and the rules that decide what gets included.

A large context window is useful capacity. It is not an information architecture.

Where is your current AI workflow using “paste everything” as a substitute for context selection?

Top comments (3)

Tae Kim • Jun 28

The attention degradation from conflicting context is especially visible when the retrieval layer hasn't resolved entities first. In our news graph pipeline, stuffing the context with raw retrieval results before deduplication caused the model to read about the same real-world entity under 3-4 different name forms; accuracy didn't improve with more context, it actually dropped because the model spent attention reconciling aliases rather than reasoning about substance.

Mike Czerwinski • Jun 28

The part that earns the whole piece is pushing structured notes outside the window. Compaction, external state, sub-agents that return only conclusions. That is the right shape, and most people stop at a bigger window and call it engineering.

Where I would lean on it: your Four-C filter has a soft spot in the first two C's. Correct and Current are categories, not mechanisms. A note passes Correct because it looks trustworthy and passes Current because nothing flagged it, but a note cannot detect its own staleness. It was true when it was written and then it rots quietly while still reading as structured and clean. That is the failure mode that hides inside the cure: the memory ends up auditing its own freshness, which means it audits nothing.

So the question your framework points at but does not answer: who stamps a note as still true? If the answer is the note itself, or the same agent that wrote it, then Current is just a hope with a label. The only version that holds is when something outside the loop reconfirms it against source before the model is allowed to trust it. Otherwise you have not built a knowledge base. You have built a very tidy junk drawer that lies on schedule.

UnitBuilds • Jun 28

Was the case, but turns out it just needed a completely different design in general. My AI doesnt need context, or cache, because the codebase is both.

DEV Community