What is context engineering?
Prompt engineering is about asking better.
Context engineering is about deciding what the model gets to know while it works.
That sounds like a small shift, but it changes how you build with AI. A prompt can help with a single reply. Context engineering is what you need when the model has to work across tools, files, memory, tests, retries, and several steps of a real task.
Most agent failures do not come from one bad sentence in the prompt. They come from the model seeing the wrong working set: too much history, missing files, stale decisions, huge logs, old failed attempts, or instructions that no longer match the current step.
The context window is not a filing cabinet. It is a workbench. If you pile everything onto it, the useful parts get buried.
The short definition
Context engineering is the practice of choosing, shaping, storing, retrieving, compressing, and isolating the information an AI system receives at each step.
That includes:
- the user request
- system instructions
- relevant files
- retrieved documentation
- examples
- tool definitions
- tool results
- memory
- previous decisions
- error notes
- acceptance criteria
- the current task state
The goal is not to maximize context. The goal is to give the model the right context for the next action.
Sometimes that means adding more information. Sometimes it means removing information. Sometimes it means writing information outside the chat so the next step can retrieve it cleanly.
Why this became a real problem
Early prompt engineering mostly dealt with single calls: write this email, summarize this document, generate this function, explain this error.
Agents are different.
An agent may plan, call tools, inspect files, make edits, run commands, read failures, retry, and prepare a final answer. That creates a long trail of context. Some of it is useful. Some of it is noise. Some of it was useful five minutes ago but is now harmful.
Coding agents make this easy to see.
Imagine an agent working on a medium-sized feature. It starts with a ticket, scans the repo, drafts a plan, edits a few files, runs tests, hits an error, patches the error, runs more tests, finds a second issue, and then tries again.
If every step keeps inheriting the whole transcript, the model starts carrying around:
- old plans
- abandoned approaches
- stack traces from fixed errors
- file contents that no longer match disk
- tool output that was only relevant to one branch of the attempt
- user corrections mixed with model guesses
After a while, the agent is no longer solving the clean task. It is solving the task plus the sediment of every previous attempt.
That is context rot.
Larger context windows do not solve it by themselves
A larger context window gives the model more room. It does not decide what belongs in the room.
This matters because agent context is not just "more text." It is a live bundle of instructions, evidence, state, and tool access. Bad context can pull the model toward the wrong action even when the answer is technically somewhere in the window.
You can see the same thing in human work. Give a developer the current ticket, the target files, and the latest failing test, and they can focus. Give them every Slack message, every old branch, every failed patch, and every log from the last week, and now they have to do archaeology before they can write code.
Models have the same problem, with less judgment.
Context engineering vs prompt engineering
Prompt engineering asks:
- How should I phrase the request?
- What role should the model play?
- What examples should I include?
- What output format do I want?
Context engineering asks:
- What does this step need to know?
- What should be stored outside the chat?
- What should be retrieved now?
- What should be summarized?
- What should be hidden from this step?
- Which tool results are worth keeping?
- When should the agent start fresh?
- Which artifact is the source of truth?
Both matter. Good prompting still helps. But once a system has memory, tools, files, and multiple steps, the prompt is only one part of the environment.
The four practical moves
A useful way to think about context engineering is: write, select, compress, isolate.
1. Write important state down
Do not rely on the conversation transcript as the only memory.
Write important state into durable artifacts:
- requirements
- decisions
- plans
- acceptance criteria
- implementation notes
- test results
- failure summaries
- review notes
This gives the system something cleaner than chat history. A later step can load the current PRD or the latest failure note without dragging along every false start that produced it.
For coding agents, this is especially useful because files, specs, and test reports are easier to review than a long model transcript.
2. Select only what matters now
Each step should receive the smallest useful working set.
If the agent is planning, it may need the ticket, repo structure, relevant files, and user constraints.
If the agent is editing one small task, it may need only:
- the active task
- the target files
- acceptance criteria
- a compact note from the last failure
- the command it should run to verify the work
The model does not need every planning draft or every old tool call just because those things happened earlier.
Selection is where many agent systems fail. They retrieve too much, retrieve the wrong thing, or treat all previous context as equally important.
3. Compress noisy information
Some information should survive, but not in full.
A 2,000-line test log may contain one useful fact: one assertion failed because the mocked user had no email address. A retry does not need the whole log. It needs the failure note.
Good compression turns messy context into useful working memory:
Previous attempt failed because the password reset test creates a user without an email.
Update the fixture or guard the email access before rerunning:
npm test -- password-reset
Bad compression hides the important detail:
Tests failed. Try again.
The point is not to summarize everything. The point is to preserve what changes the next action.
4. Isolate phases
Planning, coding, retrying, testing, and reviewing should not all share the same context shape.
A planner benefits from broad context. A coder benefits from narrow context. A reviewer benefits from the final diff, the requirement, and verification results. A retry often benefits from a clean session plus one compact failure note.
This is the part many people miss. Context engineering is not only retrieval. It is also deciding which information should not cross a boundary.
What bad context looks like
Bad context usually feels convenient at first:
Give the model everything.
Keep the whole chat.
Paste the full logs.
Include every file.
Ask it to continue.
That works for a while. Then the agent starts making strange choices.
It edits the wrong file because an old file dump is still in the transcript. It repeats a failed idea because the failed attempt is still nearby. It forgets the acceptance criteria because a huge tool result pushed the actual task out of focus. It treats a plan draft as if it was approved.
The model did not become lazy. The working set became polluted.
What good context looks like
Good context is boring in the best way.
It is explicit:
Current phase: implementation
Active task: add empty-state copy to the media manager
Target files:
- app.js
- styles/managers-media.css
Acceptance criteria:
- Empty state appears only when no media exists.
- Upload button remains visible.
- Layout works on mobile.
Validation:
- node _check_css.js
- manual browser check later
It is narrow:
Do not modify campaign logic.
Do not change media upload behavior.
Do not run full end-to-end tests.
It carries forward only useful failure information:
Last attempt broke keyboard focus because the empty-state container replaced the upload button.
Keep the upload button outside the conditional empty-state block.
This kind of context gives the model less room to improvise in the wrong direction.
Context engineering is not just RAG
Retrieval augmented generation is one technique inside context engineering. It helps the system fetch relevant information from docs, files, memories, or a database.
But context engineering is wider.
It includes tool design, memory design, artifact design, phase boundaries, retry behavior, human approval gates, and the shape of tool results.
For example, an agent with fifty tools may perform worse if every tool definition is always loaded. The system has to decide which tools belong in the current step. The same is true for files, examples, memories, and prior messages.
RAG answers: what should we fetch?
Context engineering answers: what should this model call know, what should it be allowed to do, and what should survive afterward?
A checklist for agent builders
Before you build a serious agent workflow, ask:
- What is the source of truth for the task?
- Which artifacts should survive between steps?
- Which old messages should be discarded?
- What is the smallest useful context for this phase?
- Which tools should the model see right now?
- How should large tool results be stored?
- What should happen after a failed attempt?
- Can a human review the plan before execution?
- Can a human review the final diff before merge?
If the answer to most of these is "the chat has it," the system is fragile.
How LoopTroop applies context engineering
LoopTroop is built around this problem in AI coding: long tasks get worse when every step lives inside one endless chat.
LoopTroop is a local, open-source GUI app for running coding tickets across local Git repositories. It is not trying to replace every coding agent. It is the orchestration layer around longer work: planning, splitting, execution, retries, logs, review artifacts, and human approval.
The relevant pieces are:
- LLM Council for planning
- PRDs as durable specs
- Beads as small implementation units
- OpenCode as the execution layer
- Git worktrees for isolated repo work
- Ralph Loop retries for failed or stuck beads
- human gates before important transitions
The important context engineering move is that each stage gets a different working set.
The council can look broadly and compare plans. The PRD becomes the source of truth. Beads turn a large task into smaller units with acceptance criteria and validation commands. Execution gets one bead at a time instead of the whole project history. If a bead fails, the Ralph Loop saves a compact note, resets the attempt, and retries with fresh context rather than dragging the polluted session forward.
That is the practical version of context engineering: keep the useful state, discard the noise, and give the model a clean job at each phase.
For tiny edits, a direct chat or editor agent is usually enough. For larger tickets, especially the ones that touch several files or need multiple retries, context engineering is what keeps the agent reviewable.
The final output should not be "the AI wrote some code." It should be a plan, a set of small tasks, logs, test commands, diffs, and a result a human can inspect without reading a giant transcript.
That is the lane LoopTroop is trying to own.
Sources
- Anthropic, "Effective context engineering for AI agents": https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- Anthropic, "Effective harnesses for long-running agents": https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
- OpenAI Cookbook, "Context Engineering - Short-Term Memory Management with Sessions": https://developers.openai.com/cookbook/examples/agents_sdk/session_memory
- OpenAI Cookbook, "Context Engineering for Personalization": https://developers.openai.com/cookbook/examples/agents_sdk/context_personalization
- LangChain docs, "Context engineering in agents": https://docs.langchain.com/oss/python/langchain/context-engineering
- LangChain, "Context Engineering": https://www.langchain.com/blog/context-engineering-for-agents
- LoopTroop: https://www.looptroop.ovh/
- LoopTroop GitHub: https://github.com/looptroop-ai/LoopTroop




Top comments (0)