If you've been building with LLMs for a while, you've probably experienced that moment when the model feels like it's more of a hindrance than a help. Worsening quality of answers, constant reminders to start a new chat window, and constraints that continually get forgotten. These are all symptoms of what happens when the models context starts to get too polluted. When building ai agents these problems can be the difference between having a reliable production ready agent and having a system that works great in your local tests but breaks in production.
To bridge this gap and have a consistently reliable agentic system, you need to know how to effectively manage context. This is where context engineering comes into play and is essential for reliable agent building at scale.
Understanding Context in LLMs
Before we dive into context engineering, let's clarify what we mean by "context" in the world of large language models.
Context is everything the LLM can see and process when handling a task. This is essentially all the data The model has access to and can reference in order to fulfill the request. This includes things like:
- Your prompts and system instructions
- Previous messages in the conversation
- RAG documents and retrieved information
- Tool call results and responses
- Multimodal inputs like images, audio, or documents
Every LLM has a context window. A context window is a hard limit on how much information (measured in tokens) the model can access and reference within a single session. When you exceed this limit, the model either truncates information or refuses to process the request entirely. The limit can range from hundreds of thousands of tokens to millions. The type of input provided to the llm (text vs image etc.) also has a significant impact on token usage.
What Is Context Engineering?
Context engineering as described by Andrej Karpathy, former director of AI @ Tesla is: the delicate art and science of filling the context window with just the right information for the next step.
This goes beyond the prompt provided and manages the entire system your agent or llm can access. Think of it as fine tuning the brain of the agent. When a person needs to solve a problem, there are various inputs and reference points we use to discern the right solution. Managing this for an agent is the essential nature of context engineering.
Context Engineering vs. Prompt Engineering
You may already be familiar with the term prompt engineering, but that is just one piece of building reliable AI systems.
Prompt Engineering focuses on:
- Writing clear, specific instructions
- Using effective prompt templates and formats
- Crafting examples and demonstrations
- Optimizing the immediate request to the model
Context Engineering focuses on:
- Managing the entire information environment over time
- Deciding what information to include or exclude across multiple interactions
- Handling long-running conversations and sessions
- Optimizing for reliability and consistency at scale
Think of it this way: prompt engineering is like perfecting one song on an album. Context engineering is managing the entire tracklist and production.
Why This Matters Building Agents
When trying to build scalable agents, there are a number of factors to consider. Typically an agent must do all of these things:
- Maintain long conversations
- Access multiple data sources
- Use various tools
- Remember past decisions
- Handle complex, multi-step tasks
Without effective context engineering you can quickly hit the wall on expected performance and have inefficient token usage which directly translates to increased costs. Writing better prompts will not solve this issue.
Poor context management can lead to these 4 common context failures:
- Context Poisoning - Incorrect or hallucinated responses interfering with the context.
- Context Clashes - Conflicting information causing inconsistent outputs.
- Context Confusion - Irrelevant data in the context influencing the output.
- Context Distraction - The context overwhelming the models training.
You can read more about each of these failures here.
The Four Pillars of Context Engineering
To avoid common context failures, these four key principles will help you to build more robust and production ready systems. While none of these techniques are foolproof, they each give us greater control over our context than without.
1. Write Context
Storing information outside of the context window for later retrieval and usage.
This is about creating persistent memory systems that can maintain important information across sessions and interactions. This can be done in a file that the llm can read from or directly in the state that is managed by the agent.
Key strategies:
- Use file-based memory systems (like Claude.md files or rules files in cursor)
- Implement embedded document stores for large collections of facts
- Build autonomous memory creation based on user feedback and interactions
- Store structured information that can't fit in context windows
2. Select Context
Selecting only the relevant information needed for the context to accomplish the task at hand.
Smart retrieval ensures you're including the right information while excluding the noise that degrades performance. Only selecting the necessary information gives more accurate context to solve the problem and reduces token usage which saves money.
Key strategies:
- Implement semantic similarity scoring for document retrieval
- Use relevance thresholds to filter out low-quality matches
- Create retrieval mechanisms that consider recency, importance, and similarity
- Build fallback strategies when no highly relevant information is found
- Design retrieval that adapts to the specific task requirements
3. Compress Histories
Distilling down to only the required tokens to complete a task.
As conversations and agent trajectories get longer, intelligent compression can help with maintaining performance. This involves summarizing information and/or removing information where necessary.
Key strategies:
- Implement context summarization at natural boundaries (completed phases, tool calls)
- Summarize token-heavy tool call feedback while preserving key insights
- Create checkpoints for important conversation milestones
- Apply auto-compaction when approaching context limits
4. Isolate Contexts
Splitting up the context between multiple windows or agents to break down the task.
Sometimes the best context management is context separation, allowing different components to focus on specific aspects.
This principle is demonstrated in Anthropic's multi-agent research system. Instead of cramming everything into one massive context, they use specialized subagents that each operate with their own focused context windows. This allows them to scale beyond what any single agent could handle while maintaining clarity and focus.
Key strategies:
- Use multiple specialized agents instead of one overpowered agent
- Create sandbox environments that isolate objects from the main context
- Design multi-agent systems for easily parallelizable tasks
- Separate concerns across different context windows with clear boundaries
Key Takeaways
To wrap up:
- Context engineering is distinct from prompt engineering and focuses on managing the entire information environment.
- The four main problems (poisoning, clashing, confusion, distraction) can break even well-designed systems.
- The four pillars (write, select, compress, isolate) provide a framework for systematic improvement.
- Better context management directly translates to lower costs, higher reliability, and better user experience.
Context engineering is still an evolving field, and best practices are still being discovered through real-world experimentation. The more we share what works (and what doesn't), the faster we'll all build better AI systems.
What context engineering techniques have you tried? What worked, and what didn't? Let me know your experience with this new concept!
Top comments (1)
Reading your post on context engineering felt like watching someone nail the orchestration truth in real time. I can relate deeply because I’ve been down that road, coding with Claude as my “intern,” building my own orchestration layers to manage context flow.
As you point out, the real power comes from managing the broader environment; deciding what context to store, when to retrieve, how to compress history, and how to avoid poisoning the model with noise or contradictions. That four-pillar breakdown of write, select, compress, manage history is what separates brittle demos from production-ready agents.
When I first tackled this, I treated Claude like a magic oracle: throw in the entire project, expect consistency. No surprise, I ended up with code scattered across folders, naming conventions lost, and hallucinations masked as features. It wasn’t until I built a structured pipeline by feeding only what's relevant, summarizing history selectively, breaking tasks into atomic steps that the model started behaving more like a junior dev who actually gets the codebase.
Your framing that crafting the right context is like managing a tracklist, not just a single song, captures it perfectly. In my world, that meant building a custom MCP-like orchestrator: auto-fetching relevant modules, producing spec drafts, and handing Claude only the context it needs in bite-sized pieces.
Without disciplined context, even the smartest agent turns into a gardener who weeds unpredictably instead of one who prunes purposefully.
Really well done. This is the engineering shift from hype to craft.