Vonta Johnson

Posted on Aug 3

Context Engineering: Building Better AI Agents

#ai

If you've been building with LLMs for a while, you've probably experienced that moment when the model feels like it's more of a hindrance than a help. Worsening quality of answers, constant reminders to start a new chat window, and constraints that continually get forgotten. These are all symptoms of what happens when the models context starts to get too polluted. When building ai agents these problems can be the difference between having a reliable production ready agent and having a system that works great in your local tests but breaks in production.

To bridge this gap and have a consistently reliable agentic system, you need to know how to effectively manage context. This is where context engineering comes into play and is essential for reliable agent building at scale.

Understanding Context in LLMs

Before we dive into context engineering, let's clarify what we mean by "context" in the world of large language models.

Context is everything the LLM can see and process when handling a task. This is essentially all the data The model has access to and can reference in order to fulfill the request. This includes things like:

Your prompts and system instructions
Previous messages in the conversation
RAG documents and retrieved information
Tool call results and responses
Multimodal inputs like images, audio, or documents

Every LLM has a context window. A context window is a hard limit on how much information (measured in tokens) the model can access and reference within a single session. When you exceed this limit, the model either truncates information or refuses to process the request entirely. The limit can range from hundreds of thousands of tokens to millions. The type of input provided to the llm (text vs image etc.) also has a significant impact on token usage.

What Is Context Engineering?

Context engineering as described by Andrej Karpathy, former director of AI @ Tesla is: the delicate art and science of filling the context window with just the right information for the next step.

This goes beyond the prompt provided and manages the entire system your agent or llm can access. Think of it as fine tuning the brain of the agent. When a person needs to solve a problem, there are various inputs and reference points we use to discern the right solution. Managing this for an agent is the essential nature of context engineering.

Context Engineering vs. Prompt Engineering

You may already be familiar with the term prompt engineering, but that is just one piece of building reliable AI systems.

Prompt Engineering focuses on:

Writing clear, specific instructions
Using effective prompt templates and formats
Crafting examples and demonstrations
Optimizing the immediate request to the model

Context Engineering focuses on:

Managing the entire information environment over time
Deciding what information to include or exclude across multiple interactions
Handling long-running conversations and sessions
Optimizing for reliability and consistency at scale

Think of it this way: prompt engineering is like perfecting one song on an album. Context engineering is managing the entire tracklist and production.

Why This Matters Building Agents

When trying to build scalable agents, there are a number of factors to consider. Typically an agent must do all of these things:

Maintain long conversations
Access multiple data sources
Use various tools
Remember past decisions
Handle complex, multi-step tasks

Without effective context engineering you can quickly hit the wall on expected performance and have inefficient token usage which directly translates to increased costs. Writing better prompts will not solve this issue.

Poor context management can lead to these 4 common context failures:

Context Poisoning - Incorrect or hallucinated responses interfering with the context.
Context Clashes - Conflicting information causing inconsistent outputs.
Context Confusion - Irrelevant data in the context influencing the output.
Context Distraction - The context overwhelming the models training.

You can read more about each of these failures here.

The Four Pillars of Context Engineering

To avoid common context failures, these four key principles will help you to build more robust and production ready systems. While none of these techniques are foolproof, they each give us greater control over our context than without.

1. Write Context

Storing information outside of the context window for later retrieval and usage.

This is about creating persistent memory systems that can maintain important information across sessions and interactions. This can be done in a file that the llm can read from or directly in the state that is managed by the agent.

Key strategies:

Use file-based memory systems (like Claude.md files or rules files in cursor)
Implement embedded document stores for large collections of facts
Build autonomous memory creation based on user feedback and interactions
Store structured information that can't fit in context windows

2. Select Context

Selecting only the relevant information needed for the context to accomplish the task at hand.

Smart retrieval ensures you're including the right information while excluding the noise that degrades performance. Only selecting the necessary information gives more accurate context to solve the problem and reduces token usage which saves money.

Key strategies:

Implement semantic similarity scoring for document retrieval
Use relevance thresholds to filter out low-quality matches
Create retrieval mechanisms that consider recency, importance, and similarity
Build fallback strategies when no highly relevant information is found
Design retrieval that adapts to the specific task requirements

3. Compress Histories

Distilling down to only the required tokens to complete a task.

As conversations and agent trajectories get longer, intelligent compression can help with maintaining performance. This involves summarizing information and/or removing information where necessary.

Key strategies:

Implement context summarization at natural boundaries (completed phases, tool calls)
Summarize token-heavy tool call feedback while preserving key insights
Create checkpoints for important conversation milestones
Apply auto-compaction when approaching context limits

4. Isolate Contexts

Splitting up the context between multiple windows or agents to break down the task.

Sometimes the best context management is context separation, allowing different components to focus on specific aspects.

This principle is demonstrated in Anthropic's multi-agent research system. Instead of cramming everything into one massive context, they use specialized subagents that each operate with their own focused context windows. This allows them to scale beyond what any single agent could handle while maintaining clarity and focus.

Key strategies:

Use multiple specialized agents instead of one overpowered agent
Create sandbox environments that isolate objects from the main context
Design multi-agent systems for easily parallelizable tasks
Separate concerns across different context windows with clear boundaries

Key Takeaways

To wrap up:

Context engineering is distinct from prompt engineering and focuses on managing the entire information environment.
The four main problems (poisoning, clashing, confusion, distraction) can break even well-designed systems.
The four pillars (write, select, compress, isolate) provide a framework for systematic improvement.
Better context management directly translates to lower costs, higher reliability, and better user experience.

Context engineering is still an evolving field, and best practices are still being discovered through real-world experimentation. The more we share what works (and what doesn't), the faster we'll all build better AI systems.

What context engineering techniques have you tried? What worked, and what didn't? Let me know your experience with this new concept!

Top comments (2)

Guy • Sep 9

Reading your post on context engineering felt like watching someone nail the orchestration truth in real time. I can relate deeply because I’ve been down that road, coding with Claude as my “intern,” building my own orchestration layers to manage context flow.

As you point out, the real power comes from managing the broader environment; deciding what context to store, when to retrieve, how to compress history, and how to avoid poisoning the model with noise or contradictions. That four-pillar breakdown of write, select, compress, manage history is what separates brittle demos from production-ready agents.

When I first tackled this, I treated Claude like a magic oracle: throw in the entire project, expect consistency. No surprise, I ended up with code scattered across folders, naming conventions lost, and hallucinations masked as features. It wasn’t until I built a structured pipeline by feeding only what's relevant, summarizing history selectively, breaking tasks into atomic steps that the model started behaving more like a junior dev who actually gets the codebase.

Your framing that crafting the right context is like managing a tracklist, not just a single song, captures it perfectly. In my world, that meant building a custom MCP-like orchestrator: auto-fetching relevant modules, producing spec drafts, and handing Claude only the context it needs in bite-sized pieces.

Without disciplined context, even the smartest agent turns into a gardener who weeds unpredictably instead of one who prunes purposefully.

Really well done. This is the engineering shift from hype to craft.

Vonta Johnson • Oct 13

thank you so much for reading! I'm glad you could relate in your own problem solving.