Context engineering: What, why and how to engineer context

#ai #llms #contextengineering #promptengineering

In the past two years, prompt engineering has risen as a crucial skill for getting the most out of AI systems. But as context windows grow bigger, a new discipline is emerging: context engineering.

Today, the underlying models are powerful for most tasks. The reason agents fail is that they are not provided the right context. The bottleneck shifts from model capability to system design. That's where context engineering comes in.

What the heck is context engineering?

Context engineering is designing systems to deliver context to the LLM. It involves filling the LLM's context window with exactly what it needs to succeed at a specific step in a complex workflow. It is setting the stage for the models to perform effectively.

The complete environment for the model. Includes background, tone, intent, history, tools, and guardrails.

Compare LLMs to an OS, where CPU is the model and RAM represents a "working memory" for the model. Just like an OS curates what fits into the RAM, context engineering is the science of filling the context window with just the right amount of right information.

Enough information to do the task, but not too much to confuse the LLM and save token costs. AI models get their AI agents get their context from multiple dynamic sources. It includes:

Instructions - prompts, memories, guardrails, and preferences
Retrieval - retrieval systems to fetch relevant information
Tool outputs - data flowing in from web search and APIs

Together, they form the model's perception of the task at hand. Context engineering is about orchestrating this perception with precision, ensuring that every token counts and every part of the context serves a purpose.

The context window

The context window is the amount of information the model sees at a time. It is limited, and there is ample debate around token optimization to best manage the window.

Context engineering is the craft of carefully filling this window with the right information. There is so much context available to the model at any time. For example, a deep-research agent might retrieve hundreds of pages of content through search tools - far exceeding what the model can fit into its context window.

It's like packing a bag for a hike. Take too little and you're lost. Take too much and you're overwhelmed.

If you provide too little information, the model gives vague answers. If you provide too much information, the context overflows.

Why context engineering is important

In multi-agent systems, where a complex task is distributed across several agents, managing context becomes absolutely critical.

Consider a scenario where a task is being divided between multiple agents. The subagents receive a fragment of the overall context and a subtask. The agent that divided the task has the full context of the task. The subagents don't have that broader context. They can easily miscommunicate and produce nothing close to the original task.

This happens often in multi-agent architectures. Agent-to-agent communication can fix the problem of miscommunication between agents, but that's early at the moment. Until that matures, ensure the task is parallelizable for multi-agent scenarios.

One way to curb this problem is context compression. Agent interactions can span hundreds of turns and may have token-heavy tool calls, resulting in context overflow. At each turn, compress the context to only forward high-value tokens.

This is one of the techniques of context engineering. General principles for building agents are still in their infancy, so there is lot of ongoing experimentation.

Beyond prompts: A system approach

Prompts are instructions part of one interaction for accomplishing a task. Prompt engineering involves specific phrasings, examples, or formatting that trigger desired responses. Context engineering requires systematic approaches: database design, information architecture, retrieval systems, and knowledge management.

Prompt is what you ask. E.g. "Translate this paragraph to Spanish"
Context is what the model knows when you ask it. E.g. "User is a South Asian. Previous conversations were about visiting Spain."

The prompt is just the tip of the iceberg. Context is everything underneath that makes it possible.

Engineering context is much larger. Designing systems to best deliver the right context agents running in production for months, require structures. These structures are scalable, and rigorously tested to avoid breaking real-world production environments.

The shift from prompts to context is not just semantic, it's systematic.

Engineering a great context

To engineer effective context, you have to make decisions in real time - about what to include, exclude and carry forward. Key questions include:

What's the broader task at hand?
What information does the model need?
What should it remember from earlier steps?
What should it forget to avoid getting confused?

Context engineering is still an emerging discipline, and best practices continue to evolve. However, it is not an optional skill. It's the core of how the powerful AI systems will work from here on.