Shashi Jagtap

Posted on Oct 2

Agent Optimization: Why Context Engineering Isn’t Enough

#ai #llm #performance

In the rapidly evolving world of AI agents and large language models, the conversation has moved quickly. It began with prompt engineering, then shifted to context engineering which focuses on curating what information enters a model’s context window. The topic of Context Engineering is not stopping at all as Anthopic published blog post on effective context engineering for AI agents recently.

As context engineering is being discussed everywhere, one fundamental problem has being ignore in all context is optimization. Until and unless this basic problem of optimization of prompt gets resolved curating the context will be still brittle when model or weight shifted. There is an opportunity to explore new concept of Agent Optimization whch views the agent not as a prompt or a context buffer but as a complete system. It spans prompts, retrieval augmented generation, memory, and tool orchestration. The idea is straightforward. Context alone cannot guarantee reliability especially as models evolve. To build sustainable agents, optimization must occur across the entire pipeline. Agent Optimization: The Next Big Shift Beyond Context Engineering? Let's dive deeper.

Context Engineering: A Crucial Foundation

Context engineering rose to prominence because it addressed an immediate bottleneck: the finite nature of an LLM’s attention. Models can only process so many tokens effectively and longer contexts often lead to what Anthropic calls context rot. This is when recall accuracy decreases as sequence length increases. In its widely read guide, Anthropic described context as “a critical but finite resource for AI agents.” Their strategies included:

Compaction: Summarizing long histories to preserve coherence without exhausting the context window. In Claude Code this technique is used to keep multi hour programming sessions coherent.

Structured Note Taking: Storing important facts outside the context and retrieving them only when necessary, reducing token waste.

Sub Agent Architectures: Delegating tasks to smaller agents with clean contexts and then integrating their outputs.

Just in Time Retrieval: Dynamically fetching information instead of overloading the initial context with every possible detail.

These methods improve efficiency, reduce hallucination, and enhance autonomy. Frameworks such as LangChain and DSPy could integrate many of these strategies, proving their practical value.

Yet the limitations are clear. Curated context is fragile when models change. Optimizations tuned for Claude may not work for GPT 6 or Llama 5 or next future models. The transformer architecture imposes strict limits on attention which means longer context windows do not always translate into better performance. Context engineering answers what information goes in but not how the agent interprets it or adapts to it.

Why Context Alone Falls Short

Research has shown that scaling context length does not eliminate the problem. Studies on effective context length reveal that many open source models struggle to maintain accuracy beyond a fraction of their advertised window. This means the usable context is often much smaller than the maximum. Another challenge is brittleness across models. A context strategy that works well on one model may degrade when applied to another because of differences in inductive biases. This problem is already visible in enterprise deployments where upgrading models can cause accuracy to drop for downstream tasks.

Most importantly context engineering does not optimize the rest of the agent. Prompt templates, RAG pipelines, memory persistence, and tool usage remain under optimized. Without tuning across these layers agents may sound coherent but fail at reliability.

Agent Optimization: A Holistic Approach

Agent Optimization reframes the challenge. It assumes that agents are systems of systems and that each layer must be optimized for long term robustness.

Prompt Optimization

The GEPA framework has demonstrated that prompts can evolve dynamically by mutating based on execution traces. In evaluations GEPA achieved up to nineteen percent improvement over static baselines in low rollout environments. Unlike context engineering this method is robust to model updates since prompts evolve with the system rather than remaining fixed.

Retrieval Augmented Generation Optimization

RAG pipelines are widely used to reduce hallucination but their success depends heavily on retriever quality, embedding choice, ranking depth, and filtering strategies. Research has highlighted failure cases where irrelevant or adversarial passages undermine accuracy in domains such as healthcare. Optimizing retrieval is as critical as optimizing the model itself.

Tool Calls and Orchestration

Tools extend agent capability but only if designed carefully. Anthropic advises that tools should be minimal, explicit, and non overlapping. Optimizing tool invocation and validation reduces errors and ensures agents use tools effectively.

Memory and Persistence

Memory remains a difficult problem. Techniques such as structured note taking help but the harder challenge is deciding what to remember, how to compress it, and when to retrieve it. Adaptive memory systems that evolve with usage are increasingly seen as part of the optimization stack.

Frameworks and Programmability

DSPy represents the direction forward. Instead of ad hoc prompt hacking it provides declarative modules for prompts, retrieval, and memory which can be optimized automatically with algorithms such as GEPA or MIPROv2. In benchmarks like HotPotQA DSPy raised accuracy from twenty four percent to fifty one percent, results that context engineering alone could not achieve.

Why Agent Optimization?

Agent Optimization could make this things better because it directly addresses the problems engineers are facing today.

It is adaptable and survives model updates.
It is robust and tunes retrieval, prompts, memory, and tools together.
It is scalable and enables systematic improvement instead of trial and error.
It is the natural next step. Prompt engineering gave way to context engineering. Context engineering might give way to Agent Optimization or Agent Engineering.

Conclusion

Context engineering remains important but it is no longer the endpoint. Some of the technologies from Anthropic, GEPA, and DSPy shows that the future lies in Agent Optimization or Agent Engineering. This holistic approach treats prompts, retrieval augmented generation, memory, and tools as interconnected layers that must be optimized together.The most reliable AI agents of the future will not be those with the best curated context windows but those optimized across the full stack. In 2025 and beyond Agent Optimization will define the next wave of reliable and adaptive AI systems. What do you think?

References

Anthropic. Effective Context Engineering for AI Agents.
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
Chroma. Context Rot: How Increasing Input Tokens Impacts LLM Performance.
https://research.trychroma.com/context-rot
Huang et al. Why Does the Effective Context Length of LLMs Fall Short? (STRING). arXiv, 2024.
https://arxiv.org/abs/2410.18745
Wang et al. GEPA: Genetic Evolutionary Prompt Adaptation. arXiv, 2025.
https://arxiv.org/abs/2507.19457
IBM. What is Model Drift?
https://www.ibm.com/think/topics/model-drift

DEV Community