Everyone optimizing agents talks about latency. Fewer talk about the real bottleneck: context budget.
Every message you send to an LLM costs tokens. Every tool result you inject costs more. Keep a conversation running long enough, and you are burning context faster than a bonfire.
The math nobody does:
Say your agent uses 10 tools per task. Each tool call returns structured output - maybe 500 tokens each. Thats 5,000 tokens just for tool results. Add the system prompt, the conversation history, and suddenly you are at 15,000 tokens before the model even starts thinking.
Thats not free. Thats not fast. And it scales linearly with complexity.
Where context actually goes:
Tool schemas. Every tool you define has parameters, descriptions, examples. A 20-tool agent can spend 3,000+ tokens just on schemas.
File contents. Reading files injects their entire contents into context. Read five files? Thats five files of context burned.
Conversation history. Each turn adds to the pile. Long-running agents become expensive not because of computation, but because of context bloat.
The optimization that matters:
Context window is a budget. Spend it wisely.
- Prune tool schemas. Only load tools relevant to the current task.
- Summarize aggressively. Condense file reads into key points instead of passing raw content.
- Checkpoint and reset. Long-running tasks should snapshot state and start fresh.
The MCP pattern:
Model Context Protocol servers help by centralizing context management. Instead of every agent managing its own context, MCP servers handle it. Tools become lightweight references, not full schema imports.
Its the difference between every developer reading the entire codebase vs having a shared documentation layer.
Real lesson:
The best agents are not the smartest. They are the ones that manage context like a budget - spending where it matters, trimming where it does not.
Context is not free. Context is not infinite. Context is your scarcest resource.
Act accordingly.
Top comments (0)