Originally published at devopsdiary.blog. Post F-AID2 in the "Governing AI in the Enterprise" series.
The advice for the last two years has been simple: give the agent more. More tools. A bigger window. Another rules file. Half the talks at AI Dev 26 argued the reverse, and they brought data.
The pitch sounds right, which is why it spreads. If the agent gave a bad answer, it must have been missing something. So you connect another MCP server, paste in more docs, add a second CLAUDE.md. The output gets worse. Now you're debugging a system that has too much to read, not too little.
The four things people believe that aren't true
A Stripe engineer put up a slide listing four myths about feeding context to agents, and I'd repeated at least two of them out loud the month before.
Naive RAG over your docs is not a context engine. A bigger context window does not fix output quality. Connecting more MCP servers does not get you there. And more rule files definitely don't. Each one feels like progress because it's an action you can take.
None of them touch the actual problem.
Underneath, the problem is that access isn't understanding. We wire agents into the code, the logs, the tickets, the docs, and we assume that putting information nearby is the same as the model comprehending it. The agent produces plausible output that compiles and then fails human review, because the thing it needed was never written down anywhere it could reach.
The slide that stuck was an iceberg. Above the waterline: the code, the docs, the tickets you can point an agent at. Below it: the original intent, the reason the thing was built this way, the migration flag nobody documented, the trade-off that got argued out in Slack eight months ago and then deleted. That submerged part decides whether the output is right. None of it lives somewhere you can index.
More context has a cost, and it compounds
Dumping more context in carries a cost, and CodeRabbit named the four ways it bites: bloat, dilution, conflict and latency.
Bloat is the obvious one. You blow the token budget. Dilution is worse, because the signal the model needed is still in there, buried under noise it now has to wade through to find it. Conflict is the dangerous one. Hand the model two sources that disagree and it picks one, hides the seam, and gives you a confident answer built on the wrong half. Latency is the tax on all of it. Slower, more expensive, every single call.
And the cost isn't flat. It grows the further the agent runs from your keyboard. At tab-complete you catch a bad suggestion in a second. With a background agent running unattended, bad context the first time means a silent failure you find days later in review, or worse, in production.
What actually works is curation
| Path | Process | Outcome |
|---|---|---|
| Dump it all in | Bloat → dilution → conflict → latency | Plausible output that fails review |
| Context engine | Unify, retrieve, rank, resolve, govern, scope | Output that passes review |
Same sources, two strategies. "More context" takes the left path. Curated context takes the right.
Same sources, two strategies. "More context" takes the left path. Curated context takes the right.
The teams that got past this stopped adding and started engineering what the agent sees. Stripe's framing was a context engine with six jobs, and it reads like a platform spec, not a prompting tip:
- merge signals from every source into one view
- retrieve only what the task needs, not everything that might be relevant
- rank and compress so tokens aren't wasted
- resolve conflicts by recency and authority instead of hiding them
- enforce permissions and governance across every system it touches
- scope relevance to your repos, your team, your work history
Read that list again and tell me which part is prompt-craft. None of it. It's retrieval, ranking, conflict resolution, access control and governance. That's infrastructure. It's the work platform teams have been doing for other systems for years, pointed at a new consumer.
Unblocked, who build exactly this for a living, showed a before-and-after on the same prompt and the same model, changing only the context. Without their engine, the output scored 2 to 3.5 out of 10 on things like respecting team conventions and not breaking existing code. With it, 8 to 9.5. Same model. Same prompt. The only variable was what the agent was allowed to see, and how well that slice had been curated.
Why this lands on the platform team
"Context engineering" isn't a prompt-writing trick your senior devs pick up on the side. It's a system somebody builds and owns. Something has to unify the sources, score them, govern access and hand the right slice to the right model at the right moment.
The hype said the bottleneck was model intelligence. Then it said the bottleneck was context. Both diagnoses were right about the symptom. Neither said who fixes it: context has to be assembled, ranked and governed, and that's platform engineering.
So when the next vendor tells you the fix is one more MCP server, ask them what happens to your token budget, your conflicts and your permissions once you've connected forty of them. The goal was always the right context. Building the thing that knows the difference is the job, and it's going to land on the platform team whether they staffed for it or not.
Top comments (0)