Most teams optimize retrieval quality first. But there's a bigger lever: teaching the system when NOT to retrieve.
Here's how the flow works:
Step 1 — Pause before fetching
User query comes in → Agent evaluates intent first. It may rewrite or reframe the question. In many cases, the model already has enough context to respond. Retrieval only triggers when genuinely needed.
Step 2 — Decouple data access with MCP
Instead of hardcoding every connection to each source, teams run their own MCP servers:
• HR team owns theirs
• Product owns theirs
• Security rules live at the source, not inside the agent
Adding a new source? Plug in the server. No agent refactor needed.
Step 3 — Rank before generating
Retrieved data gets reranked by a stronger model. We filter noise early, not after generation. Then the answer gets evaluated. Good → send. Weak → loop back with improved query logic.
Why this matters:
• Every query fetches something → Only fetch when needed
• Hardcoded connections → Standardized MCP servers
• Security baked into agent → Rules at the source
• Dump & generate → Rerank → Review → Refine
What's been your biggest friction point with RAG pipelines? Sharing experiences below helps everyone learn faster.

Top comments (0)