DEV Community

benoit Pecqueur
benoit Pecqueur

Posted on

Smart MCP

The Problem: Multi-Tool Routing Nobody Talks About
The Model Context Protocol has become the de facto standard for giving AI assistants access to external tools and data sources. Connect your CRM. Connect your analytics. Connect your file storage. In theory, a single AI assistant can now answer "How did last quarter's email campaign affect our Shopify revenue, and does our QuickBooks profit-loss reflect that?" by pulling from three different systems simultaneously.
In practice, this is where most MCP implementations quietly fall apart.
The challenge is not the protocol itself. MCP is well-designed. The challenge is routing: given a user's natural language question, which tools should be called, in what order, with what parameters, and how should the results be synthesized into a coherent answer?

"The question isn't whether your MCP server can access twenty data sources. The question is whether it can figure out which three of those twenty actually matter for the question in front of it — before it wastes 40,000 tokens finding out."

How Today's Platforms Handle Multi-Tool Orchestration
Two patterns dominate the market.
Client-Side Routing (e.g. ChatGPT / Custom GPTs)
The host LLM examines every available tool, decides which to call, executes them, receives all raw results, then synthesizes a response.
Problem: With fifteen connected data sources, you can easily exceed 20,000 tokens in tool result data before the model writes a single word. At scale, this is economically and latency-wise untenable.
Search-Based Routing
The model queries for relevant tools using keywords and loads only matched tools. This reduces the tool list in context — but keyword search misses semantic intent.
If a user asks "What's making our revenue drop?" and the relevant tool is named GET_SHOPIFY_ORDER_SUMMARY, the semantic distance between the query vocabulary and the tool name may cause the search to miss it entirely. The host LLM still receives raw tool outputs, so the token cost of results hasn't been addressed — only the cost of tool selection.

The Architecture: An LLM Inside Your MCP Server
The insight at the core of our approach: move the routing intelligence inside the MCP server itself.
Instead of asking the host LLM to figure out which of your twenty tools to call, embed a small, fast, purpose-built language model inside the server that handles this decision transparently.
From the host LLM's perspective, it makes a single tool call. From the user's perspective, they get a synthesized answer that draws from exactly the right data sources. Everything in between — tool selection, parallel execution, result synthesis, error handling — happens inside the MCP server, invisible to the host model.
HOST LAYER
User Query

Host LLM → makes 1 MCP tool call

MCP SERVER LAYER
Embedded Router LLM
→ intent classification
→ tool plan generation

Parallel Execution (Tool A, Tool B, Tool C...)

Synthesis → merge results + format output

Single synthesized response returned

HOST LAYER
Host LLM receives clean answer
(no raw payloads ever exposed)
Key implication: The host LLM's context window is never contaminated with raw tool outputs. It sends a natural language question. It receives a clean, synthesized answer.

How the Embedded Router Classifies Intent
The embedded router is not a general-purpose LLM. It has one job: read a natural language query, reason about which tools are needed, and produce a structured execution plan. It outputs deterministic JSON. Nothing else.
The router operates against a tool schema registry — a compact representation of every available tool including semantic descriptions of what kinds of questions it can answer and its relationships to other tools.
Example query: "Why did our email revenue drop last month compared to the month before?"
Embedded router reasoning:

Temporal comparison → need date-filtered metrics
"Email revenue" → campaign + flow performance tools
"Drop" implies comparative analysis, not just a snapshot

Router output:
json{
"tools": ["get_klaviyo_campaign_revenue", "get_klaviyo_flow_performance"],
"params": { "date_range": "last_2_months", "compare": true },
"synthesis_goal": "revenue_comparison_with_delta"
}
Semantic, Not Keyword
Five different phrasings of the same underlying question:

"Why is revenue dropping?"
"Sales are lower, what happened?"
"Compare this week vs last week"
"Show me the revenue trend"
"Our numbers look off since the email campaign ended"

All collapse to the same intent category and the same tool calls. No query word appears in any tool name. Keyword search cannot reliably achieve this.

Token Economics
Assume 15 connected data sources, 1,500 tokens average per tool call, 5 tools needed for a complex query.
ApproachTokens ConsumedClient-side Orchestration (all 15 tools in context)38,000Search-based Routing18,000Embedded LLM Router~4,000
89% reduction vs client-side.
The host only receives the final synthesized answer — typically 300–600 tokens regardless of how many tools were called. Token costs grow sub-linearly with the number of connected data sources.
CharacteristicClient-SideSearch-BasedEmbedded RouterHost LLM context pollutionHighMediumNoneSemantic routing accuracyHighLowHighRound-trips for routing1 + parallel2–30 (internal)Cross-tool synthesis qualityHighMediumHighScales to 20+ toolsPoorlyModeratelyWellImplementation complexityLowMediumHigh

Request Lifecycle
StageTimingWhat HappensQuery Ingestion~0msHost LLM calls MCP server with natural language queryIntent Classification~200msEmbedded router reads query + schema registry, generates planParallel Execution~800msSelected tools called concurrently, errors caught per-toolResult Synthesis~1,100msRouter LLM merges all outputs into coherent answerResponse Returned~1,400msHost LLM receives clean structured answer
Graceful Degradation
When a tool call fails — rate limit, API timeout — the router synthesizes the best possible answer from tools that succeeded and explicitly annotates the gap:
"Revenue data from Shopify is unavailable — API timeout. Klaviyo email attribution and GA4 session data were retrieved successfully."
This is far more useful than a generic error.

Schema Design: The Underappreciated Challenge
Standard MCP tool schemas describe what parameters a function takes. That's inadequate for routing.
Standard schema — adequate for execution, poor for routing:
json{
"name": "get_shopify_order_summary",
"description": "Get summary of recent orders",
"inputSchema": { "start_date": "string", "end_date": "string" }
}
Routing-aware schema — enables semantic classification:
json{
"name": "get_shopify_order_summary",
"intent_tags": ["revenue", "sales", "ecommerce", "orders", "trend"],
"answers_questions_like": [
"How much did we sell this month?",
"Why is revenue dropping?",
"What is our average order value?"
],
"complementary_tools": ["get_klaviyo_campaign_revenue", "run_ga4_report"],
"returns": "Revenue total, order count, AOV, top products by date range"
}
The complementary_tools field teaches the router which tools are frequently most useful together, enabling proactive multi-tool execution plans for queries that only mention one domain.
Caution: Schema Drift
Stale schemas are a silent failure mode. The router makes plausible-sounding decisions based on outdated information — no error, no alert, just a wrong answer delivered confidently. Automated schema validation in your CI pipeline is not optional. It's foundational.

What This Architecture Unlocks

  1. Context Window Preservation The host LLM's context remains available for the actual conversation — longer histories, more complex instructions, higher-quality responses.
  2. Unlimited Tool Scale Each new tool adds a small entry to the schema registry. The host LLM never sees it unless it's needed. Fifty connected data sources remain performant on simple queries.
  3. Cross-Source Synthesis Without Client-Side Complexity The router recognizes multi-source queries, calls all relevant tools, and returns a single synthesized answer. No client-side orchestration required.
  4. Reusable Routing Intelligence Across Host Models Swap host LLMs — Claude, GPT-4, an open-source model — without rebuilding orchestration. The server presents identical behavior regardless of what's calling it.
  5. Observability and Control Every routing decision is logged. Audit which tools were selected for which queries, identify misrouting patterns, improve schemas, tune behavior. None of this is possible when routing happens inside the host LLM's opaque reasoning.

"The paradox of multi-tool AI is that more capability creates more complexity — unless you have a layer that absorbs that complexity before it reaches the host model. The embedded router is that layer."

Trade-offs and When Not to Use This Pattern
Operational overhead is real. Running an embedded LLM means running inference infrastructure. For low-traffic deployments, this cost may outweigh the token savings. The pattern becomes economical typically when you have more than five connected tools with meaningful daily query volume.
Simpler approaches are better if your MCP server exposes only 2–4 tools covering distinct, non-overlapping domains. A simple rule-based dispatcher is perfectly adequate there.
Schema maintenance is ongoing work. Teams that underinvest in schema quality will find routing accuracy degrading silently.
Pre-synthesis has a ceiling. If your use case requires the host model to reason directly over raw data — statistical analysis, anomaly detection, complex derived metrics — pre-synthesis may discard information the model needs.

Conclusion
The Model Context Protocol has solved the integration problem. The next challenge is the orchestration problem: making multi-tool queries feel effortless, economical, and intelligent.
Embedded LLM routing moves intelligence to where the tools live, keeping the host model's context clean, reducing token costs dramatically, and enabling semantic routing accuracy that keyword search cannot match.
The best AI integrations are ones where the routing infrastructure is invisible. Users ask natural questions and receive coherent answers drawn from exactly the right sources. The complexity lives in the server. The conversation stays clean.

CorpusIQ connects 50+ business tools into a single AI conversation. corpusiq.io

Top comments (0)