<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: benoit Pecqueur</title>
    <description>The latest articles on DEV Community by benoit Pecqueur (@benoit_pecqueur_7a5bf1a2f).</description>
    <link>https://dev.to/benoit_pecqueur_7a5bf1a2f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3864765%2F7da723ef-f1f2-4e4d-8ef3-4ca5a2602848.png</url>
      <title>DEV Community: benoit Pecqueur</title>
      <link>https://dev.to/benoit_pecqueur_7a5bf1a2f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/benoit_pecqueur_7a5bf1a2f"/>
    <language>en</language>
    <item>
      <title>Smart MCP</title>
      <dc:creator>benoit Pecqueur</dc:creator>
      <pubDate>Tue, 07 Apr 2026 02:20:38 +0000</pubDate>
      <link>https://dev.to/benoit_pecqueur_7a5bf1a2f/smart-mcp-42p0</link>
      <guid>https://dev.to/benoit_pecqueur_7a5bf1a2f/smart-mcp-42p0</guid>
      <description>&lt;p&gt;The Problem: Multi-Tool Routing Nobody Talks About&lt;br&gt;
The Model Context Protocol has become the de facto standard for giving AI assistants access to external tools and data sources. Connect your CRM. Connect your analytics. Connect your file storage. In theory, a single AI assistant can now answer "How did last quarter's email campaign affect our Shopify revenue, and does our QuickBooks profit-loss reflect that?" by pulling from three different systems simultaneously.&lt;br&gt;
In practice, this is where most MCP implementations quietly fall apart.&lt;br&gt;
The challenge is not the protocol itself. MCP is well-designed. The challenge is routing: given a user's natural language question, which tools should be called, in what order, with what parameters, and how should the results be synthesized into a coherent answer?&lt;/p&gt;

&lt;p&gt;"The question isn't whether your MCP server can access twenty data sources. The question is whether it can figure out which three of those twenty actually matter for the question in front of it — before it wastes 40,000 tokens finding out."&lt;/p&gt;

&lt;p&gt;How Today's Platforms Handle Multi-Tool Orchestration&lt;br&gt;
Two patterns dominate the market.&lt;br&gt;
Client-Side Routing (e.g. ChatGPT / Custom GPTs)&lt;br&gt;
The host LLM examines every available tool, decides which to call, executes them, receives all raw results, then synthesizes a response.&lt;br&gt;
Problem: With fifteen connected data sources, you can easily exceed 20,000 tokens in tool result data before the model writes a single word. At scale, this is economically and latency-wise untenable.&lt;br&gt;
Search-Based Routing&lt;br&gt;
The model queries for relevant tools using keywords and loads only matched tools. This reduces the tool list in context — but keyword search misses semantic intent.&lt;br&gt;
If a user asks "What's making our revenue drop?" and the relevant tool is named GET_SHOPIFY_ORDER_SUMMARY, the semantic distance between the query vocabulary and the tool name may cause the search to miss it entirely. The host LLM still receives raw tool outputs, so the token cost of results hasn't been addressed — only the cost of tool selection.&lt;/p&gt;

&lt;p&gt;The Architecture: An LLM Inside Your MCP Server&lt;br&gt;
The insight at the core of our approach: move the routing intelligence inside the MCP server itself.&lt;br&gt;
Instead of asking the host LLM to figure out which of your twenty tools to call, embed a small, fast, purpose-built language model inside the server that handles this decision transparently.&lt;br&gt;
From the host LLM's perspective, it makes a single tool call. From the user's perspective, they get a synthesized answer that draws from exactly the right data sources. Everything in between — tool selection, parallel execution, result synthesis, error handling — happens inside the MCP server, invisible to the host model.&lt;br&gt;
HOST LAYER&lt;br&gt;
  User Query&lt;br&gt;
    ↓&lt;br&gt;
  Host LLM → makes 1 MCP tool call&lt;br&gt;
    ↓&lt;br&gt;
MCP SERVER LAYER&lt;br&gt;
  Embedded Router LLM&lt;br&gt;
    → intent classification&lt;br&gt;
    → tool plan generation&lt;br&gt;
    ↓&lt;br&gt;
  Parallel Execution (Tool A, Tool B, Tool C...)&lt;br&gt;
    ↓&lt;br&gt;
  Synthesis → merge results + format output&lt;br&gt;
    ↓&lt;br&gt;
  Single synthesized response returned&lt;br&gt;
    ↓&lt;br&gt;
HOST LAYER&lt;br&gt;
  Host LLM receives clean answer&lt;br&gt;
  (no raw payloads ever exposed)&lt;br&gt;
Key implication: The host LLM's context window is never contaminated with raw tool outputs. It sends a natural language question. It receives a clean, synthesized answer.&lt;/p&gt;

&lt;p&gt;How the Embedded Router Classifies Intent&lt;br&gt;
The embedded router is not a general-purpose LLM. It has one job: read a natural language query, reason about which tools are needed, and produce a structured execution plan. It outputs deterministic JSON. Nothing else.&lt;br&gt;
The router operates against a tool schema registry — a compact representation of every available tool including semantic descriptions of what kinds of questions it can answer and its relationships to other tools.&lt;br&gt;
Example query: "Why did our email revenue drop last month compared to the month before?"&lt;br&gt;
Embedded router reasoning:&lt;/p&gt;

&lt;p&gt;Temporal comparison → need date-filtered metrics&lt;br&gt;
"Email revenue" → campaign + flow performance tools&lt;br&gt;
"Drop" implies comparative analysis, not just a snapshot&lt;/p&gt;

&lt;p&gt;Router output:&lt;br&gt;
json{&lt;br&gt;
  "tools": ["get_klaviyo_campaign_revenue", "get_klaviyo_flow_performance"],&lt;br&gt;
  "params": { "date_range": "last_2_months", "compare": true },&lt;br&gt;
  "synthesis_goal": "revenue_comparison_with_delta"&lt;br&gt;
}&lt;br&gt;
Semantic, Not Keyword&lt;br&gt;
Five different phrasings of the same underlying question:&lt;/p&gt;

&lt;p&gt;"Why is revenue dropping?"&lt;br&gt;
"Sales are lower, what happened?"&lt;br&gt;
"Compare this week vs last week"&lt;br&gt;
"Show me the revenue trend"&lt;br&gt;
"Our numbers look off since the email campaign ended"&lt;/p&gt;

&lt;p&gt;All collapse to the same intent category and the same tool calls. No query word appears in any tool name. Keyword search cannot reliably achieve this.&lt;/p&gt;

&lt;p&gt;Token Economics&lt;br&gt;
Assume 15 connected data sources, 1,500 tokens average per tool call, 5 tools needed for a complex query.&lt;br&gt;
ApproachTokens ConsumedClient-side Orchestration (all 15 tools in context)38,000Search-based Routing18,000Embedded LLM Router~4,000&lt;br&gt;
89% reduction vs client-side.&lt;br&gt;
The host only receives the final synthesized answer — typically 300–600 tokens regardless of how many tools were called. Token costs grow sub-linearly with the number of connected data sources.&lt;br&gt;
CharacteristicClient-SideSearch-BasedEmbedded RouterHost LLM context pollutionHighMediumNoneSemantic routing accuracyHighLowHighRound-trips for routing1 + parallel2–30 (internal)Cross-tool synthesis qualityHighMediumHighScales to 20+ toolsPoorlyModeratelyWellImplementation complexityLowMediumHigh&lt;/p&gt;

&lt;p&gt;Request Lifecycle&lt;br&gt;
StageTimingWhat HappensQuery Ingestion~0msHost LLM calls MCP server with natural language queryIntent Classification~200msEmbedded router reads query + schema registry, generates planParallel Execution~800msSelected tools called concurrently, errors caught per-toolResult Synthesis~1,100msRouter LLM merges all outputs into coherent answerResponse Returned~1,400msHost LLM receives clean structured answer&lt;br&gt;
Graceful Degradation&lt;br&gt;
When a tool call fails — rate limit, API timeout — the router synthesizes the best possible answer from tools that succeeded and explicitly annotates the gap:&lt;br&gt;
"Revenue data from Shopify is unavailable — API timeout. Klaviyo email attribution and GA4 session data were retrieved successfully."&lt;br&gt;
This is far more useful than a generic error.&lt;/p&gt;

&lt;p&gt;Schema Design: The Underappreciated Challenge&lt;br&gt;
Standard MCP tool schemas describe what parameters a function takes. That's inadequate for routing.&lt;br&gt;
Standard schema — adequate for execution, poor for routing:&lt;br&gt;
json{&lt;br&gt;
  "name": "get_shopify_order_summary",&lt;br&gt;
  "description": "Get summary of recent orders",&lt;br&gt;
  "inputSchema": { "start_date": "string", "end_date": "string" }&lt;br&gt;
}&lt;br&gt;
Routing-aware schema — enables semantic classification:&lt;br&gt;
json{&lt;br&gt;
  "name": "get_shopify_order_summary",&lt;br&gt;
  "intent_tags": ["revenue", "sales", "ecommerce", "orders", "trend"],&lt;br&gt;
  "answers_questions_like": [&lt;br&gt;
    "How much did we sell this month?",&lt;br&gt;
    "Why is revenue dropping?",&lt;br&gt;
    "What is our average order value?"&lt;br&gt;
  ],&lt;br&gt;
  "complementary_tools": ["get_klaviyo_campaign_revenue", "run_ga4_report"],&lt;br&gt;
  "returns": "Revenue total, order count, AOV, top products by date range"&lt;br&gt;
}&lt;br&gt;
The complementary_tools field teaches the router which tools are frequently most useful together, enabling proactive multi-tool execution plans for queries that only mention one domain.&lt;br&gt;
Caution: Schema Drift&lt;br&gt;
Stale schemas are a silent failure mode. The router makes plausible-sounding decisions based on outdated information — no error, no alert, just a wrong answer delivered confidently. Automated schema validation in your CI pipeline is not optional. It's foundational.&lt;/p&gt;

&lt;p&gt;What This Architecture Unlocks&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Context Window Preservation
The host LLM's context remains available for the actual conversation — longer histories, more complex instructions, higher-quality responses.&lt;/li&gt;
&lt;li&gt;Unlimited Tool Scale
Each new tool adds a small entry to the schema registry. The host LLM never sees it unless it's needed. Fifty connected data sources remain performant on simple queries.&lt;/li&gt;
&lt;li&gt;Cross-Source Synthesis Without Client-Side Complexity
The router recognizes multi-source queries, calls all relevant tools, and returns a single synthesized answer. No client-side orchestration required.&lt;/li&gt;
&lt;li&gt;Reusable Routing Intelligence Across Host Models
Swap host LLMs — Claude, GPT-4, an open-source model — without rebuilding orchestration. The server presents identical behavior regardless of what's calling it.&lt;/li&gt;
&lt;li&gt;Observability and Control
Every routing decision is logged. Audit which tools were selected for which queries, identify misrouting patterns, improve schemas, tune behavior. None of this is possible when routing happens inside the host LLM's opaque reasoning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;"The paradox of multi-tool AI is that more capability creates more complexity — unless you have a layer that absorbs that complexity before it reaches the host model. The embedded router is that layer."&lt;/p&gt;

&lt;p&gt;Trade-offs and When Not to Use This Pattern&lt;br&gt;
Operational overhead is real. Running an embedded LLM means running inference infrastructure. For low-traffic deployments, this cost may outweigh the token savings. The pattern becomes economical typically when you have more than five connected tools with meaningful daily query volume.&lt;br&gt;
Simpler approaches are better if your MCP server exposes only 2–4 tools covering distinct, non-overlapping domains. A simple rule-based dispatcher is perfectly adequate there.&lt;br&gt;
Schema maintenance is ongoing work. Teams that underinvest in schema quality will find routing accuracy degrading silently.&lt;br&gt;
Pre-synthesis has a ceiling. If your use case requires the host model to reason directly over raw data — statistical analysis, anomaly detection, complex derived metrics — pre-synthesis may discard information the model needs.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;br&gt;
The Model Context Protocol has solved the integration problem. The next challenge is the orchestration problem: making multi-tool queries feel effortless, economical, and intelligent.&lt;br&gt;
Embedded LLM routing moves intelligence to where the tools live, keeping the host model's context clean, reducing token costs dramatically, and enabling semantic routing accuracy that keyword search cannot match.&lt;br&gt;
The best AI integrations are ones where the routing infrastructure is invisible. Users ask natural questions and receive coherent answers drawn from exactly the right sources. The complexity lives in the server. The conversation stays clean.&lt;/p&gt;

&lt;p&gt;CorpusIQ connects 50+ business tools into a single AI conversation. corpusiq.io&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Smart MCP</title>
      <dc:creator>benoit Pecqueur</dc:creator>
      <pubDate>Tue, 07 Apr 2026 02:08:40 +0000</pubDate>
      <link>https://dev.to/benoit_pecqueur_7a5bf1a2f/smart-mcp-2cjl</link>
      <guid>https://dev.to/benoit_pecqueur_7a5bf1a2f/smart-mcp-2cjl</guid>
      <description>&lt;p&gt;T E C H N I C A L D E E P D I V E · M C P  A R C H I T E C T U R E&lt;/p&gt;

&lt;p&gt;Intelligent Request Routing in MCP&lt;br&gt;
Servers Using Embedded LLMs&lt;br&gt;
How a small, purpose-built language model inside your MCP server can slash token costs,&lt;br&gt;
eliminate client-side orchestration complexity, and deliver dramatically smarter multi-tool&lt;/p&gt;

&lt;p&gt;responses.&lt;/p&gt;

&lt;p&gt;CorpusIQ Engineering · April 2026 · ~14 min read&lt;/p&gt;

&lt;p&gt;To understand why this matters, we need to look at how the two dominant AI platforms handle&lt;br&gt;
multi-tool orchestration today — and why both approaches leave significant problems on the table.&lt;/p&gt;

&lt;p&gt;§ 0 2 — C U R R E N T A P P R O A C H E S&lt;br&gt;
How Today's Platforms Handle Multi-Tool&lt;br&gt;
Orchestration&lt;/p&gt;

&lt;p&gt;Before describing what a better solution looks like, it's worth mapping the terrain of existing&lt;br&gt;
approaches. Two patterns dominate the market, each reflecting the architectural philosophy of the&lt;br&gt;
platform that popularized it.&lt;/p&gt;

&lt;p&gt;C L I E N T - S I D E R O U T I N G&lt;br&gt;
(e.g. ChatGPT / Custom GPTs)&lt;br&gt;
User Query&lt;/p&gt;

&lt;p&gt;HOST LLM&lt;br&gt;
decides which tools to call&lt;/p&gt;

&lt;p&gt;Tool A&lt;br&gt;
CRM&lt;/p&gt;

&lt;p&gt;Tool B&lt;br&gt;
Analytics&lt;/p&gt;

&lt;p&gt;Tool C&lt;br&gt;
Accounting&lt;/p&gt;

&lt;p&gt;HOST LLM AGAIN&lt;br&gt;
synthesizes all results&lt;/p&gt;

&lt;p&gt;Response&lt;br&gt;
PROBLEM&lt;br&gt;
All tool results flood the host&lt;br&gt;
LLM context — massive token cost&lt;/p&gt;

&lt;p&gt;E M B E D D E D L L M R O U T I N G&lt;br&gt;
(the approach we built)&lt;br&gt;
User Query&lt;/p&gt;

&lt;p&gt;HOST LLM&lt;br&gt;
calls MCP server once&lt;/p&gt;

&lt;p&gt;M C P S E R V E R&lt;br&gt;
EMBEDDED ROUTER LLM&lt;br&gt;
understands intent → selects tools&lt;/p&gt;

&lt;p&gt;Tool A Tool B Tool C ✓&lt;br&gt;
synthesizes before returning&lt;/p&gt;

&lt;p&gt;Response&lt;/p&gt;

&lt;p&gt;HOST LLM CONTEXT STAYS CLEAN&lt;/p&gt;

&lt;p&gt;Figure 1. Two routing architectures compared. Left: client-side orchestration floods the&lt;br&gt;
host LLM context with raw tool results. Right: the embedded LLM router handles all&lt;br&gt;
orchestration server-side, returning a single synthesized response.&lt;/p&gt;

&lt;p&gt;The Client-Side Approach: Powerful but Expensive&lt;br&gt;
Platforms like OpenAI's Custom GPT framework and many LangChain-based implementations&lt;br&gt;
push the orchestration problem to the client — meaning the host LLM itself. When a user asks a&lt;br&gt;
complex question, the assistant examines every available tool, decides which to call, executes them&lt;br&gt;
in sequence or parallel, receives all the raw results, and then synthesizes a response.&lt;br&gt;
This works. It also consumes enormous amounts of context window. When you have fifteen&lt;br&gt;
connected data sources and each tool call returns even a modest payload, you can easily exceed&lt;br&gt;
20,000 tokens just in tool result data before the model writes a single word of response. At scale,&lt;br&gt;
this is economically and latency-wise untenable.&lt;br&gt;
The Search-Based Approach: Smarter but Incomplete&lt;br&gt;
Some architectures address this by exposing a search mechanism that lets the model query for&lt;br&gt;
relevant tools using keywords, load only the matched tools, and proceed. This reduces the tool list&lt;br&gt;
in context — but keyword search misses semantic intent. If a user asks "What's making our revenue&lt;br&gt;
drop?" and the relevant tool is named GET_SHOPIFY_ORDER_SUMMARY , the semantic distance between&lt;br&gt;
the query vocabulary and the tool name may cause the search to miss it entirely. The host LLM still&lt;br&gt;
receives raw tool outputs, so the token cost of results hasn't been addressed — only the cost of tool&lt;br&gt;
selection.&lt;/p&gt;

&lt;p&gt;§ 0 3 — T H E A R C H I T E C T U R E&lt;br&gt;
The Embedded Router: An LLM Inside Your MCP&lt;br&gt;
Server&lt;br&gt;
The insight at the core of our approach is simple to state and non-obvious to implement: move the&lt;br&gt;
routing intelligence inside the MCP server itself. Instead of asking the host LLM to figure out&lt;br&gt;
which of your twenty tools to call, embed a small, fast, purpose-built language model inside the&lt;br&gt;
server that handles this decision transparently.&lt;br&gt;
From the host LLM's perspective, it makes a single tool call. From the user's perspective, they&lt;br&gt;
get a synthesized answer that draws from exactly the right data sources. Everything in between —&lt;br&gt;
tool selection, parallel execution, result synthesis, error handling — happens inside the MCP server,&lt;br&gt;
invisible to the host model.&lt;/p&gt;

&lt;p&gt;H O S T L A Y E R&lt;/p&gt;

&lt;p&gt;M C P S E R V E R L A Y E R ( y o u r i m p l e m e n t a t i o n )&lt;/p&gt;

&lt;p&gt;D A T A L A Y E R&lt;/p&gt;

&lt;p&gt;User Query&lt;/p&gt;

&lt;p&gt;HOST LLM&lt;br&gt;
makes 1 MCP tool call&lt;/p&gt;

&lt;p&gt;single request&lt;/p&gt;

&lt;p&gt;Response&lt;/p&gt;

&lt;p&gt;EMBEDDED ROUTER LLM&lt;br&gt;
intent classification +&lt;br&gt;
tool plan generation&lt;/p&gt;

&lt;p&gt;EXECUTION&lt;br&gt;
parallel tool calls&lt;br&gt;
with error handling&lt;/p&gt;

&lt;p&gt;SYNTHESIS&lt;br&gt;
merge results&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;format output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;T O O L R E G I S T R Y + S C H E M A E M B E D D I N G S&lt;br&gt;
[ CRM ] [ Analytics ] [ Email ] [ Accounting ] [ Storage ] [ Ads ] [ eCommerce ] [ Calendar ] ...&lt;/p&gt;

&lt;p&gt;HubSpot&lt;br&gt;
CRM&lt;/p&gt;

&lt;p&gt;Shopify&lt;br&gt;
eCommerce&lt;/p&gt;

&lt;p&gt;Klaviyo&lt;br&gt;
Email / SMS&lt;/p&gt;

&lt;p&gt;QuickBooks&lt;br&gt;
Accounting&lt;/p&gt;

&lt;p&gt;Figure 2. Full architecture of the embedded LLM routing system. The host LLM makes a&lt;br&gt;
single MCP tool call; everything inside the dashed green boundary is handled by the server&lt;br&gt;
— including intent classification, tool selection, parallel execution, and synthesis.&lt;/p&gt;

&lt;p&gt;This architecture has a profound implication: the host LLM's context window is never&lt;br&gt;
contaminated with raw tool outputs. It sends a natural language question. It receives a clean,&lt;br&gt;
synthesized answer. The routing complexity — which was previously the host model's problem —&lt;br&gt;
is fully encapsulated.&lt;/p&gt;

&lt;p&gt;1&lt;/p&gt;

&lt;p&gt;MCP CALL FROM HOST LLM&lt;/p&gt;

&lt;p&gt;N&lt;/p&gt;

&lt;p&gt;TOOLS CALLED INTERNALLY&lt;br&gt;
(IN PARALLEL)&lt;/p&gt;

&lt;p&gt;1&lt;br&gt;
SYNTHESIZED RESPONSE&lt;br&gt;
RETURNED&lt;/p&gt;

&lt;p&gt;§ 0 4 — T H E R O U T I N G M E C H A N I S M&lt;br&gt;
How the Embedded Router Classifies Intent&lt;br&gt;
The embedded router is not a general-purpose LLM. It is a small, fast model that has been given a&lt;br&gt;
precise and narrow job: read a natural language query, reason about which tools are needed to&lt;br&gt;
answer it, and produce a structured execution plan. It does not write prose. It does not explain itself.&lt;br&gt;
It outputs a deterministic JSON plan that the execution engine can act on immediately.&lt;br&gt;
The router operates against a tool schema registry — a compact representation of every&lt;br&gt;
available tool that includes not just the tool's name and parameters, but semantic descriptions of&lt;br&gt;
what kinds of questions it can answer, what data it returns, and what its relationship is to other tools&lt;br&gt;
in the system.&lt;/p&gt;

&lt;p&gt;"Why did our email revenue drop last month compared to the month before?"&lt;/p&gt;

&lt;p&gt;E M B E D D E D R O U T E R L L M — R E A S O N I N G&lt;/p&gt;

&lt;p&gt;• Temporal comparison → need date-filtered metrics&lt;br&gt;
• "Email revenue" → campaign + flow performance tools&lt;br&gt;
• "Drop" implies comparative analysis, not just a snapshot&lt;/p&gt;

&lt;p&gt;// router output: structured execution plan&lt;br&gt;
{&lt;br&gt;
"tools" : [&lt;br&gt;
"get_klaviyo_campaign_revenue", "get_klaviyo_flow_performance"&lt;br&gt;
],&lt;br&gt;
"params" : { date_range: "last_2_months", compare: true },&lt;br&gt;
"synthesis_goal" : "revenue_comparison_with_delta"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;Figure 3. The embedded router analyzes query intent, maps it to semantic tool categories,&lt;br&gt;
and outputs a structured execution plan — all before a single external API call is made.&lt;/p&gt;

&lt;p&gt;Q U E R Y S U R F A C E F O R M S&lt;br&gt;
"Why is revenue dropping?"&lt;br&gt;
"Sales are lower, what happened?"&lt;br&gt;
"Compare this week vs last week"&lt;br&gt;
"Show me the revenue trend"&lt;br&gt;
"Our numbers look off since the&lt;br&gt;
email campaign ended"&lt;/p&gt;

&lt;p&gt;INTENT&lt;br&gt;
REVENUE&lt;br&gt;
COMPARE&lt;/p&gt;

&lt;p&gt;get_shopify_order_summary&lt;br&gt;
date_range: "last_2_months"&lt;/p&gt;

&lt;p&gt;get_klaviyo_campaign_revenue&lt;/p&gt;

&lt;p&gt;compare: true&lt;/p&gt;

&lt;p&gt;get_ga4_realtime&lt;br&gt;
session + conversion data&lt;br&gt;
SEMANTIC, NOT KEYWORD&lt;br&gt;
No query word appears in&lt;br&gt;
any tool name — intent matched&lt;/p&gt;

&lt;p&gt;Figure 4. Semantic intent classification: five different phrasings of the same underlying&lt;br&gt;
question all collapse to the same intent category and tool calls — something keyword&lt;br&gt;
search cannot reliably achieve.&lt;/p&gt;

&lt;p&gt;§ 0 5 — T O K E N E C O N O M I C S&lt;br&gt;
The Token Cost Advantage: A Detailed Breakdown&lt;br&gt;
Let's put concrete numbers to this. Assume a deployment with fifteen connected data sources,&lt;br&gt;
where the average tool call returns 1,500 tokens of raw output. A complex multi-source question&lt;br&gt;
might require calling five tools.&lt;/p&gt;

&lt;p&gt;E S T I M A T E D T O K E N C O N S U M P T I O N — M U L T I - S O U R C E Q U E R Y ( 5 T O O L S N E E D E D )&lt;/p&gt;

&lt;p&gt;tokens (thousands)&lt;br&gt;
40k&lt;/p&gt;

&lt;p&gt;30k&lt;/p&gt;

&lt;p&gt;20k&lt;/p&gt;

&lt;p&gt;10k&lt;/p&gt;

&lt;p&gt;0&lt;/p&gt;

&lt;p&gt;38,000&lt;br&gt;
ALL 15 TOOLS&lt;br&gt;
IN CONTEXT&lt;/p&gt;

&lt;p&gt;Client-side&lt;br&gt;
Orchestration&lt;/p&gt;

&lt;p&gt;18,000&lt;br&gt;
SEARCH +&lt;br&gt;
RESULTS&lt;/p&gt;

&lt;p&gt;Search-based&lt;br&gt;
Routing&lt;/p&gt;

&lt;p&gt;~4,000&lt;br&gt;
Embedded&lt;br&gt;
LLM Router&lt;/p&gt;

&lt;p&gt;89% reduction&lt;br&gt;
vs client-side&lt;/p&gt;

&lt;p&gt;Figure 5. Comparative token consumption across routing approaches for a query requiring 5&lt;br&gt;
tool calls (15 tools connected, average 1,500 tokens per result). The embedded router&lt;br&gt;
keeps the host LLM context clean — it only sees the synthesized answer, not raw tool&lt;br&gt;
outputs.&lt;/p&gt;

&lt;p&gt;The key insight: in the embedded router approach, raw tool outputs never enter the host LLM's&lt;br&gt;
context window. The MCP server consumes them internally. The host only receives the final&lt;br&gt;
synthesized answer — typically 300–600 tokens regardless of how many tools were called. Token&lt;br&gt;
costs grow sub-linearly with the number of connected data sources.&lt;/p&gt;

&lt;p&gt;CHARACTERISTIC CLIENT-SIDE SEARCH-BASED EMBEDDED ROUTER&lt;br&gt;
Host LLM context&lt;br&gt;
pollution&lt;/p&gt;

&lt;p&gt;High — all raw&lt;br&gt;
results&lt;/p&gt;

&lt;p&gt;Medium — matched&lt;br&gt;
results&lt;/p&gt;

&lt;p&gt;None — synthesized&lt;br&gt;
only&lt;/p&gt;

&lt;p&gt;Semantic routing accuracy High (LLM decides) Low (keyword match) High (LLM decides)&lt;br&gt;
Round-trips for routing 1 (then parallel calls) 2–3 (search → load →&lt;/p&gt;

&lt;p&gt;call)&lt;/p&gt;

&lt;p&gt;0 (internal to server)&lt;/p&gt;

&lt;p&gt;Cross-tool synthesis&lt;br&gt;
quality&lt;/p&gt;

&lt;p&gt;High Medium High&lt;/p&gt;

&lt;p&gt;Scales to 20+ tools Poorly Moderately Well&lt;br&gt;
Implementation&lt;br&gt;
complexity&lt;/p&gt;

&lt;p&gt;Low (client handles&lt;br&gt;
it)&lt;/p&gt;

&lt;p&gt;Medium High (your server must)&lt;/p&gt;

&lt;p&gt;§ 0 6 — R E Q U E S T L I F E C Y C L E&lt;br&gt;
Anatomy of a Routed Request&lt;br&gt;
Let's trace a complete request from user query to synthesized response, examining each stage and&lt;br&gt;
what decisions are made along the way.&lt;/p&gt;

&lt;p&gt;1&lt;/p&gt;

&lt;p&gt;QUERY INGESTION&lt;br&gt;
Host LLM calls MCP server endpoint&lt;br&gt;
with raw natural language query&lt;/p&gt;

&lt;p&gt;2&lt;/p&gt;

&lt;p&gt;INTENT CLASSIFICATION&lt;br&gt;
Embedded router LLM reads query&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool schema registry, generates plan&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;3&lt;/p&gt;

&lt;p&gt;PARALLEL EXECUTION&lt;br&gt;
Selected tools are called concurrently.&lt;br&gt;
Errors in any single tool are caught&lt;br&gt;
without aborting the entire plan&lt;/p&gt;

&lt;p&gt;4&lt;/p&gt;

&lt;p&gt;RESULT SYNTHESIS&lt;br&gt;
Router LLM merges all tool outputs&lt;br&gt;
into a single coherent answer&lt;/p&gt;

&lt;p&gt;5&lt;/p&gt;

&lt;p&gt;RESPONSE RETURNED&lt;br&gt;
Host LLM receives a clean, structured&lt;br&gt;
answer — no raw payloads ever exposed&lt;/p&gt;

&lt;p&gt;~0ms&lt;/p&gt;

&lt;p&gt;~200ms&lt;/p&gt;

&lt;p&gt;~800ms&lt;/p&gt;

&lt;p&gt;~1100ms&lt;/p&gt;

&lt;p&gt;~1400ms&lt;/p&gt;

&lt;p&gt;Figure 6. Complete request lifecycle. Timing estimates assume average API latency. The&lt;br&gt;
parallel execution phase (stage 3) is the dominant cost — running N tools in parallel&lt;br&gt;
rather than serially is critical to keeping total latency acceptable.&lt;/p&gt;

&lt;p&gt;DESIGN PRINCIPLE: GRACEFUL DEGRADATION&lt;br&gt;
When a tool call fails during parallel execution — a rate limit, a downstream API timeout — the&lt;br&gt;
embedded router synthesizes the best possible answer from the tools that succeeded. It explicitly&lt;br&gt;
annotates the gap: "Revenue data from Shopify is unavailable — API timeout. Klaviyo email attribution&lt;br&gt;
and GA4 session data were retrieved successfully." This transparency is far more useful than a generic&lt;br&gt;
error.&lt;/p&gt;

&lt;p&gt;§ 0 7 — S C H E M A D E S I G N&lt;br&gt;
Designing Tool Schemas That the Router Can Reason&lt;br&gt;
About&lt;/p&gt;

&lt;p&gt;The quality of routing is only as good as the schema information the router has access to. Writing&lt;br&gt;
tool schemas that are simultaneously machine-parseable and semantically rich enough for an LLM&lt;br&gt;
to reason about is perhaps the most underappreciated engineering challenge in this architecture.&lt;br&gt;
Standard MCP tool schemas describe what parameters a function takes. Our routing schemas&lt;br&gt;
also describe what kinds of questions the tool can answer, which other tools it is complementary to,&lt;br&gt;
and what data it returns in plain language.&lt;/p&gt;

&lt;p&gt;// Standard MCP schema — adequate for execution, poor for routing { "name":&lt;br&gt;
"get_shopify_order_summary", "description": "Get summary of recent orders",&lt;/p&gt;

&lt;p&gt;"inputSchema": { "start_date": "string", "end_date": "string" } } // Routing-&lt;br&gt;
aware schema — enables semantic classification { "name":&lt;/p&gt;

&lt;p&gt;"get_shopify_order_summary", "intent_tags": ["revenue", "sales", "ecommerce",&lt;br&gt;
"orders", "trend"], "answers_questions_like": [ "How much did we sell this&lt;br&gt;
month?", "Why is revenue dropping?", "What is our average order value?" ],&lt;br&gt;
"complementary_tools": ["get_klaviyo_campaign_revenue", "run_ga4_report"],&lt;br&gt;
"returns": "Revenue total, order count, AOV, top products by date range" }&lt;/p&gt;

&lt;p&gt;The COMPLEMENTARY_TOOLS field is particularly powerful. It teaches the router which tools are&lt;br&gt;
frequently most useful together, enabling it to proactively build multi-tool execution plans for&lt;br&gt;
queries that only mention one domain but would benefit from correlated data.&lt;/p&gt;

&lt;p&gt;CAUTION: SCHEMA DRIFT&lt;br&gt;
Tool schemas are live documents. When you add a new data source, update an existing tool's return&lt;br&gt;
signature, or deprecate a connector, the router's schema registry must be updated in lockstep. Stale&lt;br&gt;
schemas are a silent failure mode — the router makes plausible-sounding decisions based on outdated&lt;br&gt;
information. Automated schema validation as part of your CI pipeline is not optional; it's foundational.&lt;/p&gt;

&lt;p&gt;§ 0 8 — W H Y I T M A T T E R S&lt;br&gt;
What This Architecture Unlocks&lt;/p&gt;

&lt;p&gt;The embedded LLM router is not an optimization — it's an architectural shift that changes what's&lt;br&gt;
possible in multi-tool AI deployments.&lt;/p&gt;

&lt;p&gt;1&lt;br&gt;
CONTEXT WINDOW PRESERVATION&lt;br&gt;
The host LLM's context remains available for the actual conversation. When tool outputs don't consume&lt;br&gt;
the context window, the model can maintain longer histories, follow more complex instructions, and&lt;br&gt;
produce higher-quality synthesized responses.&lt;br&gt;
2&lt;br&gt;
UNLIMITED TOOL SCALE&lt;br&gt;
With traditional approaches, adding a fifteenth or twentieth tool increases context window pressure&lt;br&gt;
significantly. With embedded routing, each new tool adds a small entry to the schema registry — but the&lt;br&gt;
host LLM never sees it unless it's needed. Fifty connected data sources remain performant on simple&lt;br&gt;
queries.&lt;br&gt;
3&lt;br&gt;
CROSS-SOURCE SYNTHESIS WITHOUT CLIENT-SIDE COMPLEXITY&lt;br&gt;
Correlating data from Klaviyo, Shopify, and Google Analytics previously required careful client-side&lt;br&gt;
orchestration. The embedded router handles this transparently — it recognizes multi-source queries, calls&lt;br&gt;
all relevant tools, and returns a single synthesized answer.&lt;br&gt;
4&lt;br&gt;
REUSABLE ROUTING INTELLIGENCE ACROSS HOST MODELS&lt;/p&gt;

&lt;p&gt;Because routing logic lives in the MCP server, you can swap host LLMs — Claude, GPT-4, an open-&lt;br&gt;
source model — without rebuilding orchestration. The server presents identical behavior regardless of&lt;/p&gt;

&lt;p&gt;what's calling it.&lt;br&gt;
5&lt;br&gt;
OBSERVABILITY AND CONTROL&lt;br&gt;
Every routing decision is logged inside the MCP server. You can audit which tools were selected for which&lt;br&gt;
queries, identify misrouting patterns, improve schemas, and tune behavior — none of which is possible&lt;br&gt;
when routing happens inside the host LLM's opaque reasoning.&lt;/p&gt;

&lt;p&gt;"The paradox of multi-tool AI is that more capability creates more complexity —&lt;br&gt;
unless you have a layer that absorbs that complexity before it reaches the host&lt;br&gt;
model. The embedded router is that layer."&lt;/p&gt;

&lt;p&gt;§ 0 9 — H O N E S T A S S E S S M E N T&lt;br&gt;
Trade-offs and When Not to Use This Pattern&lt;br&gt;
Architectural honesty requires acknowledging where this pattern adds cost and complexity. The&lt;br&gt;
embedded router is not a free lunch.&lt;/p&gt;

&lt;p&gt;OPERATIONAL OVERHEAD&lt;/p&gt;

&lt;p&gt;Running an embedded LLM inside your MCP server means running inference infrastructure. For low-&lt;br&gt;
traffic deployments, this cost may outweigh the token savings from optimized routing. The pattern&lt;/p&gt;

&lt;p&gt;becomes economical at scale — typically when you have more than five connected tools and handle a&lt;br&gt;
meaningful daily query volume.&lt;/p&gt;

&lt;p&gt;WHEN SIMPLER APPROACHES ARE BETTER&lt;br&gt;
If your MCP server exposes only 2–4 tools covering distinct, non-overlapping domains, the routing&lt;br&gt;
problem is trivial. A simple rule-based dispatcher or exposing all tools to the host LLM is perfectly&lt;br&gt;
adequate. The embedded router pays dividends when you have many tools with overlapping semantic&lt;br&gt;
domains — where a question about "revenue" might reasonably involve three different systems.&lt;/p&gt;

&lt;p&gt;The schema maintenance burden is also real. Every tool must have a well-written, semantically rich&lt;br&gt;
schema. This requires ongoing attention as tools evolve, connectors are added, and the vocabulary&lt;br&gt;
of user queries shifts. Teams that underinvest in schema quality will find routing accuracy&lt;br&gt;
degrading silently and without obvious error messages.&lt;br&gt;
Finally, pre-synthesis inside the server means you are making opinionated decisions about how&lt;br&gt;
to combine and present data before it reaches the host LLM. In most cases this is desirable. But if&lt;br&gt;
your use case requires the host model to reason directly over raw data — statistical analysis,&lt;br&gt;
anomaly detection, complex derived metrics — pre-synthesis may discard information the model&lt;br&gt;
needs.&lt;/p&gt;

&lt;p&gt;§ 1 0 — C O N C L U S I O N&lt;br&gt;
The Next Frontier in MCP Server Design&lt;br&gt;
The Model Context Protocol has solved the integration problem — connecting AI to the tools and&lt;br&gt;
data sources it needs. The next challenge is the orchestration problem: making multi-tool queries&lt;br&gt;
feel effortless, economical, and intelligent.&lt;/p&gt;

&lt;p&gt;Embedded LLM routing moves intelligence to where the tools live, keeping the host model's&lt;br&gt;
context clean, reducing token costs dramatically, and enabling semantic routing accuracy that&lt;br&gt;
keyword search cannot match. It is not the only path forward — but it is the pattern that we have&lt;br&gt;
found most scalable as the number of connected data sources grows from five to twenty to fifty.&lt;br&gt;
The best AI integrations are ones where the routing infrastructure is invisible. Users ask natural&lt;br&gt;
questions and receive coherent answers drawn from exactly the right sources. The complexity lives&lt;br&gt;
in the server. The conversation stays clean.&lt;/p&gt;

&lt;p&gt;CORPUSIQ ENGINEERING · APRIL 2026 · ALL ARCHITECTURES DESCRIBED HEREIN ARE PROPRIETARY&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
