A side-by-side look at Classic MCP vs MCP Code Mode on context footprint, token cost, latency, and accuracy, plus how Bifrost runs Code Mode at scale.
Most production agents aren't wired to a single MCP server. A typical stack stitches together search, filesystem, CRM, and several more, with each connected server pushing tool definitions into the model's context window on every turn. Every Classic MCP vs MCP Code Mode conversation eventually lands on the same three questions: when are tools loaded, how are they invoked, and what happens to intermediate results? Bifrost, the open-source AI gateway built by Maxim AI, runs both patterns through its MCP gateway, enabled per client. The comparison below is anchored in the Model Context Protocol specification along with published work from Anthropic and Cloudflare.
How Classic MCP Tool Calling Works
The default execution model in the Model Context Protocol is what most teams call Classic MCP. Discovery happens up front: the client sends tools/list to each connected server, gets back JSON Schema definitions for every tool, and loads those definitions into the model's context. Invocation follows the same loop turn by turn. When the model picks a tool, the client issues a tools/call with the chosen arguments, waits for the result, and writes that result back into the conversation before the next turn. The full discovery-and-invocation sequence is laid out in the MCP tools specification.
With a small tool catalog, the pattern is tidy. As the catalog grows, costs climb quickly. The defining traits:
- Every tool definition stays in context, every turn: all connected servers' tools are loaded before the loop starts and remain resident through the entire agent run.
- Serial tool invocation: each model turn produces exactly one tool call, the client runs it, and the result has to return before the model can pick a second tool.
- The model sees every intermediate payload: results of any size are serialized straight back into the conversation, large ones included.
- Token cost compounds with server count: wire in ten servers of fifteen tools each and the model is carrying 150 tool definitions on every request.
The MCP Code Mode Pattern
In MCP Code Mode, the "one call per turn" loop is replaced by a "write code that orchestrates tools" loop. The gateway no longer hands the model a full tool catalog. Instead, it presents a small set of meta-tools that let the model discover what's available on demand and then submit a single script that chains multiple calls together inside a sandbox. Cloudflare introduced the pattern in their Code Mode post, built around a straightforward premise: LLMs have been trained on far more real-world code than synthetic tool-calling sequences, which is why they handle messy multi-step workflows more dependably when asked to write code. Anthropic's engineering team ran a parallel study on code execution with MCP, and their results showed a Google Drive to Salesforce workflow collapsing from about 150,000 tokens to 2,000 under the same approach.
Mechanically, Code Mode rests on four ideas:
- Lazy tool discovery: servers are listed first, and only the tools the model actually plans to call get their compact stub signatures loaded.
- Sandbox-side orchestration: a short script written by the model chains multiple tool calls server-side, keeping the sequence off the conversation.
- Local intermediate results: the model's context only receives the final output; everything between stays inside the sandbox.
- Bounded context footprint: total cost tracks what the model reads, not how large the underlying tool catalog happens to be.
Classic MCP vs Code Mode, Dimension by Dimension
On every axis that matters to production economics, the two patterns diverge. The breakdown below reflects how Bifrost's Code Mode is built along with the public measurements from Anthropic and Cloudflare:
| Dimension | Classic MCP | MCP Code Mode |
|---|---|---|
| Tools loaded into context | Entire catalog, on every turn | Four meta-tools plus stubs fetched on demand |
| Orchestration pattern | Single tool call per turn | One script calling many tools, in one turn |
| Intermediate results | Routed through model context | Kept inside the sandbox |
| Typical round trips (multi-step) | Roughly 6 to 10 turns | Roughly 3 to 4 turns |
| How token cost scales | Linearly with server count | Flat; tied to reads, not catalog size |
| Common failure modes | Tool misselection, context overflow | Script errors, sandbox timeouts |
| Where it fits best | 1 or 2 small servers, direct calls | 3 or more servers, chained workflows |
The gap is narrow at small scale and opens quickly as workloads grow. Controlled MCP gateway benchmarks from Bifrost recorded a 58 percent drop in token usage at 96 tools and a 92 percent drop at 508 tools, while pass rate stayed at 100 percent across all three rounds. On the API side, Cloudflare measured something similar: their Code Mode MCP server exposed 2,500 endpoints in roughly 1,000 tokens, against more than 1.17 million under the classic pattern.
Where Classic MCP Still Makes Sense
Classic MCP has not been retired. For workloads that match its shape, it's often the simpler and faster option:
- A couple of small servers: the fixed cost of spinning through Code Mode's meta-tool cycle isn't justified for a handful of tools.
- One-shot, direct calls: a weather lookup or a single record fetch is exactly one invocation, and code orchestration adds no value there.
- Hard latency budgets: Code Mode is generally faster on multi-step work, but for a simple one-shot call Classic MCP skips the extra parse and sandbox step.
- Workflows that require explicit per-call human approval: Classic MCP lines up cleanly with manual approval gates, without the extra validation that Code Mode layers on.
Small utility servers can stay on Classic MCP while heavier ones move to Code Mode. Because Bifrost enables Code Mode per client rather than globally, that trade-off is made server by server, not once for the whole gateway.
Where MCP Code Mode Pulls Ahead
As the tool surface expands, Code Mode starts earning its added complexity. It becomes the stronger default when:
- Three or more MCP servers are connected at once: under Classic MCP, each added server adds definitions to every request linearly. Code Mode holds the cost flat regardless.
- Workflows chain multiple tools together: a lookup into a join into a filter into a write takes four round trips in Classic MCP and often a single script execution in Code Mode.
- Intermediate payloads are large: reading a document and writing to another is the exact scenario behind Anthropic's 150,000-to-2,000-token benchmark.
- Bills are dominated by token spend: when tool definitions are eating more of the request budget than actual reasoning, Code Mode goes after that waste head-on.
Efficiency does not come at the expense of accuracy here. Pass rate held at 100 percent in Bifrost's controlled benchmarks, with Code Mode on and off, across every tool-count tier tested.
Inside Bifrost's MCP Code Mode Implementation
Code Mode in Bifrost is native to the gateway, not bolted on as a plugin or wrapper. Four meta-tools are exposed to the model:
-
listToolFiles: enumerate the virtual.pyistub files across every connected Code Mode server. -
readToolFile: pull the compact Python function signatures for a chosen server or tool. -
getToolDocs: retrieve full documentation for a single tool when the compact signature isn't sufficient. -
executeToolCode: execute the orchestration script against live tool bindings.
Execution happens inside an embedded Starlark interpreter, a deterministic Python subset with no imports, no file I/O, and no network access. The constraint is intentional: the sandbox exists to call tools and process their outputs, and nothing else. Bindings can be configured at either the server or tool level, so a single stub per server works for compact discovery, while one stub per tool helps when servers carry dozens of tools and per-read context budgets get tight. Code Mode plays well with the rest of Bifrost's MCP stack, including Agent Mode auto-execution, tool filtering, and per-consumer scoping via virtual keys.
Auto-execution rules are stricter in Code Mode than in Classic MCP. The submitted Python is parsed, every tool call is extracted, and each one is checked against the per-server auto-execute allowlist. A single call outside that allowlist routes the whole script to manual approval. This closes the obvious loophole where the sandbox could otherwise be used to run tool invocations that would have been rejected under Classic MCP.
Picking the Right MCP Pattern for Your Agent Stack
The practical read on this Classic MCP vs Code Mode comparison is that the two patterns complement each other rather than compete. For small tool catalogs and one-shot workflows, Classic MCP is still the correct default. Once token cost, latency, and context bloat start to dominate in multi-server agent workflows, MCP Code Mode becomes the better default. Bifrost runs both, which lets teams flip the switch per client and migrate gradually as their MCP footprint keeps growing. Teams evaluating the broader gateway trade-offs alongside this MCP-level choice can walk through the LLM Gateway Buyer's Guide for a full capability matrix.
To watch Bifrost's MCP gateway run Code Mode over your own tool catalog, with access control, audit logging, and per-tool cost tracking in place, book a demo with the Bifrost team.
Top comments (0)