Classic MCP loads every tool schema on every request, driving up costs. Code Mode in Bifrost cuts input tokens by 92.8% by letting AI orchestrate through Python.
Building AI agents that connect to dozens of external tools creates a hidden cost: each request loaded with 150+ tool schemas, consuming most of the model's token budget just parsing definitions instead of solving the task. This is the central tradeoff between classic MCP and Code Mode. Bifrost, the open-source AI gateway built in Go by Maxim AI, gives teams both options, letting them pick based on deployment scale and workflow complexity. Understanding when each approach makes sense can cut costs by orders of magnitude.
The Default Approach: Direct Tool Calling
Classic MCP tool calling is how most MCP implementations work out of the box. The gateway pushes all connected tool definitions into the model's context before each request arrives, the model generates tool-call suggestions, and the application handles execution separately through an explicit approval step.
Tool execution in Bifrost follows a deliberate, stateless flow: each chat request asks the model what to do, returns the suggestions without running them, lets the application review for safety, and then executes the approved actions via a separate call before resuming conversation. Bifrost simultaneously acts as an MCP client connecting to external servers via STDIO, HTTP, or SSE, and as an MCP server exposing those tools to clients such as Claude Desktop. See the MCP overview for the full mechanics.
This design delivers tight control. Tool execution never happens without a reviewer in the loop, all operations are logged, and the calling code maintains conversation state. The downside emerges at scale: as servers multiply, the model rereads the entire tool list on every turn, and every intermediate step in a multi-tool task has to flow back through the context before the next move.
The Alternative: Code Mode
Code Mode flips the model: instead of injecting all tool definitions, the system shows the AI four small meta-tools and lets it write a Python orchestration script that runs in a controlled sandbox. The model sees compact schemas rather than a catalog of 150 definitions.
When Code Mode is enabled in Bifrost, four lightweight tools appear in every request:
- listToolFiles: reveal available MCP servers and their tools
- readToolFile: fetch Python function signatures for a server as needed
- getToolDocs: retrieve full details for a single tool on demand
- executeToolCode: submit Python code for sandbox execution with full tool bindings
The model discovers what's available, pulls the signatures it needs, and writes a short Python script that Bifrost runs via a Starlark interpreter. Any intermediate outputs from tool calls stay sandboxed and never return to the model; only the final result comes back. This mirrors the pattern described by Anthropic's engineering team, which found that a Google Drive to Salesforce integration shrank from 150,000 tokens down to 2,000 when tool calls gave way to code execution. Bifrost embeds this as a first-class gateway feature, picking Python over JavaScript (models see more Python in training data) and including a dedicated doc-lookup tool to compress context even further.
Comparing the Two Patterns
Both approaches operate through the same MCP gateway. The difference lies in what the model sees, how many interaction cycles happen, and where orchestration logic runs. Switching from one to the other is just a per-client configuration toggle.
| Aspect | Classic MCP | Code Mode |
|---|---|---|
| Tool visibility | Complete list in context, repeated per turn | Four meta-tools; definitions fetched on demand |
| Model interaction cycles | One step per tool invocation | 3 to 4 total across whole workflow |
| Where intermediate outputs go | Returned to model | Processed inside the sandbox |
| Where workflow logic lives | Conversation loop | Python script |
| Cost scaling | Grows linearly with tool count | Capped by actually-used files |
| Right for | Few tools, straightforward calls | Many servers, intricate multi-step work |
Classic MCP's simplicity shines for lean tool sets. Code Mode trades that immediacy for resource efficiency: tools stay behind the meta-tool interface, and the model only reads the stubs and docs it actually uses.
The Token Math as Tool Count Climbs
The efficiency gains from Code Mode multiply as deployments scale. At large tool counts, classic MCP cost balloons because the entire catalog reloads every turn, whereas Code Mode cost plateaus based on which stub files the model opens.
Bifrost ran benchmarks across increasing MCP footprints, comparing Code Mode to classic MCP with the same test queries each time. The benchmark writeup shows the separation widening:
- 96 tools across 6 servers: input token drop of 58.2%, cost drop of 55.7%
- 251 tools across 11 servers: input token drop of 84.5%, cost drop of 83.4%
- 508 tools across 16 servers: input token drop of 92.8% (75.1M falling to 5.4M), cost drop from $377 to $29
The largest test maintained a 100% pass rate (65 of 65 succeeded), showing the token reduction came without sacrificing task completion. Per-query math at 500 tools shows roughly a 14x saving, dropping from 1.15M to 83K tokens. These runs also cut LLM round trips from many calls to 3 to 4 total, and accelerated execution by ~40%. Using published benchmarks, teams can test their own MCP configurations.
How Code Mode Executes, Step by Step
Code Mode replaces the back-and-forth loop with a single discover-then-run sequence. Once activated for an MCP client, that client's tools disappear from direct availability and only become accessible through the four meta-tools.
A typical interaction unfolds like this:
-
Discovery: the model calls
listToolFilesto examine available servers -
Schema loading:
readToolFileretrieves the Python signatures for a relevant server -
Documentation lookup: if needed,
getToolDocspulls deeper docs for one tool -
Execution: the model writes Python code that
executeToolCoderuns in the sandbox, returning a compact outcome
The Starlark sandbox intentionally limits what code can do: no import statements, no filesystem or network access, default 30-second timeout. Tools from Code Mode clients appear as global objects in the sandbox; calls run synchronously, so scripts can chain multiple tool operations and return a single aggregated result. Bifrost offers two ways to organize the stub files: server-level puts all tools from one server in one file, while tool-level creates one file per tool for servers with large schemas.
Code Mode is toggled per MCP client. A single Bifrost instance can run some clients in classic mode and others in Code Mode simultaneously, and adding new servers follows the standard connection flow either way. Complete setup instructions and the full meta-tool API are in the Code Mode docs.
Deciding Between Classic MCP and Code Mode
Code Mode wins for deployments with 3+ MCP servers, workflows that require multiple orchestrated steps, or scenarios where token cost and latency are top concerns. Stick with classic calling for single or dual-server setups with straightforward, one-step tool use. The patterns need not be all-or-nothing: advanced teams use Code Mode for heavy-duty servers like web search or data access, and keep direct calling for quick utilities.
What if I only have one or two MCP servers?
The token savings from Code Mode rarely justify the added conceptual overhead with minimal server counts. Classic calling has lower mental complexity and debugging is simpler. Code Mode's payoff accelerates as the tool count climbs.
Will Code Mode slow my agents down?
No. By consolidating multiple tool steps into one sandboxed script, Code Mode actually cuts the count of LLM interactions by 3 to 4 times and typically improves execution speed by 40% in large deployments. Classic MCP can still be preferable for single, immediate tool calls where latency is critical.
Can I run both at once?
Absolutely. Code Mode is configured individually per MCP client, so a Bifrost setup can have some clients in classic mode and others in Code Mode. This approach lets teams migrate the largest, most expensive servers first while keeping lightweight tools on the simpler path.
How secure is Code Mode auto-execution?
Agent Mode lets Code Mode scripts run without human approval, but with safeguards. Bifrost extracts every tool reference from the code and executes only if all of them are on the allowlist; any off-list calls bounce back for approval. Paired with granular tool filtering per virtual key, this keeps autonomous runs inside configured guardrails. For compliance-heavy teams and regulated industries, Bifrost operates as a full-featured MCP gateway with access logs, cost attribution, and tool-level permissions.
Next Steps
Picking between classic MCP and Code Mode fundamentally depends on scale. For small tool inventories, classic calling is clear and direct. For agents managing many connected servers, Code Mode constrains token consumption and latency even as the catalog expands, while sustaining reliability. Both ride the same infrastructure; toggling Code Mode happens once servers connect via the Bifrost quickstart. The MCP specification provides full protocol details for teams digging deeper into the standard itself.
Ready to optimize MCP orchestration and governance at scale? Schedule a demo with the Bifrost team, or browse the Bifrost resources hub for benchmarks and deployment patterns.
Top comments (0)