Code Mode in Bifrost MCP Gateway has AI agents write Python scripts to orchestrate tools, trimming token usage up to 92% with pass rate fully preserved.
Rather than injecting every tool definition into the model's prompt on each request, Code Mode in Bifrost MCP Gateway takes a different route to agent execution. It keeps the exposed surface area small: four lightweight meta-tools, plus a short Python (Starlark) script that the model writes to orchestrate the work. Controlled benchmarks covering 500+ tools have shown input token reductions reaching 92.8%, with pass rate holding steady at 100%. For any team operating production AI agents across several Model Context Protocol servers, that gap decides whether the monthly AI bill stays manageable or spirals.
Code Mode in Bifrost MCP Gateway Explained
Code Mode in Bifrost MCP Gateway shifts orchestration from one-shot function calls to model-written Python. Rather than invoking each MCP tool separately through the usual function-calling interface, the model produces a single script that strings the calls together. Bifrost presents the connected MCP servers as a virtual filesystem of Python stub files, using .pyi signatures, which the model browses on demand. After locating only the relevant tools, the model drafts its script, and Bifrost runs it inside a sandboxed Starlark interpreter. The model's context receives only the final output, not the intermediate steps.
Context bloat shows up almost immediately once a team wires more than a few MCP servers into an agent. The conventional MCP flow pushes every tool definition from every connected server into the prompt on every single turn. Do the math for 5 servers with 30 tools apiece and the agent is already carrying 150 schemas before the user's message has even been parsed. Code Mode severs that link: prompt cost scales with what the model actually opens, not with the total size of the tool registry.
The Cost Problem Baked Into Default MCP Execution
The conventional MCP setup asks the gateway to push every available tool schema into every LLM request. That model works fine for demos and proof-of-concepts. Once it hits production, three failure modes surface:
- Per-server token costs stack. The classic MCP path ships the full tool catalog on each request and on every intermediate turn of the agent loop. Connecting more servers compounds the charge rather than amortizing it.
- Bigger prompts mean slower responses. Extensive tool lists inflate prompt length, which pushes up time-to-first-token and stretches end-to-end request latency.
- Pruning the tool list isn't a real fix. Trimming capability to save tokens just redistributes the problem. Teams wind up managing multiple narrow tool sets across different agents.
Public work has already put numbers on these failures. Anthropic's engineering team documented a workflow that went from 150,000 tokens to 2,000 when tool calls were swapped for code execution on a Google Drive to Salesforce pipeline, and Cloudflare explored a comparable approach using a TypeScript runtime. Code Mode applies the same core idea, baking it directly into the Bifrost MCP gateway with two deliberate design calls: Python rather than JavaScript (LLMs see substantially more Python during training) and a dedicated documentation meta-tool that trims prompt size further.
Inside Code Mode: The Four Meta-Tools That Power It
Turning Code Mode on at the client level triggers Bifrost to attach four generic meta-tools to every request, taking the place of the direct tool schemas that would otherwise show up in context.
| Meta-tool | Purpose |
|---|---|
listToolFiles |
Discover which servers and tools are available as virtual .pyi stub files |
readToolFile |
Load compact Python function signatures for a specific server or tool |
getToolDocs |
Fetch detailed documentation for a specific tool before using it |
executeToolCode |
Run an orchestration script against the live tool bindings |
Navigation happens on demand: the model lists the stub files, pulls in only the signatures it actually plans to use, optionally reaches for detailed docs on a specific tool, then composes a short Python script that Bifrost runs in the sandbox. Two binding granularities are available, server-level and tool-level; one stub per server keeps discovery compact, while one stub per tool supports more targeted lookups. Both share the same four-tool interface. Configuration details across both modes live in the Code Mode configuration reference.
Inside the Sandbox: Boundaries of Generated Code
Execution runs inside a Starlark interpreter, a deterministic Python-like language first built at Google for build system configuration. The sandbox is intentionally narrow by design:
- No imports
- No file I/O
- No network access
- Only tool calls against the allowed bindings and basic Python-like logic
The result is fast, deterministic execution that is safe to run under Agent Mode with auto-execution on. Because they are read-only, the three meta-tools listToolFiles, readToolFile, and getToolDocs can always be auto-executed. executeToolCode clears the auto-execution bar only when every tool referenced in the generated script appears on the configured allow-list.
Code Mode's Token Savings in Real Workloads
Picture a multi-step e-commerce task: pull up a customer, review their order history, apply a discount, and fire off a confirmation. What separates classic MCP from Code Mode isn't only the final output; it's the entire shape of the context the model sees.
Classic MCP flow: Every turn drags along the full tool list. Each intermediate tool result loops back through the model. Once a workload is running 10 MCP servers with 100+ tools, the bulk of every prompt is being spent on tool definitions.
Code Mode flow: The model pulls one stub file, writes a single script that chains the calls together, and the script runs in the Bifrost sandbox. Intermediate results never leave the sandbox. Only the compact final output returns to the model's context.
Three controlled benchmark rounds were published, toggling Code Mode on and off while scaling tool count between rounds:
| Scenario | Input tokens (off) | Input tokens (on) | Token reduction | Cost reduction |
|---|---|---|---|---|
| 96 tools / 6 servers | 19.9M | 8.3M | -58.2% | -55.7% |
| 251 tools / 11 servers | 35.7M | 5.5M | -84.5% | -83.4% |
| 508 tools / 16 servers | 75.1M | 5.4M | -92.8% | -92.2% |
The gains compound with scale: the classic path reloads every definition on every call, while Code Mode's cost stays bounded by what the model actually reads. Pass rate held firm at 100% across all three rounds, confirming that efficiency came without an accuracy tradeoff. Full methodology and raw numbers sit in the Bifrost MCP Code Mode benchmark report.
How all of this plays out in a live production setting, including cost governance, access control, and per-tool pricing, is covered in the Bifrost MCP Gateway launch post.
What Code Mode Delivers for Enterprise AI Teams
Token cost sits at the top of the list, but it is not the only reason Code Mode earns its place in production. Platform and infrastructure teams running AI agents at scale get a set of operational properties through Code Mode that classic MCP execution simply does not deliver:
- Capability without the cost penalty. Every MCP server a team needs (internal APIs, search, databases, filesystem, CRM) can be connected without paying a per-request token tax for each tool definition.
- Predictable scaling. Bringing a new MCP server online does not balloon the context window of every downstream agent. Per-request cost stays flat.
- Lower end-to-end latency. Fewer, larger model turns with sandboxed orchestration between them cut total response time compared to tool-by-tool multi-turn execution, a pattern consistent with Bifrost's broader performance benchmarks.
- Deterministic workflows. Orchestration logic lives in a deterministic Starlark script instead of being reassembled across several stochastic model turns.
- Auditable execution. Each tool call made from within a Code Mode script is still logged as a first-class event in Bifrost, recording tool name, server, arguments, result, latency, virtual key, and parent LLM request.
Paired with Bifrost's virtual keys and governance, Code Mode slots into the pattern enterprise AI teams have been converging toward for a while: capability, cost control, and centralized AI governance enforced at the infrastructure layer, not stitched onto each individual agent.
Turning On Code Mode for a Bifrost MCP Client
Code Mode operates as a per-client toggle. Any MCP client attached to Bifrost, whether over STDIO, HTTP, SSE, or in-process through the Go SDK, can flip between classic mode and Code Mode on demand, with no redeployment and no schema changes required.
Step 1: Register an MCP Server
Head into the MCP section of the Bifrost dashboard and add a new client. Enter a name, choose the connection type, and provide the endpoint or command. Tool discovery runs automatically, with Bifrost syncing the server's tools on a configurable interval and surfacing each client in the list with a live health indicator. Step-by-step setup is walked through in the connecting to MCP servers guide.
Step 2: Flip the Code Mode Switch
Inside the client's settings, switch Code Mode on. At that moment, Bifrost stops injecting the full tool catalog into context for that specific client. Starting with the next request, the model gets the four meta-tools and browses the tool filesystem on its own. Token usage on agent loops drops from the first call.
Step 3: Set Up Auto-Execution
Out of the box, tool calls need manual approval. To let the agent loop run on its own, allowlist individual tools in the auto-execute settings. Because allowlisting is granular per tool, filesystem_read can run without a prompt while filesystem_write remains behind an approval gate. Under Code Mode, the three read-only meta-tools always run without approval, and executeToolCode qualifies for auto-execution only when every tool that its script touches is on the allow-list.
Step 4: Scope Tool Access Through Virtual Keys
Combine Code Mode with virtual keys to scope tool access by consumer. A virtual key issued to a customer-facing agent can be locked to a specific tool subset, while an internal admin key can be granted broader reach. Tools that fall outside the key's scope never appear to the model, which rules out prompt-level attempts to bypass the restriction.
Putting Code Mode in Bifrost MCP Gateway to Work
Every team running MCP in production eventually runs into the same question: how do you keep adding capability without watching the token bill grow exponentially? Code Mode in Bifrost MCP Gateway is the pragmatic answer. By relocating orchestration from prompts into sandboxed Python, it brings token cost reductions of up to 92%, faster agent runs, and full auditability together under a single per-client toggle. Any MCP server works; virtual keys and tool groups handle access control; and the whole thing drops into Bifrost's MCP gateway architecture next to its LLM routing, fallback, and observability layers.
To see Code Mode in Bifrost MCP Gateway run against your own agent workloads, book a Bifrost demo with the team.
Top comments (0)