DEV Community

Kuldeep Paul
Kuldeep Paul

Posted on

Code Mode in Bifrost MCP Gateway: How Sandboxed Python Cuts Token Costs

With Code Mode in Bifrost MCP Gateway, agents orchestrate tools through short Python scripts, trimming token consumption by as much as 92% with no loss of capability.

Code Mode in Bifrost MCP Gateway replaces the conventional execution path, where every tool schema lands in the model's context on every request, with a compact scripting layer. Rather than pushing hundreds of tool definitions into the prompt, Bifrost surfaces four lightweight meta-tools and lets the model assemble a short Python program to coordinate the work. Across controlled benchmarks with more than 500 connected tools, this model-driven scripting approach has cut input tokens by up to 92.8% while keeping pass rate pinned at 100%. For any team operating production AI agents across several Model Context Protocol servers, Code Mode is what separates a predictable AI bill from a runaway one.

A Working Definition of Code Mode in Bifrost MCP Gateway

At its core, Code Mode in Bifrost MCP Gateway is an orchestration mode in which the AI model composes Python to invoke MCP tools, rather than firing them individually through the standard function-calling loop. Connected MCP servers get projected as a virtual filesystem of Python stub files (.pyi signatures), and the model pulls only the tools it actually needs. It then writes a script that wires those tools together, and Bifrost runs that script inside a sandboxed Starlark interpreter. Only the final result gets returned to the model's context.

The design targets the context-bloat problem that surfaces the moment a team hooks up more than a handful of MCP servers. In the classic execution flow, every tool definition from every server is packed into the prompt on every turn. Five servers with thirty tools each means 150 schemas in context before the model has even read the user's message. Code Mode severs that coupling, so context cost is bounded by what the model chooses to read, not by how many tools sit in the registry. Teams evaluating MCP gateway options often hit this ceiling first.

Where the Default MCP Execution Model Breaks Down on Cost

Standard MCP usage hands the gateway the job of injecting every available tool schema into every LLM call. That works fine for demos and early prototypes. In production, three problems show up:

  • Token spend grows with every connected server. The classic flow transmits the full tool catalog on each request and each intermediate turn of an agent loop. Plugging in more MCP servers makes the situation worse, not better.
  • Latency climbs alongside context size. Longer tool catalogs mean longer prompts, which drive up time-to-first-token and overall request latency.
  • "Just prune the tool list" is a compromise, not a solution. Dropping tools to manage cost means dropping capability. Teams end up juggling separate, artificially narrow tool sets for different agents.

Public engineering work has quantified this pattern. Anthropic's engineering team reported a drop from 150,000 to 2,000 tokens on a Google Drive to Salesforce workflow once tool calls were swapped out for code execution, and Cloudflare explored a parallel approach with a TypeScript runtime. Bifrost's Code Mode applies the same insight directly inside the Bifrost MCP gateway, with two deliberate calls: Python rather than JavaScript (LLMs see considerably more Python in training), and a dedicated documentation meta-tool that squeezes context usage down further.

Inside Code Mode: The Four Meta-Tools

Whenever Code Mode is active on an MCP client, Bifrost automatically injects four generic meta-tools into every request in place of the direct tool schemas that the classic flow would otherwise load.

Meta-tool Purpose
listToolFiles Discover which servers and tools are available as virtual .pyi stub files
readToolFile Load compact Python function signatures for a specific server or tool
getToolDocs Fetch detailed documentation for a specific tool before using it
executeToolCode Run an orchestration script against the live tool bindings

Navigation through the tool catalog happens on demand. The model lists stub files, opens only the signatures it needs, optionally pulls detailed docs for a specific tool, and finally emits a short Python script that Bifrost executes in the sandbox. Both server-level and tool-level bindings are supported: one stub per server for compact discovery, or one stub per tool when more granular lookups are needed. The four-tool interface is identical across both modes. Full configuration details live in the Code Mode configuration reference.

What the Sandbox Allows (and Blocks)

Model-generated scripts run inside a Starlark interpreter, a deterministic Python-like language that Google originally built for configuring its build system. The sandbox is intentionally tight:

  • No imports
  • No file I/O
  • No network access
  • Only tool calls against the permitted bindings and basic Python-like control flow

That scope makes execution fast, deterministic, and safe enough to run under Agent Mode with auto-execution turned on. Because they are read-only, the three meta-tools listToolFiles, readToolFile, and getToolDocs are always auto-executable. executeToolCode becomes auto-executable only once every tool its generated script calls appears on the configured allow-list.

How Code Mode Lowers Token Costs in Real Workflows

Take a multi-step e-commerce workflow: look up a customer, pull their order history, apply a discount, then send a confirmation. The gap between classic MCP and Code Mode shows up in the shape of the context, not just in the final output.

Classic MCP flow: Every turn drags the full tool list along with it. Every intermediate tool result flows back through the model. With 10 MCP servers and more than 100 tools, most of each prompt gets spent on tool definitions.

Code Mode flow: The model reads a single stub file, writes one script that chains the calls together, and Bifrost runs that script inside the sandbox. Intermediate results stay in the sandbox. Only the compact final output reaches the model's context.

Three rounds of controlled benchmarks comparing Code Mode on and off, scaling tool count between rounds, are published by Bifrost:

Scenario Input tokens (off) Input tokens (on) Token reduction Cost reduction
96 tools / 6 servers 19.9M 8.3M -58.2% -55.7%
251 tools / 11 servers 35.7M 5.5M -84.5% -83.4%
508 tools / 16 servers 75.1M 5.4M -92.8% -92.2%

Savings compound as tool count grows: the classic flow pays for every definition on every call, while Code Mode's bill is bounded by what the model actually reads. Pass rate held at 100% across all three rounds, confirming that efficiency did not come at the cost of accuracy. Bifrost's broader performance benchmarks cover the surrounding architecture, and the complete methodology and results for Code Mode are documented in the Bifrost MCP Code Mode benchmark report.

How this cascades through production, including cost governance, access control, and per-tool pricing, is covered end-to-end in the Bifrost MCP Gateway launch post.

Why Code Mode Matters for Enterprise AI Teams

Token cost is just one reason Code Mode pays off in production. For platform and infrastructure teams running AI agents at scale, Code Mode opens up a set of operational properties that classic MCP execution cannot match:

  • Capability without a cost penalty. Every MCP server a team needs (internal APIs, search, databases, filesystem, CRM) can be connected without incurring a per-request token tax on each tool definition.
  • Predictable scaling. Adding an MCP server no longer inflates the context window of every downstream agent. Per-request cost stays flat.
  • Quicker execution. Fewer, larger model turns, with sandboxed orchestration between them, cut end-to-end latency compared to turn-by-turn tool invocation.
  • Deterministic workflows. Orchestration logic sits in a deterministic Starlark script instead of being reassembled across several stochastic model turns.
  • Auditable execution. Every tool call inside a Code Mode script still shows up as a first-class log entry in Bifrost, carrying tool name, server, arguments, result, latency, virtual key, and parent LLM request.

Paired with Bifrost's virtual keys and governance, Code Mode slots into the broader pattern enterprise AI teams need: capability, cost control, and governance handled at the infrastructure layer rather than stitched onto each agent. For a wider view of how this pattern extends, Bifrost's governance capabilities cover the full policy surface.

Turning Code Mode On for a Bifrost MCP Client

Code Mode is a per-client toggle. Any MCP client connected to Bifrost (STDIO, HTTP, SSE, or in-process via the Go SDK) can be flipped between classic mode and Code Mode without a redeployment or a schema change.

Step 1: Connect an MCP server

Open the MCP section of the Bifrost dashboard and add a client. Give it a name, choose the connection type, and supply the endpoint or command. Bifrost then discovers the server's tools and keeps them in sync on a configurable interval, with each client appearing in the list alongside a live health indicator. Complete setup instructions are in the connecting to MCP servers guide.

Step 2: Flip on Code Mode

Open the client's settings and turn Code Mode on. From that point, Bifrost stops packing the full tool catalog into context for that client. Starting with the next request, the model receives the four meta-tools and walks the tool filesystem on demand. Token usage on agent loops drops immediately.

Step 3: Set up auto-execution

Tool calls need manual approval by default. To let the agent loop run autonomously, allowlist specific tools under the auto-execute settings. Allowlisting is per-tool, so filesystem_read can auto-execute while filesystem_write stays behind an approval gate. Under Code Mode, the three read-only meta-tools are always auto-executable, and executeToolCode gets auto-execution only when every tool its script invokes sits on the allow-list.

Step 4: Scope access using virtual keys

Pair Code Mode with virtual keys to scope tool access per consumer. A virtual key tied to a customer-facing agent can be locked down to a specific subset of tools, while an internal admin key gets broader reach. Tools outside a virtual key's scope are invisible to the model, so prompt-level workarounds go away.

Getting Started with Code Mode in Bifrost MCP Gateway

Code Mode is the pragmatic answer to the question every team running MCP in production eventually asks: how do we keep adding capability without watching our token bill go exponential? By pulling orchestration out of prompts and into sandboxed Python, Code Mode in Bifrost MCP Gateway delivers as much as 92% lower token costs, quicker agent execution, and complete auditability, all through a single per-client switch. It works with any MCP server, plugs into virtual keys and tool groups for access control, and fits cleanly into the MCP gateway architecture alongside Bifrost's LLM routing, fallbacks, and observability.

To see what Code Mode in Bifrost MCP Gateway can do on your own agent workloads, book a Bifrost demo with the team.

Top comments (0)