Every week, more MCP servers pop up. More tools. More "connect everything to your LLM" demos.
Then you actually plug 8-10 MCP servers into a real product and hit the wall:
- Requests drag
- Bills spike
- The model forgets what the user asked because it's busy reading 150 tool definitions
maximhq
/
bifrost
Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…
With Bifrost MCP Code Mode, the team asked a simple question:
What if, instead of sending all tools to the model, we just sent three?
The Problem with MCP at Scale
When you start with MCP, it feels great. You declare tools, the LLM calls them. Hook in anything - GitHub, Notion, Google Drive, internal APIs.
The trouble starts when you go from 2-3 servers to 8-10+.
A typical setup:
- 5-10 MCP servers (YouTube, web, Gmail, calendar, docs, internal APIs)
- 10-30 tools per server
You quickly end up with well over 100 tools exposed to the model on every single request. Even for a simple "hi".
Three concrete problems
1. Tool definitions overload the context
Most MCP clients send on every request:
- The user prompt
- System instructions
- All tool definitions (name, description, parameters, schemas) for every MCP server
Cloudflare called this out directly: most agents "use MCP by exposing the tools directly to the LLM," which means the model reads a massive JSON catalog before it even looks at the user's question.
As you add more MCP servers, this catalog dominates your prompt.
2. Intermediate results burn tokens twice
Anthropic showed a second issue: the way we chain tools forces every intermediate result to travel back through the model even when it's just being passed from tool A to tool B.
Fetch a document, summarize it, use the summary to query another system - each step involves large data blobs going through the LLM again, burning context and latency.
3. LLMs are better at code than JSON tool calls
Both Cloudflare's "Code Mode" and Anthropic's "Code execution with MCP" point at the same reality:
LLMs are very good at writing TypeScript/JavaScript against typed APIs. They're noticeably worse at emitting pristine JSON tool calls and managing multi-step workflows with dozens of separate hops.
So the current pattern is backwards:
We drown the model in tool JSON, then ask it to manually orchestrate everything using the one thing it's worse at.
What Inspired Us
We didn't invent "code-first agents for MCP." Two big pieces influenced Bifrost MCP Code Mode:
Cloudflare's Code Mode
Instead of exposing every MCP tool directly, they:
- Expose a single code execution tool
- Give the sandbox an environment with bindings to MCP servers
- Let the model write TypeScript that calls those bindings directly
Result: The LLM focuses on writing code. Code runs in a sandbox and talks to MCP servers. Tool definitions no longer dominate the prompt.
Anthropic's "Code execution with MCP"
Anthropic's team described: present MCP servers as code APIs in a filesystem, then let the model:
- List directories
- Read TypeScript files that describe tools
- Write code that imports and uses those APIs in a code execution environment
This tackles both issues:
- Agents can load only the tools they need, on demand
- They can process data inside the execution environment before sending anything back to the model
What they were hearing from users
Bifrost users kept saying:
- "Can Bifrost just handle all my MCP connections?"
- "I don't want to tune tool lists per model/agent"
- "Just give me a gateway that's fast and cheap"
They were already positioning Bifrost as a low-overhead LLM gateway with minimal latency and full feature set (routing, observability, policies, MCP).
So the thought was:
If we're already the MCP/LLM gateway, why not bake Code Mode into the gateway itself?
That's what Bifrost MCP Code Mode is: Code Mode as a gateway feature that works across all your MCP servers.
How Bifrost MCP Code Mode Works
Say you have these MCP servers wired into Bifrost:
-
youtube -
web -
gmail -
gmeet -
notion
Each exposes ~20 tools. Normal MCP flow means ~100 tools in context on every request.
With Bifrost MCP Code Mode enabled, the model only sees three tools:
-
mcp_listFiles -
mcp_readFile -
mcp_executeCode
Everything else is "hidden behind" those.
A virtual file system of your MCP servers
Internally, Bifrost builds a virtual file system (VFS) representing all code-mode MCP servers.
By default, server-level binding gives each MCP server a single .d.ts file:
servers/
youtube.d.ts
web.d.ts
gmail.d.ts
gmeet.d.ts
notion.d.ts
Each *.d.ts file is a TypeScript declaration file describing that server's tools.
For example, servers/youtube.d.ts might look like:
typescript
export declare function listChannels(params: {
search: string;
limit?: number;
}): Promise<Channel[]>;
export declare function listVideos(params: {
channelId: string;
limit?: number;
}): Promise<Video[]>;
export declare function getVideoSummary(params: {
videoId: string;
}): Promise<VideoSummary>;
The model now sees MCP tools as functions instead of opaque "tool JSON."
Alternatively, tool-level binding gets one file per tool:
servers/
youtube/
listChannels.d.ts
listVideos.d.ts
getVideoSummary.d.ts
web/
search.d.ts
The three generic tools
mcp_listFiles()
- Returns a directory tree of the VFS
- Lets the model discover which servers exist
mcp_readFile(path, fromLine, toLine)
- Reads tool signatures for only the servers actually needed
- Paginates through large files without blowing context
mcp_executeCode(code)
- Runs TypeScript in a sandboxed environment
- Provides bindings that line up with the
.d.tsfiles
Inside the sandbox:
import * as youtube from "servers/youtube";
import * as web from "servers/web";
import * as gdocs from "servers/gdocs";
// Model writes code against these imports
// Bifrost wires them to real MCP tool calls
The actual loop: what the LLM does
From the model's perspective:
1. Inspect the VFS
Call mcp_listFiles() to see what's available:
json
{
"files": [
"servers/youtube.d.ts",
"servers/web.d.ts",
"servers/gdocs.d.ts"
]
}
2. Load relevant APIs
Call mcp_readFile("servers/web.d.ts", 0, 400) to learn how to search. Call mcp_readFile("servers/youtube.d.ts", 0, 400) for YouTube APIs.
3. Write code that orchestrates everything
import * as web from "servers/web";
import * as youtube from "servers/youtube";
import * as gdocs from "servers/gdocs";
export default async function main() {
const companyResult = await web.search({
query: "Which company launched the Bifrost LLM Gateway?",
limit: 1,
});
const companyName = companyResult[0]?.name;
if (!companyName) return { error: "No company found" };
const channels = await youtube.listChannels({
search: companyName,
limit: 1,
});
if (!channels.length) {
return { error: `No channels found for ${companyName}` };
}
const videos = await youtube.listVideos({
channelId: channels[0].id,
limit: 5,
});
const doc = await gdocs.createDoc({
title: `${companyName} - Latest YouTube Report`,
data: videos.map(v => ({
id: v.id,
title: v.title,
thumbnail: v.thumbnail,
})),
});
return { companyName, docUrl: doc.url };
}
4. We execute the code
Bifrost runs this in a sandbox where web.search, youtube.listChannels, etc. are backed by real MCP tool calls.
The full fan-out to MCP servers happens inside the sandbox, not through dozens of LLM turns.
5. Compact result back
The model sees:
{
"companyName": "Example Corp",
"docUrl": "https://docs.google.com/..."
}
And answers the user naturally.
Choosing your binding level
Server-Level (Default)
- One
.d.tsper MCP server - Best for: moderate tool counts (5-20 per server)
- Trade-off: larger files, simpler discovery
Tool-Level
- One
.d.tsper individual tool - Best for: servers with 30+ tools
- Trade-off: more files, maximum context efficiency
Both use the same three-tool interface, so the LLM adapts automatically.
Mixing Code Mode and classic MCP
Not every server needs Code Mode.
You can:
- Put
web,youtube,gdocs,gmailinto code mode - Keep small utilities (
datetime,math) as classic tools exposed directly
The LLM sees:
-
mcp_listFiles,mcp_readFile,mcp_executeCode - Plus a small curated set of direct tools
Adopt Code Mode incrementally instead of all-or-nothing.
Example: One Workflow, Two Traces
Task: "Check which company launched Bifrost LLM Gateway and make a summary of their last 5 YouTube videos, create a Google Doc."
Assume:
- 10 MCP servers connected
- ~15 tools each = ~150 total tools
Normal MCP flow
Turn 1: Prompt + 150 tools → LLM calls web.search
Turn 2: Prompt + search result + 150 tools → LLM calls youtube.listChannels
Turn 3: Prompt + results + 150 tools → LLM calls youtube.listVideos
Turn 4: Prompt + results + 150 tools → LLM calls youtube.getVideoSummary 5x
Turn 5: Prompt + summaries + 150 tools → LLM calls gdocs.createDoc
Turn 6: Prompt + doc result + 150 tools → Final answer
Result:
- 6 LLM turns
- 150 tools in context every time
- All intermediate results flow through the model
Bifrost Code Mode flow
Turn 1: Prompt + 3 tools → LLM calls mcp_listFiles
Turn 2: Prompt + listFiles result + 3 tools → LLM calls mcp_readFile for web, youtube, gdocs
Turn 3: Prompt + readFile results + 3 tools → LLM returns code block
We execute that code - it calls web.search, youtube.listChannels, youtube.listVideos, youtube.getVideoSummary, gdocs.createDoc inside the sandbox.
Turn 4: Prompt + code execution result + 3 tools → Final answer
Result:
- 3-4 LLM turns
- Only 3 tools in base context
- Tool definitions loaded on demand
- Intermediate results stay in execution environment
The model spends context on the task, not re-reading tool catalogs.
How It Benefits You
1. Dramatically less token overhead
You're sending only three tools up front. TypeScript definitions load on demand. Intermediate data processes inside the sandbox.
This means:
- Lower cost
- More headroom for actual user context
- Less chance of context overflow
2. Lower latency
Less prompt bloat means faster model eval.
More importantly: complex multi-step workflows collapse into a single executeCode call instead of 5-10 tool-call hops.
3. Better tool orchestration
By letting the model write code, you get normal programming features:
- Loops over collections
- If/else logic
- Retries and error handling
- Helper functions for data shaping
Cloudflare and others argue this makes agents more capable and reliable than hacking logic into prompt instructions.
4. Gateway-level simplicity
You don't have to:
- Build your own code-mode proxy
- Hand-roll sandboxes
- Maintain a separate MCP wrapper deployment
Instead:
- Point your MCP servers at Bifrost
- Flip Code Mode on for servers you want
- Let the gateway handle VFS,
.d.tsgeneration, sandbox wiring
5. Incremental adoption
Mix code-mode and classic MCP servers:
- Start by putting "heavy" servers (web, docs, file APIs) into Code Mode
- Keep small, trusted tools as direct calls
- Gradually migrate more as you get comfortable
You can also:
- Observe generated code
- Put guardrails around sandbox permissions
- Iterate on schemas without changing your client app
Related Work & Acknowledgments
Bifrost MCP Code Mode stands on the shoulders of:
Cloudflare – "Code Mode: the better way to use MCP"
Introduced translating MCP into typed code interface, letting LLMs write TypeScript against bindings in a sandboxed environment.
Anthropic – "Code execution with MCP"
Showed how presenting MCP servers as code APIs in a filesystem with code execution can drastically reduce token usage.
What is added:
- Gateway-level implementation across all MCP servers connected to Bifrost
- Three-tool interface (
listFiles,readFile,executeCode) tailored to that gateway role - Ability to mix code-mode and classic MCP per server for gradual adoption
Try it:
If you're dealing with MCP at scale, this might be worth trying.

Top comments (0)