Team Bifrost just released something I'm genuinely excited about - Code Mode for MCP.
maximhq
/
bifrost
Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…
The problem they were trying to solve:
When you connect multiple MCP servers (like 8-10 servers with 100+ tools), every single LLM request includes all those tool definitions in context. The continued observation was seeing people burn through tokens just sending tool catalogs back and forth.
Classic flow looks like:
- Turn 1: Prompt + all 100 tool definitions
- Turn 2: First result + all 100 tool definitions again
- Turn 3: Second result + all 100 tool definitions again
- Repeat for every step
The LLM spends more context reading about tools than actually using them.
What they built:
Instead of exposing 100+ tools directly, Code Mode exposes just 3 meta-tools:
- List available MCP servers
- Read tool definitions on-demand (only what you need)
- Execute TypeScript code in a sandbox
The AI writes TypeScript once that orchestrates all the tools it needs. Everything runs in the sandbox instead of making multiple round trips through the LLM.
The impact:
People testing it are seeing drastically lower token usage and noticeably faster execution. Instead of sending tool definitions on every turn, you only load what's needed once and run everything in one go.
When to use it:
Makes sense if you have several MCP servers or complex workflows. For 1-2 simple servers, classic MCP is probably fine.
You can also mix both - enable Code Mode for heavy servers (web search, databases) and keep small utilities as direct tools.
How it works:
The AI discovers available servers, reads the tool definitions it needs (just those specific ones), then writes TypeScript to orchestrate everything. The sandbox has access to all your MCP tools as async functions.
Example execution flow goes from like 6+ LLM calls down to 3-4, with way less context overhead each time.
Docs: https://docs.getbifrost.ai/features/mcp/code-mode
Curious what people think. If you're dealing with MCP at scale this might be worth trying out.

Top comments (0)