DEV Community

Debby McKinney
Debby McKinney

Posted on

They just shipped Code Mode for MCP in Bifrost and it's kind of wild

Team Bifrost just released something I'm genuinely excited about - Code Mode for MCP.

GitHub logo maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

Go Report Card Discord badge Known Vulnerabilities codecov Docker Pulls Run In Postman Artifact Hub License

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Get started

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…

The problem they were trying to solve:

When you connect multiple MCP servers (like 8-10 servers with 100+ tools), every single LLM request includes all those tool definitions in context. The continued observation was seeing people burn through tokens just sending tool catalogs back and forth.

Classic flow looks like:

  • Turn 1: Prompt + all 100 tool definitions
  • Turn 2: First result + all 100 tool definitions again
  • Turn 3: Second result + all 100 tool definitions again
  • Repeat for every step

The LLM spends more context reading about tools than actually using them.

What they built:

Instead of exposing 100+ tools directly, Code Mode exposes just 3 meta-tools:

  1. List available MCP servers
  2. Read tool definitions on-demand (only what you need)
  3. Execute TypeScript code in a sandbox

The AI writes TypeScript once that orchestrates all the tools it needs. Everything runs in the sandbox instead of making multiple round trips through the LLM.

The impact:

People testing it are seeing drastically lower token usage and noticeably faster execution. Instead of sending tool definitions on every turn, you only load what's needed once and run everything in one go.

When to use it:

Makes sense if you have several MCP servers or complex workflows. For 1-2 simple servers, classic MCP is probably fine.

You can also mix both - enable Code Mode for heavy servers (web search, databases) and keep small utilities as direct tools.

How it works:

The AI discovers available servers, reads the tool definitions it needs (just those specific ones), then writes TypeScript to orchestrate everything. The sandbox has access to all your MCP tools as async functions.

Example execution flow goes from like 6+ LLM calls down to 3-4, with way less context overhead each time.

Docs: https://docs.getbifrost.ai/features/mcp/code-mode

Curious what people think. If you're dealing with MCP at scale this might be worth trying out.

Top comments (0)