I spend a lot of time wiring AI coding tools together: VS Code, Copilot, Claude Code, Codex-style flows, local agents, and MCP servers.
The problem is not the model anymore. It is the plumbing.
Every new provider needs a key. Every MCP server adds another tool catalog. Every tool adds schema, API shapes, and prompt context.
That is where Bifrost caught my attention.
Bifrost is an open-source gateway from Maxim AI. It is not meant to replace models or agents. It is meant to sit between them and make the whole system easier to inspect and control.
This test had a simple goal: start the gateway, route real model traffic through it, attach a filesystem MCP server, enable Code Mode, and see whether a coding agent could actually work through that stack.
I also wanted practical answers:
- Which key did this agent use?
- Which model did it call?
- Which tools could it reach?
- What did the run cost?
- Where could I inspect the logs?
That is the thread through this post. It is not a marketing summary. It is a field test of the local gateway, the CLI, provider routing, MCP setup, Code Mode, and a real coding-agent launch.
Repository:
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 23+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
Official setup docs:
Open the official Bifrost gateway setup guide.
What I was actually testing
MCP is useful because it gives AI agents a standard way to call tools. Those tools can be filesystem access, search, databases, internal APIs, browser automation, or anything else exposed through an MCP server.
That sounds good until your tool list gets large.
Classic MCP can put a big tool catalog into the model's context. With a few servers that is okay. With many servers, the model spends tokens just reading what tools exist.
That is the problem I want to avoid.
I do not want my coding agent spending context on every possible tool every time. I want it to discover what it needs and keep the prompt smaller.
Bifrost's Code Mode is designed for that issue.
So the question was not "does Bifrost look useful?" The question was:
Can I install it, wire it into a real project, and see the tool-control layer work end to end?
Why the Bifrost CLI matters
The gateway routes providers. The CLI is what makes the setup usable for coding agents.
I wanted to avoid hand-editing agent config every time I switched models or providers.
The workflow I tested was:
- launch the gateway
- run
bifrost-cli - pick a harness
- pick a model
- launch the agent
The CLI stores state under ~/.bifrost/, including gateway URL, selected model, and harness.
That matters because it keeps the gateway and agent shared in one place instead of every agent maintaining a separate, brittle config.
Supported harnesses in the docs included:
- Claude Code
- Codex CLI
- Gemini CLI
- Opencode
Installation and integration
I tested both ways to run the gateway.
Local NPX
npx -y @maximhq/bifrost
Docker
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
That gives you a dashboard at:
http://localhost:8080
I always start there. If the dashboard does not respond, nothing else matters.
First smoke test
curl http://localhost:8080
Once the gateway was up, the next step was provider setup, MCP setup, and CLI testing.
First model request
A basic OpenAI-compatible request looks like this:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{ "role": "user", "content": "Hello, Bifrost!" }
]
}'
I did not use RunKit here because RunKit cannot reach localhost:8080.
Add an MCP client and enable Code Mode
The Bifrost path is:
Dashboard → MCP Gateway → add client → choose STDIO, HTTP, or SSE → configure command or URL → choose tool permissions
For this test, I used a STDIO filesystem MCP server scoped to the donotpush draft folder.
That scope matters. A gateway should not become an excuse to give every agent access to the whole machine.
Once the MCP client exists, Code Mode can be enabled for that client. When Code Mode is on, the model does not get the full tool catalog directly. It gets four meta-tools first, then inspects and executes only what it needs.
Where Code Mode fits in
This was the part I cared about most.
Classic MCP can work fine. It can also force the model to carry a lot of extra tool metadata.
Code Mode is meant to change that.
Bifrost exposes these four meta-tools:
listToolFiles
readToolFile
getToolDocs
executeToolCode
That means the model can ask:
- what tool files are available?
- what is in this stub?
- what are the docs for this tool?
- execute this code against the tool binding
The idea is to give the model an index instead of a giant binder.
Why Code Mode exists
The problem is context bloat. In a small MCP setup, every tool definition may be fine. In a larger setup, it becomes waste.
Bifrost's docs use an example with five MCP servers and around 100 tools. In the classic flow, the model carries that catalog across multiple turns. In Code Mode, it carries only four meta-tools and then loads the specific stubs it needs.
This is why Code Mode uses code instead of a long chain of individual tool calls. The model can orchestrate the workflow in a constrained Starlark interpreter, and intermediate results do not have to get pushed back through the model.
Performance and cost claims
Bifrost's docs claim that for about 100 tools across five MCP servers, Code Mode can cut the interaction shape from six LLM turns with a full catalog to three or four turns with about 50 tool-definition tokens.
Their published benchmark goes further: 55.7% lower estimated cost with 96 tools, 83.4% with 251 tools, and 92.2% with 508 tools.
Those are vendor claims. I am including them here because they explain why the feature is worth testing, not because my one-server filesystem test reproduced them.
When I would enable it
My rule of thumb after this pass:
- keep classic MCP for one or two small servers
- enable Code Mode when you have three or more MCP servers
- enable it for heavier servers like filesystem, search, docs, databases, CRM, or internal APIs
- use tool-level binding when one server has many tools or big schemas
- keep the allowed tool list tight either way
How it works
Code Mode exposes MCP tools as virtual .pyi stub files. There are two binding levels:
- server-level binding: one stub file per MCP server
- tool-level binding: one stub file per tool
Server-level binding is simpler when a server has a small tool set. Tool-level binding is better when a server has many tools or large schemas.
The code runs in Starlark, a deterministic Python-like runtime. Bifrost's docs describe it as intentionally constrained: no imports, no file I/O, no network access, just allowed tool calls and basic Python-like logic. That constraint is what makes Code Mode interesting for agent workflows.
Code Mode is more like giving them a small index first. They look up the exact tool they need, read only that tool's instructions, then run the task. That matters when the number of tools grows."Classic MCP vs Code Mode in plain English"
Classic MCP is like handing someone a giant binder full of every tool manual before every task.
Why that matters for coding agents
Coding agents already use a lot of context.
They read files, inspect diffs, look at errors, review stack traces, and sometimes pull in project history. If they also carry every connected MCP tool definition on every request, the context gets noisy fast.
That can hurt:
- cost
- latency
- tool selection
- reliability
- debugging
For small setups, this may not matter. If you only have one or two MCP servers with a few tools, classic MCP can be fine.
But once you start connecting heavier tools, Code Mode makes more sense.
That is why I tested the flow directly instead of only repeating the docs.
What I tested
My test setup used the tools I actually work with:
- VS Code
- Bifrost gateway
- Bifrost CLI
- Codex CLI and OpenCode
- one filesystem MCP server
- a real local project
The test needed to answer one practical question: can the gateway sit between the agent, the provider, and the MCP server without turning the workflow into a black box?
The flow looked like this:
Start Bifrost gateway
↓
Open Bifrost dashboard
↓
Configure provider
↓
Start Bifrost CLI
↓
Launch coding agent through Bifrost
↓
Run a small repo task
↓
Compare classic MCP flow vs Code Mode flow
The task was intentionally small. I did not want a fake benchmark. I wanted enough proof to answer this:
Can I route a coding agent through Bifrost, expose MCP tools, turn on Code Mode, and see the control layer working?
Everything after this point is evidence for that question.
The small agent task
For the coding-agent check, I used a deliberately small prompt:
Do not edit files. Reply with exactly: opencode through bifrost ok
That prompt is boring on purpose. It does not prove the agent can solve every repo task. It proves the path: agent starts, uses the configured Bifrost provider, calls the model through the gateway, and returns a response.
Then I split the MCP behavior into two questions:
Classic MCP:
- Did the model see the direct filesystem tools?
- Did it produce a normal MCP tool call?
- Could I execute that tool call through Bifrost?
Code Mode:
- Could I enable Code Mode on the MCP client?
- Did the four meta-tools work?
- Could executeToolCode call the underlying filesystem server?
What I am not claiming
This is not a benchmark.
Bifrost publishes token-savings numbers for Code Mode. My test did not try to reproduce those claims.
I only used one filesystem MCP server. With one server, there is not enough tool sprawl to make a serious token-savings claim. The useful result is narrower:
- gateway
- provider route
- local Ollama
- Ollama cloud
- MCP server
- Code Mode meta-tools
- coding agent
That is the difference between "I read the docs" and "I ran the workflow."
What I measured
I also have Ollama installed locally, with both local models and Ollama cloud variants available. That let me test:
- local Ollama model routing
- Ollama cloud model routing
- Bifrost provider behavior
- MCP tool discovery
- Code Mode meta-tool behavior
- one coding-agent launch through Bifrost
Here is the checklist from the hands-on pass:
Install/setup:
- [x] NPX start cleanly
- [x] Docker start cleanly
- [x] Dashboard available at localhost:8080
- [x] Provider setup persists
- [x] OpenAI config bootstrap tested
- [ ] OpenAI provider request blocked by missing OPENAI_API_KEY
CLI:
- [x] Bifrost CLI detects the gateway
- [x] Bifrost CLI asks for the expected config
- [x] Bifrost CLI shows Codex CLI as installed
- [x] Bifrost CLI lets me set the gateway URL and model
- [x] Config is stored in ~/.bifrost/config.json and ~/.bifrost/state.json
- [x] Bifrost CLI launches Codex CLI
- [x] Bifrost CLI prints the MCP server URL for Codex CLI
- [x] Codex CLI launch works
- [ ] Codex CLI request blocked because current Codex expects Responses API for this route
MCP:
- [x] Add at least one MCP server
- [x] Gateway can see the tools
- [x] Model can produce a classic MCP tool call
- [x] Classic MCP tool call works
- [x] Enable Code Mode
- [x] Code Mode exposes the four meta-tools
- [x] Run the same "list allowed directories" task with classic MCP and Code Mode
- [x] Route local Ollama models through Bifrost
- [x] Route Ollama cloud models through Bifrost
Agent workflow:
- [x] Complete a minimal coding-agent request through Bifrost with OpenCode
- [x] Keep the response understandable
- [x] Confirm tool-governed filesystem access through MCP
- [ ] OpenAI-vs-Ollama provider fallback blocked by missing OpenAI key
- [out of scope] Prove token savings with a large multi-server setup
Cost/context:
- [out of scope] Fewer tokens used in a realistic multi-server setup
- [x] Latency is acceptable for small local tests
- [x] Workflow is inspectable from one gateway dashboard
This is the evidence I care about more than a marketing number: what launched, what routed, what exposed tools, what returned output, and what failed.
The unresolved items were clear: OpenAI fallback needs a real key, Codex needs a Responses-compatible route, and a larger multi-server setup is required for any real token-savings claim.
Hands-on results
The first thing I verified was the gateway.
I started the local gateway and confirmed http://127.0.0.1:8080 was alive.
The dashboard loaded cleanly.
That dashboard is useful because it gives a second source of truth. It shows request volume, token usage, model usage, latency, cache state, and cost.
And the panels were meaningful for this test.
With that working, I moved to provider wiring.
The Bifrost CLI defaulted to Codex CLI. I chose openai-ollama2/gpt-oss:20b-cloud after confirming the custom provider could route Ollama cloud requests.
On disk, the CLI state was exactly where I expected:
-
~/.bifrost/config.jsonpointed athttp://localhost:8080 -
~/.bifrost/state.jsonstored the selected harness and model -
./bifrost-data/config.dbhad the usual config tables
I also checked the repo .env. It had project secrets, but not an OpenAI key for Bifrost.
So I did one more bootstrap test:
- created
bifrost-temp/config.jsonwithproviders.openai.keys[0].value = "env.OPENAI_API_KEY" - launched Bifrost on
http://127.0.0.1:8081 - the gateway started and wrote state to
./config.db - the provider row existed, but Bifrost reported "no valid keys found for provider: openai"
That was the exact failure mode: the key was missing from the shell.
Ollama routing worked
With OpenAI blocked by a missing secret, I tested the local provider I could fully verify: Ollama.
I started ollama serve on http://127.0.0.1:11434 and confirmed the API exposed models such as gemma3:1b, qwen3-coder:30b, tinyllama:latest, and mistral:latest.
I configured a custom Bifrost provider through /api/providers. The right pattern for Ollama was a key with an empty models list, which allows all models through.
The provider config looked like this:
{
"provider": "openai-ollama2",
"keys": [
{
"name": "openai-ollama2-key",
"value": "dummy",
"models": [],
"weight": 1.0
}
],
"network_config": {
"base_url": "http://127.0.0.1:11434",
"default_request_timeout_in_seconds": 60
},
"custom_provider_config": {
"base_provider_type": "openai",
"allowed_requests": {
"chat_completion": true,
"chat_completion_stream": true
}
}
}
With that in place, Bifrost routed both local Ollama and Ollama cloud requests.
The request volume panel captured the setup process honestly: successful checks and tuning failures.
The token usage panel was not a benchmark, but it gave me a baseline.
The model usage panel confirmed the requests were hitting the models I expected.
Latency was uneven, which is no surprise for mixed local/cloud routing.
Cache was effectively unused in this small test.
I also verified the Docker path on http://127.0.0.1:8082 with:
docker run -p 8082:8080 -v $(pwd)/bifrost-temp:/app/data maximhq/bifrost
That instance also responded.
At that point the provider layer was proven enough to move on.
Classic MCP versus Code Mode
Next, I added a filesystem MCP server scoped to the donotpush folder.
curl -X POST http://127.0.0.1:8080/api/mcp/client \
-H "Content-Type: application/json" \
-d '{
"name": "filesystem_blog",
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
],
"envs": ["HOME", "PATH"]
},
"tools_to_execute": ["*"],
"is_ping_available": false
}'
Bifrost discovered 14 tools from that server.
In classic MCP mode, the model produced a direct tool call:
filesystem_blog-list_allowed_directories
I executed it through Bifrost:
curl -X POST http://127.0.0.1:8080/v1/mcp/tool/execute \
-H "Content-Type: application/json" \
-d '{
"id": "call_pjlop9a3",
"type": "function",
"function": {
"name": "filesystem_blog-list_allowed_directories",
"arguments": "{}"
}
}'
Result:
Allowed directories:
/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush
That is the governance point in miniature: the server only saw one draft folder.
Then I flipped the same client into Code Mode.
curl -X PUT http://127.0.0.1:8080/api/mcp/client/f771c023-8a16-4b34-b03f-ffbfffd34e4b \
-H "Content-Type: application/json" \
-d '{
"name": "filesystem_blog",
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
],
"envs": ["HOME", "PATH"]
},
"tools_to_execute": ["*"],
"is_ping_available": false,
"is_code_mode_client": true
}'
The four Code Mode meta-tools worked.
listToolFiles returned:
servers/
filesystem_blog.pyi
readToolFile returned a compact stub with signatures like:
def read_text_file(path: str, head: float = None, tail: float = None) -> dict
def list_allowed_directories() -> dict
def list_directory(path: str) -> dict
getToolDocs returned docs for a specific function.
executeToolCode ran the filesystem server from inside the Code Mode layer:
Execution completed successfully.
Return value: "Allowed directories:\n/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
So the behavior matched the docs. The model saw a small tool-file surface first, then executed the specific tool binding it needed.
The task was the same in both modes:
List the allowed filesystem directory.
Classic MCP exposed the filesystem tools directly. Code Mode exposed a smaller meta-tool surface.
On this setup, I am not claiming savings. I am claiming behavior verification.
Coding agent results
The final step was the coding-agent launch.
Bifrost CLI launched Codex CLI and showed:
Harness Codex CLI (codex-cli 0.125.0)
Model openai-ollama2/gpt-oss:20b-cloud
It also printed:
MCP: Codex CLI has no native auto-attach yet. Use server URL: http://localhost:8080/mcp
So the launch path worked.
But the actual Codex request did not complete. Codex expects the Responses API, while my Ollama route was using chat completions.
Codex reported:
Error loading config.toml: `wire_api = "chat"` is no longer supported.
How to fix: set `wire_api = "responses"` in your provider config.
That is a real provider compatibility issue, not a gateway failure.
For the working agent test, I used OpenCode, which supports OpenAI-compatible chat completions.
The config was:
{
"$schema": "https://opencode.ai/config.json",
"model": "bifrost/openai-ollama2/gpt-oss:20b-cloud",
"provider": {
"bifrost": {
"npm": "@ai-sdk/openai-compatible",
"name": "Bifrost",
"options": {
"baseURL": "http://127.0.0.1:8080/v1",
"apiKey": "dummy"
},
"models": {
"openai-ollama2/gpt-oss:20b-cloud": {
"name": "Ollama cloud through Bifrost",
"limit": {
"context": 32768,
"output": 4096
}
}
}
}
}
}
Then I ran:
OPENCODE_CONFIG=/tmp/bifrost-opencode.json opencode run \
--dir /Users/bradleymatera/Desktop/gatsby-starter-minimal-blog \
--format json \
--model bifrost/openai-ollama2/gpt-oss:20b-cloud \
'Do not edit files. Reply with exactly: opencode through bifrost ok'
Result:
opencode through bifrost ok
Usage reported:
17154 total tokens
17100 input tokens
54 output tokens
That was the first real coding-agent request through Bifrost in this test.
Why governance matters here
This setup is not just about making a request succeed.
It is about making the request visible and controllable.
A gateway lets you answer:
- which virtual key was used?
- which provider routed the request?
- which MCP server was available?
- what did the dashboard record?
That matters more than one successful completion.
Bifrost also surfaces:
- virtual keys
- budget controls
- provider routing
- MCP tool governance
- audit logs
- cost tracking
This is not only enterprise language. Even for solo projects, it is useful to know what key is being used, what model is being called, and what tools the agent can reach.
If I am running multiple demos or experiments, I want separate keys, usage tracking, and some protection against accidental spending.
Bifrost as the control plane
Without Bifrost, the setup looks like this:
Agent → Provider
Agent → MCP server
Agent → another provider
Agent → another tool config
Agent → another local config file
With Bifrost:
Agent → Bifrost → Providers
→ MCP tools
→ routing
→ governance
→ logs
→ cost tracking
That second layout is easier to reason about.
It does not make agents perfect, but it does make the system easier to inspect.
Benefits and limitations
What was clear from this pass:
- the gateway is easy to start with NPX or Docker
- the dashboard makes traffic, latency, tokens, and errors visible
- the CLI reduces per-agent setup work
- the CLI can switch between supported agents without hand-editing every config
- virtual keys, audit logs, and budget controls add governance
- Code Mode turns the MCP surface into a smaller discovery-and-execute flow
What was also clear:
- Code Mode is not a magic switch. It is worth validating in every environment.
- Classic MCP can still be the better choice for very small setups.
- Provider compatibility matters. One OpenAI-compatible route is not the same as another.
My local dashboard was on v1.4.24. The docs call out Code Mode from v1.4.0-prerelease1. That is a good reminder: verify the exact version where you run this.
The provider-specific limitation was real. OpenCode worked through Bifrost + Ollama. Codex CLI launched, but the request failed because the route needed Responses compatibility.
That is not a reason to discard the setup. It is a reason to test each agent/provider pair instead of assuming every OpenAI-compatible route behaves identically.
Related reading
Bifrost GitHub repository:
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 23+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
Bifrost CLI article:
Bifrost MCP Gateway and Code Mode article:
Official Bifrost setup article on Medium:
Final status
Here is the final state of the test:
- [x] Install Bifrost locally with NPX
- [x] Install Bifrost locally with Docker
- [x] Open the dashboard and confirm the UI works
- [x] Configure at least one provider (Ollama)
- [ ] Configure OpenAI provider request blocked by missing secret
- [x] Start Bifrost CLI
- [x] Confirm Bifrost CLI asks for the right config
- [x] Confirm Bifrost CLI config storage
- [x] Launch Codex CLI through Bifrost CLI
- [x] Record the Codex limitation with this route
- [x] Launch one working coding agent through Bifrost with OpenCode
- [x] Add at least one MCP server
- [x] Verify the tool surface
- [x] Verify classic MCP tool execution
- [x] Enable Code Mode
- [x] Verify the Code Mode meta-tools
- [x] Run the same small MCP task with and without Code Mode
- [x] Verify local Ollama routing through Bifrost
- [x] Verify Ollama cloud routing through Bifrost
- [ ] Check provider selection and fallback path blocked by missing OpenAI key
- [x] Capture screenshots of the dashboard and terminal results
- [x] Record exact errors, fixes, and commands
- [out of scope] Replace vendor benchmark claims with my own benchmark for a one-server test
This is still not a benchmark. It is a tested setup walkthrough and an initial MCP + Code Mode proof. I do not want to turn a one-server smoke test into a fake performance claim.
Final thought
I am not testing Bifrost because I need another AI tool.
I am testing it because the wiring is getting messy.
Every agent wants a config file. Every provider wants a key. Every MCP server adds tools. Every tool adds context.
Bifrost does not solve all of that, but it does give you one place to inspect and control it.
In this setup, the core pieces worked:
- the gateway installed cleanly
- the dashboard gave me visibility
- Ollama routed through it
- MCP tools were governable
- Code Mode exposed the expected meta-tools
- OpenCode completed a request through the gateway
The limits are also clear:
- OpenAI routing still needs a real
OPENAI_API_KEY - Codex CLI launched, but the request path needs Responses compatibility
- the Code Mode savings claim still needs a larger multi-server test
My takeaway is practical: Bifrost can be a useful control layer for local agent experiments, but the value depends on provider wiring, MCP scoping, and agent-specific testing.












Top comments (0)