I have been spending a lot of time around AI coding tools lately: VS Code, Copilot, Claude Code, Codex-style workflows, local agents, and MCP servers. The more tools you connect, the more you start running into the same problem.
The agent is not just reading your prompt anymore.
It is reading tool definitions, server descriptions, schemas, API shapes, permissions, and all the extra context that comes with giving an AI system access to real tools.
That is where Bifrost caught my attention.
Bifrost is an open-source AI gateway from Maxim AI. The basic idea is that instead of wiring every model provider and every tool directly into every agent, you put Bifrost in the middle as the control layer.
The purpose of this test was simple: install the gateway, route real model traffic through it, connect an MCP server, turn on Code Mode, and see whether a coding agent can actually work through that setup.
The governance part matters just as much as the routing part. Once an agent can call tools, read files, or reach internal APIs, I want a way to answer basic questions:
- Which key is this agent using?
- Which model did it call?
- Which tools can it reach?
- What did the run cost?
- Where do I inspect the logs when something gets weird?
Repository:
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
Official setup docs:
Open the official Bifrost gateway setup guide.
The problem I am looking at
MCP is useful because it gives AI agents a standard way to use tools. That can mean filesystem access, search, database calls, internal APIs, browser tools, or anything else exposed through an MCP server.
That sounds great until the tool list gets large.
In classic MCP usage, the model can end up receiving a large set of tool definitions in its context. If you connect a few servers, that might not matter. If you connect a lot of servers, the model starts spending tokens just reading what tools exist before it even gets to the actual task.
That is the part I care about.
I do not want my coding agent wasting context on every possible tool every time. I want it to discover what it needs, use that, and keep the context smaller.
Bifrost Code Mode is designed for that exact issue.
Getting started with Bifrost CLI
Bifrost CLI is the launcher layer for coding agents. Instead of manually wiring environment variables, provider URLs, model names, and MCP config into every tool, the CLI walks you through the setup in a terminal UI.
The official Bifrost CLI docs describe it as a way to launch supported coding agents through the gateway with automatic configuration, model selection, MCP integration, and no manually exported environment variables.
Bifrost CLI page:
Open the Bifrost CLI overview.
Bifrost CLI with Codex CLI:
The quick version is this:
npx -y @maximhq/bifrost
That starts the Bifrost gateway.
Then in another terminal, assuming Node.js 18+ and npm are available:
npx -y @maximhq/bifrost-cli
That starts the interactive CLI.
From there, the CLI asks for things like:
- gateway URL
- virtual key, if you are using one
- coding agent or harness
- model selection
- launch settings, including worktree mode where the selected harness supports it
The current docs show these supported coding-agent harnesses:
- Claude Code
- Codex CLI
- Gemini CLI
- Opencode
The useful part is that the CLI owns the provider-specific wiring. It can fetch available models from the gateway, offer to install missing agents through npm, launch agents inside a persistent tabbed terminal UI, and remember the last gateway, harness, and model selection.
The CLI does create its own state. In my run, that state lived under ~/.bifrost/. The practical point is that I did not have to hand-write each agent's base URL, API key, and model config before launching. Virtual keys are handled separately through the OS keyring instead of being stored as plaintext in the CLI config.
That matters if you move between agents. I can test Codex CLI, then OpenCode, then another harness without rebuilding the whole provider setup from scratch each time.
Installation and integration guide
Deploy the gateway
The simplest local setup is NPX:
npx -y @maximhq/bifrost
If you prefer Docker:
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
That maps the gateway to port 8080 and stores Bifrost data in a local data folder so your config does not disappear every time the container stops.
After it starts, the local dashboard should be available here:
http://localhost:8080
The gateway gives you the web UI, real-time monitoring, provider setup, logs, analytics, and governance controls. That is the part that turns this from "a model URL in a config file" into an inspectable control plane.
I checked the gateway first:
curl http://localhost:8080
If the gateway is alive, then I can move on to provider setup, MCP setup, and CLI testing.
First API request through Bifrost
Once a provider is configured, a basic OpenAI-compatible request looks like this:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Hello, Bifrost!"
}
]
}'
That example is not meant to run inside DEV. It is meant to run from your own terminal after Bifrost is running locally.
I am intentionally not using a RunKit embed here because RunKit cannot talk to localhost:8080 on your machine. A regular terminal command is the honest example.
Add MCP clients and enable Code Mode
The MCP path is:
Dashboard → MCP Gateway → add client → choose STDIO, HTTP, or SSE → configure command or URL → choose tool permissions
For this test, I used a STDIO filesystem MCP server and scoped it to the donotpush draft folder. That matters because the gateway should not be treated as a reason to give every agent access to the entire machine.
Code Mode can be enabled per MCP client from the dashboard or through the API. Once it is enabled, that client's direct tools are no longer the first thing the model sees. The model sees the Code Mode meta-tools first, then reads the specific stub file and docs it needs.
Integrate it with a project
The project flow I would use again is:
Start Bifrost
Configure one provider
Add the MCP server for the project
Launch the coding agent through Bifrost CLI
Run the smallest possible task first
Check the dashboard logs before expanding permissions
For VS Code work, I would keep the gateway running in a separate terminal, launch the agent from the project folder, and scope MCP servers to the current repo or a git worktree. If virtual keys are enabled, I would issue separate keys for separate projects or demos so usage, budgets, and permissions do not get mixed together.
Where Code Mode fits in
Bifrost Code Mode is the part I am most interested in.
Code Mode changes how the model interacts with MCP tools. Instead of showing the model every tool definition from every connected MCP server, it exposes a smaller set of meta-tools. The model can inspect available tool files, read the specific tool signatures it needs, get docs for a tool, and then execute code against the tool bindings.
Code Mode docs:
Open the official Code Mode documentation.
Bifrost's own docs describe four core Code Mode tools:
listToolFiles
readToolFile
getToolDocs
executeToolCode
The important idea is simple:
Classic MCP can make the model carry a large tool catalog.
Code Mode lets the model discover and use only what it needs.
Why Code Mode exists
The problem is context bloat. In a small MCP setup, loading every available tool definition may be acceptable. In a larger setup, every model request can carry schemas for tools the model will not use.
Bifrost's docs use the example of five MCP servers with around 100 tools. In the classic flow, the same large tool catalog can be present across multiple turns. In Code Mode, the model sees the four meta-tools, asks for the relevant .pyi stub file, optionally asks for detailed docs, and then runs a short script through executeToolCode.
That is also why Code Mode uses code instead of a long chain of individual tool calls. The model can orchestrate the workflow in a sandboxed Starlark interpreter, and the intermediate tool results do not all have to be pushed back through the model.
Performance and cost claims
Bifrost's docs claim that, for a workflow with about 100 tools across five MCP servers, Code Mode changes the shape from roughly six LLM turns with the full tool catalog repeatedly present to about three or four turns with about 50 tool-definition tokens. Their docs summarize that example as roughly 50% cost reduction and fewer LLM round trips.
Their MCP Gateway benchmark goes further at larger scale. In Maxim's published results, Code Mode reduced estimated cost by 55.7% with 96 tools, 83.4% with 251 tools, and 92.2% with 508 tools.
Those are vendor numbers. I am including them because they explain why this feature is worth testing, not because my one-server filesystem test reproduced them.
When I would enable it
My rule of thumb after this pass is close to the docs:
- use classic MCP for one or two small servers with simple direct tool calls
- enable Code Mode when there are three or more MCP servers
- enable Code Mode for heavier servers like filesystem, search, docs, databases, CRM, or internal APIs
- use tool-level binding when one server has a large or complicated tool surface
- keep the allowed tool list tight either way
How it works
Code Mode exposes MCP tools as virtual .pyi stub files. There are two binding levels:
- server-level binding: one stub file per MCP server
- tool-level binding: one stub file per tool
Server-level binding is simpler when a server has a small tool set. Tool-level binding is better when a server has a lot of tools or large schemas.
The code runs in Starlark, which is a deterministic Python-like runtime. Bifrost's docs describe it as intentionally constrained: no imports, no file I/O, no network access, just allowed tool calls and basic Python-like logic. That is the safety property that makes Code Mode interesting for agent workflows.
Code Mode is more like giving them a small index first. They look up the exact tool they need, read only that tool's instructions, then run the task. That matters when the number of tools grows."Classic MCP vs Code Mode in plain English"
Classic MCP is like handing someone a giant binder full of every tool manual before every task.
Why that matters for coding agents
Coding agents already use a lot of context.
They read files, inspect diffs, look at errors, review stack traces, and sometimes pull in project history. If you also make them carry every connected MCP tool definition on every request, the context gets noisy fast.
That can hurt:
- cost
- latency
- tool selection
- reliability
- debugging
For small setups, this might not matter. If you only have one or two MCP servers with a few tools, classic MCP can be fine.
But once you start connecting heavier tools, Code Mode makes more sense.
That is why I tested it with a real coding workflow instead of just repeating the docs.
The workflow I tested
My test setup was based around the tools I actually use:
- VS Code
- Bifrost gateway
- Bifrost CLI
- Codex CLI and OpenCode
- one filesystem MCP server
- a real local project
The basic flow looked like this:
Start Bifrost gateway
↓
Open Bifrost dashboard
↓
Configure provider
↓
Start Bifrost CLI
↓
Launch coding agent through Bifrost
↓
Run a small repo task
↓
Compare classic MCP flow vs Code Mode flow
The task was intentionally small. I did not want a giant fake benchmark. I wanted enough proof to answer the practical question:
Can I route a coding agent through Bifrost, expose MCP tools, turn on Code Mode, and see the control layer working?
That is the first real step toward a working MCP/code-mode workflow.
The small agent task
For the coding-agent check, I used a deliberately boring prompt:
Do not edit files. Reply with exactly: opencode through bifrost ok
That does not prove the agent can solve every repo task through Bifrost. It proves something narrower and more useful: the agent can start, use the configured Bifrost provider, call the model through the gateway, and return a response.
Then I compared the MCP side separately:
Classic MCP:
- Did the model see the direct filesystem tools?
- Did it produce a normal MCP tool call?
- Could I execute that tool call through Bifrost?
Code Mode:
- Could I enable Code Mode on the MCP client?
- Did the four meta-tools work?
- Could executeToolCode call the underlying filesystem server?
What I am not claiming
This is not a benchmark.
Bifrost has published claims around reducing token usage with Code Mode, especially as MCP tool counts grow. I am not replacing those claims with my own numbers yet.
My test only used one filesystem MCP server. With one server, there is not enough tool sprawl to make a serious token-savings claim. The important result here is that the components connected:
- gateway
- provider route
- local Ollama
- Ollama cloud
- MCP server
- Code Mode meta-tools
- coding agent
That is the difference between "I read the docs" and "I actually ran it."
What I measured
I also have Ollama installed locally, with both local models and Ollama cloud variants available. That let me test:
- local Ollama model routing
- Ollama cloud model routing
- Bifrost provider behavior
- MCP tool discovery
- Code Mode meta-tool behavior
- one coding-agent launch through Bifrost
Here is the measurement checklist after the hands-on pass. I am treating each item as resolved one of three ways: done, blocked, or out of scope for this first pass.
Install/setup:
- [x] NPX start cleanly
- [x] Docker start cleanly
- [x] Dashboard available at localhost:8080
- [x] Provider setup persists
- [x] OpenAI config bootstrap tested
- [ ] OpenAI provider request blocked by missing OPENAI_API_KEY
CLI:
- [x] Bifrost CLI detects the gateway
- [x] Bifrost CLI asks for the expected config
- [x] Bifrost CLI shows Codex CLI as installed
- [x] Bifrost CLI lets me set the gateway URL and model
- [x] Config is stored in ~/.bifrost/config.json and ~/.bifrost/state.json
- [x] Bifrost CLI launches Codex CLI
- [x] Bifrost CLI prints the MCP server URL for Codex CLI
- [x] Codex CLI launch works
- [ ] Codex CLI request blocked because current Codex expects Responses API for this route
MCP:
- [x] Add at least one MCP server
- [x] Gateway can see the tools
- [x] Model can produce a classic MCP tool call
- [x] Classic MCP tool call works
- [x] Enable Code Mode
- [x] Code Mode exposes the four meta-tools
- [x] Run the same "list allowed directories" task with classic MCP and Code Mode
- [x] Route local Ollama models through Bifrost
- [x] Route Ollama cloud models through Bifrost
Agent workflow:
- [x] Complete a minimal coding-agent request through Bifrost with OpenCode
- [x] Keep the response understandable
- [x] Confirm tool-governed filesystem access through MCP
- [ ] OpenAI-vs-Ollama provider fallback blocked by missing OpenAI key
- [out of scope] Prove token savings with a large multi-server setup
Cost/context:
- [out of scope] Fewer tokens used in a realistic multi-server setup
- [x] Latency is acceptable for small local tests
- [x] Workflow is inspectable from one gateway dashboard
This is the stuff that matters to me more than a marketing number.
At this point, the unresolved items are not hidden: OpenAI fallback needs an OpenAI key, Codex needs a Responses-compatible route for this specific provider path, and a real token-savings benchmark needs a larger multi-server setup.
Hands-on results
I started with the local gateway and confirmed it is alive on http://127.0.0.1:8080.
The gateway shell is up, and /workspace/dashboard returns the Bifrost dashboard.
The dashboard matters because it gives the run a second source of truth. I can see request volume, token usage, model usage, latency, cache state, and cost in one place instead of piecing the story together only from terminal output.
The same run, condensed into the panels I actually cared about:
The Bifrost CLI resolved the default harness as Codex CLI. I changed the model selection to openai-ollama2/gpt-oss:20b-cloud after confirming that the custom provider could route Ollama cloud requests.
On disk I found the expected Bifrost state:
-
~/.bifrost/config.jsonis present and points athttp://localhost:8080 -
~/.bifrost/state.jsonstores the selected harness and model -
./bifrost-data/config.dbexists and contains the usual config tables
I also checked the repo .env file. It contains app-specific secrets for this blog project, but it does not contain an OpenAI provider key for Bifrost.
I did one more concrete provider bootstrap test:
- created
bifrost-temp/config.jsonwithproviders.openai.keys[0].value = "env.OPENAI_API_KEY" - launched a fresh Bifrost instance on
http://127.0.0.1:8081 - the gateway started successfully and wrote the config store state to
./config.db - the provider row exists, but Bifrost reports "no valid keys found for provider: openai" because
OPENAI_API_KEYis not set in the shell
That means the gateway and config-file-based OpenAI bootstrapping work, but the OpenAI provider route is blocked by a missing secret.
Then I moved on to Ollama.
- started
ollama servelocally onhttp://127.0.0.1:11434 - confirmed the Ollama API responds with
gemma3:1b,qwen3-coder:30b,tinyllama:latest,mistral:latest, and other models - configured a custom Bifrost provider for Ollama using the documented
/api/providersAPI - discovered that a key with an empty
modelslist is the right way to allow all Ollama models through Bifrost - successfully sent a request through Bifrost to
openai-ollama2/gemma3:1band received a valid completion from local Ollama - successfully sent a request through Bifrost to
openai-ollama2/gpt-oss:20b-cloudand received a valid completion from Ollama cloud
The key config looked like this:
{
"provider": "openai-ollama2",
"keys": [
{
"name": "openai-ollama2-key",
"value": "dummy",
"models": [],
"weight": 1.0
}
],
"network_config": {
"base_url": "http://127.0.0.1:11434",
"default_request_timeout_in_seconds": 60
},
"custom_provider_config": {
"base_provider_type": "openai",
"allowed_requests": {
"chat_completion": true,
"chat_completion_stream": true
}
}
}
That means the gateway can route both a local Ollama model and an Ollama cloud model.
The request volume panel captured the real setup process: both successful checks and the failures that surfaced while I was tuning the route.
Token usage is the panel I care about most for Code Mode testing. This first run was not a benchmark, but it gave me a baseline for what the gateway observed.
Model usage confirmed that requests were not just hitting "some model somewhere." Bifrost could show which routed models were actually being used.
Latency was uneven, which makes sense for a mixed local/cloud setup. That is exactly why having the gateway view is useful.
Cache was effectively unused in this small test, which is fine. I would rather see a boring 0.0% than pretend caching was involved when it was not.
I also verified Docker is installed and now reachable. I successfully launched Bifrost in Docker on http://127.0.0.1:8082 using docker run -p 8082:8080 -v $(pwd)/bifrost-temp:/app/data maximhq/bifrost, and the gateway responded correctly.
That is the practical shape of this setup: the gateway can run, the dashboard works, Ollama can route through it, and provider wiring still matters.
Classic MCP versus Code Mode
For MCP, I added a filesystem server pointed only at the draft folder:
curl -X POST http://127.0.0.1:8080/api/mcp/client \
-H "Content-Type: application/json" \
-d '{
"name": "filesystem_blog",
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
],
"envs": ["HOME", "PATH"]
},
"tools_to_execute": ["*"],
"is_ping_available": false
}'
Bifrost connected it and discovered 14 tools.
The classic MCP test worked. The model produced this tool call:
filesystem_blog-list_allowed_directories
Then I executed the tool through Bifrost:
curl -X POST http://127.0.0.1:8080/v1/mcp/tool/execute \
-H "Content-Type: application/json" \
-d '{
"id": "call_pjlop9a3",
"type": "function",
"function": {
"name": "filesystem_blog-list_allowed_directories",
"arguments": "{}"
}
}'
Result:
Allowed directories:
/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush
That is the governance point in miniature. The server was not pointed at my whole machine. It was pointed at one draft folder.
Then I enabled Code Mode on that MCP client:
curl -X PUT http://127.0.0.1:8080/api/mcp/client/f771c023-8a16-4b34-b03f-ffbfffd34e4b \
-H "Content-Type: application/json" \
-d '{
"name": "filesystem_blog",
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
],
"envs": ["HOME", "PATH"]
},
"tools_to_execute": ["*"],
"is_ping_available": false,
"is_code_mode_client": true
}'
After that, the four Code Mode meta-tools worked.
listToolFiles returned:
servers/
filesystem_blog.pyi
readToolFile returned a compact stub file with signatures like:
def read_text_file(path: str, head: float = None, tail: float = None) -> dict
def list_allowed_directories() -> dict
def list_directory(path: str) -> dict
getToolDocs returned detailed docs for a specific tool.
executeToolCode successfully called the filesystem MCP server from inside the Code Mode execution layer:
Execution completed successfully.
Return value: "Allowed directories:\n/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
That matches the mental model from the docs: the model can inspect a small tool-file surface first, then use code to call the specific tool binding it needs.
The task was the same in both modes:
List the allowed filesystem directory.
Classic MCP exposed the filesystem tools directly and produced a direct tool call:
filesystem_blog-list_allowed_directories
Code Mode exposed the smaller meta-tool surface first:
listToolFiles
readToolFile
getToolDocs
executeToolCode
On this tiny setup, I am not claiming Code Mode saved money. The useful result is simpler: I verified the behavior difference. Classic MCP presented the actual filesystem tools. Code Mode presented meta-tools, let me inspect the virtual .pyi file, and then executed code against the underlying filesystem binding.
Coding agent results
This is where the test got more interesting.
Bifrost CLI launched Codex CLI and showed the selected model:
Harness Codex CLI (codex-cli 0.125.0)
Model openai-ollama2/gpt-oss:20b-cloud
It also printed this:
MCP: Codex CLI has no native auto-attach yet. Use server URL: http://localhost:8080/mcp
So the CLI launch path works.
But my actual Codex request did not complete against this provider route. The current Codex non-interactive path wanted the Responses API, while this Ollama route through Bifrost was working through chat completions. With a temporary CODEX_HOME, Codex reported:
Error loading config.toml: `wire_api = "chat"` is no longer supported.
How to fix: set `wire_api = "responses"` in your provider config.
That is useful, not a failure of the whole test. It tells me the Codex path needs either a Responses-compatible provider route or a different model/provider setup.
For the real coding-agent request, I used OpenCode because it supports an OpenAI-compatible chat-completions provider cleanly.
Temporary OpenCode config:
{
"$schema": "https://opencode.ai/config.json",
"model": "bifrost/openai-ollama2/gpt-oss:20b-cloud",
"provider": {
"bifrost": {
"npm": "@ai-sdk/openai-compatible",
"name": "Bifrost",
"options": {
"baseURL": "http://127.0.0.1:8080/v1",
"apiKey": "dummy"
},
"models": {
"openai-ollama2/gpt-oss:20b-cloud": {
"name": "Ollama cloud through Bifrost",
"limit": {
"context": 32768,
"output": 4096
}
}
}
}
}
}
Then I ran:
OPENCODE_CONFIG=/tmp/bifrost-opencode.json opencode run \
--dir /Users/bradleymatera/Desktop/gatsby-starter-minimal-blog \
--format json \
--model bifrost/openai-ollama2/gpt-oss:20b-cloud \
'Do not edit files. Reply with exactly: opencode through bifrost ok'
Result:
opencode through bifrost ok
The returned usage was:
17154 total tokens
17100 input tokens
54 output tokens
That is the first real coding-agent request through Bifrost in this test.
The terminal evidence for the provider routes, MCP connection, Code Mode tools, and OpenCode run looked like this:
Governance is another reason this matters
The other part of Bifrost that interests me is governance.
When an AI agent can call tools, there needs to be a control layer. I do not want every agent to have every permission by default.
A gateway can help with that. The dashboard makes that control-plane idea visible instead of leaving it buried across random config files:
Bifrost supports ideas like:
- virtual keys
- budget control
- provider routing
- MCP tool governance
- audit logs
- cost tracking
The MCP gateway resource goes deeper into this:
This is not just enterprise buzzword stuff. Even for solo projects, it is useful to know which key is being used, what model is being called, and what tools the agent can reach.
If I am running multiple demos or client-style experiments, I want the ability to separate keys, track usage, and avoid accidental spending.
The governance surface is right there: virtual keys, users, teams, roles, permissions, audit logs, and guardrails.
Bifrost as the control plane
The way I think about it is:
Without Bifrost:
Agent → Provider
Agent → MCP server
Agent → Another provider
Agent → Another tool config
Agent → Another local config file
With Bifrost:
Agent → Bifrost → Providers
→ MCP tools
→ routing
→ governance
→ logs
→ cost tracking
That second layout is easier to reason about.
It does not magically make agents perfect, but it gives you one place to inspect and control the system.
Benefits and limitations
The benefits that showed up clearly in this pass:
- the gateway is easy to start with NPX or Docker
- the dashboard makes model traffic, errors, latency, token usage, and cost visible
- Bifrost CLI reduces per-agent setup work
- the CLI can switch between supported coding agents without rebuilding every config by hand
- virtual keys, tool groups, audit logs, budget controls, and cost tracking give agent workflows a governance layer
- Code Mode changes the MCP tool surface from direct tool injection to a smaller discovery-and-execution flow
The main limitation is that Code Mode is not something I would turn on blindly and call done. The docs currently call out Code Mode as available in v1.4.0-prerelease1 and above, while my local dashboard showed a newer v1.4.24 build. Either way, I would still verify the exact version and behavior in the environment where it will run.
Classic MCP can also be the better choice for very small setups. If the agent only needs one or two simple servers, the extra Code Mode discovery step may not matter. Code Mode becomes more compelling as the number of MCP servers, tools, and multi-step workflows grows.
The other limitation in my test was provider-specific. OpenCode worked through my Bifrost + Ollama route. Codex CLI launched through Bifrost CLI, but my actual Codex request hit a Responses API expectation that did not match the custom Ollama chat-completions route I had configured. That is not a reason to throw away the setup. It is a reminder to test each agent/provider pair instead of assuming every OpenAI-compatible route behaves the same way.
Related reading
Bifrost GitHub repository:
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
Bifrost CLI article:
Bifrost MCP Gateway and Code Mode article:
Official Bifrost setup article on Medium:
Final status
Here is the final checklist:
- [x] Install Bifrost locally with NPX
- [x] Install Bifrost locally with Docker
- [x] Open the dashboard and confirm the UI works
- [x] Configure at least one provider (Ollama)
- [ ] Configure OpenAI provider request blocked by missing secret
- [x] Start Bifrost CLI
- [x] Confirm Bifrost CLI asks for the right config
- [x] Confirm Bifrost CLI config storage
- [x] Launch Codex CLI through Bifrost CLI
- [x] Record the Codex limitation with this route
- [x] Launch one working coding agent through Bifrost with OpenCode
- [x] Add at least one MCP server
- [x] Verify the tool surface
- [x] Verify classic MCP tool execution
- [x] Enable Code Mode
- [x] Verify the Code Mode meta-tools
- [x] Run the same small MCP task with and without Code Mode
- [x] Verify local Ollama routing through Bifrost
- [x] Verify Ollama cloud routing through Bifrost
- [ ] Check provider selection and fallback path blocked by missing OpenAI key
- [x] Capture screenshots of the dashboard and terminal results
- [x] Record exact errors, fixes, and commands
- [out of scope] Replace vendor benchmark claims with my own benchmark for a one-server test
The last item matters. This is still not a benchmark. It is a tested setup walkthrough and a first working MCP/code-mode pass. I do not want to turn a one-server smoke test into a fake performance claim.
Final thought
The reason Bifrost interests me is not because I need another shiny AI tool. I already have enough of those.
The reason it interests me is because AI coding workflows are getting messy.
Every agent wants a config file. Every provider wants a key. Every MCP server adds tools. Every tool adds context. Every context increase adds cost and noise.
Bifrost looks like one possible way to bring that under control.
The real test was not whether the homepage sounded good. The real test was whether I could install it, connect it to my normal workflow, and see the control layer working on an actual project.
That part worked.
The next articles I would actually want to read are the deeper ones: clustering, custom plugins, LangChain-style integrations, and a real multi-server Code Mode benchmark with enough tools attached for the token savings claim to matter.
For now, the takeaway is narrower and more useful: Bifrost installed cleanly, the dashboard gave me visibility, Ollama routed through it, MCP tools were governable, Code Mode exposed the expected meta-tools, and at least one coding agent completed a request through the gateway.












Top comments (0)