DEV Community

Cover image for Testing Bifrost CLI and Code Mode: What Worked, What Broke, and What I Verified
Bradley Matera
Bradley Matera

Posted on • Edited on

Testing Bifrost CLI and Code Mode: What Worked, What Broke, and What I Verified

I spend a lot of time wiring AI coding tools together: VS Code, Copilot, Claude Code, Codex-style flows, local agents, and MCP servers.

The problem is not the model anymore. It is the plumbing.

Every new provider needs a key. Every MCP server adds another tool catalog. Every tool adds schema, API shapes, and prompt context.

That is where Bifrost caught my attention.

Bifrost is an open-source gateway from Maxim AI. It is not meant to replace models or agents. It is meant to sit between them and make the whole system easier to inspect and control.

This test had a simple goal: start the gateway, route real model traffic through it, attach a filesystem MCP server, enable Code Mode, and see whether a coding agent could actually work through that stack.

I also wanted practical answers:

  • Which key did this agent use?
  • Which model did it call?
  • Which tools could it reach?
  • What did the run cost?
  • Where could I inspect the logs?

That is the thread through this post. It is not a marketing summary. It is a field test of the local gateway, the CLI, provider routing, MCP setup, Code Mode, and a real coding-agent launch.

Repository:

GitHub logo maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

Go Report Card Discord badge codecov Docker Pulls Run In Postman Artifact Hub License

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 23+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Get started

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

That's it! Your AI gateway is running with a web interface for visual configuration…

Official setup docs:

Open the official Bifrost gateway setup guide.

What I was actually testing

MCP is useful because it gives AI agents a standard way to call tools. Those tools can be filesystem access, search, databases, internal APIs, browser automation, or anything else exposed through an MCP server.

That sounds good until your tool list gets large.

Classic MCP can put a big tool catalog into the model's context. With a few servers that is okay. With many servers, the model spends tokens just reading what tools exist.

That is the problem I want to avoid.

I do not want my coding agent spending context on every possible tool every time. I want it to discover what it needs and keep the prompt smaller.

Bifrost's Code Mode is designed for that issue.

So the question was not "does Bifrost look useful?" The question was:

Can I install it, wire it into a real project, and see the tool-control layer work end to end?

Why the Bifrost CLI matters

The gateway routes providers. The CLI is what makes the setup usable for coding agents.

I wanted to avoid hand-editing agent config every time I switched models or providers.

The workflow I tested was:

  • launch the gateway
  • run bifrost-cli
  • pick a harness
  • pick a model
  • launch the agent

The CLI stores state under ~/.bifrost/, including gateway URL, selected model, and harness.

That matters because it keeps the gateway and agent shared in one place instead of every agent maintaining a separate, brittle config.

Supported harnesses in the docs included:

  • Claude Code
  • Codex CLI
  • Gemini CLI
  • Opencode

Installation and integration

I tested both ways to run the gateway.

Local NPX

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Docker

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

That gives you a dashboard at:

http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

I always start there. If the dashboard does not respond, nothing else matters.

First smoke test

curl http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Once the gateway was up, the next step was provider setup, MCP setup, and CLI testing.

First model request

A basic OpenAI-compatible request looks like this:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "Hello, Bifrost!" }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

I did not use RunKit here because RunKit cannot reach localhost:8080.

Add an MCP client and enable Code Mode

The Bifrost path is:

Dashboard → MCP Gateway → add client → choose STDIO, HTTP, or SSE → configure command or URL → choose tool permissions
Enter fullscreen mode Exit fullscreen mode

For this test, I used a STDIO filesystem MCP server scoped to the donotpush draft folder.

That scope matters. A gateway should not become an excuse to give every agent access to the whole machine.

Once the MCP client exists, Code Mode can be enabled for that client. When Code Mode is on, the model does not get the full tool catalog directly. It gets four meta-tools first, then inspects and executes only what it needs.

Where Code Mode fits in

This was the part I cared about most.

Classic MCP can work fine. It can also force the model to carry a lot of extra tool metadata.

Code Mode is meant to change that.

Bifrost exposes these four meta-tools:

listToolFiles
readToolFile
getToolDocs
executeToolCode
Enter fullscreen mode Exit fullscreen mode

That means the model can ask:

  • what tool files are available?
  • what is in this stub?
  • what are the docs for this tool?
  • execute this code against the tool binding

The idea is to give the model an index instead of a giant binder.

Why Code Mode exists

The problem is context bloat. In a small MCP setup, every tool definition may be fine. In a larger setup, it becomes waste.

Bifrost's docs use an example with five MCP servers and around 100 tools. In the classic flow, the model carries that catalog across multiple turns. In Code Mode, it carries only four meta-tools and then loads the specific stubs it needs.

This is why Code Mode uses code instead of a long chain of individual tool calls. The model can orchestrate the workflow in a constrained Starlark interpreter, and intermediate results do not have to get pushed back through the model.

Performance and cost claims

Bifrost's docs claim that for about 100 tools across five MCP servers, Code Mode can cut the interaction shape from six LLM turns with a full catalog to three or four turns with about 50 tool-definition tokens.

Their published benchmark goes further: 55.7% lower estimated cost with 96 tools, 83.4% with 251 tools, and 92.2% with 508 tools.

Those are vendor claims. I am including them here because they explain why the feature is worth testing, not because my one-server filesystem test reproduced them.

When I would enable it

My rule of thumb after this pass:

  • keep classic MCP for one or two small servers
  • enable Code Mode when you have three or more MCP servers
  • enable it for heavier servers like filesystem, search, docs, databases, CRM, or internal APIs
  • use tool-level binding when one server has many tools or big schemas
  • keep the allowed tool list tight either way

How it works

Code Mode exposes MCP tools as virtual .pyi stub files. There are two binding levels:

  • server-level binding: one stub file per MCP server
  • tool-level binding: one stub file per tool

Server-level binding is simpler when a server has a small tool set. Tool-level binding is better when a server has many tools or large schemas.

The code runs in Starlark, a deterministic Python-like runtime. Bifrost's docs describe it as intentionally constrained: no imports, no file I/O, no network access, just allowed tool calls and basic Python-like logic. That constraint is what makes Code Mode interesting for agent workflows.

"Classic MCP vs Code Mode in plain English"
Classic MCP is like handing someone a giant binder full of every tool manual before every task.

Code Mode is more like giving them a small index first. They look up the exact tool they need, read only that tool's instructions, then run the task.

That matters when the number of tools grows.

Why that matters for coding agents

Coding agents already use a lot of context.

They read files, inspect diffs, look at errors, review stack traces, and sometimes pull in project history. If they also carry every connected MCP tool definition on every request, the context gets noisy fast.

That can hurt:

  • cost
  • latency
  • tool selection
  • reliability
  • debugging

For small setups, this may not matter. If you only have one or two MCP servers with a few tools, classic MCP can be fine.

But once you start connecting heavier tools, Code Mode makes more sense.

That is why I tested the flow directly instead of only repeating the docs.

What I tested

My test setup used the tools I actually work with:

  • VS Code
  • Bifrost gateway
  • Bifrost CLI
  • Codex CLI and OpenCode
  • one filesystem MCP server
  • a real local project

The test needed to answer one practical question: can the gateway sit between the agent, the provider, and the MCP server without turning the workflow into a black box?

The flow looked like this:

Start Bifrost gateway
↓
Open Bifrost dashboard
↓
Configure provider
↓
Start Bifrost CLI
↓
Launch coding agent through Bifrost
↓
Run a small repo task
↓
Compare classic MCP flow vs Code Mode flow
Enter fullscreen mode Exit fullscreen mode

The task was intentionally small. I did not want a fake benchmark. I wanted enough proof to answer this:

Can I route a coding agent through Bifrost, expose MCP tools, turn on Code Mode, and see the control layer working?
Enter fullscreen mode Exit fullscreen mode

Everything after this point is evidence for that question.

The small agent task

For the coding-agent check, I used a deliberately small prompt:

Do not edit files. Reply with exactly: opencode through bifrost ok
Enter fullscreen mode Exit fullscreen mode

That prompt is boring on purpose. It does not prove the agent can solve every repo task. It proves the path: agent starts, uses the configured Bifrost provider, calls the model through the gateway, and returns a response.

Then I split the MCP behavior into two questions:

Classic MCP:
- Did the model see the direct filesystem tools?
- Did it produce a normal MCP tool call?
- Could I execute that tool call through Bifrost?

Code Mode:
- Could I enable Code Mode on the MCP client?
- Did the four meta-tools work?
- Could executeToolCode call the underlying filesystem server?
Enter fullscreen mode Exit fullscreen mode

What I am not claiming

This is not a benchmark.

Bifrost publishes token-savings numbers for Code Mode. My test did not try to reproduce those claims.

I only used one filesystem MCP server. With one server, there is not enough tool sprawl to make a serious token-savings claim. The useful result is narrower:

  • gateway
  • provider route
  • local Ollama
  • Ollama cloud
  • MCP server
  • Code Mode meta-tools
  • coding agent

That is the difference between "I read the docs" and "I ran the workflow."

What I measured

I also have Ollama installed locally, with both local models and Ollama cloud variants available. That let me test:

  • local Ollama model routing
  • Ollama cloud model routing
  • Bifrost provider behavior
  • MCP tool discovery
  • Code Mode meta-tool behavior
  • one coding-agent launch through Bifrost

Here is the checklist from the hands-on pass:

Install/setup:
- [x] NPX start cleanly
- [x] Docker start cleanly
- [x] Dashboard available at localhost:8080
- [x] Provider setup persists
- [x] OpenAI config bootstrap tested
- [ ] OpenAI provider request blocked by missing OPENAI_API_KEY

CLI:
- [x] Bifrost CLI detects the gateway
- [x] Bifrost CLI asks for the expected config
- [x] Bifrost CLI shows Codex CLI as installed
- [x] Bifrost CLI lets me set the gateway URL and model
- [x] Config is stored in ~/.bifrost/config.json and ~/.bifrost/state.json
- [x] Bifrost CLI launches Codex CLI
- [x] Bifrost CLI prints the MCP server URL for Codex CLI
- [x] Codex CLI launch works
- [ ] Codex CLI request blocked because current Codex expects Responses API for this route

MCP:
- [x] Add at least one MCP server
- [x] Gateway can see the tools
- [x] Model can produce a classic MCP tool call
- [x] Classic MCP tool call works
- [x] Enable Code Mode
- [x] Code Mode exposes the four meta-tools
- [x] Run the same "list allowed directories" task with classic MCP and Code Mode
- [x] Route local Ollama models through Bifrost
- [x] Route Ollama cloud models through Bifrost

Agent workflow:
- [x] Complete a minimal coding-agent request through Bifrost with OpenCode
- [x] Keep the response understandable
- [x] Confirm tool-governed filesystem access through MCP
- [ ] OpenAI-vs-Ollama provider fallback blocked by missing OpenAI key
- [out of scope] Prove token savings with a large multi-server setup

Cost/context:
- [out of scope] Fewer tokens used in a realistic multi-server setup
- [x] Latency is acceptable for small local tests
- [x] Workflow is inspectable from one gateway dashboard
Enter fullscreen mode Exit fullscreen mode

This is the evidence I care about more than a marketing number: what launched, what routed, what exposed tools, what returned output, and what failed.

The unresolved items were clear: OpenAI fallback needs a real key, Codex needs a Responses-compatible route, and a larger multi-server setup is required for any real token-savings claim.

Hands-on results

The first thing I verified was the gateway.

I started the local gateway and confirmed http://127.0.0.1:8080 was alive.

The dashboard loaded cleanly.

Bifrost dashboard after local requests

That dashboard is useful because it gives a second source of truth. It shows request volume, token usage, model usage, latency, cache state, and cost.

Bifrost dashboard main observability grid

And the panels were meaningful for this test.

Bifrost dashboard tour through request volume, token usage, cache, model usage, and latency

With that working, I moved to provider wiring.

The Bifrost CLI defaulted to Codex CLI. I chose openai-ollama2/gpt-oss:20b-cloud after confirming the custom provider could route Ollama cloud requests.

On disk, the CLI state was exactly where I expected:

  • ~/.bifrost/config.json pointed at http://localhost:8080
  • ~/.bifrost/state.json stored the selected harness and model
  • ./bifrost-data/config.db had the usual config tables

I also checked the repo .env. It had project secrets, but not an OpenAI key for Bifrost.

So I did one more bootstrap test:

  • created bifrost-temp/config.json with providers.openai.keys[0].value = "env.OPENAI_API_KEY"
  • launched Bifrost on http://127.0.0.1:8081
  • the gateway started and wrote state to ./config.db
  • the provider row existed, but Bifrost reported "no valid keys found for provider: openai"

That was the exact failure mode: the key was missing from the shell.

Ollama routing worked

With OpenAI blocked by a missing secret, I tested the local provider I could fully verify: Ollama.

I started ollama serve on http://127.0.0.1:11434 and confirmed the API exposed models such as gemma3:1b, qwen3-coder:30b, tinyllama:latest, and mistral:latest.

I configured a custom Bifrost provider through /api/providers. The right pattern for Ollama was a key with an empty models list, which allows all models through.

The provider config looked like this:

{
  "provider": "openai-ollama2",
  "keys": [
    {
      "name": "openai-ollama2-key",
      "value": "dummy",
      "models": [],
      "weight": 1.0
    }
  ],
  "network_config": {
    "base_url": "http://127.0.0.1:11434",
    "default_request_timeout_in_seconds": 60
  },
  "custom_provider_config": {
    "base_provider_type": "openai",
    "allowed_requests": {
      "chat_completion": true,
      "chat_completion_stream": true
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

With that in place, Bifrost routed both local Ollama and Ollama cloud requests.

The request volume panel captured the setup process honestly: successful checks and tuning failures.

Bifrost request volume panel showing successful and failed local test requests

The token usage panel was not a benchmark, but it gave me a baseline.

Bifrost token usage panel after local and cloud Ollama requests

The model usage panel confirmed the requests were hitting the models I expected.

Bifrost model usage panel showing gemma3 and another routed model

Latency was uneven, which is no surprise for mixed local/cloud routing.

Bifrost latency panel after gateway requests

Cache was effectively unused in this small test.

Bifrost cache hit rate panel showing no cached input tokens in this small test

I also verified the Docker path on http://127.0.0.1:8082 with:

docker run -p 8082:8080 -v $(pwd)/bifrost-temp:/app/data maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

That instance also responded.

At that point the provider layer was proven enough to move on.

Classic MCP versus Code Mode

Next, I added a filesystem MCP server scoped to the donotpush folder.

curl -X POST http://127.0.0.1:8080/api/mcp/client \
  -H "Content-Type: application/json" \
  -d '{
    "name": "filesystem_blog",
    "connection_type": "stdio",
    "stdio_config": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
      ],
      "envs": ["HOME", "PATH"]
    },
    "tools_to_execute": ["*"],
    "is_ping_available": false
  }'
Enter fullscreen mode Exit fullscreen mode

Bifrost discovered 14 tools from that server.

In classic MCP mode, the model produced a direct tool call:

filesystem_blog-list_allowed_directories
Enter fullscreen mode Exit fullscreen mode

I executed it through Bifrost:

curl -X POST http://127.0.0.1:8080/v1/mcp/tool/execute \
  -H "Content-Type: application/json" \
  -d '{
    "id": "call_pjlop9a3",
    "type": "function",
    "function": {
      "name": "filesystem_blog-list_allowed_directories",
      "arguments": "{}"
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Result:

Allowed directories:
/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush
Enter fullscreen mode Exit fullscreen mode

That is the governance point in miniature: the server only saw one draft folder.

Then I flipped the same client into Code Mode.

curl -X PUT http://127.0.0.1:8080/api/mcp/client/f771c023-8a16-4b34-b03f-ffbfffd34e4b \
  -H "Content-Type: application/json" \
  -d '{
    "name": "filesystem_blog",
    "connection_type": "stdio",
    "stdio_config": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
      ],
      "envs": ["HOME", "PATH"]
    },
    "tools_to_execute": ["*"],
    "is_ping_available": false,
    "is_code_mode_client": true
  }'
Enter fullscreen mode Exit fullscreen mode

The four Code Mode meta-tools worked.

listToolFiles returned:

servers/
  filesystem_blog.pyi
Enter fullscreen mode Exit fullscreen mode

readToolFile returned a compact stub with signatures like:

def read_text_file(path: str, head: float = None, tail: float = None) -> dict
def list_allowed_directories() -> dict
def list_directory(path: str) -> dict
Enter fullscreen mode Exit fullscreen mode

getToolDocs returned docs for a specific function.

executeToolCode ran the filesystem server from inside the Code Mode layer:

Execution completed successfully.
Return value: "Allowed directories:\n/Users/bradleymatera/Desktop/gatsby-starter-minimal-blog/donotpush"
Enter fullscreen mode Exit fullscreen mode

So the behavior matched the docs. The model saw a small tool-file surface first, then executed the specific tool binding it needed.

The task was the same in both modes:

List the allowed filesystem directory.
Enter fullscreen mode Exit fullscreen mode

Classic MCP exposed the filesystem tools directly. Code Mode exposed a smaller meta-tool surface.

On this setup, I am not claiming savings. I am claiming behavior verification.

Coding agent results

The final step was the coding-agent launch.

Bifrost CLI launched Codex CLI and showed:

Harness       Codex CLI (codex-cli 0.125.0)
Model         openai-ollama2/gpt-oss:20b-cloud
Enter fullscreen mode Exit fullscreen mode

It also printed:

MCP: Codex CLI has no native auto-attach yet. Use server URL: http://localhost:8080/mcp
Enter fullscreen mode Exit fullscreen mode

So the launch path worked.

But the actual Codex request did not complete. Codex expects the Responses API, while my Ollama route was using chat completions.

Codex reported:

Error loading config.toml: `wire_api = "chat"` is no longer supported.
How to fix: set `wire_api = "responses"` in your provider config.
Enter fullscreen mode Exit fullscreen mode

That is a real provider compatibility issue, not a gateway failure.

For the working agent test, I used OpenCode, which supports OpenAI-compatible chat completions.

The config was:

{
  "$schema": "https://opencode.ai/config.json",
  "model": "bifrost/openai-ollama2/gpt-oss:20b-cloud",
  "provider": {
    "bifrost": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Bifrost",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1",
        "apiKey": "dummy"
      },
      "models": {
        "openai-ollama2/gpt-oss:20b-cloud": {
          "name": "Ollama cloud through Bifrost",
          "limit": {
            "context": 32768,
            "output": 4096
          }
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Then I ran:

OPENCODE_CONFIG=/tmp/bifrost-opencode.json opencode run \
  --dir /Users/bradleymatera/Desktop/gatsby-starter-minimal-blog \
  --format json \
  --model bifrost/openai-ollama2/gpt-oss:20b-cloud \
  'Do not edit files. Reply with exactly: opencode through bifrost ok'
Enter fullscreen mode Exit fullscreen mode

Result:

opencode through bifrost ok
Enter fullscreen mode Exit fullscreen mode

Usage reported:

17154 total tokens
17100 input tokens
54 output tokens
Enter fullscreen mode Exit fullscreen mode

That was the first real coding-agent request through Bifrost in this test.

Bifrost terminal results for provider routing, MCP, Code Mode, and OpenCode

Why governance matters here

This setup is not just about making a request succeed.

It is about making the request visible and controllable.

A gateway lets you answer:

  • which virtual key was used?
  • which provider routed the request?
  • which MCP server was available?
  • what did the dashboard record?

That matters more than one successful completion.

Bifrost control plane sidebar showing observability, models, MCP gateway, governance, and guardrails

Bifrost also surfaces:

  • virtual keys
  • budget controls
  • provider routing
  • MCP tool governance
  • audit logs
  • cost tracking

This is not only enterprise language. Even for solo projects, it is useful to know what key is being used, what model is being called, and what tools the agent can reach.

If I am running multiple demos or experiments, I want separate keys, usage tracking, and some protection against accidental spending.

Bifrost governance navigation showing virtual keys, users, teams, roles, audit logs, and guardrails

Bifrost as the control plane

Without Bifrost, the setup looks like this:

Agent → Provider
Agent → MCP server
Agent → another provider
Agent → another tool config
Agent → another local config file
Enter fullscreen mode Exit fullscreen mode

With Bifrost:

Agent → Bifrost → Providers
              → MCP tools
              → routing
              → governance
              → logs
              → cost tracking
Enter fullscreen mode Exit fullscreen mode

That second layout is easier to reason about.

It does not make agents perfect, but it does make the system easier to inspect.

Benefits and limitations

What was clear from this pass:

  • the gateway is easy to start with NPX or Docker
  • the dashboard makes traffic, latency, tokens, and errors visible
  • the CLI reduces per-agent setup work
  • the CLI can switch between supported agents without hand-editing every config
  • virtual keys, audit logs, and budget controls add governance
  • Code Mode turns the MCP surface into a smaller discovery-and-execute flow

What was also clear:

  • Code Mode is not a magic switch. It is worth validating in every environment.
  • Classic MCP can still be the better choice for very small setups.
  • Provider compatibility matters. One OpenAI-compatible route is not the same as another.

My local dashboard was on v1.4.24. The docs call out Code Mode from v1.4.0-prerelease1. That is a good reminder: verify the exact version where you run this.

The provider-specific limitation was real. OpenCode worked through Bifrost + Ollama. Codex CLI launched, but the request failed because the route needed Responses compatibility.

That is not a reason to discard the setup. It is a reason to test each agent/provider pair instead of assuming every OpenAI-compatible route behaves identically.

Related reading

Bifrost GitHub repository:

GitHub logo maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

Go Report Card Discord badge codecov Docker Pulls Run In Postman Artifact Hub License

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 23+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Get started

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

That's it! Your AI gateway is running with a web interface for visual configuration…

Bifrost CLI article:

Bifrost MCP Gateway and Code Mode article:

Official Bifrost setup article on Medium:

Final status

Here is the final state of the test:

  • [x] Install Bifrost locally with NPX
  • [x] Install Bifrost locally with Docker
  • [x] Open the dashboard and confirm the UI works
  • [x] Configure at least one provider (Ollama)
  • [ ] Configure OpenAI provider request blocked by missing secret
  • [x] Start Bifrost CLI
  • [x] Confirm Bifrost CLI asks for the right config
  • [x] Confirm Bifrost CLI config storage
  • [x] Launch Codex CLI through Bifrost CLI
  • [x] Record the Codex limitation with this route
  • [x] Launch one working coding agent through Bifrost with OpenCode
  • [x] Add at least one MCP server
  • [x] Verify the tool surface
  • [x] Verify classic MCP tool execution
  • [x] Enable Code Mode
  • [x] Verify the Code Mode meta-tools
  • [x] Run the same small MCP task with and without Code Mode
  • [x] Verify local Ollama routing through Bifrost
  • [x] Verify Ollama cloud routing through Bifrost
  • [ ] Check provider selection and fallback path blocked by missing OpenAI key
  • [x] Capture screenshots of the dashboard and terminal results
  • [x] Record exact errors, fixes, and commands
  • [out of scope] Replace vendor benchmark claims with my own benchmark for a one-server test

This is still not a benchmark. It is a tested setup walkthrough and an initial MCP + Code Mode proof. I do not want to turn a one-server smoke test into a fake performance claim.

Final thought

I am not testing Bifrost because I need another AI tool.

I am testing it because the wiring is getting messy.

Every agent wants a config file. Every provider wants a key. Every MCP server adds tools. Every tool adds context.

Bifrost does not solve all of that, but it does give you one place to inspect and control it.

In this setup, the core pieces worked:

  • the gateway installed cleanly
  • the dashboard gave me visibility
  • Ollama routed through it
  • MCP tools were governable
  • Code Mode exposed the expected meta-tools
  • OpenCode completed a request through the gateway

The limits are also clear:

  • OpenAI routing still needs a real OPENAI_API_KEY
  • Codex CLI launched, but the request path needs Responses compatibility
  • the Code Mode savings claim still needs a larger multi-server test

My takeaway is practical: Bifrost can be a useful control layer for local agent experiments, but the value depends on provider wiring, MCP scoping, and agent-specific testing.

Top comments (0)