I've been using Claude Code for a while now, and if you're a developer who has added it to your daily workflow, you probably know the feeling. It's genuinely good. It reads your codebase, runs commands, modifies files, and helps implement features right from your terminal without you having to context-switch constantly.
But at some point, most developers hit the same wall I did: what if I want to use a different model?
What if GPT-4o handles your specific codebase better? What if Gemini's larger context window is exactly what you need for that massive legacy project? What if you're spending more on API calls than you should be, and you know some of those simpler tasks could run on a cheaper model just fine?
Out of the box, Claude Code only talks to Anthropic. That's just how it works. And while Anthropic's models are genuinely strong, being locked into a single provider means you're trading flexibility for convenience. This guide is about getting both.
The Real Friction Points
Before jumping into the solution, it helps to be specific about what problems we're actually solving.
Model flexibility. Different models have different strengths. Claude Sonnet is excellent for most coding tasks, but you can't know it's the best tool for every job without being able to test alternatives. Without a gateway, you can't experiment without completely switching tools.
Cost management. Claude Code burns through tokens quickly during an active session. Complex architectural work and boilerplate generation are not the same job, and pricing them identically doesn't make much sense. Routing simpler requests to a more affordable model can cut costs significantly without affecting output quality where it matters.
Compliance and data routing. If you work in fintech, healthcare, or any regulated industry, you've likely dealt with requirements around where your data goes. Routing all API traffic through your own infrastructure before it reaches any external provider is often non-negotiable.
Observability. This one gets overlooked a lot. How many tokens does a typical Claude Code session consume? What's your actual cost per feature shipped? Without request logging, you're genuinely guessing.
Why Bifrost
Bifrost is an open-source LLM gateway built by Maxim AI to route, manage, and optimize requests between your application and multiple model providers. It's Apache 2.0 licensed, self-hostable, and supports 20+ providers including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Mistral, Cohere, Groq, and more.
A few things that make it stand out technically:
Performance that doesn't get in the way. At 5,000 requests per second, Bifrost adds less than 15 microseconds of internal overhead per request. At production scale, that's essentially nothing.
Zero-config startup. A single npx command launches the gateway, and everything else is configurable through a web UI.
Built-in fallbacks and load balancing. If a provider fails or rate-limits you, Bifrost automatically routes to a backup. Traffic can also be distributed across multiple keys or providers using weighted rules.
Semantic caching. Repeated or semantically similar queries can be served from cache, which reduces both latency and cost for workflows with a lot of repetitive prompting.
Full observability out of the box. Prometheus metrics, request tracing, token usage, latency, and a built-in web dashboard are all included.
The architecture is straightforward:
Claude Code --> Bifrost (localhost:8080) --> Any LLM Provider
Claude Code uses an environment variable called ANTHROPIC_BASE_URL to know where to send API requests. Normally it points to https://api.anthropic.com. You point it at Bifrost instead. Bifrost accepts requests in Anthropic's Messages API format, translates them to whichever provider you've configured, and translates the response back. Claude Code never knows the difference.
No code changes. No patching. One environment variable.
What We'll Cover
- Setting up and configuring Bifrost with multiple LLM providers
- Integrating Claude Code with the gateway
- Running Claude Code with any model
- Configuring routing rules, fallbacks, and budgets
- Integrating MCP tools
- Using built-in observability and monitoring
Part 1: Setting Up Bifrost
Step 1: Install Bifrost
Create a project folder, open it in your editor, and run:
npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
The -app-dir flag tells Bifrost where to store all its data. Bifrost will start listening on port 8080.
If you prefer Docker:
docker pull maximhq/bifrost
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
The -v flag mounts a volume so your configuration persists across container restarts.
Step 2: Create Your Config File
Inside your ./my-bifrost-data folder, create a config.json file. This defines which providers Bifrost can route to, enables request logging, and sets up database persistence:
{
"$schema": "https://www.getbifrost.ai/schema",
"client": {
"enable_logging": true,
"disable_content_logging": false,
"drop_excess_requests": false,
"initial_pool_size": 300,
"allow_direct_keys": false
},
"providers": {
"openai": {
"keys": [
{
"name": "openai-primary",
"value": "env.OPENAI_API_KEY",
"models": [],
"weight": 1.0
}
]
},
"anthropic": {
"keys": [
{
"name": "anthropic-primary",
"value": "env.ANTHROPIC_API_KEY",
"models": [],
"weight": 1.0
}
]
},
"gemini": {
"keys": [
{
"name": "gemini-primary",
"value": "env.GEMINI_API_KEY",
"models": [],
"weight": 1.0
}
]
}
},
"config_store": {
"enabled": true,
"type": "sqlite",
"config": {
"path": "./config.db"
}
},
"logs_store": {
"enabled": true,
"type": "sqlite",
"config": {
"path": "./logs.db"
}
}
}
The "value": "env.OPENAI_API_KEY" syntax tells Bifrost to read actual keys from environment variables rather than storing them in the file. Your secrets stay out of version control.
Step 3: Set Your API Keys
export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export GEMINI_API_KEY="your-gemini-api-key"
Step 4: Start the Gateway
Stop any previously running Bifrost instance, then start it again with the app directory flag:
npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
Open http://localhost:8080 in your browser. You'll see the Bifrost dashboard where all configuration and monitoring lives.
Part 2: Connecting Claude Code to Bifrost
Step 1: Install Claude Code
npm install -g @anthropic-ai/claude-code
Step 2: Point Claude Code at Bifrost
Set these two environment variables in the same terminal session where you'll run Claude Code:
export ANTHROPIC_BASE_URL="http://localhost:8080/anthropic"
export ANTHROPIC_API_KEY="dummy-key"
The dummy-key part is a bit counterintuitive at first. Claude Code requires this variable to be set before it will run, but Bifrost handles actual authentication to providers using the keys you configured earlier. You can put any non-empty string here.
Step 3: Run Claude Code with Any Model
Start Claude Code and specify whichever model you want to use:
claude --model openai/gpt-4o
To route to other providers, use the provider prefix pattern:
openai/gpt-4o
openai/gpt-4o-mini
gemini/gemini-2.5-pro
groq/llama-3.1-70b-versatile
mistral/mistral-large-latest
anthropic/claude-sonnet-4-20250514
ollama/llama3
Run a quick sanity check by asking something simple like "Hello there" to confirm requests are flowing through correctly.
Part 3: Routing Rules, Fallbacks, and Budgets
Once Claude Code is connected, you can start using Bifrost's routing features to get more control over how requests are handled.
Weighted Routing Across Providers
Virtual Keys in Bifrost let you define routing logic that applies automatically. Navigate to Governance > Virtual Keys, create a key, and configure your routing weights:
{
"name": "dev-routing",
"budget": {
"max_budget": 100,
"budget_duration": "monthly"
},
"providers": [
{ "provider": "openai", "model": "gpt-4o", "weight": 0.7 },
{ "provider": "anthropic", "model": "claude-sonnet-4-20250514", "weight": 0.3 }
]
}
This routes 70% of requests to GPT-4o and 30% to Claude Sonnet, with a hard monthly cap of $100. Once the budget is exhausted, Bifrost stops routing automatically. For teams, this replaces a lot of manual cost monitoring.
Automatic Fallbacks
When a provider goes down or you hit a rate limit, Bifrost works down a fallback list until a request succeeds:
{
"model": "openai/gpt-4o",
"fallbacks": [
{ "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
{ "provider": "gemini", "model": "gemini-2.5-pro" }
]
}
Your coding session continues without any manual intervention when a provider has issues.
Part 4: MCP Tool Integration
If you're using Model Context Protocol servers for filesystem access, web search, database queries, or custom integrations, Bifrost supports those too. Configure them once in Bifrost, and they become available to any model routing through it.
Step 1: Add MCP Configuration to Bifrost
Update your config.json to include MCP server definitions. Here's an example with filesystem access:
{
"$schema": "https://www.getbifrost.ai/schema",
"client": {
"enable_logging": true,
"disable_content_logging": true,
"drop_excess_requests": false,
"initial_pool_size": 300,
"allow_direct_keys": false
},
"mcp": {
"client_configs": [
{
"name": "filesystem",
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
},
"tools_to_execute": ["*"],
"tools_to_auto_execute": [
"read_file",
"list_directory",
"create_file",
"delete_file"
]
}
],
"tool_manager_config": {
"max_agent_depth": 10,
"tool_execution_timeout": 300000000000,
"code_mode_binding_level": "server"
}
}
}
Restart Bifrost and navigate to the MCP catalog page in the web UI to confirm the filesystem server shows as connected.
Step 2: Add Bifrost as an MCP Server in Claude Code
claude mcp add --transport http bifrost http://localhost:8080/mcp
Step 3: Verify with a Real Task
Restart Claude Code and try a task that exercises the MCP tools. For example:
Create a simple calculator program in Python.
It should support addition, subtraction, multiplication, and division.
The user should input two numbers and an operation, and the program should print the result.
Then follow up with:
Analyze this repository and create a README.md explaining how the project works.
Include the project architecture and instructions for running it locally.
If the MCP integration is working, Claude Code will read your files, create new ones, and interact with your filesystem through Bifrost's tool injection.
Part 5: Observability and Monitoring
This is the part that surprised me most when I first set it up.
Every request that passes through Bifrost is logged with full detail: the input prompt, the response, which model handled it, latency, and cost. The web interface at http://localhost:8080/logs provides:
- Real-time streaming of requests and responses
- Token usage tracking per request
- Latency measurements
- Filtering by provider, model, or conversation content
- Full request and response inspection
For individual developers, it's useful for understanding your actual usage patterns. For teams, it becomes a proper audit trail. You can see which models are being used most, where the expensive requests are coming from, and whether your routing rules are actually behaving as expected.
Bifrost also exposes Prometheus metrics for teams that want to integrate this data into existing monitoring pipelines.
Is This Worth Setting Up?
If you're a solo developer who uses Claude Code occasionally and doesn't have any compliance or cost concerns, the default setup is probably fine.
But if any of the following are true, a gateway is worth the time:
- You want to test how different models perform on your specific workload
- You're managing API costs across a team
- Your organization has requirements around data routing or infrastructure control
- You want actual visibility into your AI usage rather than end-of-month billing surprises
- You use MCP tools and want them available across multiple model providers without reconfiguring each time
Bifrost being open source and self-hosted means your prompts and responses stay on your own infrastructure. For teams working on proprietary codebases, that's a meaningful difference from routing everything directly to a third-party API.
Get started:
- Website: getmax.im/bifrost
- GitHub: git.new/bifrost
- Docs: getmax.im/bifrostdocs

Top comments (0)