Claude Code is one of the best agentic coding tools out there. But it only talks to Anthropic models natively.
That is a problem if you want to route requests to OpenAI, Bedrock, or any other provider. Maybe you want cost control. Maybe you want failover. Maybe you just want options.
We built Bifrost to solve exactly this. It is an open-source LLM gateway written in Go that sits between Claude Code and any LLM provider. It adds 11µs of latency overhead. That is not a typo. Eleven microseconds.
Here is how to set it up.
Why You Need a Gateway for Claude Code
Claude Code sends requests in a specific format to Anthropic's API. Without a proxy, you are locked into one provider, one billing account, one point of failure.
With Bifrost as a proxy, you get:
- Routing to any provider (OpenAI, Anthropic, Bedrock, etc.) using OpenAI-compatible API format
- Budget controls per Virtual Key so Claude Code does not burn through your credits overnight
- Automatic failover if a provider goes down or rate limits you
- Weighted load balancing across multiple providers
- Rate limiting to cap runaway token spend
Bifrost handles all of this at 5,000 RPS sustained throughput. It is 50x faster than Python-based alternatives like LiteLLM, which adds roughly 8ms of overhead per request.
Check the docs for the full feature list.
Step 1: Start Bifrost
Zero-config deployment. One command.
npx -y @maximhq/bifrost
Or if you prefer Docker:
docker run -p 8080:8080 maximhq/bifrost
Bifrost ships with a Web UI for configuration. Once it is running, open the dashboard and configure your providers.
Step 2: Configure Providers
You can configure providers through the Web UI or directly in config.json. Here is an example config with Anthropic and OpenAI:
{
"providers": [
{
"name": "anthropic",
"provider": "anthropic",
"apiKey": "sk-ant-xxxxx",
"weight": 70
},
{
"name": "openai",
"provider": "openai",
"apiKey": "sk-xxxxx",
"weight": 30
}
]
}
The weight field controls load balancing. In this config, 70% of requests go to Anthropic, 30% to OpenAI.
Step 3: Use Model Routing
Bifrost uses the provider/model format for routing. When you send a request, specify which provider and model you want:
-
anthropic/claude-3-5-sonnetroutes to Anthropic -
openai/gpt-4oroutes to OpenAI -
anthropic/claude-3-opusroutes to Anthropic
All requests use the OpenAI-compatible API format, so Claude Code can connect without any custom adapters.
Here is a curl example:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-virtual-key" \
-d '{
"model": "openai/gpt-4o",
"messages": [
{"role": "user", "content": "Explain TCP handshake in 3 lines"}
]
}'
That request goes through Bifrost, gets routed to OpenAI, and the response comes back in the same OpenAI-compatible format.
Step 4: Set Up Virtual Keys for Cost Control
This is where it gets practical. Virtual Keys in Bifrost let you set budget and rate limits per key.
Create a Virtual Key for your Claude Code usage with:
- A daily budget cap (say $50/day)
- Rate limits (requests per minute)
- Model/provider filtering (restrict which models the key can access)
Budget reset frequencies: 1m, 1h, 1d, 1w, 1M.
So if you set a $50 daily budget on your Claude Code Virtual Key, once you hit that limit, Bifrost stops routing requests for that key until the budget resets. No surprises on your bill.
Step 5: Configure Automatic Failover
Bifrost automatically fails over on network errors, rate limits, 5xx responses, and timeouts.
If Anthropic rate limits you during a heavy Claude Code session, Bifrost routes the next request to OpenAI (or whichever fallback provider you configured). No dropped requests. No manual intervention.
Step 6: Budget Hierarchy for Teams
Bifrost has a four-tier budget hierarchy: Customer, Team, Virtual Key, Provider Config.
If you have a team of developers all using Claude Code, you can set:
- A team-level monthly budget of $500
- Individual Virtual Keys with $50/day limits per developer
- Provider-level caps so no single provider gets more than a set amount
All of this is enforced at the gateway level. No code changes needed in Claude Code.
Thinking Parameter Support
As of Bifrost v1.3.0, the thinking parameter for Anthropic models is fully supported. If you are using Claude's extended thinking features through Claude Code, Bifrost passes the thinking parameter through correctly.
Why Go Matters Here
Python-based gateways add milliseconds of latency per request. When Claude Code is making dozens of API calls per task, that adds up.
Bifrost is written in Go. 11µs overhead per request. 5,000 RPS sustained. For a tool like Claude Code that makes rapid-fire API calls during agentic coding sessions, that difference is real.
Get Started
Bifrost is fully open source.
- GitHub: https://git.new/bifrost
- Docs: https://getmax.im/bifrostdocs
- Website: https://getmax.im/bifrost-home
One npx command and you have a production-grade gateway between Claude Code and every LLM provider. Budget controls, failover, load balancing, rate limiting. All at 11µs overhead.
If you are using Claude Code in production or across a team, you need a gateway. Bifrost is the fastest one available.
Star the repo: https://git.new/bifrostrepo
Top comments (0)