DEV Community

Pranay Batta
Pranay Batta

Posted on

How to Use Claude Code with Any Model using an AI Gateway

Claude Code is one of the best agentic coding tools out there. But it only talks to Anthropic models natively.

That is a problem if you want to route requests to OpenAI, Bedrock, or any other provider. Maybe you want cost control. Maybe you want failover. Maybe you just want options.

We built Bifrost to solve exactly this. It is an open-source LLM gateway written in Go that sits between Claude Code and any LLM provider. It adds 11µs of latency overhead. That is not a typo. Eleven microseconds.

Here is how to set it up.

Why You Need a Gateway for Claude Code

Claude Code sends requests in a specific format to Anthropic's API. Without a proxy, you are locked into one provider, one billing account, one point of failure.

With Bifrost as a proxy, you get:

  • Routing to any provider (OpenAI, Anthropic, Bedrock, etc.) using OpenAI-compatible API format
  • Budget controls per Virtual Key so Claude Code does not burn through your credits overnight
  • Automatic failover if a provider goes down or rate limits you
  • Weighted load balancing across multiple providers
  • Rate limiting to cap runaway token spend

Bifrost handles all of this at 5,000 RPS sustained throughput. It is 50x faster than Python-based alternatives like LiteLLM, which adds roughly 8ms of overhead per request.

Check the docs for the full feature list.

Step 1: Start Bifrost

Zero-config deployment. One command.

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Or if you prefer Docker:

docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Bifrost ships with a Web UI for configuration. Once it is running, open the dashboard and configure your providers.

Step 2: Configure Providers

You can configure providers through the Web UI or directly in config.json. Here is an example config with Anthropic and OpenAI:

{
  "providers": [
    {
      "name": "anthropic",
      "provider": "anthropic",
      "apiKey": "sk-ant-xxxxx",
      "weight": 70
    },
    {
      "name": "openai",
      "provider": "openai",
      "apiKey": "sk-xxxxx",
      "weight": 30
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The weight field controls load balancing. In this config, 70% of requests go to Anthropic, 30% to OpenAI.

Step 3: Use Model Routing

Bifrost uses the provider/model format for routing. When you send a request, specify which provider and model you want:

  • anthropic/claude-3-5-sonnet routes to Anthropic
  • openai/gpt-4o routes to OpenAI
  • anthropic/claude-3-opus routes to Anthropic

All requests use the OpenAI-compatible API format, so Claude Code can connect without any custom adapters.

Here is a curl example:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-virtual-key" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "Explain TCP handshake in 3 lines"}
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

That request goes through Bifrost, gets routed to OpenAI, and the response comes back in the same OpenAI-compatible format.

Step 4: Set Up Virtual Keys for Cost Control

This is where it gets practical. Virtual Keys in Bifrost let you set budget and rate limits per key.

Create a Virtual Key for your Claude Code usage with:

  • A daily budget cap (say $50/day)
  • Rate limits (requests per minute)
  • Model/provider filtering (restrict which models the key can access)

Budget reset frequencies: 1m, 1h, 1d, 1w, 1M.

So if you set a $50 daily budget on your Claude Code Virtual Key, once you hit that limit, Bifrost stops routing requests for that key until the budget resets. No surprises on your bill.

Step 5: Configure Automatic Failover

Bifrost automatically fails over on network errors, rate limits, 5xx responses, and timeouts.

If Anthropic rate limits you during a heavy Claude Code session, Bifrost routes the next request to OpenAI (or whichever fallback provider you configured). No dropped requests. No manual intervention.

Step 6: Budget Hierarchy for Teams

Bifrost has a four-tier budget hierarchy: Customer, Team, Virtual Key, Provider Config.

If you have a team of developers all using Claude Code, you can set:

  • A team-level monthly budget of $500
  • Individual Virtual Keys with $50/day limits per developer
  • Provider-level caps so no single provider gets more than a set amount

All of this is enforced at the gateway level. No code changes needed in Claude Code.

Thinking Parameter Support

As of Bifrost v1.3.0, the thinking parameter for Anthropic models is fully supported. If you are using Claude's extended thinking features through Claude Code, Bifrost passes the thinking parameter through correctly.

Why Go Matters Here

Python-based gateways add milliseconds of latency per request. When Claude Code is making dozens of API calls per task, that adds up.

Bifrost is written in Go. 11µs overhead per request. 5,000 RPS sustained. For a tool like Claude Code that makes rapid-fire API calls during agentic coding sessions, that difference is real.

Get Started

Bifrost is fully open source.

One npx command and you have a production-grade gateway between Claude Code and every LLM provider. Budget controls, failover, load balancing, rate limiting. All at 11µs overhead.

If you are using Claude Code in production or across a team, you need a gateway. Bifrost is the fastest one available.

Star the repo: https://git.new/bifrostrepo

Top comments (0)