Pranay Batta

Posted on Mar 10

How to Use Claude Code with Any Model using an AI Gateway

#chatgpt #llm #ai #tutorial

Claude Code is one of the best agentic coding tools out there. But it only talks to Anthropic models natively.

That is a problem if you want to route requests to OpenAI, Bedrock, or any other provider. Maybe you want cost control. Maybe you want failover. Maybe you just want options.

We built Bifrost to solve exactly this. It is an open-source LLM gateway written in Go that sits between Claude Code and any LLM provider. It adds 11µs of latency overhead. That is not a typo. Eleven microseconds.

Here is how to set it up.

Why You Need a Gateway for Claude Code

Claude Code sends requests in a specific format to Anthropic's API. Without a proxy, you are locked into one provider, one billing account, one point of failure.

With Bifrost as a proxy, you get:

Routing to any provider (OpenAI, Anthropic, Bedrock, etc.) using OpenAI-compatible API format
Budget controls per Virtual Key so Claude Code does not burn through your credits overnight
Automatic failover if a provider goes down or rate limits you
Weighted load balancing across multiple providers
Rate limiting to cap runaway token spend

Bifrost handles all of this at 5,000 RPS sustained throughput. It is 50x faster than Python-based alternatives like LiteLLM, which adds roughly 8ms of overhead per request.

Check the docs for the full feature list.

Step 1: Start Bifrost

Zero-config deployment. One command.

npx -y @maximhq/bifrost

Or if you prefer Docker:

docker run -p 8080:8080 maximhq/bifrost

Bifrost ships with a Web UI for configuration. Once it is running, open the dashboard and configure your providers.

Step 2: Configure Providers

You can configure providers through the Web UI or directly in config.json. Here is an example config with Anthropic and OpenAI:

{
  "providers": [
    {
      "name": "anthropic",
      "provider": "anthropic",
      "apiKey": "sk-ant-xxxxx",
      "weight": 70
    },
    {
      "name": "openai",
      "provider": "openai",
      "apiKey": "sk-xxxxx",
      "weight": 30
    }
  ]
}

The weight field controls load balancing. In this config, 70% of requests go to Anthropic, 30% to OpenAI.

Step 3: Use Model Routing

Bifrost uses the provider/model format for routing. When you send a request, specify which provider and model you want:

anthropic/claude-3-5-sonnet routes to Anthropic
openai/gpt-4o routes to OpenAI
anthropic/claude-3-opus routes to Anthropic

All requests use the OpenAI-compatible API format, so Claude Code can connect without any custom adapters.

Here is a curl example:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-virtual-key" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "Explain TCP handshake in 3 lines"}
    ]
  }'

That request goes through Bifrost, gets routed to OpenAI, and the response comes back in the same OpenAI-compatible format.

Step 4: Set Up Virtual Keys for Cost Control

This is where it gets practical. Virtual Keys in Bifrost let you set budget and rate limits per key.

Create a Virtual Key for your Claude Code usage with:

A daily budget cap (say $50/day)
Rate limits (requests per minute)
Model/provider filtering (restrict which models the key can access)

Budget reset frequencies: 1m, 1h, 1d, 1w, 1M.

So if you set a $50 daily budget on your Claude Code Virtual Key, once you hit that limit, Bifrost stops routing requests for that key until the budget resets. No surprises on your bill.

Step 5: Configure Automatic Failover

Bifrost automatically fails over on network errors, rate limits, 5xx responses, and timeouts.

If Anthropic rate limits you during a heavy Claude Code session, Bifrost routes the next request to OpenAI (or whichever fallback provider you configured). No dropped requests. No manual intervention.

Step 6: Budget Hierarchy for Teams

Bifrost has a four-tier budget hierarchy: Customer, Team, Virtual Key, Provider Config.

If you have a team of developers all using Claude Code, you can set:

A team-level monthly budget of $500
Individual Virtual Keys with $50/day limits per developer
Provider-level caps so no single provider gets more than a set amount

All of this is enforced at the gateway level. No code changes needed in Claude Code.

Thinking Parameter Support

As of Bifrost v1.3.0, the thinking parameter for Anthropic models is fully supported. If you are using Claude's extended thinking features through Claude Code, Bifrost passes the thinking parameter through correctly.

Why Go Matters Here

Python-based gateways add milliseconds of latency per request. When Claude Code is making dozens of API calls per task, that adds up.

Bifrost is written in Go. 11µs overhead per request. 5,000 RPS sustained. For a tool like Claude Code that makes rapid-fire API calls during agentic coding sessions, that difference is real.

Get Started

Bifrost is fully open source.

One npx command and you have a production-grade gateway between Claude Code and every LLM provider. Budget controls, failover, load balancing, rate limiting. All at 11µs overhead.

If you are using Claude Code in production or across a team, you need a gateway. Bifrost is the fastest one available.

Star the repo: https://git.new/bifrostrepo

Top comments (1)

Henry Godnick • Mar 15

The budget hierarchy piece is really underrated. Most devs I talk to have no clue how much they're spending on tokens until they check their dashboard days later. Even with a gateway enforcing limits, there's a gap between "budget hit, requests stopped" and actually understanding your spend in real time while working.

I've been running a macOS menu bar counter that shows live token usage and cost across providers as I code. Completely changed my prompting habits -- you get way more intentional about what you're sending when you watch the numbers tick up live. Pairs well with a gateway setup like this since you get both the enforcement layer AND the visibility layer.

Good writeup, the Go performance numbers are impressive.