Debby McKinney

Posted on Mar 3

Claude Code with Any LLM: A Step-by-Step Guide Using Bifrost

#programming #ai #tutorial #llm

TL;DR: Claude Code is locked to Anthropic models by default. By routing it through Bifrost (an open-source LLM gateway), you can use GPT-4o, Gemini, Llama, Mistral, or any of 20+ providers; all without modifying Claude Code itself. One environment variable change. Full budget controls. This guide walks you through the entire setup in under 10 minutes.

If you're already sold and just want to get started: Bifrost GitHub | Docs | Website

Why Would You Want This?

If you're using Claude Code daily (and honestly, who isn't at this point), you've probably hit one of these situations:

You want to compare models. Claude Sonnet is great for most coding tasks, but maybe GPT-4o handles your particular codebase better. Or maybe Gemini's context window is what you actually need for that massive monorepo. Without a gateway, you can't test this; Claude Code only talks to Anthropic.

You want cost control. Claude Code burns through tokens fast, yaar. A heavy coding session can easily run up ₹5,000-10,000 in API costs. What if you could route simpler tasks to GPT-4o-mini or an open-source model and save 60-80% on those calls?

You need compliance. If you're working in fintech or healthcare in India, DPDPA compliance means you might need all API traffic flowing through your own infrastructure; not directly to US-based APIs. A self-hosted gateway gives you that control.

You want observability. What models is Claude Code actually calling? How many tokens per session? What's your cost per feature? Without a gateway, you're flying blind.

Here's What This Means (Architecturally)

The setup is straightforward. Claude Code uses the ANTHROPIC_BASE_URL environment variable to determine where to send API requests. Normally, that points to https://api.anthropic.com.

You're going to point it at Bifrost instead.

Claude Code  -->  Bifrost (localhost:8080)  -->  Any LLM Provider

Bifrost exposes an Anthropic-compatible endpoint at /anthropic. It accepts requests in Anthropic's Messages API format, transforms them to whatever provider format you've configured, sends the request, and transforms the response back to Anthropic format.

Claude Code doesn't know the difference. It thinks it's talking to Anthropic. But Bifrost can route that request to OpenAI, Gemini, Bedrock, Groq, Mistral, Ollama or whatever you've configured.

Here's what this means for you:

Zero code changes to Claude Code
One environment variable to set
Full provider flexibility — swap models without restarting Claude Code
Budget enforcement — set spending limits per virtual key
Complete request logging — every prompt, every response, every cost

Step 1: Install Bifrost

You have two options. Both take about 30 seconds.

Option A: NPX (Recommended for Quick Start)

npx -y @maximhq/bifrost

That's it. Bifrost is now running on http://localhost:8080.

Option B: Docker

docker pull maximhq/bifrost
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

The volume mount (-v) gives you data persistence across restarts — your config, logs, and cache survive container rebuilds.

Open http://localhost:8080 in your browser. You should see Bifrost's web UI; this is where you'll configure providers visually.

Step 2: Configure Your Providers

Bifrost gives you two ways to add providers: the Web UI (point and click) or a config.json file.

Using the Web UI (Easier)

Open http://localhost:8080
Go to the Providers section
Click "Add Provider"
Select your provider (OpenAI, Anthropic, Gemini, etc.)
Paste your API key
Select which models to enable

You can add multiple providers. Bifrost will use them for routing and fallbacks.

Using config.json (For Automation / GitOps)

Create a config.json in your Bifrost app directory:

{
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-primary",
          "value": "env.OPENAI_API_KEY",
          "models": ["gpt-4o", "gpt-4o-mini"],
          "weight": 1.0
        }
      ]
    },
    "anthropic": {
      "keys": [
        {
          "name": "anthropic-primary",
          "value": "env.ANTHROPIC_API_KEY",
          "models": ["claude-sonnet-4-20250514"],
          "weight": 1.0
        }
      ]
    },
    "gemini": {
      "keys": [
        {
          "name": "gemini-primary",
          "value": "env.GEMINI_API_KEY",
          "models": ["gemini-2.5-pro"],
          "weight": 1.0
        }
      ]
    }
  }
}

Here's what this means: you've told Bifrost about three providers. The "value": "env.OPENAI_API_KEY" syntax means Bifrost reads the key from your environment variable — your actual API keys never sit in a config file.

Step 3: Point Claude Code at Bifrost

This is the key step. Set two environment variables:

export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
export ANTHROPIC_API_KEY=dummy-key

Why dummy-key? Because Bifrost handles the actual authentication to providers. Claude Code needs something in this variable to not throw an error, but Bifrost doesn't validate it — it uses the provider keys you configured in Step 2.

Important note: If you're using Claude Code with a MAX subscription (not API key auth), Bifrost integrates with MAX accounts out of the box. Claude Code's session-based auth works seamlessly through the gateway.

Now launch Claude Code:

claude

That's it. Every request Claude Code makes will now flow through Bifrost.

Step 4: Route to Different Models

Here's where it gets powerful. By default, Claude Code sends requests for Anthropic models. But through Bifrost, you can route these to any provider.

Method 1: Use the Bifrost model prefix

If you're using Claude Code in a context where you can specify the model (like through settings or API calls), prefix the model name with the provider:

openai/gpt-4o
gemini/gemini-2.5-pro
groq/llama-3.1-70b-versatile
mistral/mistral-large-latest
ollama/llama3

Method 2: Configure routing rules on Virtual Keys

This is the more elegant approach. Create a Virtual Key in Bifrost's governance settings that automatically routes requests:

Go to Bifrost UI > Governance > Virtual Keys
Create a new Virtual Key
Add routing rules — for example, route all requests to openai/gpt-4o by default
Set the x-model-provider header or configure default routing

Through the API:

curl -X POST http://localhost:8080/api/governance/virtual-keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "claude-code-routing",
    "budget": {
      "max_budget": 100,
      "budget_duration": "monthly"
    },
    "providers": [
      {
        "provider": "openai",
        "model": "gpt-4o",
        "weight": 0.7
      },
      {
        "provider": "anthropic",
        "model": "claude-sonnet-4-20250514",
        "weight": 0.3
      }
    ]
  }'

Here's what this means: 70% of requests go to GPT-4o, 30% go to Claude Sonnet. With a $100/month budget cap. Bifrost enforces this automatically.

Step 5: Set Up Budget Controls

If you're managing a team, this is critical. Bifrost's Virtual Keys let you set hard spending limits.

Per-key budgets: Give each developer a virtual key with a monthly cap. ₹10,000/month for junior devs, ₹50,000/month for senior devs. Bifrost stops routing requests once the budget is exhausted.

Rate limits: Prevent runaway scripts from burning through your quota. Set requests-per-minute limits per virtual key.

Model restrictions: Lock a virtual key to only use certain models. Your staging environment doesn't need claude-opus-4-20250514 — restrict it to gpt-4o-mini and save money.

Step 6: Enable Fallbacks

Providers go down. Rate limits get hit. Bifrost handles this automatically.

When you configure fallbacks, Bifrost follows this process:

Try primary provider — send the request to your configured primary
Detect failure — network error, rate limit (429), model unavailable
Try fallbacks in order — each fallback provider gets a fresh attempt with all plugins running
Return success — from whichever provider succeeds first

Configure fallbacks in your request or at the Virtual Key level:

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}],
  "fallbacks": [
    {"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
    {"provider": "gemini", "model": "gemini-2.5-pro"}
  ]
}

Here's what this means for your Claude Code workflow: if OpenAI is down, your coding session doesn't stop. Bifrost automatically fails over to Anthropic, then Gemini. No manual intervention needed.

Bonus: Get MCP Tools in Claude Code

This is the part that honestly gets me excited.

Bifrost supports full Model Context Protocol (MCP) integration. When you configure MCP servers in Bifrost, those tools automatically become available to any model that routes through it — including Claude Code.

If you've set up MCP tools for filesystem access, web search, database queries, or custom API calls, Claude Code can discover and use them without any extra configuration. Bifrost injects the tools into the request's tools array before forwarding to the model provider.

Bonus: Enable Semantic Caching

If you're asking similar questions repeatedly (and in coding sessions, you absolutely are), semantic caching saves real money.

Bifrost's semantic cache uses vector similarity search. It doesn't need the exact same prompt — it matches by meaning. "How do I sort an array in Python?" and "Python array sorting" will hit the same cache entry.

Sub-millisecond cache retrieval vs multi-second API calls. The cost? Zero for cache hits.

Quick Troubleshooting

Claude Code shows authentication error: Make sure ANTHROPIC_API_KEY is set (even to dummy-key). Claude Code validates this variable exists before making requests.

Requests aren't routing to the right provider: Check your Bifrost provider config. The model prefix (openai/, gemini/, etc.) must match a configured provider.

High latency: If you're running Bifrost and Claude Code on the same machine, latency overhead is negligible — 11us on a decent machine. If Bifrost is on a remote server, factor in network round-trip time.

Provider errors: Check Bifrost's request logs at http://localhost:8080. Every request is logged with full details — input, output, status, latency, cost.