DEV Community

Debby McKinney
Debby McKinney

Posted on

Claude Code with Any LLM: A Step-by-Step Guide Using Bifrost

TL;DR: Claude Code is locked to Anthropic models by default. By routing it through Bifrost (an open-source LLM gateway), you can use GPT-4o, Gemini, Llama, Mistral, or any of 20+ providers; all without modifying Claude Code itself. One environment variable change. Full budget controls. This guide walks you through the entire setup in under 10 minutes.

If you're already sold and just want to get started: Bifrost GitHub | Docs | Website


Why Would You Want This?

If you're using Claude Code daily (and honestly, who isn't at this point), you've probably hit one of these situations:

You want to compare models. Claude Sonnet is great for most coding tasks, but maybe GPT-4o handles your particular codebase better. Or maybe Gemini's context window is what you actually need for that massive monorepo. Without a gateway, you can't test this; Claude Code only talks to Anthropic.

You want cost control. Claude Code burns through tokens fast, yaar. A heavy coding session can easily run up ₹5,000-10,000 in API costs. What if you could route simpler tasks to GPT-4o-mini or an open-source model and save 60-80% on those calls?

You need compliance. If you're working in fintech or healthcare in India, DPDPA compliance means you might need all API traffic flowing through your own infrastructure; not directly to US-based APIs. A self-hosted gateway gives you that control.

You want observability. What models is Claude Code actually calling? How many tokens per session? What's your cost per feature? Without a gateway, you're flying blind.


Here's What This Means (Architecturally)

The setup is straightforward. Claude Code uses the ANTHROPIC_BASE_URL environment variable to determine where to send API requests. Normally, that points to https://api.anthropic.com.

You're going to point it at Bifrost instead.

Claude Code  -->  Bifrost (localhost:8080)  -->  Any LLM Provider
Enter fullscreen mode Exit fullscreen mode

Bifrost exposes an Anthropic-compatible endpoint at /anthropic. It accepts requests in Anthropic's Messages API format, transforms them to whatever provider format you've configured, sends the request, and transforms the response back to Anthropic format.

Claude Code doesn't know the difference. It thinks it's talking to Anthropic. But Bifrost can route that request to OpenAI, Gemini, Bedrock, Groq, Mistral, Ollama or whatever you've configured.

Here's what this means for you:

  • Zero code changes to Claude Code
  • One environment variable to set
  • Full provider flexibility — swap models without restarting Claude Code
  • Budget enforcement — set spending limits per virtual key
  • Complete request logging — every prompt, every response, every cost

Step 1: Install Bifrost

You have two options. Both take about 30 seconds.

Option A: NPX (Recommended for Quick Start)

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

That's it. Bifrost is now running on http://localhost:8080.

Option B: Docker

docker pull maximhq/bifrost
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

The volume mount (-v) gives you data persistence across restarts — your config, logs, and cache survive container rebuilds.

Open http://localhost:8080 in your browser. You should see Bifrost's web UI; this is where you'll configure providers visually.


Step 2: Configure Your Providers

Bifrost gives you two ways to add providers: the Web UI (point and click) or a config.json file.

Using the Web UI (Easier)

  1. Open http://localhost:8080
  2. Go to the Providers section
  3. Click "Add Provider"
  4. Select your provider (OpenAI, Anthropic, Gemini, etc.)
  5. Paste your API key
  6. Select which models to enable

You can add multiple providers. Bifrost will use them for routing and fallbacks.

Using config.json (For Automation / GitOps)

Create a config.json in your Bifrost app directory:

{
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-primary",
          "value": "env.OPENAI_API_KEY",
          "models": ["gpt-4o", "gpt-4o-mini"],
          "weight": 1.0
        }
      ]
    },
    "anthropic": {
      "keys": [
        {
          "name": "anthropic-primary",
          "value": "env.ANTHROPIC_API_KEY",
          "models": ["claude-sonnet-4-20250514"],
          "weight": 1.0
        }
      ]
    },
    "gemini": {
      "keys": [
        {
          "name": "gemini-primary",
          "value": "env.GEMINI_API_KEY",
          "models": ["gemini-2.5-pro"],
          "weight": 1.0
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Here's what this means: you've told Bifrost about three providers. The "value": "env.OPENAI_API_KEY" syntax means Bifrost reads the key from your environment variable — your actual API keys never sit in a config file.


Step 3: Point Claude Code at Bifrost

This is the key step. Set two environment variables:

export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
export ANTHROPIC_API_KEY=dummy-key
Enter fullscreen mode Exit fullscreen mode

Why dummy-key? Because Bifrost handles the actual authentication to providers. Claude Code needs something in this variable to not throw an error, but Bifrost doesn't validate it — it uses the provider keys you configured in Step 2.

Important note: If you're using Claude Code with a MAX subscription (not API key auth), Bifrost integrates with MAX accounts out of the box. Claude Code's session-based auth works seamlessly through the gateway.

Now launch Claude Code:

claude
Enter fullscreen mode Exit fullscreen mode

That's it. Every request Claude Code makes will now flow through Bifrost.

Step 4: Route to Different Models

Here's where it gets powerful. By default, Claude Code sends requests for Anthropic models. But through Bifrost, you can route these to any provider.

Method 1: Use the Bifrost model prefix

If you're using Claude Code in a context where you can specify the model (like through settings or API calls), prefix the model name with the provider:

openai/gpt-4o
gemini/gemini-2.5-pro
groq/llama-3.1-70b-versatile
mistral/mistral-large-latest
ollama/llama3
Enter fullscreen mode Exit fullscreen mode

Method 2: Configure routing rules on Virtual Keys

This is the more elegant approach. Create a Virtual Key in Bifrost's governance settings that automatically routes requests:

  1. Go to Bifrost UI > Governance > Virtual Keys
  2. Create a new Virtual Key
  3. Add routing rules — for example, route all requests to openai/gpt-4o by default
  4. Set the x-model-provider header or configure default routing

Through the API:

curl -X POST http://localhost:8080/api/governance/virtual-keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "claude-code-routing",
    "budget": {
      "max_budget": 100,
      "budget_duration": "monthly"
    },
    "providers": [
      {
        "provider": "openai",
        "model": "gpt-4o",
        "weight": 0.7
      },
      {
        "provider": "anthropic",
        "model": "claude-sonnet-4-20250514",
        "weight": 0.3
      }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Here's what this means: 70% of requests go to GPT-4o, 30% go to Claude Sonnet. With a $100/month budget cap. Bifrost enforces this automatically.


Step 5: Set Up Budget Controls

If you're managing a team, this is critical. Bifrost's Virtual Keys let you set hard spending limits.

Per-key budgets: Give each developer a virtual key with a monthly cap. ₹10,000/month for junior devs, ₹50,000/month for senior devs. Bifrost stops routing requests once the budget is exhausted.

Rate limits: Prevent runaway scripts from burning through your quota. Set requests-per-minute limits per virtual key.

Model restrictions: Lock a virtual key to only use certain models. Your staging environment doesn't need claude-opus-4-20250514 — restrict it to gpt-4o-mini and save money.


Step 6: Enable Fallbacks

Providers go down. Rate limits get hit. Bifrost handles this automatically.

When you configure fallbacks, Bifrost follows this process:

  1. Try primary provider — send the request to your configured primary
  2. Detect failure — network error, rate limit (429), model unavailable
  3. Try fallbacks in order — each fallback provider gets a fresh attempt with all plugins running
  4. Return success — from whichever provider succeeds first

Configure fallbacks in your request or at the Virtual Key level:

{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}],
  "fallbacks": [
    {"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
    {"provider": "gemini", "model": "gemini-2.5-pro"}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Here's what this means for your Claude Code workflow: if OpenAI is down, your coding session doesn't stop. Bifrost automatically fails over to Anthropic, then Gemini. No manual intervention needed.


Bonus: Get MCP Tools in Claude Code

This is the part that honestly gets me excited.

Bifrost supports full Model Context Protocol (MCP) integration. When you configure MCP servers in Bifrost, those tools automatically become available to any model that routes through it — including Claude Code.

If you've set up MCP tools for filesystem access, web search, database queries, or custom API calls, Claude Code can discover and use them without any extra configuration. Bifrost injects the tools into the request's tools array before forwarding to the model provider.


Bonus: Enable Semantic Caching

If you're asking similar questions repeatedly (and in coding sessions, you absolutely are), semantic caching saves real money.

Bifrost's semantic cache uses vector similarity search. It doesn't need the exact same prompt — it matches by meaning. "How do I sort an array in Python?" and "Python array sorting" will hit the same cache entry.

Sub-millisecond cache retrieval vs multi-second API calls. The cost? Zero for cache hits.


Quick Troubleshooting

Claude Code shows authentication error: Make sure ANTHROPIC_API_KEY is set (even to dummy-key). Claude Code validates this variable exists before making requests.

Requests aren't routing to the right provider: Check your Bifrost provider config. The model prefix (openai/, gemini/, etc.) must match a configured provider.

High latency: If you're running Bifrost and Claude Code on the same machine, latency overhead is negligible — 11us on a decent machine. If Bifrost is on a remote server, factor in network round-trip time.

Provider errors: Check Bifrost's request logs at http://localhost:8080. Every request is logged with full details — input, output, status, latency, cost.


What You've Built

Let's recap what you now have:

  • Claude Code routing through a self-hosted gateway (DPDPA-friendly)
  • Access to 20+ LLM providers through a single endpoint
  • Budget controls that prevent surprise bills
  • Automatic failovers across providers
  • Full request/response logging and cost tracking
  • Semantic caching for repeated queries
  • MCP tool injection for extended capabilities

All from one environment variable change and a 30-second install.


Get started now:

Bifrost is open source (Apache 2.0), part of the Maxim AI platform. Star it on GitHub if this was useful!


Have questions about the setup? Drop a comment below or open an issue on GitHub. Happy to help debug.

Top comments (0)