Varshith V Hegde

Posted on Mar 13

Using Claude Code with Any LLM: Why a Gateway Changes Everything

#ai #cli #llm #tooling

I've been using Claude Code for a while now, and if you're a developer who has added it to your daily workflow, you probably know the feeling. It's genuinely good. It reads your codebase, runs commands, modifies files, and helps implement features right from your terminal without you having to context-switch constantly.

But at some point, most developers hit the same wall I did: what if I want to use a different model?

What if GPT-4o handles your specific codebase better? What if Gemini's larger context window is exactly what you need for that massive legacy project? What if you're spending more on API calls than you should be, and you know some of those simpler tasks could run on a cheaper model just fine?

Out of the box, Claude Code only talks to Anthropic. That's just how it works. And while Anthropic's models are genuinely strong, being locked into a single provider means you're trading flexibility for convenience. This guide is about getting both.

The Real Friction Points

Before jumping into the solution, it helps to be specific about what problems we're actually solving.

Model flexibility. Different models have different strengths. Claude Sonnet is excellent for most coding tasks, but you can't know it's the best tool for every job without being able to test alternatives. Without a gateway, you can't experiment without completely switching tools.

Cost management. Claude Code burns through tokens quickly during an active session. Complex architectural work and boilerplate generation are not the same job, and pricing them identically doesn't make much sense. Routing simpler requests to a more affordable model can cut costs significantly without affecting output quality where it matters.

Compliance and data routing. If you work in fintech, healthcare, or any regulated industry, you've likely dealt with requirements around where your data goes. Routing all API traffic through your own infrastructure before it reaches any external provider is often non-negotiable.

Observability. This one gets overlooked a lot. How many tokens does a typical Claude Code session consume? What's your actual cost per feature shipped? Without request logging, you're genuinely guessing.

Why Bifrost

Bifrost is an open-source LLM gateway built by Maxim AI to route, manage, and optimize requests between your application and multiple model providers. It's Apache 2.0 licensed, self-hostable, and supports 20+ providers including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Mistral, Cohere, Groq, and more.

A few things that make it stand out technically:

Performance that doesn't get in the way. At 5,000 requests per second, Bifrost adds less than 15 microseconds of internal overhead per request. At production scale, that's essentially nothing.

Zero-config startup. A single npx command launches the gateway, and everything else is configurable through a web UI.

Built-in fallbacks and load balancing. If a provider fails or rate-limits you, Bifrost automatically routes to a backup. Traffic can also be distributed across multiple keys or providers using weighted rules.

Semantic caching. Repeated or semantically similar queries can be served from cache, which reduces both latency and cost for workflows with a lot of repetitive prompting.

Full observability out of the box. Prometheus metrics, request tracing, token usage, latency, and a built-in web dashboard are all included.

The architecture is straightforward:

Claude Code  -->  Bifrost (localhost:8080)  -->  Any LLM Provider

Claude Code uses an environment variable called ANTHROPIC_BASE_URL to know where to send API requests. Normally it points to https://api.anthropic.com. You point it at Bifrost instead. Bifrost accepts requests in Anthropic's Messages API format, translates them to whichever provider you've configured, and translates the response back. Claude Code never knows the difference.

No code changes. No patching. One environment variable.

What We'll Cover

Setting up and configuring Bifrost with multiple LLM providers
Integrating Claude Code with the gateway
Running Claude Code with any model
Configuring routing rules, fallbacks, and budgets
Integrating MCP tools
Using built-in observability and monitoring

Part 1: Setting Up Bifrost

Step 1: Install Bifrost

Create a project folder, open it in your editor, and run:

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data

The -app-dir flag tells Bifrost where to store all its data. Bifrost will start listening on port 8080.

If you prefer Docker:

docker pull maximhq/bifrost
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

The -v flag mounts a volume so your configuration persists across container restarts.

Step 2: Create Your Config File

Inside your ./my-bifrost-data folder, create a config.json file. This defines which providers Bifrost can route to, enables request logging, and sets up database persistence:

{
  "$schema": "https://www.getbifrost.ai/schema",
  "client": {
    "enable_logging": true,
    "disable_content_logging": false,
    "drop_excess_requests": false,
    "initial_pool_size": 300,
    "allow_direct_keys": false
  },
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-primary",
          "value": "env.OPENAI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "anthropic": {
      "keys": [
        {
          "name": "anthropic-primary",
          "value": "env.ANTHROPIC_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "gemini": {
      "keys": [
        {
          "name": "gemini-primary",
          "value": "env.GEMINI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    }
  },
  "config_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./config.db"
    }
  },
  "logs_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./logs.db"
    }
  }
}

The "value": "env.OPENAI_API_KEY" syntax tells Bifrost to read actual keys from environment variables rather than storing them in the file. Your secrets stay out of version control.

Step 3: Set Your API Keys

export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export GEMINI_API_KEY="your-gemini-api-key"

Step 4: Start the Gateway

Stop any previously running Bifrost instance, then start it again with the app directory flag:

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data

Open http://localhost:8080 in your browser. You'll see the Bifrost dashboard where all configuration and monitoring lives.

Part 2: Connecting Claude Code to Bifrost

Step 1: Install Claude Code

npm install -g @anthropic-ai/claude-code

Step 2: Point Claude Code at Bifrost

Set these two environment variables in the same terminal session where you'll run Claude Code:

export ANTHROPIC_BASE_URL="http://localhost:8080/anthropic"
export ANTHROPIC_API_KEY="dummy-key"

The dummy-key part is a bit counterintuitive at first. Claude Code requires this variable to be set before it will run, but Bifrost handles actual authentication to providers using the keys you configured earlier. You can put any non-empty string here.

Step 3: Run Claude Code with Any Model

Start Claude Code and specify whichever model you want to use:

claude --model openai/gpt-4o

To route to other providers, use the provider prefix pattern:

openai/gpt-4o
openai/gpt-4o-mini
gemini/gemini-2.5-pro
groq/llama-3.1-70b-versatile
mistral/mistral-large-latest
anthropic/claude-sonnet-4-20250514
ollama/llama3

Run a quick sanity check by asking something simple like "Hello there" to confirm requests are flowing through correctly.

Part 3: Routing Rules, Fallbacks, and Budgets

Once Claude Code is connected, you can start using Bifrost's routing features to get more control over how requests are handled.

Weighted Routing Across Providers

Virtual Keys in Bifrost let you define routing logic that applies automatically. Navigate to Governance > Virtual Keys, create a key, and configure your routing weights:

{
  "name": "dev-routing",
  "budget": {
    "max_budget": 100,
    "budget_duration": "monthly"
  },
  "providers": [
    { "provider": "openai", "model": "gpt-4o", "weight": 0.7 },
    { "provider": "anthropic", "model": "claude-sonnet-4-20250514", "weight": 0.3 }
  ]
}

This routes 70% of requests to GPT-4o and 30% to Claude Sonnet, with a hard monthly cap of $100. Once the budget is exhausted, Bifrost stops routing automatically. For teams, this replaces a lot of manual cost monitoring.

Automatic Fallbacks

When a provider goes down or you hit a rate limit, Bifrost works down a fallback list until a request succeeds:

{
  "model": "openai/gpt-4o",
  "fallbacks": [
    { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
    { "provider": "gemini", "model": "gemini-2.5-pro" }
  ]
}

Your coding session continues without any manual intervention when a provider has issues.

Part 4: MCP Tool Integration

If you're using Model Context Protocol servers for filesystem access, web search, database queries, or custom integrations, Bifrost supports those too. Configure them once in Bifrost, and they become available to any model routing through it.

Step 1: Add MCP Configuration to Bifrost

Update your config.json to include MCP server definitions. Here's an example with filesystem access:

{
  "$schema": "https://www.getbifrost.ai/schema",
  "client": {
    "enable_logging": true,
    "disable_content_logging": true,
    "drop_excess_requests": false,
    "initial_pool_size": 300,
    "allow_direct_keys": false
  },

  "mcp": {
    "client_configs": [
      {
        "name": "filesystem",
        "connection_type": "stdio",
        "stdio_config": {
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
        },
        "tools_to_execute": ["*"],
        "tools_to_auto_execute": [
          "read_file",
          "list_directory",
          "create_file",
          "delete_file"
        ]
      }
    ],
    "tool_manager_config": {
      "max_agent_depth": 10,
      "tool_execution_timeout": 300000000000,
      "code_mode_binding_level": "server"
    }
  }
}

Restart Bifrost and navigate to the MCP catalog page in the web UI to confirm the filesystem server shows as connected.

Step 2: Add Bifrost as an MCP Server in Claude Code

claude mcp add --transport http bifrost http://localhost:8080/mcp

Step 3: Verify with a Real Task

Restart Claude Code and try a task that exercises the MCP tools. For example:

Create a simple calculator program in Python.

It should support addition, subtraction, multiplication, and division.
The user should input two numbers and an operation, and the program should print the result.

Then follow up with:

Analyze this repository and create a README.md explaining how the project works.
Include the project architecture and instructions for running it locally.

If the MCP integration is working, Claude Code will read your files, create new ones, and interact with your filesystem through Bifrost's tool injection.

Part 5: Observability and Monitoring

This is the part that surprised me most when I first set it up.

Every request that passes through Bifrost is logged with full detail: the input prompt, the response, which model handled it, latency, and cost. The web interface at http://localhost:8080/logs provides:

Real-time streaming of requests and responses
Token usage tracking per request
Latency measurements
Filtering by provider, model, or conversation content
Full request and response inspection

For individual developers, it's useful for understanding your actual usage patterns. For teams, it becomes a proper audit trail. You can see which models are being used most, where the expensive requests are coming from, and whether your routing rules are actually behaving as expected.

Bifrost also exposes Prometheus metrics for teams that want to integrate this data into existing monitoring pipelines.

Is This Worth Setting Up?

If you're a solo developer who uses Claude Code occasionally and doesn't have any compliance or cost concerns, the default setup is probably fine.

But if any of the following are true, a gateway is worth the time:

You want to test how different models perform on your specific workload
You're managing API costs across a team
Your organization has requirements around data routing or infrastructure control
You want actual visibility into your AI usage rather than end-of-month billing surprises
You use MCP tools and want them available across multiple model providers without reconfiguring each time

Bifrost being open source and self-hosted means your prompts and responses stay on your own infrastructure. For teams working on proprietary codebases, that's a meaningful difference from routing everything directly to a third-party API.

Get started:

Top comments (8)

Apex Stack • Mar 14

The cost management angle is what sold me on gateway approaches. We run Claude Code heavily for building automation skills and managing a large programmatic SEO site (100k+ pages across 12 languages), and the token burn on complex architectural sessions is real.

What I've found in practice is that the model-routing decision isn't just about cost -- it's about task type. Content generation tasks where you need creativity and nuance? Claude Sonnet is hard to beat. But structured data extraction, JSON formatting, or boilerplate scaffolding? A cheaper model handles those just as well, and at scale the savings compound fast.

The observability piece you mention is underrated. We were completely blind to our per-feature costs until we started logging token usage by task type. Turns out our most expensive operations weren't the ones we expected -- it was the iterative debugging loops, not the initial generation, that burned through tokens.

One thing I'd add: if you're using Claude Code with MCP tools (which is where it really shines), test your fallback models thoroughly with those tools. We've seen significant quality differences in how different models handle tool-calling patterns -- some models are much better at knowing when to call a tool vs. when to just answer directly.

Solid writeup. Bookmarking the Bifrost repo.

m AA • Mar 17

Why so many pages?? Can I also ask what results you saw with that many pages the first month the website went live?

Apex Stack • Mar 17

The page count comes from programmatic SEO — it's a stock analysis site covering 8,000+ US tickers across 12 languages. Each ticker gets its own page with financial data, analysis, and metrics pulled from yfinance and stored in Supabase. Multiply 8,000 tickers by 12 languages and you're already close to 100k pages before you even add sector and ETF pages.

As for first month results — I'll be honest, they were humbling. Google indexed less than 2% of the pages and I got basically zero organic traffic for weeks. The issue is that programmatic pages at that scale start as thin content in Google's eyes, even if the data is real and unique. Bing and Yandex were actually much more generous with indexing (8-9x more pages), which told me the content itself wasn't the problem — it was Google's trust signals.

The lesson I took from it: launching with a massive page count only works if you also have backlinks and domain authority to back it up. Without those, Google just sees a new domain with 100k pages and assumes the worst. I'm now focused on thickening the content per page and building real backlinks before worrying about adding more pages.

Mr. Lin Uncut • Mar 13

This is a game-changer for Claude Code. Using a gateway to unlock multi-model support really extends its utility, especially for teams already invested in other providers but wanting the Sonnet 3.5 coding experience. Great breakdown!

Harsh • Mar 17

The performance numbers from the Bifrost README (11-59 µs overhead at 5k RPS) are impressive. For a local development tool like Claude Code, that overhead is essentially noise. More importantly, the architecture accepting Anthropic's format and translating it is much cleaner than trying to patch the tool itself. It's the UNIX philosophy applied to AI: a thin layer that does one thing well.

Henry Godnick • Mar 13

Great breakdown. One thing that bites people with multi-LLM setups like this is not having real-time visibility into what each model is actually costing per session. You route to a cheaper model but the context window overhead eats the savings. Having a live token counter running while you work makes the routing decisions way more informed because you can see the cost delta in real time instead of checking dashboards after the fact. DM me if you want the link to a macOS menu bar tool that does exactly this.

Ronnie • Mar 17

"This is super helpful — thanks for breaking it down! I love how Bifrost gives real observability and cost control across multiple models. Definitely going to try this setup with my Claude Code workflow."

Some comments may only be visible to logged-in visitors. Sign in to view all comments.