DEV Community

Cover image for Using Claude Code with Any LLM: Why a Gateway Changes Everything
Varshith V Hegde
Varshith V Hegde Subscriber

Posted on

Using Claude Code with Any LLM: Why a Gateway Changes Everything

I've been using Claude Code for a while now, and if you're a developer who has added it to your daily workflow, you probably know the feeling. It's genuinely good. It reads your codebase, runs commands, modifies files, and helps implement features right from your terminal without you having to context-switch constantly.

But at some point, most developers hit the same wall I did: what if I want to use a different model?

What if GPT-4o handles your specific codebase better? What if Gemini's larger context window is exactly what you need for that massive legacy project? What if you're spending more on API calls than you should be, and you know some of those simpler tasks could run on a cheaper model just fine?

Out of the box, Claude Code only talks to Anthropic. That's just how it works. And while Anthropic's models are genuinely strong, being locked into a single provider means you're trading flexibility for convenience. This guide is about getting both.


The Real Friction Points

Before jumping into the solution, it helps to be specific about what problems we're actually solving.

Model flexibility. Different models have different strengths. Claude Sonnet is excellent for most coding tasks, but you can't know it's the best tool for every job without being able to test alternatives. Without a gateway, you can't experiment without completely switching tools.

Cost management. Claude Code burns through tokens quickly during an active session. Complex architectural work and boilerplate generation are not the same job, and pricing them identically doesn't make much sense. Routing simpler requests to a more affordable model can cut costs significantly without affecting output quality where it matters.

Compliance and data routing. If you work in fintech, healthcare, or any regulated industry, you've likely dealt with requirements around where your data goes. Routing all API traffic through your own infrastructure before it reaches any external provider is often non-negotiable.

Observability. This one gets overlooked a lot. How many tokens does a typical Claude Code session consume? What's your actual cost per feature shipped? Without request logging, you're genuinely guessing.


Why Bifrost

Bifrost

Bifrost is an open-source LLM gateway built by Maxim AI to route, manage, and optimize requests between your application and multiple model providers. It's Apache 2.0 licensed, self-hostable, and supports 20+ providers including OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Mistral, Cohere, Groq, and more.

A few things that make it stand out technically:

Performance that doesn't get in the way. At 5,000 requests per second, Bifrost adds less than 15 microseconds of internal overhead per request. At production scale, that's essentially nothing.

Zero-config startup. A single npx command launches the gateway, and everything else is configurable through a web UI.

Built-in fallbacks and load balancing. If a provider fails or rate-limits you, Bifrost automatically routes to a backup. Traffic can also be distributed across multiple keys or providers using weighted rules.

Semantic caching. Repeated or semantically similar queries can be served from cache, which reduces both latency and cost for workflows with a lot of repetitive prompting.

Full observability out of the box. Prometheus metrics, request tracing, token usage, latency, and a built-in web dashboard are all included.

The architecture is straightforward:

Claude Code  -->  Bifrost (localhost:8080)  -->  Any LLM Provider
Enter fullscreen mode Exit fullscreen mode

Claude Code uses an environment variable called ANTHROPIC_BASE_URL to know where to send API requests. Normally it points to https://api.anthropic.com. You point it at Bifrost instead. Bifrost accepts requests in Anthropic's Messages API format, translates them to whichever provider you've configured, and translates the response back. Claude Code never knows the difference.

No code changes. No patching. One environment variable.


What We'll Cover

  • Setting up and configuring Bifrost with multiple LLM providers
  • Integrating Claude Code with the gateway
  • Running Claude Code with any model
  • Configuring routing rules, fallbacks, and budgets
  • Integrating MCP tools
  • Using built-in observability and monitoring

Part 1: Setting Up Bifrost

Step 1: Install Bifrost

Create a project folder, open it in your editor, and run:

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
Enter fullscreen mode Exit fullscreen mode

The -app-dir flag tells Bifrost where to store all its data. Bifrost will start listening on port 8080.

If you prefer Docker:

docker pull maximhq/bifrost
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

The -v flag mounts a volume so your configuration persists across container restarts.

Step 2: Create Your Config File

Inside your ./my-bifrost-data folder, create a config.json file. This defines which providers Bifrost can route to, enables request logging, and sets up database persistence:

{
  "$schema": "https://www.getbifrost.ai/schema",
  "client": {
    "enable_logging": true,
    "disable_content_logging": false,
    "drop_excess_requests": false,
    "initial_pool_size": 300,
    "allow_direct_keys": false
  },
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-primary",
          "value": "env.OPENAI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "anthropic": {
      "keys": [
        {
          "name": "anthropic-primary",
          "value": "env.ANTHROPIC_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "gemini": {
      "keys": [
        {
          "name": "gemini-primary",
          "value": "env.GEMINI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    }
  },
  "config_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./config.db"
    }
  },
  "logs_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./logs.db"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The "value": "env.OPENAI_API_KEY" syntax tells Bifrost to read actual keys from environment variables rather than storing them in the file. Your secrets stay out of version control.

Step 3: Set Your API Keys

export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export GEMINI_API_KEY="your-gemini-api-key"
Enter fullscreen mode Exit fullscreen mode

Step 4: Start the Gateway

Stop any previously running Bifrost instance, then start it again with the app directory flag:

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:8080 in your browser. You'll see the Bifrost dashboard where all configuration and monitoring lives.


Part 2: Connecting Claude Code to Bifrost

Step 1: Install Claude Code

npm install -g @anthropic-ai/claude-code
Enter fullscreen mode Exit fullscreen mode

Step 2: Point Claude Code at Bifrost

Set these two environment variables in the same terminal session where you'll run Claude Code:

export ANTHROPIC_BASE_URL="http://localhost:8080/anthropic"
export ANTHROPIC_API_KEY="dummy-key"
Enter fullscreen mode Exit fullscreen mode

The dummy-key part is a bit counterintuitive at first. Claude Code requires this variable to be set before it will run, but Bifrost handles actual authentication to providers using the keys you configured earlier. You can put any non-empty string here.

Step 3: Run Claude Code with Any Model

Start Claude Code and specify whichever model you want to use:

claude --model openai/gpt-4o
Enter fullscreen mode Exit fullscreen mode

To route to other providers, use the provider prefix pattern:

openai/gpt-4o
openai/gpt-4o-mini
gemini/gemini-2.5-pro
groq/llama-3.1-70b-versatile
mistral/mistral-large-latest
anthropic/claude-sonnet-4-20250514
ollama/llama3
Enter fullscreen mode Exit fullscreen mode

Run a quick sanity check by asking something simple like "Hello there" to confirm requests are flowing through correctly.


Part 3: Routing Rules, Fallbacks, and Budgets

Once Claude Code is connected, you can start using Bifrost's routing features to get more control over how requests are handled.

Weighted Routing Across Providers

Virtual Keys in Bifrost let you define routing logic that applies automatically. Navigate to Governance > Virtual Keys, create a key, and configure your routing weights:

{
  "name": "dev-routing",
  "budget": {
    "max_budget": 100,
    "budget_duration": "monthly"
  },
  "providers": [
    { "provider": "openai", "model": "gpt-4o", "weight": 0.7 },
    { "provider": "anthropic", "model": "claude-sonnet-4-20250514", "weight": 0.3 }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This routes 70% of requests to GPT-4o and 30% to Claude Sonnet, with a hard monthly cap of $100. Once the budget is exhausted, Bifrost stops routing automatically. For teams, this replaces a lot of manual cost monitoring.

Automatic Fallbacks

When a provider goes down or you hit a rate limit, Bifrost works down a fallback list until a request succeeds:

{
  "model": "openai/gpt-4o",
  "fallbacks": [
    { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
    { "provider": "gemini", "model": "gemini-2.5-pro" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Your coding session continues without any manual intervention when a provider has issues.


Part 4: MCP Tool Integration

If you're using Model Context Protocol servers for filesystem access, web search, database queries, or custom integrations, Bifrost supports those too. Configure them once in Bifrost, and they become available to any model routing through it.

Step 1: Add MCP Configuration to Bifrost

Update your config.json to include MCP server definitions. Here's an example with filesystem access:

{
  "$schema": "https://www.getbifrost.ai/schema",
  "client": {
    "enable_logging": true,
    "disable_content_logging": true,
    "drop_excess_requests": false,
    "initial_pool_size": 300,
    "allow_direct_keys": false
  },

  "mcp": {
    "client_configs": [
      {
        "name": "filesystem",
        "connection_type": "stdio",
        "stdio_config": {
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
        },
        "tools_to_execute": ["*"],
        "tools_to_auto_execute": [
          "read_file",
          "list_directory",
          "create_file",
          "delete_file"
        ]
      }
    ],
    "tool_manager_config": {
      "max_agent_depth": 10,
      "tool_execution_timeout": 300000000000,
      "code_mode_binding_level": "server"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart Bifrost and navigate to the MCP catalog page in the web UI to confirm the filesystem server shows as connected.

Step 2: Add Bifrost as an MCP Server in Claude Code

claude mcp add --transport http bifrost http://localhost:8080/mcp
Enter fullscreen mode Exit fullscreen mode

Step 3: Verify with a Real Task

Restart Claude Code and try a task that exercises the MCP tools. For example:

Create a simple calculator program in Python.

It should support addition, subtraction, multiplication, and division.
The user should input two numbers and an operation, and the program should print the result.
Enter fullscreen mode Exit fullscreen mode

Then follow up with:

Analyze this repository and create a README.md explaining how the project works.
Include the project architecture and instructions for running it locally.
Enter fullscreen mode Exit fullscreen mode

If the MCP integration is working, Claude Code will read your files, create new ones, and interact with your filesystem through Bifrost's tool injection.


Part 5: Observability and Monitoring

This is the part that surprised me most when I first set it up.

Every request that passes through Bifrost is logged with full detail: the input prompt, the response, which model handled it, latency, and cost. The web interface at http://localhost:8080/logs provides:

  • Real-time streaming of requests and responses
  • Token usage tracking per request
  • Latency measurements
  • Filtering by provider, model, or conversation content
  • Full request and response inspection

For individual developers, it's useful for understanding your actual usage patterns. For teams, it becomes a proper audit trail. You can see which models are being used most, where the expensive requests are coming from, and whether your routing rules are actually behaving as expected.

Bifrost also exposes Prometheus metrics for teams that want to integrate this data into existing monitoring pipelines.


Is This Worth Setting Up?

If you're a solo developer who uses Claude Code occasionally and doesn't have any compliance or cost concerns, the default setup is probably fine.

But if any of the following are true, a gateway is worth the time:

  • You want to test how different models perform on your specific workload
  • You're managing API costs across a team
  • Your organization has requirements around data routing or infrastructure control
  • You want actual visibility into your AI usage rather than end-of-month billing surprises
  • You use MCP tools and want them available across multiple model providers without reconfiguring each time

Bifrost being open source and self-hosted means your prompts and responses stay on your own infrastructure. For teams working on proprietary codebases, that's a meaningful difference from routing everything directly to a third-party API.


Get started:

Top comments (0)