Pranay Batta

Posted on Apr 10

Best Claude Code Gateway for Multi-Model Routing

#ai #opensource #llm #programming

Claude Code is great until you need more than one model. You hit a rate limit on Anthropic, want Gemini for long context, or need GPT-4o for a specific task. The default setup gives you no way to route across providers.

I spent a week testing gateways that sit between Claude Code and LLM providers. The goal was simple: configure multiple models, set routing weights, get automatic failover, and keep Claude Code working normally.

Bifrost was the clear winner. Open-source, written in Go, 11 microsecond overhead per request. Here is how I set up multi-model routing and what I learned.

Why Multi-Model Routing Matters

Different models are good at different things. Claude Sonnet handles tool use well. GPT-4o is strong at certain code generation tasks. Gemini 2.5 Pro handles massive context windows. Using one model for everything means you are leaving performance on the table.

Multi-model routing lets you:

Split traffic across providers by weight
Fail over automatically when a provider goes down
Pin specific models for specific tasks
Control costs by routing cheaper models for simpler operations

The problem: Claude Code talks to api.anthropic.com by default. No native multi-model support. You need a gateway.

The Setup: Bifrost as a Claude Code Gateway

Bifrost exposes an Anthropic-compatible endpoint. Claude Code does not know a gateway exists. It sends standard requests, and Bifrost translates and routes them to whatever provider you configure.

Full Claude Code integration docs here.

Install and Connect

npx -y @maximhq/bifrost

That starts the gateway locally. Setup guide has the details.

Point Claude Code at Bifrost:

export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
export ANTHROPIC_API_KEY=your-bifrost-virtual-key

The ANTHROPIC_API_KEY here is a Bifrost virtual key, not your actual Anthropic key. Provider keys live in the Bifrost config. This is a drop-in replacement for the Anthropic API.

Done. Every Claude Code request now flows through Bifrost.

Weighted Routing Configuration

This is the core of multi-model routing. You assign weights to providers, and Bifrost distributes traffic accordingly. Weights auto-normalize to sum 1.0, so you can use any numbers.

Here is a config that splits traffic between GPT-4o and Claude Sonnet:

accounts:
  - id: "dev-team"
    providers:
      - id: "openai-primary"
        type: "openai"
        api_key: "${OPENAI_API_KEY}"
        model: "gpt-4o"
        weight: 70
      - id: "anthropic-secondary"
        type: "anthropic"
        api_key: "${ANTHROPIC_API_KEY}"
        model: "claude-sonnet-4-20250514"
        weight: 30

70% of requests go to GPT-4o. 30% to Claude Sonnet. I used this to compare output quality across providers in real coding sessions without manually switching anything.

The routing docs cover all the configuration options.

Important detail: cross-provider routing does not happen automatically. You must explicitly configure each provider in your config. Bifrost does not guess or infer routing rules.

Automatic Failover

Weighted routing is useful. Automatic failover is essential. Providers go down. Rate limits hit. You do not want your Claude Code session to break mid-task.

Bifrost sorts providers by weight and retries on failure. If the primary provider fails, the next one picks up the request.

accounts:
  - id: "dev-team"
    providers:
      - id: "openai-primary"
        type: "openai"
        api_key: "${OPENAI_API_KEY}"
        model: "gpt-4o"
        weight: 80
      - id: "gemini-fallback"
        type: "gemini"
        api_key: "${GEMINI_API_KEY}"
        model: "gemini-2.5-pro"
        weight: 15
      - id: "anthropic-fallback"
        type: "anthropic"
        api_key: "${ANTHROPIC_API_KEY}"
        model: "claude-sonnet-4-20250514"
        weight: 5

OpenAI goes down, Bifrost retries with Gemini. Gemini fails, falls back to Anthropic. My coding session never interrupts.

Model Pinning for Bedrock and Vertex AI

If your team uses AWS Bedrock or Google Vertex AI, you can pin specific models directly:

# Bedrock
export ANTHROPIC_DEFAULT_SONNET_MODEL="bedrock/global.anthropic.claude-sonnet-4-6"

# Vertex AI
export ANTHROPIC_DEFAULT_SONNET_MODEL="vertex/claude-sonnet-4-6"

You can also override the model mid-session using the --model flag or the /model command inside Claude Code. Useful when you want to switch between models for different parts of a task. Start with Sonnet for scaffolding, switch to GPT-4o for a tricky implementation, then back again. The gateway handles the translation layer for each provider.

This is one area where the Anthropic SDK compatibility matters. Bifrost maintains full compatibility with the Anthropic message format, so model pinning and switching work without any client-side changes.

Provider configuration docs list all supported providers and model formats.

Budget Controls Across Providers

Once all traffic flows through one gateway, cost management becomes straightforward. Bifrost has a four-tier budget hierarchy: Customer, Team, Virtual Key, Provider Config.

budgets:
  - level: "virtual_key"
    id: "claude-code-dev"
    limit: 200
    period: "monthly"

Set a limit. When it is reached, requests get blocked. No surprise bills from a runaway Claude Code session.

The full governance layer handles rate limiting, access control, and spend management across all configured providers.

Observability Across All Providers

Every request through Bifrost gets logged: latency, token count, cost, provider used, response status. The observability layer gives you a single view across all providers.

This is particularly useful with multi-model routing. You can see exactly which provider handled each request, compare response times across models, and track per-provider costs. When I was running 70/30 weighted routing between GPT-4o and Claude Sonnet, the observability data showed me exactly how each model performed on real coding tasks. Response times, token consumption, and cost per request, all in one place.

Without centralized logging, you are checking multiple provider dashboards and guessing which model handled what. That is not sustainable when you are running multiple providers through Claude Code daily.

Honest Trade-offs

No tool is perfect. Here is what I found:

OpenRouter streaming limitation. OpenRouter does not stream function call arguments properly. This causes file operation failures in Claude Code. If you use OpenRouter as a provider, expect issues with tool use.

Non-Anthropic model requirements. Any non-Anthropic model you route through must support tool use. Claude Code relies heavily on function calling. Models without proper tool support will fail on file operations, search, and other agent tasks.

Self-hosted only. The open-source version requires you to run and maintain the gateway. There is no managed cloud offering. That means monitoring, updating, and debugging are on you.

Newer project. Bifrost's community is growing but still smaller than older alternatives. Documentation is solid, but edge cases may require digging through issues on GitHub.

Extra hop. You are adding a process between Claude Code and your provider. The 11 microsecond overhead is negligible, but it is one more thing in the chain to keep running.

Performance

I ran benchmarks matching the benchmarking guide. The numbers held up: 11 microseconds of routing overhead, 5,000 requests per second on a single instance. The Go implementation makes a real difference. Python-based gateways I tested added significantly more latency.

For a gateway that sits in the critical path of every LLM call, low overhead matters.

Quick Start Summary

# 1. Start Bifrost
npx -y @maximhq/bifrost

# 2. Configure providers in bifrost.yaml (weighted routing + failover)

# 3. Point Claude Code at Bifrost
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
export ANTHROPIC_API_KEY=your-bifrost-virtual-key

# 4. Use Claude Code normally

That is it. Your Claude Code session now routes across multiple models with automatic failover and budget controls.

GitHub | Docs | Website

If you are running Claude Code for real work, multi-model routing is not optional. Single-provider setups break at the worst times. A gateway that handles routing, failover, and cost controls in one place saves hours of debugging and thousands in unexpected spend.

Open an issue on the repo if you run into anything.

DEV Community