Debby McKinney

Posted on Dec 17, 2025

Access GPT, Gemini, Claude, Mistral etc. through 1 AI Gateway: Configure Providers in Bifrost

#mcp #llm #chatgpt #ai

When building AI-powered applications, chances are you don’t want to rely on a single provider:

Claude for reasoning-heavy tasks
GPT-4o for multimodal inputs
Gemini for Google ecosystem integrations
Mistral for fast, cost-effective completions

Each provider has different SDKs, auth methods, rate limits, and response formats. Maintaining them quickly becomes messy.

Bifrost solves this by acting as a unified AI gateway: one API surface, multiple providers behind it. In this post, we’ll walk through configuring providers in Bifrost so you can switch (or mix) GPT, Claude, Gemini, and Mistral with almost no extra code.

maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…

View on GitHub

Why Use a Gateway?

Imagine you’re building an AI support assistant:

For routine queries, you want Mistral (cheap + fast).
For escalations, you switch to Claude Sonnet (better reasoning).
For multimodal inputs, you need GPT-4o.
And if you’re on GCP, Gemini integrates best.

Instead of coding against four different SDKs, Bifrost gives you a single /v1/chat/completions API that works across all of them.

Run Bifrost

Install and run Bifrost using Docker:

# Pull and run Bifrost HTTP API
docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost

By default, the dashboard runs at:

👉 http://localhost:8080

Configure Providers

You can add providers via the Web UI, API, or a config.json file. Below are API examples.

OpenAI (GPT)

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "openai",
  "keys": [
    {
      "value": "env.OPENAI_API_KEY",
      "models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 1.0
    }
  ]
}'

Anthropic (Claude)

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "anthropic",
  "keys": [
    {
      "value": "env.ANTHROPIC_API_KEY",
      "models": ["claude-3-5-sonnet", "claude-3-opus"],
      "weight": 1.0
    }
  ]
}'

Google Vertex (Gemini)

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "vertex",
  "keys": [
    {
      "value": "env.VERTEX_API_KEY",
      "models": ["gemini-pro", "gemini-pro-vision"],
      "weight": 1.0,
      "vertex_key_config": {
        "project_id": "env.VERTEX_PROJECT_ID",
        "region": "us-central1",
        "auth_credentials": "env.VERTEX_CREDENTIALS"
      }
    }
  ]
}'

Mistral

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "mistral",
  "keys": [
    {
      "value": "env.MISTRAL_API_KEY",
      "models": ["mistral-tiny", "mistral-medium"],
      "weight": 1.0
    }
  ]
}'

Make a Request

Once configured, you can query any provider through the same endpoint:

curl --location 'http://localhost:8080/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "anthropic/claude-3-5-sonnet",
  "messages": [
    {"role": "user", "content": "Summarize this log file in 3 bullet points"}
  ]
}'

Bifrost handles the provider-specific API calls and returns a normalized response.

Advanced Routing

Say you want to split load between two OpenAI keys (70/30):

{
  "providers": {
    "openai": {
      "keys": [
        {
          "value": "env.OPENAI_API_KEY_1",
          "weight": 0.7
        },
        {
          "value": "env.OPENAI_API_KEY_2",
          "weight": 0.3
        }
      ]
    }
  }
}

This is useful for rate limit management or cost control across accounts.

Managing Retries Gracefully

Retries are tricky: too aggressive and you waste tokens + cost, too light and users see errors. The below example sets up exponential backoff with up to 5 retries, starting with 1ms delay and capping at 10 seconds - ideal for handling transient network issues.

Example:

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "network_config": {
        "max_retries": 5,
        "retry_backoff_initial_ms": 1,
        "retry_backoff_max_ms": 10000
    }
}'

Concurrency and Buffer Size

When you scale from dozens to thousands of requests, concurrency control saves you from provider bans.

This example gives OpenAI higher limits (100 workers, 500 queue) for high throughput, while Anthropic gets conservative limits to respect their rate limits.

# OpenAI with high throughput settings
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "concurrency_and_buffer_size": {
        "concurrency": 100,
        "buffer_size": 500
    }
}'


# Anthropic with conservative settings
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "anthropic",
    "keys": [
        {
            "value": "env.ANTHROPIC_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "concurrency_and_buffer_size": {
        "concurrency": 25,
        "buffer_size": 100
    }
}'

Think of it as a circuit breaker for LLM traffic.

Setting Up a Proxy

Route requests through proxies for compliance, security, or geographic requirements. This example shows both HTTP proxy for OpenAI and authenticated SOCKS5 proxy for Anthropic, useful for corporate environments or regional access.

# HTTP proxy for OpenAI
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "proxy_config": {
        "type": "http",
        "url": "http://localhost:8000"
    }
}'


# SOCKS5 proxy with authentication for Anthropic
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "anthropic",
    "keys": [
        {
            "value": "env.ANTHROPIC_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "proxy_config": {
        "type": "socks5",
        "url": "http://localhost:8000",
        "username": "user",
        "password": "password"
    }
}'

Now all calls to LLMs will be routed through the proxy you’ve specified.

Returning Raw Responses

By default, Bifrost normalizes responses across providers into a common schema (/v1/chat/completions).

But sometimes you want the raw response (for logging, debugging, or preserving model-specific metadata).

You can request raw output like this:

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "send_back_raw_response": true
}'

When enabled, the raw provider response appears in extra_fields.raw_response:

{
    "choices": [...],
    "usage": {...},
    "extra_fields": {
        "provider": "openai",
        "raw_response": {
            // Original OpenAI response here
        }
    }
}

Putting It Together: Multi-Model AI Support Assistant

With this setup, your support assistant can:

Use Mistral for 80% of queries
Escalate tricky ones to Claude Sonnet
Handle screenshots via GPT-4o
Run sensitive workloads on Gemini if hosted on GCP

All through one gateway - consistent API, retries, observability, and proxy support out of the box.

Bifrost makes it possible to plug GPT, Claude, Gemini, and Mistral into your app in minutes, without juggling multiple SDKs.

Top comments (2)

Urvisha Maniar • Dec 18 '25

Really useful breakdown! One of the biggest hurdles when working with multiple LLMs is switching contexts and handling different APIs. A unified config and provider layer like this feels like it could remove a lot of friction for real projects.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.