DEV Community

Cover image for Access GPT, Gemini, Claude, Mistral etc. through 1 AI Gateway: Configure Providers in Bifrost
Debby McKinney
Debby McKinney

Posted on

Access GPT, Gemini, Claude, Mistral etc. through 1 AI Gateway: Configure Providers in Bifrost

When building AI-powered applications, chances are you don’t want to rely on a single provider:

  • Claude for reasoning-heavy tasks
  • GPT-4o for multimodal inputs
  • Gemini for Google ecosystem integrations
  • Mistral for fast, cost-effective completions

Each provider has different SDKs, auth methods, rate limits, and response formats. Maintaining them quickly becomes messy.

Messy

Bifrost solves this by acting as a unified AI gateway: one API surface, multiple providers behind it. In this post, we’ll walk through configuring providers in Bifrost so you can switch (or mix) GPT, Claude, Gemini, and Mistral with almost no extra code.

GitHub logo maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

Go Report Card Discord badge Known Vulnerabilities codecov Docker Pulls Run In Postman Artifact Hub License

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Get started

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…


Why Use a Gateway?

Imagine you’re building an AI support assistant:

  • For routine queries, you want Mistral (cheap + fast).
  • For escalations, you switch to Claude Sonnet (better reasoning).
  • For multimodal inputs, you need GPT-4o.
  • And if you’re on GCP, Gemini integrates best.

Instead of coding against four different SDKs, Bifrost gives you a single /v1/chat/completions API that works across all of them.


Run Bifrost

Install and run Bifrost using Docker:

# Pull and run Bifrost HTTP API
docker pull maximhq/bifrost
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

By default, the dashboard runs at:

👉 http://localhost:8080

Configure Providers

You can add providers via the Web UI, API, or a config.json file. Below are API examples.

OpenAI (GPT)

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "openai",
  "keys": [
    {
      "value": "env.OPENAI_API_KEY",
      "models": ["gpt-4o", "gpt-4o-mini"],
      "weight": 1.0
    }
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Anthropic (Claude)

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "anthropic",
  "keys": [
    {
      "value": "env.ANTHROPIC_API_KEY",
      "models": ["claude-3-5-sonnet", "claude-3-opus"],
      "weight": 1.0
    }
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Google Vertex (Gemini)

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "vertex",
  "keys": [
    {
      "value": "env.VERTEX_API_KEY",
      "models": ["gemini-pro", "gemini-pro-vision"],
      "weight": 1.0,
      "vertex_key_config": {
        "project_id": "env.VERTEX_PROJECT_ID",
        "region": "us-central1",
        "auth_credentials": "env.VERTEX_CREDENTIALS"
      }
    }
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Mistral

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "mistral",
  "keys": [
    {
      "value": "env.MISTRAL_API_KEY",
      "models": ["mistral-tiny", "mistral-medium"],
      "weight": 1.0
    }
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Make a Request

Once configured, you can query any provider through the same endpoint:

curl --location 'http://localhost:8080/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "anthropic/claude-3-5-sonnet",
  "messages": [
    {"role": "user", "content": "Summarize this log file in 3 bullet points"}
  ]
}'
Enter fullscreen mode Exit fullscreen mode

Bifrost handles the provider-specific API calls and returns a normalized response.


Advanced Routing

Say you want to split load between two OpenAI keys (70/30):

{
  "providers": {
    "openai": {
      "keys": [
        {
          "value": "env.OPENAI_API_KEY_1",
          "weight": 0.7
        },
        {
          "value": "env.OPENAI_API_KEY_2",
          "weight": 0.3
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This is useful for rate limit management or cost control across accounts.


Managing Retries Gracefully

Retries are tricky: too aggressive and you waste tokens + cost, too light and users see errors. The below example sets up exponential backoff with up to 5 retries, starting with 1ms delay and capping at 10 seconds - ideal for handling transient network issues.

Example:

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "network_config": {
        "max_retries": 5,
        "retry_backoff_initial_ms": 1,
        "retry_backoff_max_ms": 10000
    }
}'
Enter fullscreen mode Exit fullscreen mode

Concurrency and Buffer Size

When you scale from dozens to thousands of requests, concurrency control saves you from provider bans.

This example gives OpenAI higher limits (100 workers, 500 queue) for high throughput, while Anthropic gets conservative limits to respect their rate limits.

# OpenAI with high throughput settings
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "concurrency_and_buffer_size": {
        "concurrency": 100,
        "buffer_size": 500
    }
}'
Enter fullscreen mode Exit fullscreen mode

# Anthropic with conservative settings
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "anthropic",
    "keys": [
        {
            "value": "env.ANTHROPIC_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "concurrency_and_buffer_size": {
        "concurrency": 25,
        "buffer_size": 100
    }
}'
Enter fullscreen mode Exit fullscreen mode

Think of it as a circuit breaker for LLM traffic.


Setting Up a Proxy

Route requests through proxies for compliance, security, or geographic requirements. This example shows both HTTP proxy for OpenAI and authenticated SOCKS5 proxy for Anthropic, useful for corporate environments or regional access.

# HTTP proxy for OpenAI
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "proxy_config": {
        "type": "http",
        "url": "http://localhost:8000"
    }
}'
Enter fullscreen mode Exit fullscreen mode

# SOCKS5 proxy with authentication for Anthropic
curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "anthropic",
    "keys": [
        {
            "value": "env.ANTHROPIC_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "proxy_config": {
        "type": "socks5",
        "url": "http://localhost:8000",
        "username": "user",
        "password": "password"
    }
}'
Enter fullscreen mode Exit fullscreen mode

Now all calls to LLMs will be routed through the proxy you’ve specified.


Returning Raw Responses

By default, Bifrost normalizes responses across providers into a common schema (/v1/chat/completions).

But sometimes you want the raw response (for logging, debugging, or preserving model-specific metadata).

You can request raw output like this:

curl --location 'http://localhost:8080/api/providers' \
--header 'Content-Type: application/json' \
--data '{
    "provider": "openai",
    "keys": [
        {
            "value": "env.OPENAI_API_KEY",
            "models": [],
            "weight": 1.0
        }
    ],
    "send_back_raw_response": true
}'
Enter fullscreen mode Exit fullscreen mode

When enabled, the raw provider response appears in extra_fields.raw_response:

{
    "choices": [...],
    "usage": {...},
    "extra_fields": {
        "provider": "openai",
        "raw_response": {
            // Original OpenAI response here
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Putting It Together: Multi-Model AI Support Assistant

With this setup, your support assistant can:

  • Use Mistral for 80% of queries
  • Escalate tricky ones to Claude Sonnet
  • Handle screenshots via GPT-4o
  • Run sensitive workloads on Gemini if hosted on GCP

All through one gateway - consistent API, retries, observability, and proxy support out of the box.

boom


Bifrost makes it possible to plug GPT, Claude, Gemini, and Mistral into your app in minutes, without juggling multiple SDKs.

Top comments (2)

Collapse
 
notadevbuthere profile image
Urvisha Maniar

Really useful breakdown! One of the biggest hurdles when working with multiple LLMs is switching contexts and handling different APIs. A unified config and provider layer like this feels like it could remove a lot of friction for real projects.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.