Pranay Batta

Posted on Apr 6

How to Connect Any Model with Gemini CLI Using Bifrost AI Gateway

#webdev #programming #ai #gemini

TL;DR: Gemini CLI works with Google's models out of the box. But if you want to route requests through multiple providers, add failover, or track costs, you can point Gemini CLI at Bifrost. One config change. Every model available through a single endpoint.

The Problem with Single-Provider CLI Tools

Gemini CLI connects to Google's Generative AI API. That is fine if you only use Gemini models. But most production setups involve multiple providers. OpenAI for some tasks. Anthropic for others. Maybe a local Ollama instance for development.

Switching between CLIs and API keys for each provider gets old fast.

I tested Bifrost, an open-source LLM gateway written in Go, as a unified routing layer for Gemini CLI. The setup took about 5 minutes.

How Bifrost Works with Gemini CLI

Bifrost exposes a fully Google GenAI compatible endpoint:

http://localhost:8080/genai/v1beta/models/{model}/generateContent

This means Gemini CLI can talk to Bifrost without any code changes. Just point the base URL to your Bifrost instance.

Bifrost then routes the request to whatever provider and model you specify. OpenAI, Anthropic, Vertex AI, Bedrock, Groq, Ollama. All through the same endpoint.

Setup

Step 1: Install Bifrost

npx @anthropic-ai/bifrost@latest

Zero config. Starts on port 8080 by default.

Step 2: Configure Providers

Add your provider keys to the config:

{
  "providers": {
    "openai": {
      "keys": [{"name": "openai-1", "value": "env.OPENAI_API_KEY", "models": ["gpt-4o", "gpt-4o-mini"], "weight": 1.0}]
    },
    "anthropic": {
      "keys": [{"name": "anthropic-1", "value": "env.ANTHROPIC_API_KEY", "models": ["claude-sonnet-4-20250514"], "weight": 1.0}]
    },
    "gemini": {
      "keys": [{"name": "gemini-1", "value": "env.GEMINI_API_KEY", "models": ["gemini-2.5-flash"], "weight": 1.0}]
    }
  }
}

Step 3: Point Gemini CLI to Bifrost

Set the base URL to your Bifrost instance:

export GEMINI_API_BASE_URL=http://localhost:8080

Now every request from Gemini CLI goes through Bifrost. You can target any provider using the provider-prefixed model format:

gemini/gemini-2.5-flash      → Google Gemini
openai/gpt-4o                → OpenAI
anthropic/claude-sonnet-4-20250514  → Anthropic
vertex/gemini-pro             → Vertex AI

What You Get

Multi-Provider Routing

One CLI, every model. No more switching between tools or managing separate API keys per provider.

Automatic Failover

Set up fallback chains in your requests. If Gemini is rate-limited, the request goes to OpenAI. If OpenAI is down, it goes to Anthropic. Each fallback is a fresh request. All plugins still run.

Budget Controls

Bifrost has a four-tier budget hierarchy: Customer, Team, Virtual Key, and Provider Config. Set a monthly spending cap on your Virtual Key. When it is hit, the gateway stops routing to paid providers. Your local Ollama instance can serve as the fallback.

Cost Tracking

Every request is logged with token counts and cost calculations. The Model Catalog tracks pricing across all providers automatically.

Performance

I ran 1,000 requests through Bifrost targeting Gemini models. The gateway adds 11µs of overhead per request. At 5,000 RPS sustained throughput, the bottleneck is always the provider, never the gateway. That is 50x faster than Python-based alternatives like LiteLLM.

The Overhead Question

The concern with adding a proxy layer is always latency. In practice, LLM API calls take 500ms to 5 seconds depending on the model and prompt. An 11µs gateway overhead is invisible.

The semantic caching layer (currently Weaviate-backed) can actually reduce latency for repeated queries by serving cached responses instead of hitting the provider again.

When This Makes Sense

This setup is useful if you:

Use Gemini CLI but also need OpenAI or Anthropic models
Want failover so your workflow does not break during provider outages
Need to track costs across providers in one place
Want to set budget limits so you do not get surprise bills

If you only use Gemini models and do not care about failover or cost tracking, direct connection is fine. The gateway adds value when you are working across providers.

DEV Community