TL;DR: Gemini CLI works with Google's models out of the box. But if you want to route requests through multiple providers, add failover, or track costs, you can point Gemini CLI at Bifrost. One config change. Every model available through a single endpoint.
The Problem with Single-Provider CLI Tools
Gemini CLI connects to Google's Generative AI API. That is fine if you only use Gemini models. But most production setups involve multiple providers. OpenAI for some tasks. Anthropic for others. Maybe a local Ollama instance for development.
Switching between CLIs and API keys for each provider gets old fast.
I tested Bifrost, an open-source LLM gateway written in Go, as a unified routing layer for Gemini CLI. The setup took about 5 minutes.
How Bifrost Works with Gemini CLI
Bifrost exposes a fully Google GenAI compatible endpoint:
http://localhost:8080/genai/v1beta/models/{model}/generateContent
This means Gemini CLI can talk to Bifrost without any code changes. Just point the base URL to your Bifrost instance.
Bifrost then routes the request to whatever provider and model you specify. OpenAI, Anthropic, Vertex AI, Bedrock, Groq, Ollama. All through the same endpoint.
Setup
Step 1: Install Bifrost
npx @anthropic-ai/bifrost@latest
Zero config. Starts on port 8080 by default.
Step 2: Configure Providers
Add your provider keys to the config:
{
"providers": {
"openai": {
"keys": [{"name": "openai-1", "value": "env.OPENAI_API_KEY", "models": ["gpt-4o", "gpt-4o-mini"], "weight": 1.0}]
},
"anthropic": {
"keys": [{"name": "anthropic-1", "value": "env.ANTHROPIC_API_KEY", "models": ["claude-sonnet-4-20250514"], "weight": 1.0}]
},
"gemini": {
"keys": [{"name": "gemini-1", "value": "env.GEMINI_API_KEY", "models": ["gemini-2.5-flash"], "weight": 1.0}]
}
}
}
Step 3: Point Gemini CLI to Bifrost
Set the base URL to your Bifrost instance:
export GEMINI_API_BASE_URL=http://localhost:8080
Now every request from Gemini CLI goes through Bifrost. You can target any provider using the provider-prefixed model format:
gemini/gemini-2.5-flash → Google Gemini
openai/gpt-4o → OpenAI
anthropic/claude-sonnet-4-20250514 → Anthropic
vertex/gemini-pro → Vertex AI
What You Get
Multi-Provider Routing
One CLI, every model. No more switching between tools or managing separate API keys per provider.
Automatic Failover
Set up fallback chains in your requests. If Gemini is rate-limited, the request goes to OpenAI. If OpenAI is down, it goes to Anthropic. Each fallback is a fresh request. All plugins still run.
Budget Controls
Bifrost has a four-tier budget hierarchy: Customer, Team, Virtual Key, and Provider Config. Set a monthly spending cap on your Virtual Key. When it is hit, the gateway stops routing to paid providers. Your local Ollama instance can serve as the fallback.
Cost Tracking
Every request is logged with token counts and cost calculations. The Model Catalog tracks pricing across all providers automatically.
Performance
I ran 1,000 requests through Bifrost targeting Gemini models. The gateway adds 11µs of overhead per request. At 5,000 RPS sustained throughput, the bottleneck is always the provider, never the gateway. That is 50x faster than Python-based alternatives like LiteLLM.
The Overhead Question
The concern with adding a proxy layer is always latency. In practice, LLM API calls take 500ms to 5 seconds depending on the model and prompt. An 11µs gateway overhead is invisible.
The semantic caching layer (currently Weaviate-backed) can actually reduce latency for repeated queries by serving cached responses instead of hitting the provider again.
When This Makes Sense
This setup is useful if you:
- Use Gemini CLI but also need OpenAI or Anthropic models
- Want failover so your workflow does not break during provider outages
- Need to track costs across providers in one place
- Want to set budget limits so you do not get surprise bills
If you only use Gemini models and do not care about failover or cost tracking, direct connection is fine. The gateway adds value when you are working across providers.
Top comments (0)