Google’s Gemini CLI has quickly become a popular terminal-based coding agent thanks to its strong reasoning performance and deep integration with the Google ecosystem. However, real-world engineering workflows rarely stay inside a single provider environment. Different development tasks benefit from different models: high‑reasoning models for architecture, low‑latency models for rapid edit loops, and low‑cost models for repetitive generation. By default, Gemini CLI only communicates with Google’s own API, which limits flexibility for teams working across providers.
Bifrost removes this limitation by acting as an open‑source AI gateway that sits between Gemini CLI and downstream model providers. Instead of being locked to Google, requests sent by Gemini CLI can be translated into the native format required by OpenAI, Anthropic, Groq, Mistral, Ollama, and many others. Setup is handled through the interactive Bifrost CLI, which eliminates manual environment configuration and lets developers switch models without rewriting scripts.
Why Multi‑Provider Support Matters for Terminal Coding Agents
Modern engineering teams rarely rely on a single LLM vendor. A typical workflow might use Anthropic for review, OpenAI for documentation, and an open‑source model on Groq for fast iteration during debugging. Each provider exposes different authentication methods, endpoints, and SDK conventions.
Terminal agents such as Gemini CLI make this fragmentation more visible because they are designed around one provider’s API schema. Changing providers often requires editing environment variables, updating keys, and verifying tool‑calling compatibility. While manageable for a single developer, this becomes difficult to maintain for teams that need consistent configuration, usage limits, and centralized visibility.
Bifrost solves this by acting as a translation and routing layer. Requests generated in the Google GenAI format are accepted by the gateway and forwarded to the configured provider, while handling authentication, automatic failover, and load balancing internally. The gateway adds only about 11 microseconds of overhead even at high throughput, so interactive coding sessions remain responsive.
Step‑by‑Step: Connecting Gemini CLI to Bifrost
The full setup takes roughly a minute and requires two terminal windows.
Start the gateway:
npx -y @maximhq/bifrost
This launches the gateway at http://localhost:8080 and enables the
built‑in web UI where providers, keys, and traffic can be monitored.
Open a second terminal and run:
npx -y @maximhq/bifrost-cli
The CLI launches an interactive wizard with four steps:
-
Gateway URL — confirm the endpoint (default
http://localhost:8080) - Virtual key — optional virtual key for authentication and governance
- Agent selection — choose Gemini CLI (auto‑install available if missing)
- Model selection — pick any model from the list of configured providers
After confirmation, Gemini CLI starts with the correct base URL, API key, and model settings already applied. No shell exports or config edits are required.
Using Gemini CLI with Non‑Google Models
Because Bifrost converts request formats, any model configured in the gateway can be used from Gemini CLI via the provider/model syntax.
Examples:
- Anthropic →
gemini -m anthropic/claude-sonnet-4-5-20250929 - OpenAI →
gemini -m openai/gpt-5 - Groq →
gemini -m groq/llama-3.3-70b-versatile - Mistral →
gemini -m mistral/mistral-large-latest - xAI →
gemini -m xai/grok-3 - Ollama →
gemini -m ollama/llama3
The list of
supported providers
includes more than twenty options such as Azure, AWS Bedrock, Vertex AI, Cerebras, Cohere, Replicate, vLLM, and SGL.
For agentic workflows, the selected model must support
function calling, since Gemini CLI relies on tool use for file edits, shell commands, and code operations.
Routing Vertex AI Traffic Through Bifrost
Teams using Gemini through Google Cloud can still place Bifrost in front of Vertex AI to gain governance and observability without changing providers.
The
Vertex AI configuration
requires setting GOOGLE_GENAI_USE_VERTEXAI=true while pointing the base URL to Bifrost. The gateway manages authentication, routing, and project configuration automatically.
For enterprise environments, the
in‑VPC deployment
option keeps all traffic inside the private network. Credentials can be stored using
vault integrations
with HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault, while
audit logs
provide compliance tracking for SOC 2, HIPAA, and GDPR.
Running Multiple Agent Sessions in One Terminal
The
tabbed session UI
in Bifrost CLI allows several agent sessions to run simultaneously.
Each tab shows a status indicator and can run a different model or even a different agent.
Common shortcuts:
-
n→ open new session -
h / l→ switch tabs -
1–9→ jump to tab -
x→ close tab
This makes it possible to run a heavy reasoning model in one tab and a fast open‑source model in another, with both routed through the gateway independently.
Budget Controls, Rate Limits, and Observability
When multiple developers use Gemini CLI, cost and usage visibility become critical. Bifrost adds a governance layer on top of the API.
- Per‑user limits using virtual keys
- Hierarchical limits via budget and rate controls
- Metrics and traces through Prometheus and OpenTelemetry
- Lower cost responses using semantic caching
These features allow teams to monitor token usage, latency, and provider health across all active sessions.
Automatic Failover Between Providers
If a provider becomes unavailable, Bifrost can redirect traffic using its
fallback system.
For example, a primary model can be configured with a backup provider so that requests continue without interruption. The enterprise edition adds
adaptive load balancing, which distributes traffic based on real‑time health signals.
Getting Started
Bifrost is
open source on GitHub
and can connect Gemini CLI to any provider using only two commands. Teams that need governance, failover, SSO integration, and private deployments can
book a Bifrost demo
to evaluate the gateway in production workflows.
Top comments (0)