Supercharge OpenAI Codex CLI: How to Run Any LLM Provider with Codex CLI using Bifrost

#cli #llm #tooling #tutorial

OpenAI's Codex CLI has established itself as a formidable terminal-based coding agent, offering robust code generation and completion features. While it ships with native integration for GPT models and ChatGPT OAuth—making it a go-to for OpenAI ecosystem loyalists—modern AI development often requires the flexibility to benchmark and deploy models from various competing providers without switching tools.

A developer might need Claude's superior reasoning for a complex architectural decision, switch to a Groq-hosted Llama model for rapid unit test iterations, and finally employ Gemini 2.5 Pro for drafting documentation. By default, Codex CLI is tethered to OpenAI’s API. Achieving such a multi-model workflow traditionally involves juggling separate terminal agents or manually reconfiguring environment variables between sessions.

Bifrost completely removes this limitation. As an open-source AI gateway written in Go, Bifrost intercepts requests from Codex CLI and dynamically routes them to 20+ supported providers, handling API translation in real-time. When paired with Bifrost CLI, an interactive launcher that automates configuration, developers can benchmark, compare, and swap models inside Codex CLI without ever touching an environment variable.

Transforming Codex CLI into a Provider-Agnostic Agent

In a standard setup, Codex CLI directs all traffic to OpenAI servers via OPENAI_BASE_URL. This confines the user to GPT-family models, removes built-in failover options during API degradation, and lacks centralized mechanisms for tracking token spend or enforcing team-wide usage policies.

Routing traffic through Bifrost fundamentally changes this dynamic:

Seamless Model Switching: Utilize the -model flag or the /model command mid-session to transition between providers. Start a session with Claude using codex --model anthropic/claude-sonnet-4-5-20250929 and switch to Gemini mid-conversation via /model gemini/gemini-2.5-pro without restarting.
Transparent API Translation: Bifrost automatically converts Codex CLI’s native OpenAI-format requests into the target provider’s schema. The developer experience remains consistent whether the backend is GPT, Claude, Gemini, Mistral, or a self-hosted Llama instance.
Resilient Infrastructure: Bifrost's fallback system ensures that if OpenAI’s API encounters rate limits or downtime, traffic is rerouted to a backup provider automatically.
Unified Observability: All requests flow through a single gateway featuring built-in logging and monitoring accessible at http://localhost:8080/logs. Users can filter by provider or model to analyze performance.

Crucially, the gateway adds only 11 microseconds of overhead per request at 5,000 RPS, ensuring coding latency remains unaffected.

From Installation to First Prompt in 90 Seconds

Bifrost CLI removes the manual friction typically associated with multi-provider setups. The process requires just two terminal windows.

Terminal 1: Initialize the Gateway

npx -y @maximhq/bifrost

This command launches the Bifrost gateway at http://localhost:8080, complete with a web UI for provider management and live traffic visualization.

Terminal 2: Launch the Interactive CLI

npx -y @maximhq/bifrost-cli

The CLI guides the user through four simple steps:

Gateway URL: Confirm the address (default: http://localhost:8080).
Virtual Key: Optionally provide a Bifrost virtual key for governance. Keys are stored securely in the OS keyring, never in plaintext.
Agent Harness: Select Codex CLI. The tool will automatically install it via npm if missing.
Model Selection: Choose from a searchable list of models retrieved via the gateway’s /v1/models endpoint.

Upon completion, Codex CLI launches with OPENAI_BASE_URL and OPENAI_API_KEY preconfigured. Future runs remember these settings, allowing for immediate re-launch or adjustments via keyboard shortcuts.

Practical Multi-Model Workflows

The true advantage of routing Codex CLI through Bifrost lies in the ability to match specific tasks with the most suitable models.

Scenario 1: Benchmarking Code Generation
Developers can evaluate how different models handle the same problem. Generate an initial implementation with GPT-5, then switch mid-session to compare results:

codex --model openai/gpt-5
# Generate implementation
/model anthropic/claude-sonnet-4-5-20250929
# Refine or critique with a different model

The /model command facilitates an instant mid-session switch, carrying the conversation context forward so the new model can build upon previous outputs.

Scenario 2: Cost-Optimized Iteration Loops
For rapid debug-test-fix cycles where speed is paramount, route requests to a high-speed inference provider:

codex --model groq/llama-3.3-70b-versatile

Groq’s LPU-accelerated inference offers lower latency, ideal for tight loops. When deeper analysis is required, switching to a more powerful model takes just one command.

Scenario 3: Parallel Sessions
Bifrost CLI maintains a persistent tabbed terminal interface. Developers can run multiple tabs, each targeting a different model:

Tab 1: openai/gpt-5 for main development.
Tab 2: anthropic/claude-sonnet-4-5-20250929 for code review.
Tab 3: groq/llama-3.3-70b-versatile for utility scripts.

Navigate using Ctrl+B for tab mode, with visual status badges indicating active, idle, or alert states.

Authentication: OAuth, API Keys, and Virtual Keys

Bifrost supports the full range of Codex CLI authentication methods.

ChatGPT OAuth
Developers with ChatGPT subscriptions can route traffic through Bifrost by setting the base URL:

export OPENAI_BASE_URL=http://localhost:8080/openai
codex

Select "Sign in with ChatGPT" to authenticate via browser, with all traffic automatically proxied.

API Key Authentication
Both OpenAI console keys and Bifrost virtual keys are supported:

export OPENAI_API_KEY=your-api-key
export OPENAI_BASE_URL=http://localhost:8080/openai
codex

When using the Bifrost CLI launcher, these variables are set automatically.

Supported Providers and Tool-Use Requirements

Bifrost supports a wide array of providers via the provider/model-name format, including openai, azure, gemini, vertex, bedrock, mistral, groq, cerebras, cohere, perplexity, xai, ollama, openrouter, huggingface, nebius, parasail, replicate, vllm, and sgl.

Critical Constraint: Non-OpenAI models must support tool-use capabilities. Because Codex CLI relies on function calling for file and terminal operations, models lacking this feature will fail during agentic tasks.

Enterprise Governance and Observability

As Codex CLI usage scales across engineering teams, unmanaged access creates blind spots in spending and compliance. Bifrost addresses this at the gateway level.

Scoped Permissions: Issue virtual keys with specific model access rules and spend limits to control who can use which models.
Budget Controls: Implement hierarchical policies to prevent cost overruns.
Deep Observability: Ship Prometheus metrics and OpenTelemetry traces to dashboards like Grafana or Datadog for centralized visibility.
Semantic Caching: Semantic caching identifies duplicate queries to serve cached responses, reducing latency and costs.
Compliance: Bifrost Enterprise provides audit logs for SOC 2 and HIPAA, vault-based key management, and in-VPC deployments for strict data residency.

Getting Started

Bifrost is open source on GitHub and connects Codex CLI to any LLM provider in just two commands. For engineering teams requiring enterprise governance, adaptive failover, and private cloud deployments for their terminal workflows, book a Bifrost demo to see the platform in action.