Pranay Batta

Posted on Apr 16

Best AI Gateway to Route Codex CLI to Any Model

#ai #programming #devops #productivity

Codex CLI is OpenAI's terminal-based coding agent that runs entirely in your shell. It reads your codebase, proposes changes, runs commands, and writes code. Solid tool. One problem: it only talks to OpenAI by default.

I wanted to route Codex CLI through an AI gateway so I could use Claude Sonnet, Gemini 2.5 Pro, Mistral, and others without switching tools. I tested a few options. Bifrost worked best. Open-source, written in Go, 11 microsecond overhead. Here is exactly how I set it up and what I found.

Why Route Codex CLI Through an AI Gateway

Codex CLI sends requests to OpenAI's API. That is fine until you need something else. Maybe Claude Sonnet handles your refactoring tasks better. Maybe Gemini's context window fits your monorepo. Maybe you want automatic failover when OpenAI rate limits you mid-session.

An AI gateway sits between Codex CLI and your providers. It translates requests, routes traffic, and handles failures. You configure it once and Codex CLI does not know the difference.

Without a gateway, your options are:

Stick with OpenAI only (no routing, no failover, no cost tracking)
Manually swap API keys and base URLs every time you want a different model

Neither scales.

Setting Up Bifrost for Codex CLI

Bifrost exposes an OpenAI-compatible endpoint. Codex CLI connects to it like it would connect to OpenAI directly. Full Codex CLI integration docs here.

Install Bifrost

npx -y @maximhq/bifrost

That starts the gateway locally. The setup guide has the full walkthrough.

The OAuth Gotcha

This one tripped me up. Codex CLI always prefers OAuth authentication over custom API keys. If you have previously logged in with OpenAI, Codex CLI will ignore your custom OPENAI_API_KEY entirely.

Fix: Run /logout inside Codex CLI before configuring Bifrost. Without this step, your gateway config will be silently bypassed.

Configure Codex CLI to Use Bifrost

Set your environment variables:

export OPENAI_API_KEY=bifrost_virtual_key
export OPENAI_BASE_URL=http://localhost:8080/openai/v1

Or add it to your codex.toml:

[auth]
api_key = "bifrost_virtual_key"

[network]
openai_base_url = "http://localhost:8080/openai/v1"

The OPENAI_API_KEY here is a Bifrost virtual key. Your actual provider keys live in the Bifrost config.

Done. Every Codex CLI request now flows through Bifrost.

Routing Codex CLI to Any Model

This is the core use case. Configure multiple providers in Bifrost, and route Codex CLI traffic however you want. Bifrost uses the provider/model-name format for cross-provider routing:

accounts:
  - id: "codex-dev"
    providers:
      - id: "claude-primary"
        type: "anthropic"
        api_key: "${ANTHROPIC_API_KEY}"
        model: "anthropic/claude-sonnet-4-5-20250929"
        weight: 60
      - id: "gemini-secondary"
        type: "gemini"
        api_key: "${GEMINI_API_KEY}"
        model: "gemini/gemini-2-5-pro"
        weight: 25
      - id: "openai-fallback"
        type: "openai"
        api_key: "${OPENAI_API_KEY}"
        model: "gpt-4o"
        weight: 15

60% of requests go to Claude Sonnet. 25% to Gemini. 15% to GPT-4o. Weights auto-normalise, so use any numbers.

I ran this for a week. Claude Sonnet handled tool-heavy refactoring better. Gemini was faster on large context reads. GPT-4o was solid as a fallback. The routing docs cover all configuration options.

Other providers you can route to: Mistral, Groq, Cerebras, Cohere, Perplexity. All via the same provider/model-name format.

Can You Use Codex CLI with Non-OpenAI Models?

Yes. That is exactly what this setup does. Bifrost translates the OpenAI-format requests from Codex CLI into whatever format each provider expects. Codex CLI thinks it is talking to OpenAI. Bifrost handles the rest.

Critical requirement: non-OpenAI models must support tool use. Codex CLI relies on function calling for file operations, terminal commands, and code editing. If a model does not support tools, it will break on anything beyond simple chat.

Automatic Failover

Provider outages are inevitable. Bifrost sorts providers by weight and retries on failure. If Claude goes down, Gemini picks up. Gemini fails, falls back to OpenAI. Your Codex CLI session never interrupts.

The failover docs explain the retry logic in detail.

Comparison: AI Gateway Options for Codex CLI

Feature	Bifrost	LiteLLM	Direct API
Language	Go	Python	N/A
Routing overhead	11 microseconds	~8 milliseconds	0
Weighted routing	Yes	Yes	No
Automatic failover	Yes	Yes	No
Budget controls	4-tier hierarchy	Basic	No
Semantic caching	Yes	No	No
Self-hosted	Yes	Yes	N/A
Codex CLI compatible	Yes	Yes	Default

LiteLLM works as a proxy for Codex CLI, but the Python runtime adds measurable latency. When every Codex CLI request goes through the gateway, those milliseconds compound. For a tool sitting in the critical path of your coding workflow, overhead matters.

How to Route Codex CLI Through an AI Gateway?

Three steps:

Start Bifrost (npx -y @maximhq/bifrost)
Run /logout in Codex CLI to clear OAuth
Set OPENAI_API_KEY and OPENAI_BASE_URL to point at Bifrost

That is it. Configure your providers in the Bifrost config, and Codex CLI routes to any model you specify.

Budget and Observability

Once all Codex CLI traffic flows through Bifrost, you get cost controls and logging for free. The four-tier budget hierarchy lets you cap spend at the virtual key, team, or provider level.

budgets:
  - level: "virtual_key"
    id: "codex-cli-dev"
    limit: 150
    period: "monthly"

The observability layer logs every request: latency, tokens, cost, which provider handled it. When you are routing across three providers, this data tells you exactly where your money goes and which model performs best for your tasks.

Semantic caching also helps. Repeated or similar queries hit the cache instead of the provider. Cuts both cost and latency for common operations.

Honest Trade-offs

OAuth quirk is easy to miss. If you skip the /logout step, Codex CLI silently ignores your gateway config. There is no error. It just routes to OpenAI directly. I lost an hour to this before checking the docs.

Tool use is non-negotiable. Not every model supports function calling well enough for Codex CLI. Stick to models with solid tool use: Claude Sonnet, GPT-4o, Gemini 2.5 Pro. Smaller or older models may fail on file operations.

Self-hosted only. You run and maintain the gateway. No managed cloud version for the open-source release. The governance layer helps with access control, but ops is on you.

Extra hop. One more process in the chain. The 11 microsecond overhead is negligible, but it is still something to keep running.

Quick Start

# 1. Start Bifrost
npx -y @maximhq/bifrost

# 2. Logout from OpenAI OAuth in Codex CLI
# Inside Codex CLI, run: /logout

# 3. Point Codex CLI at Bifrost
export OPENAI_API_KEY=bifrost_virtual_key
export OPENAI_BASE_URL=http://localhost:8080/openai/v1

# 4. Use Codex CLI normally - it routes through Bifrost

GitHub | Docs | Website

If you are using Codex CLI for real work, routing through an AI gateway gives you model flexibility, failover, and cost visibility that you cannot get from a single provider. I benchmarked the performance and the overhead is genuinely negligible.

Open an issue on the repo if you run into anything.

DEV Community