Owen

Posted on May 19 • Originally published at ofox.ai

How to Use Any Model with Codex CLI: Custom OAI-Compatible Provider Setup

#ai #codexcli #openai #coding

TL;DR

Codex CLI includes one built-in endpoint for OpenAI. The [model_providers.<id>] block in ~/.codex/config.toml enables declaring additional providers with the correct wire_api, allowing seamless switching between GPT-5.3 Codex, Claude Sonnet 4.6, and DeepSeek V3.2 from one terminal without environment variable gymnastics. This guide covers the configuration-file approach with multiple providers, profile-based switching, and common implementation challenges.

Why the env-var trick stops working

The shell configuration shortcut combining OPENAI_API_KEY and OPENAI_BASE_URL functions adequately for single endpoints but fails when you need to:

Maintain both OpenAI direct and OpenAI-compatible gateway access simultaneously
Run different projects against distinct models without re-sourcing configuration
Supply non-standard authentication headers like X-Project-Id or rotating Bearer tokens
Configure request_max_retries individually per provider to prevent upstream failures from affecting defaults

These requirements demand a proper configuration file. Codex CLI reads ~/.codex/config.toml at each invocation, with [model_providers.<id>] tables designated for custom endpoints.

Anatomy of a model_providers block

The complete table supports approximately a dozen keys, though five prove essential for most implementations:

[model_providers.ofox]
name = "ofox.ai gateway"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "chat"
request_max_retries = 4

base_url — references the API root, concluding with /v1 for OpenAI-compatible gateways without trailing slashes. The endpoint Codex appends depends on wire_api selection.
env_key — identifies the environment variable containing the Bearer token at runtime. Never embed keys directly in TOML.
wire_api — "responses" directs Codex to POST at /responses (OpenAI's newer endpoint). "chat" sends requests to /chat/completions. Third-party OpenAI-compatible gateways standardly implement the latter, making it appropriate for ofox.ai, OpenRouter, DeepSeek direct, and similar services.
http_headers — merges static headers into all requests for organization scoping or regional routing.
env_http_headers — retrieves header values from environment variables at request execution. Use for tokens requiring rotation.

Two important considerations:

The identifiers openai, ollama, and lmstudio are reserved—custom providers require different names.
requires_openai_auth = false disables Codex's validation that key prefixes match sk-. Most gateways need this explicitly set.

A working ofox.ai setup (copy this)

Create or edit ~/.codex/config.toml:

model = "openai/gpt-5.3-codex"
model_provider = "ofox"

[model_providers.ofox]
name = "ofox.ai"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "chat"
requires_openai_auth = false

Export your key once:

export OFOX_API_KEY=<your-ofox-key>

Verify with a simple invocation:

codex "list every TODO in src/ and group them by file"

A successful model response indicates completion. A 404 Not Found error suggests incorrect wire_api configuration—either /responses targeted at a gateway serving only /chat/completions, or vice versa.

Swapping the model per command

The model key at the configuration top establishes the default. Override per invocation:

codex --model anthropic/claude-sonnet-4.6 "review this PR for race conditions"
codex --model deepseek/deepseek-v3.2 "translate this Bash script to Python"
codex --model openai/gpt-5.4-pro "design a Postgres schema for an audit log"

This works because ofox.ai routes according to model string within a single OpenAI-compatible endpoint—Codex remains unaware it communicates with three distinct vendors. Verify model identifiers in ofox's catalog before use, as vendors update naming conventions regularly.

Profiles: the cleanest multi-stack pattern

While --model switches function for occasional use, profiles bundle model, provider, reasoning effort, and sandbox policy under single names for frequent combinations:

[profiles.codex-fast]
model = "openai/gpt-5.3-codex"
model_provider = "ofox"
model_reasoning_effort = "low"

[profiles.review]
model = "anthropic/claude-sonnet-4.6"
model_provider = "ofox"
model_reasoning_effort = "high"

[profiles.bulk]
model = "deepseek/deepseek-v3.2"
model_provider = "ofox"

Then execute:

codex --profile codex-fast "generate unit tests for utils/parse_url.go"
codex --profile review "audit src/auth/ for token leakage"
codex --profile bulk "rewrite README in plain English"

Consider profiles as "complete stacks," while --model represents "single parameter adjustment." This eliminates repeated flag entry for each invocation.

Multiple providers in one config

Declaring several providers simultaneously is permitted. A practical setup preserves OpenAI direct access for sensitive operations while using ofox.ai for routine tasks:

[model_providers.ofox]
name = "ofox.ai"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "chat"
requires_openai_auth = false

[model_providers.openai-direct]
name = "OpenAI direct"
base_url = "https://api.openai.com/v1"
env_key = "OPENAI_API_KEY"
wire_api = "responses"

Switch with --config:

codex --config model_provider=openai-direct --model gpt-5.4 "..."
codex --config model_provider=ofox --model deepseek/deepseek-v3.2 "..."

The identical approach supports self-hosted vLLM instances at http://10.0.0.5:8000/v1 (wire_api = "chat", no authentication) for locally-restricted operations.

Auth that isn't a static Bearer

For gateways dispensing short-lived tokens, the static env_key model proves inadequate. Codex supports an auth sub-table executing a token-fetching command on designated refresh intervals:

[model_providers.corp]
name = "Internal proxy"
base_url = "https://llm.corp.internal/v1"
wire_api = "chat"

[model_providers.corp.auth]
command = "/usr/local/bin/corp-token"
args = ["--audience", "codex"]
timeout_ms = 5000
refresh_interval_ms = 300000

Codex re-executes the command every five minutes, using standard output as the Bearer token. This architecture accommodates AWS SigV4, Azure managed identity, or OIDC bridges. Avoid implementing with env_http_headers and scheduled tasks—the dedicated mechanism exists for this purpose.

The five mistakes I keep seeing

Trailing slash on base_url. https://api.ofox.ai/v1/ functions inconsistently depending on gateway behavior; the specification mandates no trailing slash. Follow documentation precisely.
wire_api = "responses" against Chat-only gateways. Results in 404 /responses not found. Configure it to "chat".
Omitting requires_openai_auth = false. Codex validates key prefixes and rejects gateway prefixes like ofox- or or-. Disable this validation explicitly.
Reusing the openai provider identifier. This identifier is reserved. Select an alternative name.
Embedding keys directly in TOML. Avoid this practice. env_key exists specifically for this—storing secrets in checked-in dotfiles creates recurring security incidents.

Where this fits in the bigger Codex picture

Custom-provider configuration represents one of three typical implementation steps:

Installation (see the complete official Codex CLI installation guide)
Routing (covered in this article)
Day-to-day usage patterns (see the real-world Codex CLI workflow)

For those evaluating Codex CLI against alternatives initially, the comparison between Claude Code, Codex CLI, Cursor, and DeepSeek TUI serves as the appropriate starting point. If the gateway question itself remains unresolved, guidance on LLM API gateway usage and selection addresses the underlying rationale before configuration.

Closing

Codex CLI's custom-provider functionality previously lacked documentation, existing as undocumented convention. In 2026, it represents a first-class configuration component deserving formal study—particularly when managing multiple API keys. When juggling several credentials, environment variables transform into friction. Forty lines of TOML configuration establishes a coding stack where the model functions as a command-line flag rather than operational overhead.

Originally published on ofox.ai/blog.

DEV Community