TL;DR
Codex CLI includes one built-in endpoint for OpenAI. The [model_providers.<id>] block in ~/.codex/config.toml enables declaring additional providers with the correct wire_api, allowing seamless switching between GPT-5.3 Codex, Claude Sonnet 4.6, and DeepSeek V3.2 from one terminal without environment variable gymnastics. This guide covers the configuration-file approach with multiple providers, profile-based switching, and common implementation challenges.
Why the env-var trick stops working
The shell configuration shortcut combining OPENAI_API_KEY and OPENAI_BASE_URL functions adequately for single endpoints but fails when you need to:
- Maintain both OpenAI direct and OpenAI-compatible gateway access simultaneously
- Run different projects against distinct models without re-sourcing configuration
- Supply non-standard authentication headers like
X-Project-Idor rotating Bearer tokens - Configure
request_max_retriesindividually per provider to prevent upstream failures from affecting defaults
These requirements demand a proper configuration file. Codex CLI reads ~/.codex/config.toml at each invocation, with [model_providers.<id>] tables designated for custom endpoints.
Anatomy of a model_providers block
The complete table supports approximately a dozen keys, though five prove essential for most implementations:
[model_providers.ofox]
name = "ofox.ai gateway"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "chat"
request_max_retries = 4
-
base_url— references the API root, concluding with/v1for OpenAI-compatible gateways without trailing slashes. The endpoint Codex appends depends onwire_apiselection. -
env_key— identifies the environment variable containing the Bearer token at runtime. Never embed keys directly in TOML. -
wire_api—"responses"directs Codex to POST at/responses(OpenAI's newer endpoint)."chat"sends requests to/chat/completions. Third-party OpenAI-compatible gateways standardly implement the latter, making it appropriate for ofox.ai, OpenRouter, DeepSeek direct, and similar services. -
http_headers— merges static headers into all requests for organization scoping or regional routing. -
env_http_headers— retrieves header values from environment variables at request execution. Use for tokens requiring rotation.
Two important considerations:
- The identifiers
openai,ollama, andlmstudioare reserved—custom providers require different names. -
requires_openai_auth = falsedisables Codex's validation that key prefixes matchsk-. Most gateways need this explicitly set.
A working ofox.ai setup (copy this)
Create or edit ~/.codex/config.toml:
model = "openai/gpt-5.3-codex"
model_provider = "ofox"
[model_providers.ofox]
name = "ofox.ai"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "chat"
requires_openai_auth = false
Export your key once:
export OFOX_API_KEY=<your-ofox-key>
Verify with a simple invocation:
codex "list every TODO in src/ and group them by file"
A successful model response indicates completion. A 404 Not Found error suggests incorrect wire_api configuration—either /responses targeted at a gateway serving only /chat/completions, or vice versa.
Swapping the model per command
The model key at the configuration top establishes the default. Override per invocation:
codex --model anthropic/claude-sonnet-4.6 "review this PR for race conditions"
codex --model deepseek/deepseek-v3.2 "translate this Bash script to Python"
codex --model openai/gpt-5.4-pro "design a Postgres schema for an audit log"
This works because ofox.ai routes according to model string within a single OpenAI-compatible endpoint—Codex remains unaware it communicates with three distinct vendors. Verify model identifiers in ofox's catalog before use, as vendors update naming conventions regularly.
Profiles: the cleanest multi-stack pattern
While --model switches function for occasional use, profiles bundle model, provider, reasoning effort, and sandbox policy under single names for frequent combinations:
[profiles.codex-fast]
model = "openai/gpt-5.3-codex"
model_provider = "ofox"
model_reasoning_effort = "low"
[profiles.review]
model = "anthropic/claude-sonnet-4.6"
model_provider = "ofox"
model_reasoning_effort = "high"
[profiles.bulk]
model = "deepseek/deepseek-v3.2"
model_provider = "ofox"
Then execute:
codex --profile codex-fast "generate unit tests for utils/parse_url.go"
codex --profile review "audit src/auth/ for token leakage"
codex --profile bulk "rewrite README in plain English"
Consider profiles as "complete stacks," while --model represents "single parameter adjustment." This eliminates repeated flag entry for each invocation.
Multiple providers in one config
Declaring several providers simultaneously is permitted. A practical setup preserves OpenAI direct access for sensitive operations while using ofox.ai for routine tasks:
[model_providers.ofox]
name = "ofox.ai"
base_url = "https://api.ofox.ai/v1"
env_key = "OFOX_API_KEY"
wire_api = "chat"
requires_openai_auth = false
[model_providers.openai-direct]
name = "OpenAI direct"
base_url = "https://api.openai.com/v1"
env_key = "OPENAI_API_KEY"
wire_api = "responses"
Switch with --config:
codex --config model_provider=openai-direct --model gpt-5.4 "..."
codex --config model_provider=ofox --model deepseek/deepseek-v3.2 "..."
The identical approach supports self-hosted vLLM instances at http://10.0.0.5:8000/v1 (wire_api = "chat", no authentication) for locally-restricted operations.
Auth that isn't a static Bearer
For gateways dispensing short-lived tokens, the static env_key model proves inadequate. Codex supports an auth sub-table executing a token-fetching command on designated refresh intervals:
[model_providers.corp]
name = "Internal proxy"
base_url = "https://llm.corp.internal/v1"
wire_api = "chat"
[model_providers.corp.auth]
command = "/usr/local/bin/corp-token"
args = ["--audience", "codex"]
timeout_ms = 5000
refresh_interval_ms = 300000
Codex re-executes the command every five minutes, using standard output as the Bearer token. This architecture accommodates AWS SigV4, Azure managed identity, or OIDC bridges. Avoid implementing with env_http_headers and scheduled tasks—the dedicated mechanism exists for this purpose.
The five mistakes I keep seeing
-
Trailing slash on
base_url.https://api.ofox.ai/v1/functions inconsistently depending on gateway behavior; the specification mandates no trailing slash. Follow documentation precisely. -
wire_api = "responses"against Chat-only gateways. Results in404 /responses not found. Configure it to"chat". -
Omitting
requires_openai_auth = false. Codex validates key prefixes and rejects gateway prefixes likeofox-oror-. Disable this validation explicitly. -
Reusing the
openaiprovider identifier. This identifier is reserved. Select an alternative name. -
Embedding keys directly in TOML. Avoid this practice.
env_keyexists specifically for this—storing secrets in checked-in dotfiles creates recurring security incidents.
Where this fits in the bigger Codex picture
Custom-provider configuration represents one of three typical implementation steps:
- Installation (see the complete official Codex CLI installation guide)
- Routing (covered in this article)
- Day-to-day usage patterns (see the real-world Codex CLI workflow)
For those evaluating Codex CLI against alternatives initially, the comparison between Claude Code, Codex CLI, Cursor, and DeepSeek TUI serves as the appropriate starting point. If the gateway question itself remains unresolved, guidance on LLM API gateway usage and selection addresses the underlying rationale before configuration.
Closing
Codex CLI's custom-provider functionality previously lacked documentation, existing as undocumented convention. In 2026, it represents a first-class configuration component deserving formal study—particularly when managing multiple API keys. When juggling several credentials, environment variables transform into friction. Forty lines of TOML configuration establishes a coding stack where the model functions as a command-line flag rather than operational overhead.
Originally published on ofox.ai/blog.
Top comments (0)