FuturMix

Posted on May 16

How to Use OpenAI Codex CLI with Multiple AI Models (Not Just GPT)

#ai #codex #programming #productivity

OpenAI's Codex CLI is one of the best terminal-based coding agents available. It reads your codebase, runs commands, edits files, and iterates on code -- all from your terminal.

But here is what most developers miss: you are not locked into GPT models. Codex CLI supports custom OpenAI-compatible API endpoints, which means you can route it through any provider that speaks the OpenAI wire protocol. Claude, DeepSeek, Gemini, Mistral -- all fair game.

This guide shows you exactly how to set it up.

Why Use Other Models with Codex?

Different models have different strengths. Sticking to one model for every task is leaving performance (and money) on the table:

Claude Sonnet 4.6 / Opus 4.7 -- Superior at multi-step reasoning, complex refactors, and understanding large codebases. Fewer hallucinated function calls.
DeepSeek V3 -- Extremely cost-effective for bulk operations: test generation, boilerplate, documentation, translations. ~90% cheaper than GPT-5.5.
Gemini 2.5 Pro -- Strong at multimodal tasks and long-context analysis. 1M+ token context window.
GPT-5.5 -- Still the default and a solid all-rounder. Best Codex integration since it is the native model.

The play: use a gateway that gives you one API key for all models, then swap models in Codex depending on the task.

Setup: Two Methods

Method 1: Environment Variables (Quick)

The fastest way. Set two environment variables and launch Codex:

# In your ~/.zshrc or ~/.bashrc
export OPENAI_API_KEY="your-gateway-api-key"
export OPENAI_BASE_URL="https://api.futurmix.ai/v1"

Reload your shell and run Codex with a specific model:

source ~/.zshrc
codex --model claude-sonnet-4-6 "refactor this function to use async/await"

That is it. Codex sends requests to your gateway instead of OpenAI directly.

Method 2: config.toml (Recommended for Multiple Providers)

For a more permanent setup, edit ~/.codex/config.toml. This lets you define named providers and switch between them:

# ~/.codex/config.toml

# Default model
model = "claude-sonnet-4-6"

# Custom provider pointing to your gateway
model_provider = "gateway"

[model_providers.gateway]
name = "FuturMix Gateway"
base_url = "https://api.futurmix.ai/v1"
wire_api = "responses"
env_key = "FUTURMIX_API_KEY"

Then set the API key in your shell:

export FUTURMIX_API_KEY="sk-your-key-here"

Now codex uses Claude Sonnet 4.6 by default. Override per-session with --model:

codex --model deepseek-chat "generate unit tests for src/utils/"
codex --model claude-opus-4-7 "find and fix the race condition in the worker pool"

Method 3: Quick Override Without Editing Config

If you just want to try it once without changing any config files:

OPENAI_BASE_URL="https://api.futurmix.ai/v1" \
OPENAI_API_KEY="sk-your-key" \
codex --model claude-sonnet-4-6 "explain this codebase"

Best Models for Different Codex Tasks

Not every task needs the most expensive model. Here is a practical breakdown:

Task	Recommended Model	Input/Output Cost	Why
Complex refactoring	Claude Opus 4.7	$4.50 / $22.50	Best multi-step reasoning
General coding	Claude Sonnet 4.6	$2.70 / $13.50	Strong balance of speed + quality
Quick fixes, linting	Claude Haiku 4.5	$0.90 / $4.50	Fast and cheap
Bulk test generation	DeepSeek V3	$0.19 / $0.77	90%+ cheaper, good enough quality
Boilerplate / docs	DeepSeek V3	$0.19 / $0.77	No need to pay premium for templates
Code review	GPT-5.5	$2.10 / $8.40	Solid all-rounder
Long file analysis	Gemini 2.5 Pro	Varies	1M+ context window

Prices shown are per million tokens through a gateway (discounted).

Cost Comparison: Direct vs. Gateway

Using models through a gateway like FuturMix is cheaper than going direct to each provider. Here is the math:

Model	Direct (In/Out per 1M)	Gateway (In/Out per 1M)	Savings
Claude Sonnet 4.6	$3.00 / $15.00	$2.70 / $13.50	10% off
Claude Opus 4.7	$5.00 / $25.00	$4.50 / $22.50	10% off
Claude Haiku 4.5	$1.00 / $5.00	$0.90 / $4.50	10% off
GPT-5.5	$3.00 / $12.00	$2.10 / $8.40	30% off
DeepSeek V3	$0.27 / $1.10	$0.19 / $0.77	30% off

On a typical coding session burning 500K input + 100K output tokens, switching from GPT-5.5 direct ($1.50 + $1.20 = $2.70) to DeepSeek V3 via gateway ($0.095 + $0.077 = $0.17) saves you 94%.

Pro Tips for Cost Optimization

1. Use model aliases in your shell

# Add to ~/.zshrc
alias codex-cheap='codex --model deepseek-chat'
alias codex-smart='codex --model claude-sonnet-4-6'
alias codex-max='codex --model claude-opus-4-7'

Now run codex-cheap "add docstrings to all functions in src/" for bulk tasks.

2. Match model to task complexity

Do not use Opus for generating boilerplate. Do not use DeepSeek for complex architectural decisions. The 10x price difference exists for a reason.

3. Use sandbox mode for safety

When running with less-tested models, tighten the sandbox:

codex --model deepseek-chat --sandbox read-only "analyze this codebase"

4. Set a budget-friendly default

In config.toml, set your default to a mid-tier model and only escalate when needed:

model = "claude-sonnet-4-6"

Works With Other AI Coding Tools Too

The same gateway setup works across the entire AI coding tool ecosystem. One API key, every tool:

Tool	Config Method	What to Set
Codex CLI	`config.toml` or env vars	`OPENAI_BASE_URL` + `OPENAI_API_KEY`
Aider	`--openai-api-base` flag	`OPENAI_API_BASE` env var
Claude Code	Direct API key	`ANTHROPIC_API_KEY` + `ANTHROPIC_BASE_URL`
Cursor	Settings > Models	Custom OpenAI-compatible endpoint
Continue	`config.json` provider block	`apiBase` field in provider config
Roo Code	Settings > Provider	Custom API URL + key
Cline	Settings > API Provider	OpenAI-compatible endpoint

Set up the gateway once, use it everywhere.

Troubleshooting

"Model not found" error
The model name you pass to --model must match the gateway's model ID exactly. Check your provider's model list. Common mistake: using claude-3.5-sonnet instead of the correct identifier like claude-sonnet-4-6.

"Authentication failed"
Make sure OPENAI_API_KEY (or the env_key you defined in config.toml) is set and exported in your current shell session. Run echo $OPENAI_API_KEY to verify.

Responses API vs. Chat Completions
Codex CLI prefers the Responses API (/v1/responses). If your gateway only supports Chat Completions, set wire_api = "chat" in your provider config:

[model_providers.gateway]
base_url = "https://api.futurmix.ai/v1"
wire_api = "chat"
env_key = "FUTURMIX_API_KEY"

Slow responses with large codebases
Some models have lower throughput than GPT. If Codex feels slow, try a faster model for the initial scan and switch to a smarter model for the actual edit.

Config not loading
Codex reads config from ~/.codex/config.toml. Make sure the directory exists:

mkdir -p ~/.codex

Configuration priority: CLI flags > profile settings > config.toml defaults.

Get Started

FuturMix gives you one API key for 22+ models -- Claude, GPT, DeepSeek, Gemini, Mistral, and more. OpenAI-compatible endpoint, so it works with Codex CLI out of the box. Models are 10-30% cheaper than going direct.

Sign up at futurmix.ai
Grab your API key
Set OPENAI_BASE_URL=https://api.futurmix.ai/v1 and your key
Run codex --model claude-sonnet-4-6 "your task here"

Stop paying full price for one model. Use the right model for every task.

DEV Community