OpenAI's Codex CLI ships with a great editor-agent UX: shell tool, apply_patch, plan tracking, the lot. The catch — as of February 2026 it only speaks the OpenAI Responses API. Chat Completion support was dropped (codex-rs/model-provider-info/src/lib.rs: the WireApi enum has one variant, Responses). If you wanted to point it at a Chat-Completion-only endpoint — Ollama, LM Studio, your favorite Llama runner — you're out of luck.
But Codex CLI is happy to talk to any server that speaks Responses. It has a model_provider config block exactly for that. So if you can stand up a Responses-shaped HTTP endpoint backed by the model of your choice, Codex becomes a generic front-end and you choose the brain.
Here's the trick I've been using: a 50-line C# script that runs as both an OpenAI Chat Completion server and a Responses API server, on top of Microsoft.Extensions.AI's vendor-neutral IChatClient abstraction. I then point it at OpenRouter — one API key, hundreds of models including Claude, Gemini, Llama, GPT, you name it — and tell Codex to talk to my local script instead of OpenAI.
End result: OpenAI Codex CLI running on Anthropic's Claude 3.5 Sonnet (or whichever model I'm feeling like that day).
The pieces
I'm using Cadenza.Agent, an MSBuild SDK I ship that turns a single .cs file into a runnable agent server. It's part of a small family of single-file scripting SDKs for .NET 10's file-based programs — same idea as dotnet run script.cs but with a richer Tier-1 API (Tool, UseOllama, UseOpenAi, Run, etc.). The Agent variant exposes:
-
POST /v1/chat/completions— for Aider / Continue / Cursor / Copilot BYOK / sgpt -
POST /v1/responses— for Codex CLI
Both are backed by the same IChatClient you configure. Switch the backend and the wire-format stays.
For the LLM I'm using OpenRouter, which speaks OpenAI's Chat Completion wire format with a different base URL — perfect for Microsoft.Extensions.AI.OpenAI's drop-in ChatClient. One env var, any model.
For Codex's configuration I'm using its CODEX_HOME environment variable trick: instead of editing ~/.codex/config.toml, you point Codex at a sample-local directory and it loads a fresh config.toml from there. Means I can ship a self-contained sample that never touches the user's global config.
The script
The entire backend, in one file:
#!/usr/bin/env dotnet run
#:sdk Cadenza.Agent@1.0.14
using System.ClientModel;
using OpenAI;
var apiKey = Env.Get("OPENROUTER_API_KEY")
?? throw new InvalidOperationException("OPENROUTER_API_KEY env var missing");
var model = Env.Get("OPENROUTER_MODEL") ?? "anthropic/claude-3.5-sonnet";
ServedModelName = "cadenza-codex-openrouter";
// Generate a sample-local Codex home directory.
var codexHome = Path.Combine(Env.Cwd, ".cadenza-codex-openrouter");
MakeDir(codexHome);
var catalogPath = Path.Combine(codexHome, "cadenza-catalog.json").Replace('\\', '/');
var configToml = $"""
model = "cadenza-codex-openrouter"
model_provider = "cadenza"
model_catalog_json = "{catalogPath}"
[model_providers.cadenza]
name = "Cadenza.Agent (OpenRouter-backed)"
base_url = "http://localhost:8080/v1"
wire_api = "responses"
env_key = "CADENZA_API_KEY"
stream_idle_timeout_ms = 300000
""";
WriteText(Path.Combine(codexHome, "config.toml"), configToml);
// Catalog JSON: declares the served model id to Codex so it stops printing
// "Defaulting to fallback metadata". Fields match codex-rs/protocol/src/
// openai_models.rs ModelInfo schema — every key is required.
var catalogJson = """
{
"models": [{
"slug": "cadenza-codex-openrouter",
"display_name": "Cadenza (OpenRouter)",
"description": "OpenRouter-backed agent served by Cadenza.Agent",
"supported_reasoning_levels": [],
"shell_type": "default",
"visibility": "list",
"supported_in_api": true,
"priority": 50,
"availability_nux": null,
"upgrade": null,
"base_instructions": "",
"supports_reasoning_summaries": false,
"support_verbosity": false,
"default_verbosity": null,
"apply_patch_tool_type": "freeform",
"truncation_policy": { "mode": "tokens", "limit": 8192 },
"supports_parallel_tool_calls": true,
"context_window": 200000,
"max_context_window": 200000,
"auto_compact_token_limit": 180000,
"effective_context_window_percent": 95,
"experimental_supported_tools": []
}]
}
""";
WriteText(Path.Combine(codexHome, "cadenza-catalog.json"), catalogJson);
WriteLine($"Codex config generated at: {codexHome}");
WriteLine("In another terminal, run:");
WriteLine($" $env:CODEX_HOME = \"{codexHome}\"");
WriteLine($" $env:CADENZA_API_KEY = \"any-non-empty-string\"");
WriteLine($" codex");
// Wire up OpenRouter as the LLM backend.
var openAiOptions = new OpenAIClientOptions { Endpoint = new Uri("https://openrouter.ai/api/v1") };
var chatClient = new OpenAI.Chat.ChatClient(model, new ApiKeyCredential(apiKey), openAiOptions)
.AsIChatClient();
UseChatClient(chatClient);
await Run();
That's it. No project file, no .csproj, no Program.cs. The #:sdk directive at the top tells the .NET 10 file-based program system to use Cadenza.Agent as the SDK, which pulls in the HTTP server, the Responses wire format, all the package references — and exposes Tool, UseOllama, UseChatClient, Run as bare names you can call directly.
Running it
Save the script as agent-codex-openrouter.cs and:
# Terminal 1 — start the agent server
$env:OPENROUTER_API_KEY = "sk-or-v1-..."
$env:OPENROUTER_MODEL = "anthropic/claude-3.5-sonnet" # or any OpenRouter slug
dotnet run agent-codex-openrouter.cs
The first run pulls dependencies — Microsoft.Extensions.AI, the OpenAI SDK, ASP.NET Core. After that it boots in well under a second. The script prints exactly what you need in the second terminal:
Codex config generated at: D:\work\.cadenza-codex-openrouter
In another terminal, run:
$env:CODEX_HOME = "D:\work\.cadenza-codex-openrouter"
$env:CADENZA_API_KEY = "any-non-empty-string"
codex
Paste those into another terminal, run codex, and you're chatting with Claude 3.5 Sonnet (or whichever OpenRouter model you picked) through the Codex UX. Tools like shell and apply_patch are sent by Codex itself in every request; the agent forwards them to the model and streams the model's function_call outputs back so Codex executes them locally.
What's happening behind the scenes
When Codex sends POST /v1/responses, the agent does this:
-
Parse the Responses input. Codex sends a
message/function_call/function_call_outputarray; we flatten it intoMicrosoft.Extensions.AI'sIList<ChatMessage>shape. -
Honor
previous_response_id. Codex chains turns with this id rather than re-sending the full history; the agent keeps a bounded in-memory dictionary of past turns so it can reconstruct context. -
Pass through Codex's tools. Codex's
shell,apply_patch,update_planarrive as raw schemas. We declare them to the model asPassthroughFunctioninstances that have a JSON schema but no real handler — the function-invocation middleware is bypassed for this endpoint, so any function call the model emits streams straight back to Codex. -
Call
IChatClient.GetStreamingResponseAsync. This dispatches to whichever backend you configured — OpenRouter, Ollama, OpenAI, Anthropic, Azure OpenAI. -
Re-emit as Responses SSE. The
ChatResponseUpdatestream gets translated into the ~15 SSE event types Codex expects:response.created,response.in_progress,response.output_item.added,response.output_text.delta,response.function_call_arguments.delta,response.completed, and friends.
The IChatClient abstraction is the trick that makes this composable. Cadenza.Agent doesn't care that OpenRouter is "really" Anthropic-this-time-Claude-next-time-Llama; it sees a chat client, calls it, and serializes whatever comes back into the wire format Codex wants.
The CODEX_HOME pattern
I want to stop and praise this. Codex CLI honors a CODEX_HOME environment variable that overrides where it looks for config.toml — instead of ~/.codex/, it reads from whatever directory you point at. The sample uses this to its full effect: it generates a sample-local directory with its own config.toml and cadenza-catalog.json, and prints the exact $env:CODEX_HOME = ... line to paste.
The result: your global ~/.codex/config.toml stays untouched. Different samples — Ollama backend, OpenRouter backend, gpt-5 reasoning effort tweaks — get their own isolated directories. You can have ten of them and they don't interfere. Want to share the setup with a teammate? Hand them the .cs file; their codex command points at the local directory the script generated.
Silencing the "Defaulting to fallback metadata" warning
If you point Codex at a model id it doesn't recognize, it falls back to default metadata for context window and output limits — and prints a warning every turn:
⚠ Model metadata for `cadenza-codex-openrouter` not found.
Defaulting to fallback metadata; this can degrade performance and cause issues.
This is suppressed by the model_catalog_json config key pointing at a JSON file that declares your slug. The schema is codex-rs/protocol/src/openai_models.rs::ModelInfo — 17 required fields. The sample includes a complete catalog entry; if you swap to a model with a smaller context window (e.g. openai/gpt-4o-mini at 128K), lower the context_window and max_context_window accordingly. Codex truncates prompts to this number, so over-declaring causes silent token overflows on the backing model.
Note also: model_catalog_json replaces Codex's bundled catalog rather than merging. If you want gpt-5-codex to keep working alongside your custom slug, include it in your JSON too.
One footgun I hit (and fixed)
The first time I ran this, Codex refused to start:
Error loading configuration: failed to parse model_catalog_json path
`...\cadenza-catalog.json` as JSON: expected value at line 1 column 1
The cause was a BOM. .NET's Encoding.UTF8 is the BOM-emitting variant, so File.WriteAllText(path, content, Encoding.UTF8) prepends EF BB BF before your data. Rust's serde_json (which Codex uses) rejects this — strict spec compliance: RFC 8259 says JSON implementations MUST NOT add a BOM.
Cadenza's Fs.WriteText had inherited that BOM-emitting default. Fixed by switching to new UTF8Encoding(encoderShouldEmitUTF8Identifier: false) and shipping the SDK as 1.0.14. The same fix applies to Console.OutputEncoding — without it, dotnet-script | jq would corrupt the pipe.
Worth checking your own .NET code that writes files for strict parsers: if it goes through File.WriteAllText(path, text, Encoding.UTF8), you're emitting a BOM. The fix is one line:
File.WriteAllText(path, text, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false));
Why I think this pattern matters
Codex CLI's tool loop is genuinely useful. The Responses API lock-in feels like the kind of vendor coupling that, left unchecked, kills the open-tool ecosystem. The model_providers config + wire_api = "responses" escape hatch is OpenAI explicitly saying "we accept you might want this elsewhere" — and the right move is to take them up on it.
Once you have a Responses server you control, the ecosystem opens up. Want Codex on a $0/month local Ollama model for offline work? Swap UseChatClient for UseOllama — same script, same Codex config, different brain. Want to inject a project-pinned system prompt every Codex session sees? Add it before Run(). Want to log every Codex turn for audit? Wrap the IChatClient with your own middleware. Want to round-robin between OpenRouter and a local model based on prompt size? Write the logic in C# and serve through the same endpoint.
The single-file format is what makes it sustainable. There's no project to maintain, no SDK to manage, no separate binary to ship — just a .cs file you copy into your repo. If dotnet run script.cs is available (it is on .NET 10+), the script runs.
Try it
Install .NET 10, then:
dotnet new install Cadenza.Templates
dotnet new cadenza-agent -n my-codex-backend -o ./my-codex-backend
cd my-codex-backend
# Edit my-codex-backend.cs to use the OpenRouter pattern above
$env:OPENROUTER_API_KEY = "sk-or-v1-..."
dotnet run my-codex-backend.cs
Or grab the ready-to-run sample from the Cadenza repository — agent-codex-openrouter.cs is the version above. The repo also has agent-codex-backend.cs (Ollama variant) and agent-openrouter.cs (Chat Completion variant for Aider / Continue / Cursor).
If this is useful, let me know what backend you wire up. I'm curious whether anyone gets Codex running on a fine-tuned local model with a local fallback for offline coding — that's the next experiment on my list.
Cadenza is MIT-licensed. Source: https://github.com/rkttu/cadenza. The Cadenza.Agent package ships at 1.0.14 as of writing.
Top comments (0)