DEV Community

Cover image for Run OpenAI Codex CLI on Claude, Gemini, or Llama — in 50 lines of C#
Jung Hyun, Nam
Jung Hyun, Nam

Posted on

Run OpenAI Codex CLI on Claude, Gemini, or Llama — in 50 lines of C#

OpenAI's Codex CLI ships with a great editor-agent UX: shell tool, apply_patch, plan tracking, the lot. The catch — as of February 2026 it only speaks the OpenAI Responses API. Chat Completion support was dropped (codex-rs/model-provider-info/src/lib.rs: the WireApi enum has one variant, Responses). If you wanted to point it at a Chat-Completion-only endpoint — Ollama, LM Studio, your favorite Llama runner — you're out of luck.

But Codex CLI is happy to talk to any server that speaks Responses. It has a model_provider config block exactly for that. So if you can stand up a Responses-shaped HTTP endpoint backed by the model of your choice, Codex becomes a generic front-end and you choose the brain.

Here's the trick I've been using: a 50-line C# script that runs as both an OpenAI Chat Completion server and a Responses API server, on top of Microsoft.Extensions.AI's vendor-neutral IChatClient abstraction. I then point it at OpenRouter — one API key, hundreds of models including Claude, Gemini, Llama, GPT, you name it — and tell Codex to talk to my local script instead of OpenAI.

End result: OpenAI Codex CLI running on Anthropic's Claude 3.5 Sonnet (or whichever model I'm feeling like that day).

The pieces

I'm using Cadenza.Agent, an MSBuild SDK I ship that turns a single .cs file into a runnable agent server. It's part of a small family of single-file scripting SDKs for .NET 10's file-based programs — same idea as dotnet run script.cs but with a richer Tier-1 API (Tool, UseOllama, UseOpenAi, Run, etc.). The Agent variant exposes:

  • POST /v1/chat/completions — for Aider / Continue / Cursor / Copilot BYOK / sgpt
  • POST /v1/responses — for Codex CLI

Both are backed by the same IChatClient you configure. Switch the backend and the wire-format stays.

For the LLM I'm using OpenRouter, which speaks OpenAI's Chat Completion wire format with a different base URL — perfect for Microsoft.Extensions.AI.OpenAI's drop-in ChatClient. One env var, any model.

For Codex's configuration I'm using its CODEX_HOME environment variable trick: instead of editing ~/.codex/config.toml, you point Codex at a sample-local directory and it loads a fresh config.toml from there. Means I can ship a self-contained sample that never touches the user's global config.

The script

The entire backend, in one file:

#!/usr/bin/env dotnet run
#:sdk Cadenza.Agent@1.0.14

using System.ClientModel;
using OpenAI;

var apiKey = Env.Get("OPENROUTER_API_KEY")
    ?? throw new InvalidOperationException("OPENROUTER_API_KEY env var missing");
var model = Env.Get("OPENROUTER_MODEL") ?? "anthropic/claude-3.5-sonnet";

ServedModelName = "cadenza-codex-openrouter";

// Generate a sample-local Codex home directory.
var codexHome = Path.Combine(Env.Cwd, ".cadenza-codex-openrouter");
MakeDir(codexHome);

var catalogPath = Path.Combine(codexHome, "cadenza-catalog.json").Replace('\\', '/');
var configToml = $"""
    model          = "cadenza-codex-openrouter"
    model_provider = "cadenza"
    model_catalog_json = "{catalogPath}"

    [model_providers.cadenza]
    name     = "Cadenza.Agent (OpenRouter-backed)"
    base_url = "http://localhost:8080/v1"
    wire_api = "responses"
    env_key  = "CADENZA_API_KEY"
    stream_idle_timeout_ms = 300000
    """;
WriteText(Path.Combine(codexHome, "config.toml"), configToml);

// Catalog JSON: declares the served model id to Codex so it stops printing
// "Defaulting to fallback metadata". Fields match codex-rs/protocol/src/
// openai_models.rs ModelInfo schema — every key is required.
var catalogJson = """
    {
      "models": [{
        "slug": "cadenza-codex-openrouter",
        "display_name": "Cadenza (OpenRouter)",
        "description": "OpenRouter-backed agent served by Cadenza.Agent",
        "supported_reasoning_levels": [],
        "shell_type": "default",
        "visibility": "list",
        "supported_in_api": true,
        "priority": 50,
        "availability_nux": null,
        "upgrade": null,
        "base_instructions": "",
        "supports_reasoning_summaries": false,
        "support_verbosity": false,
        "default_verbosity": null,
        "apply_patch_tool_type": "freeform",
        "truncation_policy": { "mode": "tokens", "limit": 8192 },
        "supports_parallel_tool_calls": true,
        "context_window": 200000,
        "max_context_window": 200000,
        "auto_compact_token_limit": 180000,
        "effective_context_window_percent": 95,
        "experimental_supported_tools": []
      }]
    }
    """;
WriteText(Path.Combine(codexHome, "cadenza-catalog.json"), catalogJson);

WriteLine($"Codex config generated at: {codexHome}");
WriteLine("In another terminal, run:");
WriteLine($"  $env:CODEX_HOME      = \"{codexHome}\"");
WriteLine($"  $env:CADENZA_API_KEY = \"any-non-empty-string\"");
WriteLine($"  codex");

// Wire up OpenRouter as the LLM backend.
var openAiOptions = new OpenAIClientOptions { Endpoint = new Uri("https://openrouter.ai/api/v1") };
var chatClient = new OpenAI.Chat.ChatClient(model, new ApiKeyCredential(apiKey), openAiOptions)
    .AsIChatClient();

UseChatClient(chatClient);

await Run();
Enter fullscreen mode Exit fullscreen mode

That's it. No project file, no .csproj, no Program.cs. The #:sdk directive at the top tells the .NET 10 file-based program system to use Cadenza.Agent as the SDK, which pulls in the HTTP server, the Responses wire format, all the package references — and exposes Tool, UseOllama, UseChatClient, Run as bare names you can call directly.

Running it

Save the script as agent-codex-openrouter.cs and:

# Terminal 1 — start the agent server
$env:OPENROUTER_API_KEY = "sk-or-v1-..."
$env:OPENROUTER_MODEL   = "anthropic/claude-3.5-sonnet"  # or any OpenRouter slug
dotnet run agent-codex-openrouter.cs
Enter fullscreen mode Exit fullscreen mode

The first run pulls dependencies — Microsoft.Extensions.AI, the OpenAI SDK, ASP.NET Core. After that it boots in well under a second. The script prints exactly what you need in the second terminal:

Codex config generated at: D:\work\.cadenza-codex-openrouter

In another terminal, run:
  $env:CODEX_HOME      = "D:\work\.cadenza-codex-openrouter"
  $env:CADENZA_API_KEY = "any-non-empty-string"
  codex
Enter fullscreen mode Exit fullscreen mode

Paste those into another terminal, run codex, and you're chatting with Claude 3.5 Sonnet (or whichever OpenRouter model you picked) through the Codex UX. Tools like shell and apply_patch are sent by Codex itself in every request; the agent forwards them to the model and streams the model's function_call outputs back so Codex executes them locally.

What's happening behind the scenes

When Codex sends POST /v1/responses, the agent does this:

  1. Parse the Responses input. Codex sends a message / function_call / function_call_output array; we flatten it into Microsoft.Extensions.AI's IList<ChatMessage> shape.
  2. Honor previous_response_id. Codex chains turns with this id rather than re-sending the full history; the agent keeps a bounded in-memory dictionary of past turns so it can reconstruct context.
  3. Pass through Codex's tools. Codex's shell, apply_patch, update_plan arrive as raw schemas. We declare them to the model as PassthroughFunction instances that have a JSON schema but no real handler — the function-invocation middleware is bypassed for this endpoint, so any function call the model emits streams straight back to Codex.
  4. Call IChatClient.GetStreamingResponseAsync. This dispatches to whichever backend you configured — OpenRouter, Ollama, OpenAI, Anthropic, Azure OpenAI.
  5. Re-emit as Responses SSE. The ChatResponseUpdate stream gets translated into the ~15 SSE event types Codex expects: response.created, response.in_progress, response.output_item.added, response.output_text.delta, response.function_call_arguments.delta, response.completed, and friends.

The IChatClient abstraction is the trick that makes this composable. Cadenza.Agent doesn't care that OpenRouter is "really" Anthropic-this-time-Claude-next-time-Llama; it sees a chat client, calls it, and serializes whatever comes back into the wire format Codex wants.

The CODEX_HOME pattern

I want to stop and praise this. Codex CLI honors a CODEX_HOME environment variable that overrides where it looks for config.toml — instead of ~/.codex/, it reads from whatever directory you point at. The sample uses this to its full effect: it generates a sample-local directory with its own config.toml and cadenza-catalog.json, and prints the exact $env:CODEX_HOME = ... line to paste.

The result: your global ~/.codex/config.toml stays untouched. Different samples — Ollama backend, OpenRouter backend, gpt-5 reasoning effort tweaks — get their own isolated directories. You can have ten of them and they don't interfere. Want to share the setup with a teammate? Hand them the .cs file; their codex command points at the local directory the script generated.

Silencing the "Defaulting to fallback metadata" warning

If you point Codex at a model id it doesn't recognize, it falls back to default metadata for context window and output limits — and prints a warning every turn:

⚠ Model metadata for `cadenza-codex-openrouter` not found.
   Defaulting to fallback metadata; this can degrade performance and cause issues.
Enter fullscreen mode Exit fullscreen mode

This is suppressed by the model_catalog_json config key pointing at a JSON file that declares your slug. The schema is codex-rs/protocol/src/openai_models.rs::ModelInfo — 17 required fields. The sample includes a complete catalog entry; if you swap to a model with a smaller context window (e.g. openai/gpt-4o-mini at 128K), lower the context_window and max_context_window accordingly. Codex truncates prompts to this number, so over-declaring causes silent token overflows on the backing model.

Note also: model_catalog_json replaces Codex's bundled catalog rather than merging. If you want gpt-5-codex to keep working alongside your custom slug, include it in your JSON too.

One footgun I hit (and fixed)

The first time I ran this, Codex refused to start:

Error loading configuration: failed to parse model_catalog_json path
`...\cadenza-catalog.json` as JSON: expected value at line 1 column 1
Enter fullscreen mode Exit fullscreen mode

The cause was a BOM. .NET's Encoding.UTF8 is the BOM-emitting variant, so File.WriteAllText(path, content, Encoding.UTF8) prepends EF BB BF before your data. Rust's serde_json (which Codex uses) rejects this — strict spec compliance: RFC 8259 says JSON implementations MUST NOT add a BOM.

Cadenza's Fs.WriteText had inherited that BOM-emitting default. Fixed by switching to new UTF8Encoding(encoderShouldEmitUTF8Identifier: false) and shipping the SDK as 1.0.14. The same fix applies to Console.OutputEncoding — without it, dotnet-script | jq would corrupt the pipe.

Worth checking your own .NET code that writes files for strict parsers: if it goes through File.WriteAllText(path, text, Encoding.UTF8), you're emitting a BOM. The fix is one line:

File.WriteAllText(path, text, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false));
Enter fullscreen mode Exit fullscreen mode

Why I think this pattern matters

Codex CLI's tool loop is genuinely useful. The Responses API lock-in feels like the kind of vendor coupling that, left unchecked, kills the open-tool ecosystem. The model_providers config + wire_api = "responses" escape hatch is OpenAI explicitly saying "we accept you might want this elsewhere" — and the right move is to take them up on it.

Once you have a Responses server you control, the ecosystem opens up. Want Codex on a $0/month local Ollama model for offline work? Swap UseChatClient for UseOllama — same script, same Codex config, different brain. Want to inject a project-pinned system prompt every Codex session sees? Add it before Run(). Want to log every Codex turn for audit? Wrap the IChatClient with your own middleware. Want to round-robin between OpenRouter and a local model based on prompt size? Write the logic in C# and serve through the same endpoint.

The single-file format is what makes it sustainable. There's no project to maintain, no SDK to manage, no separate binary to ship — just a .cs file you copy into your repo. If dotnet run script.cs is available (it is on .NET 10+), the script runs.

Try it

Install .NET 10, then:

dotnet new install Cadenza.Templates
dotnet new cadenza-agent -n my-codex-backend -o ./my-codex-backend
cd my-codex-backend
# Edit my-codex-backend.cs to use the OpenRouter pattern above
$env:OPENROUTER_API_KEY = "sk-or-v1-..."
dotnet run my-codex-backend.cs
Enter fullscreen mode Exit fullscreen mode

Or grab the ready-to-run sample from the Cadenza repositoryagent-codex-openrouter.cs is the version above. The repo also has agent-codex-backend.cs (Ollama variant) and agent-openrouter.cs (Chat Completion variant for Aider / Continue / Cursor).

If this is useful, let me know what backend you wire up. I'm curious whether anyone gets Codex running on a fine-tuned local model with a local fallback for offline coding — that's the next experiment on my list.


Cadenza is MIT-licensed. Source: https://github.com/rkttu/cadenza. The Cadenza.Agent package ships at 1.0.14 as of writing.

Cover Image Credit: Lukas from Unsplash

Top comments (0)