mufeng

Posted on Jun 19

Two Ways Claude Code Calls Codex: One-Shot Subprocess vs. Persistent App Server

#codex #claudecode

"Claude Code calls Codex" sounds like one feature. It's at least two different process models, and they have almost nothing in common past the name.

The first spawns a one-shot subprocess with codex exec. You hand it one explicit instruction, it produces a file or a structured result, and it exits. The second runs a persistent runtime with codex app-server and talks to it over JSON-RPC, managing threads, turns, reviews, and interrupts for work that needs to carry state across rounds.

Both let Claude Code borrow Codex. They differ on startup cost, protocol, permissions, error recovery, and the kind of task they fit. Get the distinction wrong and you either over-engineer a one-shot job or reach for a stateless call on work that needs to resume.

The conclusion first: two architectures, not two commands

Dimension	`codex exec` one-shot subprocess	`codex app-server` persistent service
Reference implementation	baoyu `codex-imagegen` backend	OpenAI Codex Plugin for Claude Code
Process shape	Spawned per task, exits when done	Long-running, reused within a session
Transport	Launch args, stdin, JSONL event stream	JSON-RPC requests and notifications
State model	Single run, no dependence on the last	Thread holds multiple turns, can resume
Permission posture	The example uses `danger-full-access`	Review is read-only; task can switch to `workspace-write`
Typical task	Image gen, file generation, single deterministic op	Code review, long delegated tasks, multi-turn work
Main risk	Full-access child, cold start every time	More protocol and lifecycle complexity

The one-line test:

If you need to run once and get a single verifiable artifact, reach for codex exec.
If you need ongoing collaboration, retained context, and the ability to cancel or resume, reach for codex app-server.

Version scope: keep the numbers honest

The first thing this writeup exposed wasn't architecture. It was version accounting. I had carried over the original draft's phrasing about "the current local version," and only after checking the install records did I confirm that the marketplace source and the active plugin were not the same snapshot.

Local commands and plugin records show:

Codex CLI is 0.140.0.
The OpenAI Codex Plugin for Claude Code is 1.0.4, commit 807e03a.
The baoyu-skills marketplace source snapshot is 2.5.1, commit 441ca30.
But Claude Code's installed-plugin record still points baoyu-skills at the earlier 1.111.1 snapshot.

So the accurate way to state the baoyu-codex-imagegen analysis below is this:

It's based on the baoyu-skills v2.5.1 source snapshot in the local marketplace, not a claim that the active plugin has been upgraded to v2.5.1.

This is easy to miss. The marketplace source, the cached snapshot, and the active version can all be different commits. Read the directory name or the changelog alone and you'll write "the version I read" when you mean "the version actually running."

Path one: `codex exec`, Codex as a one-shot operator

What it solves

The baoyu-codex-imagegen skill has a narrow job: let a non-Codex host like Claude Code call the image_gen tool built into the Codex CLI, and save the result to a chosen path.

Tasks like that share a shape:

Clear input boundary, usually a prompt, an aspect ratio, and an output path.
Clear result boundary, usually one file and one line of structured status.
No need for multiple rounds, and no need to restore prior context.

So it skips a persistent service and spawns directly:

codex exec \
  --json \
  --sandbox danger-full-access \
  --skip-git-repo-check \
  -

If a reference image exists, it appends one or more --image arguments.

Why each flag is there

exec runs non-interactively for scripting. OpenAI's CLI docs position it as the execution path for automation and CI: run, return a result, done.

--json turns process output into line-delimited JSON events, or JSONL. The caller doesn't parse terminal display text; it reads structured events for the thread, tool calls, usage, and the final message.

--sandbox danger-full-access is here because this implementation needs Codex to copy the image from its default generation directory to an arbitrary target path the caller specifies, so it grants full file permissions.

That is not a general best practice. OpenAI's docs recommend workspace-write for automation and say to avoid unnecessary full access unless the runtime is already isolated.

--skip-git-repo-check lets Codex run outside a Git repo, since image jobs may launch from a temp or plugin directory rather than a trusted repository.

The trailing - tells Codex to read the instruction from stdin. The wrapper writes the task contract with child.stdin.write(instruction) and then closes stdin.

The task contract is the real work

This path doesn't pass the user prompt straight through. It wraps a strict instruction, roughly:

TASK:
Generate an image and save it to the given path.

STEPS:
1. You must call the built-in image_gen.
2. Copy the result to the target path.
3. Check that the target file exists.
4. Return one line of JSON only.

HARD CONSTRAINTS:
- Do not call an external image API.
- Do not fake the image with a script.
- You must use image_gen to produce real pixels.

This is the "sub-agent as operator" design:

Fixed input structure.
Fixed set of allowed tools.
Fixed file side effects.
Fixed output format.
Explicit prohibitions.

For an automated pipeline, the constraints matter more than the phrasing. The caller wants a verifiable result, not an open conversation.

Don't trust self-reported success: three checks

The engineering detail worth keeping is that this implementation does not call the job done just because Codex replied "success."

It checks, in order:

Whether the JSONL events contain a thread ID.
Whether an image actually appears under $CODEX_HOME/generated_images/{threadId}/.
If the directory check fails, whether the tool calls include a cp or mv from the generation directory to the target path.
Whether the target file actually exists and has a byte count above zero.

Failure becomes a structured error:

agent_refused
no_image_gen_tool_use
timeout
codex_not_installed
spawn_failed

The point isn't the image. It's a general principle:

An agent's natural-language reply is a claim. Files, events, and repeatable checks are evidence.

Where it fits and where it doesn't

Good fit:

Single image or file generation.
A code transform with clear boundaries.
One-off analysis that returns structured JSON.
Automation that doesn't need inherited context.

Limits:

Every run pays process and model cold-start cost.
No cross-run state by default.
With danger-full-access, the trust boundary is very wide.
Timeout, cancellation, and recovery usually fall to the wrapper to build.

Path two: `codex app-server`, Codex as a stateful service

The OpenAI Codex Plugin for Claude Code does not re-run codex exec per command. It starts codex app-server and manages an ongoing session over JSON-RPC.

OpenAI's docs define the App Server's core abstraction in three layers:

Thread: a conversation that persists.
Turn: one round of user input and agent execution inside a thread.
Item: events inside a turn, such as messages, reasoning, commands, and file edits.

Direct connection and broker

The plugin supports two connection modes.

Direct:

Claude Code
    |
    | stdin/stdout JSONL
    v
codex app-server

The client starts codex app-server itself and sends line-delimited JSON-RPC over stdio.

Broker:

Claude Code command
    |
    | Unix socket
    v
Broker
    |
    | reuse
    v
codex app-server

The plugin stores the broker endpoint in CODEX_COMPANION_APP_SERVER_ENDPOINT so review, rescue, and status commands in the same Claude Code session share one Codex runtime.

If the broker returns the busy error -32001, or the connection hits ENOENT or ECONNREFUSED, the plugin drops the broker and starts an App Server directly to retry.

That's one more layer than a one-shot subprocess, and it buys:

Runtime reuse within a session.
Thread persistence.
Background task management.
Cancel and resume.
Permission isolation between review and task.

Handshake: initialize first

Once the App Server connection is up, the client sends initialize, then an initialized notification.

The plugin passes this client identity:

{
  "title": "Codex Plugin",
  "name": "Claude Code",
  "version": "1.0.4"
}

It also uses optOutNotificationMethods to unsubscribe from some token-level delta events, keeping the structured notifications that are worth more to the caller and cutting noise.

Session model: threads and turns

The key RPC methods the plugin uses:

Method	Purpose
`thread/start`	Create a new thread
`thread/name/set`	Name a thread
`thread/resume`	Resume an existing thread
`thread/list`	Query past threads
`turn/start`	Start a turn in a thread
`review/start`	Start a code review
`turn/interrupt`	Interrupt a running turn

So the App Server isn't a single-round wrapper that "sends a prompt and waits." It's a managed session runtime.

Review and task have different permissions

The plugin keeps the two actions separate.

Review runs read-only, on a temporary thread, through review/start. It returns findings and does not touch code.

Task defaults to read-only. Pass --write and it switches to workspace-write. It can save the thread, and it can continue prior work with --resume or --resume-last.

This is closer to what an engineering system's default should look like than "run everything with full access." Set the minimum permission by the nature of the task, then decide whether to widen write scope.

Hooks wire Codex into the Claude Code lifecycle

The plugin registers three Claude Code hooks:

SessionStart: prepare the shared runtime.
SessionEnd: clean up the broker and session resources.
Stop: an optional stop-gate review.

With the review gate on, every time Claude Code is about to stop, it can have Codex check whether the last round has a blocking problem.

The value isn't "one more model." It's putting a second model inside the delivery flow:

Claude makes a change
    |
    v
Codex reviews independently
    |
    +-- ALLOW: stop is permitted
    |
    +-- BLOCK: return findings, keep working

It has a cost. The official plugin README warns that the review gate can create long Claude/Codex loops and burn through usage fast, so don't turn it on unconditionally.

How to choose

When `codex exec` fits

Use a one-shot subprocess when most of these hold:

The task is a single round.
The result can be verified by a file or JSON.
You don't need to restore prior context.
Cold-start cost is acceptable.
The caller can handle timeout and retry on its own.

Examples: generate an image, convert input to a fixed format, run one analysis on a file, run a check once in CI.

When `codex app-server` fits

Use the persistent service when you need:

Multiple rounds of conversation.
Thread resumption.
Background runs and status queries.
Interruption of a running task.
Separate review and write permissions.
Integration with Claude Code's session lifecycle.

Examples: review a branch continuously, delegate a long investigation, let Codex change code and then add tests, or run an automatic second-model gate before stopping.

How this was verified

This published version doesn't lean on the draft's description. I redid a minimal verification.

The steps I ran:

Read the draft and listed every factual claim about versions, commands, RPC methods, and permissions.
Ran codex --version, codex exec --help, and codex app-server --help to confirm the current CLI's commands and flags.
Checked the OpenAI plugin manifest, install records, app-server.mjs, codex.mjs, and the hook config.
Checked spawn.ts, main.ts, the version file, and the Git commit in the baoyu marketplace source.
Cross-checked against the OpenAI Codex CLI, App Server, Codex Plugin, and Claude Code Hooks docs.
Recorded "current active version" and "source snapshot I actually read" separately.

The mistake and the lesson

I first took the draft's baoyu-skills v2.5.1 as "the current local version." On further checking, the v2.5.1 marketplace source does exist locally, but Claude Code's installed-plugin record still points at an earlier snapshot.

Without checking the install record, that phrasing looks reasonable and is wrong.

The lesson:

When you analyze a local plugin, record at least the marketplace HEAD, the install cache path, the plugin manifest, and the commit. No single one of those stands in for "the version actually running."

Practical advice

One-shot tasks: hardcode the output contract

Don't write "generate an image for me" or "check my code." An automation prompt should include at least:

Goal
Allowed tools
Input and output paths
Prohibitions
Verification steps
Final return format

That cuts the uncertainty of an agent improvising, and it lets the caller judge success or failure.

Long tasks: resume with the delta only

When you resume a thread, send only what changed:

Continue the last task. Apply the first fix and add the matching test.

There's no reason to re-paste the whole background. Repeating context adds noise and can make the model misread the task boundary.

Review tasks: bind every finding to evidence

Whether you run a standard review or an adversarial one, require each finding to carry:

The file or diff actually examined.
A reproducible failure path.
A clear risk level.
A split between fact, inference, and open question.

A "might be a problem" with no evidence rarely makes it into an engineering decision.

Permissions: start at the smallest scope

The order of preference:

read-only
    |
    v
workspace-write
    |
    v
danger-full-access

Widen only when the task genuinely needs a larger file scope and the runtime is trusted.

Closing

"Claude Code calls Codex" is not one calling convention.

codex exec is a one-shot, stateless subprocess that's easy to wrap. It fits single tasks with clear boundaries and verifiable results.

codex app-server is a stateful, resumable, manageable agent service. It fits code review, task delegation, and complex work that needs ongoing collaboration.

The real selection criteria aren't "which is more advanced." They are:

Does the task need state?
Can the result be verified in one shot?
Do you need interruption, resume, and background management?
Can permissions be graded by action?
Is the extra protocol complexity worth it?

Simple tasks get a simple process. Ongoing collaboration gets a stateful service. Draw that line clearly and the system gets easier to understand and to maintain.

DEV Community

Two Ways Claude Code Calls Codex: One-Shot Subprocess vs. Persistent App Server

The conclusion first: two architectures, not two commands

Version scope: keep the numbers honest

Path one: `codex exec`, Codex as a one-shot operator

What it solves

Why each flag is there

The task contract is the real work

Don't trust self-reported success: three checks

Where it fits and where it doesn't

Path two: `codex app-server`, Codex as a stateful service

Direct connection and broker

Handshake: initialize first

Session model: threads and turns

Review and task have different permissions

Hooks wire Codex into the Claude Code lifecycle

How to choose

When `codex exec` fits

When `codex app-server` fits

How this was verified

The mistake and the lesson

Practical advice

One-shot tasks: hardcode the output contract

Long tasks: resume with the delta only

Review tasks: bind every finding to evidence

Permissions: start at the smallest scope

Closing

References

Top comments (0)

The conclusion first: two architectures, not two commands

Version scope: keep the numbers honest

Path one: codex exec, Codex as a one-shot operator

What it solves

Why each flag is there

The task contract is the real work

Don't trust self-reported success: three checks

Where it fits and where it doesn't

Path two: codex app-server, Codex as a stateful service

Direct connection and broker

Handshake: initialize first

Session model: threads and turns

Review and task have different permissions

Hooks wire Codex into the Claude Code lifecycle

How to choose

When codex exec fits

When codex app-server fits

How this was verified

The mistake and the lesson

Practical advice

One-shot tasks: hardcode the output contract

Long tasks: resume with the delta only

Review tasks: bind every finding to evidence

Permissions: start at the smallest scope

Closing

References

Path one: `codex exec`, Codex as a one-shot operator

Path two: `codex app-server`, Codex as a stateful service

When `codex exec` fits

When `codex app-server` fits