DEV Community

KingGyu
KingGyu

Posted on

I open-sourced Codex Spark: traceable UI delegation for Codex

I open-sourced Codex Spark, a Codex plugin for delegating concrete Computer Use and Browser Use tasks to GPT-5.3 Codex Spark subagents.

Repo: https://github.com/KingGyuSuh/awesome-codex-spark

The Problem

As Codex sessions get better at long-horizon reasoning, the bottleneck is not always "can the model click the button?"

Often the better question is:

Should the most reasoning-heavy model spend its context and tokens on mechanical UI work?

If the parent session is doing architecture, code review, release verification, or product reasoning, I want it focused there. A visible-world task like opening a page, reading UI state, pasting approved content, or filling one approved form should be delegated as bounded execution.

The Pattern

Codex Spark uses one skill: $codex-spark-delegate.

The parent session remains responsible for:

  • understanding the user request;
  • choosing exactly one surface: Computer Use or Browser Use;
  • confirming exact side effects;
  • setting the target, content, limits, and verification criteria;
  • reading the returned trace and deciding recovery.

The Spark child is not a planner. It is an executor.

The Trace Is The Interface

The child must return:

  • status;
  • trace id;
  • tool surface;
  • target;
  • model config;
  • steps;
  • observations;
  • verification;
  • artifacts;
  • blockers;
  • next step.

That trace matters because UI work fails in partial ways. A form might submit but not visibly persist. A rich-text editor might accept pasted text but corrupt non-ASCII characters. A browser tool might be unavailable. The parent needs evidence, not a vague "done."

What It Does Not Do

Codex Spark is intentionally narrow.

It does not ship X, Reddit, Gmail, or other domain-specific executors. Those belong in separate plugins. It also does not silently replace Browser Use with HTTP scraping or another automation surface. If the requested surface is unavailable, the child reports blocked.

Why I Built It

The useful split is:

  • reasoning-heavy parent model, for example GPT-5.5 xhigh, handles judgment;
  • Codex Spark handles bounded visible-world execution;
  • the trace is the join point.

That lets the strongest reasoning stay focused on design, code, and verification while Spark handles the mechanical UI/browser work.

Repo: https://github.com/KingGyuSuh/awesome-codex-spark

Top comments (0)