Sakib Sadman Shajib

Posted on Apr 14

I ported codex-plugin-cc to Gemini CLI — here's how ACP replaces App Server Protocol

#cli #gemini #claudecode #showdev

The backstory: why I wanted multi-agent delegation

I have Pro plans for all three — Claude Code, Codex (via ChatGPT), and Gemini CLI (via a family storage plan). Each one is good at different things:

Claude Code — the smartest orchestrator. Deep reasoning, great at multi-file refactoring, and its plugin/agent system is the most mature.
Codex — fast, focused, and increasingly capable. Good for targeted fixes and iteration.
Gemini CLI — 1M token context window. When I need to investigate an entire codebase at once, this is what I reach for.

For months, I've been running Claude Code as the "engineering lead" — it decides what needs to happen, and delegates specific tasks to Codex or Gemini based on the nature of the work. Long-context investigation? Gemini. Creative design, focused fix? Codex. Orchestration, architecture decisions, multi-file changes? Claude stays in the driver's seat.

I initially did this with custom Claude Code skills — markdown files that instructed Claude on how to shell out to the other CLIs in non-interactive mode. It worked, but it was fragile. There was no proper session management, no job tracking, no background execution. It was duct tape.

Then OpenAI released codex-plugin-cc.

What codex-plugin-cc does right

OpenAI's plugin is well-engineered. It gives Claude Code a set of slash commands — /codex:review, /codex:adversarial-review, /codex:rescue — that delegate work to Codex through a proper protocol. It has:

Background job execution with status tracking
State persistence across commands (job IDs, results, thread history)
A review gate that can automatically block Claude's output if Codex finds issues
Subagent delegation where Claude can hand off an entire task to Codex

Under the hood, it talks to Codex via the App Server Protocol (ASP) — an HTTP REST API with SSE streaming. You start Codex with codex --app-server, and the plugin makes HTTP requests to control threads, send prompts, and stream responses.

I wanted the exact same thing for Gemini. So I built it.

Building gemini-plugin-cc in one night

I sat down one evening and ported the entire plugin using Claude Code (which felt appropriately recursive). The goal was a 1:1 port — same commands, same job tracking, same review logic, same state persistence. The only thing that needed to change was the transport layer.

The result: gemini-plugin-cc

Same slash commands:

/gemini:review              # Gemini reviews your current changes
/gemini:adversarial-review  # Challenges your design decisions
/gemini:rescue              # Delegates a task to Gemini
/gemini:status              # Check on background jobs
/gemini:result              # Get output from finished jobs
/gemini:cancel              # Stop a running job
/gemini:setup               # Verify install + toggle review gate

The interesting part was adapting between two fundamentally different communication protocols.

The protocol difference: ASP vs ACP

This is where it gets technical.

Codex: App Server Protocol (HTTP + SSE)

Codex uses a custom HTTP-based protocol. The plugin starts codex --app-server, which spins up a local HTTP server. Communication looks like standard REST:

Plugin  ──[HTTP POST /thread/start]──>  Codex App Server
Plugin  <──[SSE event stream]──────────  Codex App Server
Plugin  ──[HTTP POST /thread/cancel]──>  Codex App Server

Threads are the unit of work. You start a thread with a prompt, stream results via SSE, and cancel or query threads by ID. Write control is managed with a sandbox parameter — "workspace-write" lets Codex modify files, "read-only" restricts it to analysis.

Gemini: Agent Client Protocol (JSON-RPC over stdio)

Gemini CLI uses ACP — the Agent Client Protocol. It's a JSON-RPC 2.0 protocol over stdio, designed as an open standard for AI agent communication (similar in spirit to LSP for language servers).

You start Gemini with gemini --acp, and it communicates entirely through stdin/stdout:

Plugin  ──[JSON-RPC over stdin]──>  gemini --acp
Plugin  <──[JSON-RPC over stdout]──  gemini --acp

Instead of threads, ACP uses sessions. Instead of an HTTP server, there's a persistent child process. The protocol is session-oriented:

// Initialize the connection
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{
  "protocolVersion": 1,
  "clientCapabilities": {
    "fs": { "readTextFile": true, "writeTextFile": true }
  }
}}

// Create a new session
{"jsonrpc":"2.0","id":2,"method":"session/new","params":{}}

// Send a prompt
{"jsonrpc":"2.0","id":3,"method":"prompt","params":{
  "sessionId": "sess-abc123",
  "prompt": [{"type":"text","text":"Review this diff for issues..."}]
}}

Responses stream back as JSON-RPC notifications:

{"jsonrpc":"2.0","method":"session/update","params":{
  "sessionId": "sess-abc123",
  "update": {
    "type": "agentMessage",
    "blocks": [{"type":"text","text":"I found 3 issues..."}]
  }
}}

The mapping

Here's how the concepts translate:

Concept	Codex (ASP)	Gemini (ACP)
Connection	HTTP server (`codex --app-server`)	stdio child process (`gemini --acp`)
Unit of work	Thread (`thread/start`, `thread/cancel`)	Session (`session/new`, `session/set_mode`)
Write control	`sandbox: "workspace-write"` vs `"read-only"`	`approvalMode: "auto_edit"` vs `"default"`
Streaming	SSE events from HTTP endpoint	JSON-RPC notifications over stdout
Thinking budget	`--effort` parameter (none → xhigh)	Not available via ACP (use `--model` instead)

Architecture: the broker pattern

One key design decision was the broker. You don't want to spawn a new gemini --acp process for every slash command — startup is expensive. So I built a persistent broker that keeps a single Gemini process alive for the entire Claude Code session.

Claude Code ──[Bash]──> gemini-companion.mjs ──[Unix socket]──> ACP Broker
                                                                    |
                                                              gemini --acp
                                                              (persistent)

The components:

gemini-companion.mjs — The main CLI entry point. Handles all slash command parsing and routing.
acp-broker.mjs — A daemon that spawns gemini --acp once and multiplexes JSON-RPC requests from the companion script via a Unix socket. Automatically cleaned up when the Claude Code session ends.
acp-client.mjs — Client library with a broker-first strategy: try the Unix socket first, fall back to spawning a direct gemini --acp process if the broker isn't running.

The broker listens on a Unix socket and forwards JSON-RPC messages to the persistent Gemini process's stdin, then routes responses back. This means the first /gemini:review might take a second to start the broker, but subsequent commands are near-instant.

Here's the core of acp-broker.mjs — how it spawns the Gemini process and handles the initialize handshake:

function spawnAcpProcess(cwd) {
  const child = spawn("gemini", ["--acp"], {
    cwd,
    stdio: ["pipe", "pipe", "pipe"],
    env: process.env
  });

  const rl = readline.createInterface({ input: child.stdout });
  rl.on("line", (line) => handleAcpLine(line));
  child.stderr?.resume(); // Drain stderr to prevent back-pressure.

  child.on("exit", (code) => {
    acpProcess = null;
    acpReady = false;
    // Reject all pending requests.
    for (const [id, pending] of pendingRequests) {
      send(pending.clientSocket, {
        jsonrpc: "2.0", id: pending.clientId,
        error: { code: -32000, message: "ACP process exited unexpectedly." }
      });
    }
    pendingRequests.clear();
  });

  acpProcess = child;

  // Send the initialize handshake — first response marks the broker as ready.
  const initId = nextRpcId++;
  sendToAcp({
    jsonrpc: "2.0", id: initId, method: "initialize",
    params: { clientInfo: { name: "gemini-plugin-cc-broker", version: "1.0.0" } }
  });
  pendingRequests.set(initId, { clientSocket: null, clientId: null });
  return child;
}

And how client requests get forwarded to the ACP process — note the "busy" check that prevents concurrent requests from corrupting the JSON-RPC stream:

function handleClientMessage(socket, line) {
  let message = JSON.parse(line.trim());

  // Broker handles initialize directly — no need to forward.
  if (message.method === "initialize") {
    send(socket, { jsonrpc: "2.0", id: message.id, result: {
      capabilities: {},
      serverInfo: { name: "gemini-plugin-cc-broker", version: "1.0.0" }
    }});
    return;
  }

  // Only one client can talk to the ACP process at a time.
  if (activeClient && activeClient !== socket) {
    send(socket, { jsonrpc: "2.0", id: message.id,
      error: { code: -32001, message: "Broker is busy with another request." }
    });
    return;
  }

  // Forward to gemini --acp with a broker-internal ID.
  activeClient = socket;
  const brokerId = nextRpcId++;
  pendingRequests.set(brokerId, { clientSocket: socket, clientId: message.id });
  sendToAcp({ jsonrpc: "2.0", id: brokerId, method: message.method, params: message.params ?? {} });
}

The broker remaps request IDs — the client sends id: 5, the broker forwards it as id: 17 (its internal counter), and when the response comes back, it maps 17 back to 5 before sending it to the client. This prevents ID collisions when multiple clients connect.

What stayed the same (and why that matters)

The beauty of the original codex-plugin-cc architecture is that most of it is protocol-agnostic. The command definitions, argument parsing, job tracking, state persistence, background worker management, diff collection, git context gathering, review prompt construction — none of that cares whether the backend is HTTP or JSON-RPC.

Porting required replacing only three things:

acp-client.mjs — replacing the HTTP client with a JSON-RPC stdio client
acp-broker.mjs — replacing the app-server broker with a Unix socket daemon
Execution functions in gemini.mjs — adapting the prompt submission and response parsing

Everything else — the review logic that collects diffs and untracked files, the adversarial review prompts, the rescue subagent configuration, the job state machine — was a direct copy.

Here's the factory in acp-client.mjs that implements the broker-first, direct-fallback strategy:

export class GeminiAcpClient {
  static async connect(cwd, options = {}) {
    let brokerEndpoint = null;

    if (!options.disableBroker) {
      // Try to find or start a broker.
      brokerEndpoint = options.brokerEndpoint
        ?? process.env[BROKER_ENDPOINT_ENV]
        ?? null;

      if (!brokerEndpoint && !options.reuseExistingBroker) {
        const brokerSession = await ensureBrokerSession(cwd, { env: options.env });
        brokerEndpoint = brokerSession?.endpoint ?? null;
      }
    }

    // Attempt broker connection first.
    if (brokerEndpoint) {
      try {
        const client = new BrokerAcpClient(cwd, { ...options, brokerEndpoint });
        await client.initialize();
        return client;
      } catch (error) {
        if (error?.code === BROKER_BUSY_RPC_CODE) {
          process.stderr.write("Broker busy, falling back to direct spawn.\n");
        } else {
          process.stderr.write(`Broker failed (${error?.message}), falling back.\n`);
        }
      }
    }

    // Direct spawn fallback — starts its own gemini --acp process.
    const client = new SpawnedAcpClient(cwd, options);
    await client.initialize();
    return client;
  }
}

BrokerAcpClient talks to the Unix socket. SpawnedAcpClient spawns gemini --acp directly as a child process. Both extend AcpClientBase which handles JSON-RPC request/response matching and notification routing — the caller doesn't need to know which transport is in use.

And here's how runAcpPrompt in gemini.mjs uses the client to send a prompt and collect streamed output:

export async function runAcpPrompt(cwd, prompt, options = {}) {
  const textChunks = [];
  const toolCalls = [];
  const fileChanges = [];

  // Collect streamed text from session/update notifications.
  const notificationHandler = (notification) => {
    const update = notification.params?.update;
    if (update?.sessionUpdate === "agent_message_chunk" && update.content?.type === "text") {
      textChunks.push(update.content.text);
    } else if (update?.sessionUpdate === "tool_call") {
      toolCalls.push({ name: update.toolName, arguments: update.arguments });
    } else if (update?.sessionUpdate === "file_change") {
      fileChanges.push({ path: update.path, action: update.action });
    }
  };

  const client = await GeminiAcpClient.connect(cwd, {
    env: options.env, onNotification: notificationHandler
  });

  try {
    // Create or resume a session.
    let sessionId = options.sessionId ?? null;
    if (sessionId) {
      await client.request("session/load", { sessionId, cwd, mcpServers: [] });
    } else {
      const session = await client.request("session/new", { cwd, mcpServers: [] });
      sessionId = session?.sessionId;
    }

    // Set approval mode and model.
    await client.request("session/set_mode", { sessionId, modeId: "autoEdit" });
    if (options.model) {
      await client.request("session/set_model", { sessionId, modelId: options.model });
    }

    // Send the prompt — text streams back via notifications, not the response.
    await client.request("session/prompt", {
      sessionId,
      prompt: [{ type: "text", text: prompt }]
    });

    return { sessionId, text: textChunks.join(""), toolCalls, fileChanges, error: null };
  } finally {
    await client.close();
  }
}

Notice how the prompt response itself is mostly metadata — the actual text streams in through session/update notifications that fire as Gemini generates output. This is a key difference from a typical request/response pattern.

What's different from the Codex plugin

A few things I couldn't port 1:1:

No --effort parameter. Codex supports --effort to control how much thinking budget it uses (from none to xhigh). Gemini CLI doesn't expose an equivalent via ACP. Instead, you pick a model — pro for heavy lifting, flash for speed, flash-lite for lightweight tasks.

/gemini:rescue --model pro investigate the flaky integration test
/gemini:rescue --model flash fix the issue quickly

Session vs thread semantics. Codex threads are fire-and-forget — you start one, it runs, you get results. Gemini sessions are more persistent. You can resume a session later with gemini --resume <session-id>, which means work delegated from Claude Code can be continued directly in Gemini CLI. This is actually an advantage I plan to lean into more.

ACP is a thinner protocol surface. Codex's App Server Protocol has dedicated endpoints for review, thread management, and configuration. ACP is more generic — review, for example, is implemented on top of the standard prompt flow rather than being a native protocol primitive. This means more logic lives in the plugin.

Trying it out

If you use both Claude Code and Gemini CLI:

# Add the marketplace
claude /plugin marketplace add sakibsadmanshajib/gemini-plugin-cc

# Install
claude /plugin install gemini@google-gemini

# Reload and verify
/reload-plugins
/gemini:setup

If you don't have Gemini CLI yet, /gemini:setup will offer to install it. The free tier (sign in with Google) gives you 60 requests/minute and 1,000/day — plenty for code review and task delegation.

Quick workflow examples

Review before shipping:

/gemini:review --base main

Pressure-test a design decision:

/gemini:adversarial-review challenge whether this caching strategy handles invalidation correctly

Delegate a long investigation:

/gemini:rescue --background investigate why the integration tests are flaky on CI
/gemini:status  # check progress later

Related work

Others have built similar tools — abiswas97/gemini-plugin-cc also adapted the codex-plugin-cc architecture for Gemini's ACP, and thepushkarp/cc-gemini-plugin takes a different approach focused on Gemini's long-context capabilities. The multi-agent CLI space is moving fast.

What's next

This is a 1:1 port right now. The command interface is faithful to the original, and the transport layer works. But there's room to go further:

Leverage Gemini's 1M context window more aggressively — sending entire directory trees for holistic analysis rather than just diffs
Better session continuity — making it seamless to pick up a delegated task in Gemini CLI directly
Gemini-specific review modes — taking advantage of capabilities that Codex doesn't have

This is early. I want to know what breaks, what's confusing, and what features would make this useful for your workflow.

Repo: github.com/sakibsadmanshajib/gemini-plugin-cc

Issues, feedback, and PRs are all welcome.

DEV Community