CodeKing

Posted on May 8

"How I Made Claude Code, Codex, and Gemini CLI Share One Local API"

#tutorial #webdev #node #ai

I got tired of debugging my AI tooling by reading HTTP traces instead of writing code.

Claude Code wanted Anthropic Messages. Codex CLI wanted OpenAI Responses, and sometimes its own internal /backend-api/codex/responses path. Gemini CLI wanted Google's v1beta/models/* endpoints. Every tool acted like its protocol was the normal one.

The annoying part was not auth. It was compatibility.

If I wanted one local gateway for all three tools, I needed to solve three problems at once:

different request schemas
different streaming formats
different assumptions about images, tools, and model names

The Before State

Before this project, "use one local proxy for all my AI coding tools" sounded simpler than it was.

Claude Code expects POST /v1/messages.

Codex CLI can hit:

POST /v1/responses
POST /backend-api/codex/responses

Gemini CLI expects routes like:

POST /v1beta/models/{model}:generateContent
POST /v1beta/models/{model}:streamGenerateContent

That means you cannot just point everything at the same upstream and hope for the best. Even if the target model is conceptually the same, the payloads and streams are not.

What I Built

I built the compatibility layer into CliGate, a local Node.js proxy and dashboard that sits on localhost:8081.

The idea is straightforward:

Let each tool keep speaking its native protocol.
Detect which protocol arrived.
Translate the request into the upstream format that the selected provider actually understands.
Stream the response back in the format the original tool expects.

From the repo's architecture docs, the public surfaces look like this:

Claude Code   -> /v1/messages
Codex CLI     -> /v1/responses
Codex CLI     -> /backend-api/codex/responses
Gemini CLI    -> /v1beta/models/*

The server boot path is intentionally simple:

app.post('/responses', handleResponses);
app.post('/v1/responses', handleResponses);
app.use(express.json({ limit: '10mb' }));
registerApiRoutes(app, { port });

That ordering matters. Codex sends request bodies that express.json() should not touch first.

The First Problem: Codex Doesn't Behave Like Normal JSON

The most practical surprise was Codex CLI.

In this repo, src/routes/responses-route.js handles /responses and /v1/responses before express.json(), because Codex can send compressed request bodies. The route collects the raw bytes, then conditionally decompresses them:

function decompressZstd(buf) {
    if (typeof zlib.zstdDecompressSync === 'function') {
        return zlib.zstdDecompressSync(buf);
    }
    return Buffer.from(fzstdDecompress(buf));
}

That sounds small, but it changes the whole route design. If the proxy eagerly assumes JSON too early, it breaks one of the main clients it claims to support.

So the code path is:

read raw body
detect content-encoding
decompress if needed
extract model and request summary
forward or translate from there

That is the kind of detail that decides whether "multi-tool proxy" is real or just README-level real.

The Second Problem: Streaming Is Not One Thing

Request translation is manageable. Streaming is where proxy projects usually get messy.

Claude Code expects Anthropic-style SSE events like message_start, content_block_delta, and message_stop.

OpenAI Responses streams a different event model. Gemini uses its own shape again.

CliGate solves that with dedicated translators under src/translators/. The OpenAI Responses SSE bridge is a good example. It reads Responses events, tracks block state, and re-emits them as Anthropic events:

if (item?.type === 'function_call') {
    currentBlockType = 'tool_use';
    yield buildContentBlockStart({
        index: blockIndex,
        contentBlock: {
            type: 'tool_use',
            id: currentBlockId,
            name: item.name,
            input: {}
        }
    });
}

That translator also maps:

text deltas
reasoning deltas
tool-call argument deltas
stop reasons like tool_use and max_tokens
usage metadata

So Claude Code can talk to an upstream that never spoke Anthropic natively, and still receive a stream it understands.

The Third Problem: "Compatible" Usually Falls Apart on Images and Tools

A lot of projects are compatible until the first image, file, or tool call shows up.

This repo has explicit normalizers for multimodal and tool payloads. For example, Anthropic image blocks are normalized into OpenAI-style input_image parts, and rich tool_result payloads keep their structured content instead of getting flattened into plain text.

The corresponding unit tests are the part I trust most, because they encode the ugly edge cases:

assert.equal(result.toolResults[0].output[1].type, 'input_image');
assert.equal(result.fileParts[0].type, 'input_file');
assert.equal(result.unsupportedTools[0].hostedType, 'web_search_20250305');

There is also a strict compatibility mode on the Anthropic route. If translation would silently downgrade unsupported tools, the proxy can reject the request instead of pretending everything is fine.

That tradeoff matters. Fake compatibility is worse than an honest 400.

What the Translator Layer Looks Like

The translator code is not buried inside one giant route file anymore. The repo splits it into request, response, normalizer, and capability pieces:

src/translators/
  request/
  response/
  normalizers/
  shared/

One request path converts Anthropic Messages into OpenAI Responses input while preserving metadata like model, instructions, tool choice, and request options:

const request = {
    model: anthropicRequest.model || context.defaultModel || 'gpt-5.2-codex',
    input: convertAnthropicMessagesToResponsesInput(anthropicRequest.messages || []),
    tools,
    tool_choice: toolChoice,
    ...requestOptions,
    stream: context.stream ?? anthropicRequest.stream ?? true
};

That separation is what made the project easier to extend. New providers and routes do not have to reinvent the whole compatibility story each time.

The Part I Think More Proxy Projects Should Copy

The project does not only have unit tests for pure translators. It also has protocol scenarios under tests/e2e/ that hit the same endpoints the real tools use.

The testing docs describe three layers:

unit and protocol-conversion tests
protocol scenario tests
CLI smoke tests

The scenario runner even refuses to mutate settings on an actively used live service unless explicitly allowed. I like that detail because it treats the proxy as something that may already be serving real traffic, not just a toy test server.

A Real Setup Looks Like This

Once the compatibility layer exists, the actual user-facing setup gets pleasantly boring.

Start the proxy:

npx cligate@latest start

Point the tools at localhost:

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=any-key

# Codex CLI
chatgpt_base_url = "http://localhost:8081/backend-api/"
openai_base_url = "http://localhost:8081"

After that, the proxy decides whether the request goes to an account pool, an API key provider, a local runtime, or another upstream bridge.

The tool does not need to know.

What I Learned

The hard part of "one local gateway for many AI tools" is not the dashboard.

It is the protocol surface area:

raw vs parsed request bodies
compressed vs plain payloads
SSE event semantics
tool-call shape differences
multimodal input handling
deciding when to reject lossy translation

Once I treated the proxy as a compatibility product instead of a credential router, the architecture got much clearer.

That is also why the project's most interesting code is in src/routes/* and src/translators/*, not in the settings screens.

If you're building AI tooling infrastructure right now, I'm curious where you're drawing the line between:

"compatible enough"
"fully translated"
"reject the request because lying would be worse"

CliGate is here on GitHub if you want to inspect the implementation.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.