DEV Community

PicklePixel
PicklePixel

Posted on

How I Made Claude Code Agent Teams Work With Any Model

Claude Code Agent Teams is the most capable multi-agent coding system I've used. You tell it to refactor your auth module, it spawns three teammates, they read files, write code, run tests, coordinate through task lists, and report back. Each teammate is a full Claude Code instance with 15+ tools. It's genuinely impressive.

There's one problem: every single agent has to be Claude. Your lead runs Opus at $15/M tokens. Your researcher runs Sonnet. Your reviewer runs Sonnet. A four-agent team working on a refactor can easily burn $5-10 in one session.

I wanted to keep the lead on Claude Opus and swap the teammates' brains to GPT. Honestly, I just wanted to stop burning money on tasks that don't need a frontier model.

The Wrong Approach (First)

My first instinct was to build a full custom agent framework. Agent Runtime. Universal Tool System. Provider Adapters. Coordination Layer. Spawner. I designed the whole thing. Around 2,000 lines of TypeScript, reinventing everything Claude Code already does perfectly.

Then it clicked: Claude Code IS the agent runtime. I don't need to rebuild it. I just need to change where it sends its API calls.

Every Claude Code teammate process communicates with its LLM through one endpoint: POST /v1/messages. It sends tool definitions, message history, system prompts. It expects back SSE-streamed responses with text and tool_use blocks.

The teammate never validates who is on the other end. It doesn't check if the responses actually come from Claude. It just sends Anthropic-format requests and executes whatever tool calls come back.

The hook is one environment variable: ANTHROPIC_BASE_URL. Set it to http://localhost:3456 and every API call goes to your proxy instead of Anthropic.

I confirmed this by pointing it at localhost:9999 with nothing listening. Claude Code hung waiting for connection. It respects the override completely.

So instead of building a framework, I built a translation proxy. Two API formats that do the same thing, just formatted differently. The proxy sits in the middle and translates in real-time.

What the Proxy Actually Does

Lead Agent (Claude Opus)
    |
    | ANTHROPIC_BASE_URL=http://localhost:3456
    |
Teammate Process (Claude Code CLI)
    |  -- thinks it's calling Anthropic --
    |
HydraProxy (localhost:3456)
    |  -- translates API format --
    |
GPT-5.3 Codex (or whatever model you want)
Enter fullscreen mode Exit fullscreen mode

The teammate is still a full Claude Code instance with every tool. Read, Write, Edit, Bash, Glob, Grep, Git. It just doesn't know its brain is GPT instead of Claude.

The translation has two parts: requests going out, and responses coming back.

Requests

Anthropic and OpenAI structure things differently but it's mostly a reshuffling:

  • Anthropic puts the system prompt as a top-level system field. OpenAI puts it as the first message with role: "system".
  • Anthropic defines tools as { name, input_schema }. OpenAI wraps them in { type: "function", function: { name, parameters } }.
  • Tool calls in Anthropic are tool_use content blocks inside a message. OpenAI puts them in a tool_calls array on the assistant message.
  • Tool results in Anthropic are tool_result blocks in user messages. OpenAI uses separate { role: "tool" } messages.

Pretty mechanical once you see the pattern.

SSE Streams (The Hard Part)

Both APIs stream via Server-Sent Events, but the event structure is completely different.

OpenAI gives you flat chunks:

data: {"choices":[{"delta":{"content":"Hello "}}]}
data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_123","function":{"name":"Read"}}]}}]}
data: [DONE]
Enter fullscreen mode Exit fullscreen mode

Claude Code expects this:

event: message_start
event: content_block_start  (index 0, type "text")
event: content_block_delta  (text_delta: "Hello ")
event: content_block_stop
event: content_block_start  (index 1, type "tool_use", name "Read")
event: content_block_delta  (input_json_delta: partial JSON...)
event: content_block_stop
event: message_delta        (stop_reason)
event: message_stop
Enter fullscreen mode Exit fullscreen mode

The proxy maintains a state machine that tracks block indexes, active tool calls, and whether a text block has been started. Each OpenAI chunk gets translated into the corresponding Anthropic event and written to the response stream. The model name gets spoofed too. Claude Code validates model names internally, so the proxy reports claude-sonnet-4-5-20250929 regardless of what's actually answering.

The Debugging Gauntlet

The architecture was clean. Reality was messier. Five bugs, each discovered sequentially because the previous one masked the next.

Query parameters. Claude Code sends POST /v1/messages?beta=true. My proxy matched on exact URL "/v1/messages". No match. Zero requests got through. Spent longer than I'd like to admit staring at an empty terminal before checking the actual URL.

Token counting. Claude Code sends 10+ POST /v1/messages/count_tokens requests on startup. The proxy returned 404 for all of them. Added a handler that returns estimated counts.

max_tokens overflow. Claude Code requests max_tokens: 32000. GPT-4o caps at 16384. OpenAI returned 400. Added a model-specific lookup table with clamping.

Non-streaming warmup. Claude Code sends a haiku warmup request with stream: undefined. Not false, not true. The proxy always set stream: true on the upstream call. The non-streaming response format is completely different from SSE. Had to detect and handle both paths.

Rate limits. Two teammates running GPT-4o-mini simultaneously blew through the 200K TPM limit in seconds. Added retry logic with exponential backoff.

After fixing all five:

$ ANTHROPIC_BASE_URL=http://localhost:3456 claude --print "what model are you?"
Enter fullscreen mode Exit fullscreen mode

Response: "I am Claude, an AI model developed by Anthropic..."

GPT-4o, pretending to be Claude, running through the full pipeline. It even maintained the Claude persona from the system prompt. But ask it about DALL-E and the GPT personality leaks through.

Then the real test: full agentic tool loops. A teammate spawned through the proxy successfully used Glob and Read tools across four round trips with 31 tool definitions. It searched files, read code, and reported back to the lead. GPT-4o-mini doing Claude Code's job at a fraction of the cost.

Mixed Teams: Lead on Claude, Teammates on GPT

The next challenge was routing. I wanted the lead on real Claude Opus (my subscription) and only the teammates going through the proxy. But all Claude Code processes have ANTHROPIC_BASE_URL set, so they all hit the proxy.

I tried three approaches:

Model name routing didn't work because teammates also request claude-opus-4-6 sometimes.

Tool count heuristic worked briefly. The lead had 31 tools (Claude Code's 15+ plus my MCP tools), teammates had 23. Route on count >= 28. Then I realized that adding or removing one MCP tool breaks the whole thing.

System prompt marker was the winner. I added <!-- hydra:lead --> as an HTML comment to my project's CLAUDE.md file. Claude Code injects CLAUDE.md into the system prompt. The proxy checks the system prompt for the marker. Found means passthrough to real Anthropic. Not found means translate to GPT.

Teammates don't get the CLAUDE.md from the main project. They get their own system prompt without the marker. Clean routing, zero false positives.

For the passthrough, the proxy just relays the original auth headers from Claude Code to the real Anthropic API. No API key needed for the lead. You use your subscription as-is.

The Subscription Hack: Zero-Cost Teammates

The proxy worked with OpenAI API keys. But API keys cost money. I already pay for ChatGPT Plus. Can I use that?

Turns out, yes. OpenAI's Codex CLI authenticates via ~/.codex/auth.json, an OAuth token. That token works with a different endpoint than the standard API:

POST https://chatgpt.com/backend-api/codex/responses
Enter fullscreen mode Exit fullscreen mode

This uses the Responses API format, which is different from both Chat Completions and the standard OpenAI API. Auth is a Bearer token plus a Chatgpt-Account-Id header extracted from the JWT.

I tested every model name I could think of. Found 9+ working models on ChatGPT Plus at zero additional cost:

Model Type
gpt-5-codex Full
gpt-5.1-codex Full
gpt-5.2-codex Full
gpt-5.3-codex Full (latest)
gpt-5-codex-mini Mini
gpt-5.1-codex-mini Mini

This meant building a second translation layer though. The Responses API has its own request and response format. So I wrote another pair of translators:

  • Request: Anthropic messages become input items with function_call and function_call_output types instead of tool_calls. System prompt becomes instructions. Must include store: false. Cannot include max_output_tokens or temperature (the backend rejects both, learned that the hard way).
  • Response: Different SSE events. response.output_text.delta becomes content_block_delta. response.function_call_arguments.delta becomes input_json_delta. And so on.

The proxy auto-reads ~/.codex/auth.json, decodes the JWT, extracts the account ID from a custom claim. No manual configuration. Just codex --login once and the proxy handles the rest.

node dist/index.js --model gpt-5.3-codex --provider chatgpt --port 3456 --passthrough lead
Enter fullscreen mode Exit fullscreen mode

Claude Code teammates powered by GPT-5.3-codex through a ChatGPT Plus subscription. The lead runs on Claude Opus through my Claude subscription. Total additional API cost: $0.

The Final Stack

Nine TypeScript files. Zero runtime dependencies. Just Node.js builtins.

src/
├── index.ts                    Entry point
├── proxy.ts                    HTTP server, 3-way routing
├── config.ts                   CLI args, codex JWT auth
└── translators/
    ├── types.ts                TypeScript interfaces
    ├── request.ts              Anthropic → Chat Completions
    ├── messages.ts             Message history translation
    ├── response.ts             Chat Completions SSE → Anthropic SSE
    ├── request-responses.ts    Anthropic → Responses API
    └── response-responses.ts   Responses API SSE → Anthropic SSE
Enter fullscreen mode Exit fullscreen mode

Three routing paths:

  1. Lead requests (hydra:lead marker found) pass through to real Anthropic
  2. Teammate requests with --provider openai translate to Chat Completions
  3. Teammate requests with --provider chatgpt translate to the Responses API

What I Learned

I originally designed a 2,000-line framework. What shipped was a translation proxy. Same result, fraction of the complexity. The best agent framework already existed. I just needed to make it talk to different backends.

The translation layer itself is honestly not that interesting. Two APIs that do the same thing, structured differently. The interesting part is what it enables: heterogeneous teams where each agent runs on whatever model makes sense for its task. Your lead on Opus because it needs strong reasoning. Your file searcher on GPT-4o-mini because it just needs to grep and summarize. Your code reviewer on GPT-5.3-codex because it's free through your subscription.

The real insight is that Claude Code Agent Teams is undervalued infrastructure. It's a complete multi-agent system with coordination, task management, messaging, plan approval, and graceful shutdown. Everyone's trying to build agent frameworks from scratch. The smart play is to extend the ones that already work.

The Repo

HydraTeams on GitHub

MIT licensed. If you have a ChatGPT Plus subscription and want free agent teammates, this is your move.

Top comments (0)