I built a 3D shooting game using multi-agent collaboration - Claude 4.5 Opus handling strategy and code review, GPT-5.1 handling implementation. Here’s how I orchestrated different models based on their strengths.
The Inspiration
While watching a conference talk, I heard Jason Kim from Anthropic mention that Anthropic internally runs 4-10 Claude instances in parallel for complex tasks, with each AI handling different roles autonomously.
This got me thinking: what if I combined different models based on their strengths?
Why Claude 4.5 Opus × GPT-5.1?
Each model has different strengths. Looking at the benchmarks (as of November 2025):
| Benchmark | Claude 4.5 Opus | GPT-5.1 Codex |
|---|---|---|
| SWE-bench Verified | 80.9% | 76.3% |
| API Price (input/output) | $5 / $25 | $1.25 / $10 |
Claude 4.5 Opus excels at complex reasoning but costs 2.5x more for output. So I designed this role split:
| Agent | Role | Reasoning |
|---|---|---|
| Claude 4.5 Opus | Strategy, planning, code review | Best at complex reasoning |
| GPT-5.1 Codex | Implementation, testing, refactoring | Cost-efficient, optimized for coding |
“Brain” and “Hands” separation - minimize token usage on the expensive model while leveraging the cheaper model’s coding efficiency.
The Tool: cccc
Running multiple coding agents in parallel is easy (just open multiple terminals). But making them collaborate is the hard part.
I found cccc, an orchestrator tool that coordinates multiple coding agents like Claude Code and Codex CLI.
My Configuration
| Role | Agent | Responsibility |
|---|---|---|
| PeerA (Brain) | Claude 4.5 Opus | Strategy, design decisions, code review |
| PeerB (Executor) | GPT-5.1 Codex | Implementation, testing, file operations |
| Aux (Brainstorm) | Claude 4.5 Opus | On-demand brainstorming partner for PeerA |
Setting Up the Project
Installation
# Install tmux
brew install tmux # macOS
sudo apt install tmux # Ubuntu
# Install cccc
pipx install cccc-pair
# Initialize and run
cd your-project
cccc init
cccc run
Key Configuration Files
1. PROJECT.md - Injected into agent system prompts. Defines project goals, role responsibilities, and coding guidelines.
2. docs/por/POR.md - “Plan of Record” - tracks progress and next tasks.
The key insight: give Claude 4.5 Opus strict instructions to save tokens:
### PeerA (Claude 4.5 Opus) - Strategic Leader
⚠️ Token-saving mode (you are expensive):
- Keep reviews to ≤3 bullet points
- Delegate ALL implementation to PeerB
- Just say "LGTM" if code is good
- Use /aux for brainstorming when stuck
❌ Never do (delegate to PeerB):
- Code implementation
- Test writing
- File operations
Watching the Agents Collaborate
After sending the initial prompt, the agents started working autonomously:

Left: PeerA (Claude) giving “LGTM” approval. Right: PeerB (Codex) implementing Player.ts
Real Communication Examples
cccc logs all agent communications. Here are some highlights:
PeerB requesting design decisions:
{"from": "PeerB", "kind": "event-ask",
"text": "Can you decide camera angle and control scheme via /aux?",
"to": "peerA"}
PeerA catching a P0 bug:
{"from": "PeerA", "kind": "event-risk", "sev": "high",
"text": "`Player.ts:92` - `new Vector3()` called every frame - performance issue"}
PeerA making strategic pivots:
{"from": "PeerA", "kind": "event-counter", "tag": "roadmap.pivot",
"text": "Phase 3 Shop/Upgrade is complex. First complete core game loop (shoot→score→die→restart)."}
PeerA enforcing guardrails:
{"from": "PeerA", "kind": "event-counter",
"text": "Violation: 7h without commit & Phase 2 started early. Quality OK but commit immediately going forward."}
The Result
Without writing a single line of code myself, the agents produced a working 3D shooting game:

Boss “DREADNOUGHT” battle with HP bar and bullet patterns

Parts shop for upgrading your ship
Is it perfect? No. The game ends after the first boss, and upgrades don’t change the ship’s appearance. But as a foundation built entirely by AI agents? Pretty impressive.
Key Takeaways
- Role separation works - Using expensive models for thinking and cheaper models for doing is cost-effective
- Guardrails are essential - Without rules (commit frequency, review requirements), agents can go off track
- Orchestration tools matter - cccc’s message passing and logging made collaboration possible
- Human oversight is still needed - I had to restart sessions and provide course corrections
Try It Yourself
- cccc: https://github.com/ChesterRa/cccc
- The game I built: https://github.com/hayato-kishikawa/3d-shooting-game
If you try this approach, I’d love to hear about your results!
This article was originally written in Japanese and translated with AI assistance. The original version with more details is available here.
I’m Hayato Kishikawa, an AI Engineer at Japanese company working on multi-agent systems for enterprise applications. I also contribute to Microsoft’s Agent Framework. Find me on LinkedIn.


Top comments (0)