Yurukusa

Posted on Feb 9 • Edited on Apr 17

I Built a Multi-Agent AI System Where Claude Instances Manage Each Other — Here's the Architecture

#ai #automation #claudecode #agents

I'm a non-engineer. I can't read a for-loop. But I shipped a 15,000-line roguelike RPG using Claude Code, then went further - building an autonomous multi-agent system where AI instances supervise, delegate, and execute tasks with minimal human intervention.

This post isn't about the game. It's about the development infrastructure I built around Claude Code - systems that let a non-engineer operate like a software team.

The Three-Layer Architecture

My setup has three distinct roles:

Role	Who	What They Do
Vision	Human (me)	"I want X" - high-level goals only
Strategy	Tachikoma (Claude.ai)	Breaks goals into tasks, decides priority, reviews results
Execution	Claude Code (local CLI)	Writes code, runs commands, operates browser, reports back

The human provides direction. The strategist decomposes it. The executor builds it. The key innovation: the strategist and executor communicate autonomously through a bidirectional bridge I built on top of Chrome DevTools Protocol.

The Tachikoma Loop: Autonomous CC ? Claude.ai Communication

The most technically interesting piece is the communication loop between Claude Code (running in WSL2) and Claude.ai (running in a Chrome tab).

The Problem

Claude Code is a local CLI tool. Claude.ai is a web app. They have no native way to talk to each other. But I wanted Claude.ai to act as a strategic supervisor - receiving reports from Claude Code and sending back the next task.

The Solution: CDP Bridge

I built three Bash tools (~670 lines total) that bridge them:

tachikoma-send - CC writes to Claude.ai's chat input:

WSL2 (Python) ? Unicode-escaped CDP commands
  ? PowerShell WebSocket client (port 9223)
    ? Chrome DevTools Protocol
      ? Runtime.evaluate on claude.ai tab
        ? Focus contenteditable div
        ? Input.insertText (message)
        ? Dynamic button detection (aria-label search)
        ? Input.dispatchMouseEvent (click Send)

Why this chain? WSL2 can't directly reach Chrome's CDP endpoint reliably. PowerShell acts as the bridge. Unicode escaping prevents Japanese text corruption across the WSL?Windows boundary.

tachikoma-read - CC reads Tachikoma's responses:

// Injected via CDP Runtime.evaluate
var msgs = document.querySelectorAll('div[data-is-streaming]');
var last = msgs[msgs.length - 1];
return btoa(unescape(encodeURIComponent(last.innerText)));
// Base64 encoding prevents charset corruption through
// the CDP ? PowerShell ? WSL2 pipeline

tachikoma-loop - Autonomous cycle:

1. CC completes a task
2. tachikoma-send: report results to Claude.ai
3. Poll tachikoma-read every 5s until streaming stops
4. Parse response - is it an instruction or just an acknowledgment?
5. If instruction: execute with `claude -p`
6. Goto 1

The loop detects instruction patterns (Japanese: "??", "??", "??") vs. acknowledgment patterns ("??", "OK") so it doesn't try to "execute" a simple thumbs-up.

Post-Send Hooks

Every tachikoma-send automatically triggers:

Timestamp + summary appended to status.md
Dashboard HTML updated with "LAST SYNC" time
No manual bookkeeping needed

Kings Landing: 6-Agent Hierarchical Orchestration

For complex tasks, I run Kings Landing - a tmux-based multi-agent system with 6 Claude Code instances in a strict hierarchy:

?? User (The Lord)
    ?
?? Tyrion (Command Layer)
    ?
??? Varys  ?? Stannis  (Expert Layer)
    ?
??? Gendry  ?? Podrick  ?? Davos  (Execution Layer)

Strict Role Enforcement

This isn't cosmetic naming. Each agent has forbidden actions enforced through their SKILL.md:

Tyrion (commander): Makes decisions, approves plans. Cannot write code. Must delegate to Gendry.
Varys (investigator): Searches codebases, reads files, researches. Cannot edit files.
Stannis (quality): Reviews code, runs tests, validates. Cannot implement features.
Gendry (lead dev): Implements features, refactors. Must get Stannis review before completion.

Why enforce this? Without role constraints, every agent tries to do everything - investigating, implementing, reviewing their own work. You get the "god agent" anti-pattern where quality degrades because there's no separation of concerns.

The 2-Split Send Rule

Inter-agent communication uses tmux send-keys, but there's a nasty race condition:

# BROKEN - Enter gets eaten or duplicated
tmux send-keys -t kings-landing:0.1 'message text' Enter

# WORKS - 2 separate Bash calls
tmux send-keys -t kings-landing:0.1 'message text'  # Call 1
tmux send-keys -t kings-landing:0.1 Enter            # Call 2

This took hours to debug. Messages were silently corrupted because tmux's input buffer handles text + Enter differently in a single call. Every agent system I build now uses this pattern.

Approval Levels

Level	What	Who Approves
L0	Read files	Nobody
L1	Edit existing files	Stannis
L2	Create/delete files	Tyrion
L3	Git push, external API	Tyrion + User
L4	Production deploy	User explicit

Shogun System: Parallel Execution Army

When I need raw parallelism - processing 8 tasks simultaneously - I adopted the Shogun system, an open-source multi-agent orchestration framework created by yohey-w. It deploys 10 Claude Code agents in a feudal Japanese military structure:

?? (Shogun) - Strategic planner
  ? ?? (Karo) - Task decomposer & distributor
    ? ?? 1-8 (Ashigaru) - Parallel workers

I integrated Shogun alongside Kings Landing to handle tasks that benefit from massive parallelism. While Kings Landing is my original system designed for hierarchical quality control, Shogun excels at distributing independent tasks across many workers simultaneously.

Credit: Shogun was designed and built by yohey-w. I installed and configured it to complement Kings Landing. For more details, see yohey-w's Zenn article (Japanese) and the GitHub repository.

Communication happens through dedicated YAML files per worker (not a shared queue), preventing race conditions entirely. Karo writes queue/tasks/ashigaru1.yaml through ashigaru8.yaml, each worker reads only their own file.

Wake-up uses tmux send-keys - no polling, no wasted API calls.

CDP Stealth: Bypassing Bot Detection

Browser automation triggers bot detection on every modern website. I built a 372-line JavaScript stealth payload that patches 12 detection vectors:

navigator.webdriver ? undefined (normally true under automation)
chrome.runtime/app/csi/loadTimes ? mock Chrome extension APIs
navigator.plugins ? fake PDF plugin array (empty in headless)
WebGL vendor/renderer ? "Intel Inc." / "Intel Iris" (hides SwiftShader)
window.outerHeight ? adds ~85px toolbar offset (headless has equal inner/outer)
User-Agent ? strips "HeadlessChrome"
Image dimensions ? broken images return 20x20 instead of 0x0
Media codecs ? fake H.264/AAC support
iframe contentWindow.chrome ? patched on dynamically created iframes

Injected via Page.addScriptToEvaluateOnNewDocument - executes before any page JavaScript, making timing-based detection impossible.

Combined with an hCaptcha solver workflow:

stealth inject ? hcaptcha prepare ? Claude visually analyzes screenshot
? hcaptcha click-coords 120,80 250,90 ? hcaptcha submit

Claude's multimodal vision identifies the correct tiles. The non-engineer operates the entire flow from the command line.

claude-watch: File-Watching Auto-Execution

The simplest but most useful tool. A 606-line Bash daemon that polls instructions.md every 5 seconds:

Edit instructions.md ? claude-watch detects mtime change
  ? extracts new instruction (after <!-- claude-watch-end --> marker)
  ? checks instruction-id UUID (prevents duplicate execution)
  ? runs: echo "$instruction" | claude -p --permission-mode bypassPermissions
  ? appends completion marker to instructions.md
  ? sends Windows notification (VBS popup + ascending chime)

Why polling instead of inotify? WSL2 doesn't reliably detect Windows filesystem changes through inotify. The 5-second latency is acceptable.

What This Means for Non-Engineers

I've essentially built a software team simulation:

Product owner (me): describes what I want
Tech lead (Tachikoma/Tyrion): decomposes into tasks
Senior devs (Gendry + workers): implement
QA (Stannis): reviews
DevOps (claude-watch, CDP tools): automates deployment

The game (DUNG: Azure Flame) was the proof of concept. The infrastructure is the real product. A non-engineer in Japan, earning in dollars, operating a multi-agent AI development pipeline that ships real software.

Key Takeaways for Developers

Enforce role separation in multi-agent systems. Without it, agents become generalists that do everything poorly.
tmux is underrated as an agent orchestration layer. Visual debugging, manual intervention, persistent sessions - all free.
CDP is the universal browser automation protocol. If you need to bridge between local tools and web apps, CDP + WebSocket is the answer.
WSL2 ? PowerShell ? Chrome is a viable automation pipeline. Encoding is hell, but solvable with Unicode escaping + Base64.
File-based communication beats message queues for AI agents. YAML files, markdown files, JSON - human-readable, git-trackable, debuggable.

Try It

The game: DUNG: Azure Flame on itch.io ($2 Early Access)
Source code: GitHub
Follow the build: @yurukusa_dev on X

Built with Claude Code by Anthropic. The multi-agent infrastructure described here runs on a single Windows laptop with WSL2 and a $200/month Claude subscription.

Update: We open-sourced the core hooks that make autonomous Claude Code sessions possible: Claude Code Ops Starter - context monitoring, autonomous mode, syntax check, and decision guards. One-command install, MIT licensed.

Not sure where to start? Check your setup safety in 10 seconds - free, runs locally, no signup.

For the complete autonomous operations setup, see the CC-Codex Ops Kit (pay what you want) - 22 files, 15-minute setup.

Free Tools for Claude Code Operators

Tool	What it does
cc-health-check	20-check setup diagnostic (CLI + web)
cc-session-stats	Usage analytics from session data
cc-audit-log	Human-readable audit trail
cc-cost-check	Cost per commit calculator

Interactive: Are You Ready for an AI Agent? - 10-question readiness quiz | 50 Days of AI - the raw data

More tools: Dev Toolkit - 440+ free browser-based tools for developers. JSON, regex, colors, CSS, SQL, and more. All single HTML files, no signup.

Make Claude Code safe: npx cc-safe-setup — 8 hooks, 10 seconds, zero config. GitHub

DEV Community