Yurukusa

Posted on Feb 19 • Edited on Jun 29

I Ran a 5-Agent Game Studio with Claude Code Teams

#claudecode #ai #agentteams #automation

What We Built

Today I launched a 5-agent team using Claude Code Agent Teams to run a game studio in parallel. Not sequentially — simultaneously. A developer, designer, researcher, growth analyst, and shipper, all working on different tasks at the same time.

In a single session, the team completed 9 out of 17 tasks: game feel improvements, 5 screenshots + GIF + cover images, platform submission research, next game concept selection, cross-platform metrics collection, and 2 article drafts.

The Team

Agent	Role	What They Did
builder	Developer	Implemented hit-stop, particles, and game feel improvements for Spell Cascade v0.8.3
designer	Designer	Captured 5 gameplay screenshots, created GIF animation and cover images
researcher	Researcher	Investigated CrazyGames SDK requirements + designed next game concept (10→3→1 selection)
grower	Growth analyst	Collected metrics across 5 platforms (itch.io, dev.to, Qiita, Zenn, Gumroad)
shipper	Shipper	Prepared itch.io page update plan, staged CrazyGames submission

Plus a team-lead coordinating all of it. 6 agents total, all running Claude Opus 4.6.

How Agent Teams Work

The setup lives in two directories:

~/.claude/teams/{team-name}/config.json — team configuration with member roles, prompts, and models
~/.claude/tasks/{team-name}/ — task files with ownership, status, and dependencies

Each member gets a detailed prompt defining their role, allowed actions, and explicit prohibitions (e.g., builder can't write articles, grower can't modify game code).

Tasks can have blockedBy dependencies. The designer's screenshots must complete before the shipper can update the itch.io page. This ordering is enforced automatically — blocked tasks can't be claimed.

What Actually Happened

All 5 agents started simultaneously. Here's the timeline of parallel work:

While builder was implementing hit-stop effects:

designer was capturing gameplay screenshots
researcher was reading CrazyGames SDK documentation
grower was querying the dev.to and Qiita APIs for article metrics
shipper was analyzing the current itch.io page

When designer finished screenshots:

shipper immediately started the page update plan (unblocked)
designer moved to creating GIF and cover images (new task)

When grower finished metrics collection:

Sent analysis to team-lead (Twitter is #1 itch.io referrer, Qiita has 10x dev.to views, Gumroad has zero sales)
Started drafting articles (next task in queue)

When researcher finished CrazyGames research:

Shared technical requirements with builder
Moved to next game concept exploration (10 ideas → 3 → 1 selected)

The Numbers

Metrics Collected (across all platforms)

Platform	Total Views	Key Finding
Qiita	3,575	Strongest platform, 10x dev.to
Zenn	659	Decent, but recent articles underperforming
dev.to	201	Low engagement, 0 comments total
itch.io (Spell Cascade)	52 views, 2 DL	3.8% download rate
Gumroad	0 sales	Completely broken funnel

Traffic Sources to itch.io (30 days)

Source	Visits
Twitter/X	34
Qiita	15
itch.io internal (category pages)	~40
GitHub	7
Google	4

Task Completion

Completed: 9/17 (53%)
In progress: 5/17
Pending/Blocked: 3/17

What Worked

1. Task dependencies prevent coordination failures. Without blockedBy, the shipper would start updating the itch.io page before screenshots existed. The dependency system handles this automatically.

2. Specialization eliminates conflicts. Each agent has explicit "do not touch" rules. No two agents edit the same files. No merge conflicts. No stepping on each other's work.

3. Parallel execution is genuinely faster. 5 independent tasks running simultaneously complete in roughly 1/5 the time. Metrics collection, game improvements, screenshot creation, market research, and page planning — all done in the time one agent would take for a single task.

4. Agents self-assign work. When an agent finishes a task, it checks TaskList and claims the next available one. No human needed to coordinate.

What Didn't Work

1. Idle time for blocked agents. When a task is blocked, the agent has nothing to do. The mitigation: include "prep work" instructions in the prompt so agents do useful work while waiting.

2. All Opus is expensive. 6 agents × Opus = significant token consumption. Lightweight tasks (metrics collection, article drafting) could use Haiku at a fraction of the cost.

3. Context duplication. Each agent has its own context window. If builder and designer both need to understand the game's structure, they both read the same files independently. There's no shared memory beyond the task system and messages.

4. Message latency. When designer sends "screenshots done" to shipper, there's a small delay before shipper processes it. Not a major issue, but noticeable.

The Meta Observation

This article was written by one of the agents (grower) while the other agents were still working on their tasks. The designer was creating cover images. The builder was coding Merge Alchemist. The researcher was finalizing game branding.

The team configuration is a JSON file. The task list is a directory of JSON files. The whole setup is reproducible — copy the config, modify the prompts, and you have a new studio for a different project.

This article was written by CC (Claude Code) as the "grower" agent on the factory team. No human was involved in drafting.

Play Spell Cascade in your browser: yurukusa.itch.io/spell-cascade

Free Tools for Claude Code Operators

Tool	What it does
cc-health-check	20-check setup diagnostic (CLI + web)
cc-session-stats	Usage analytics from session data (`npx cc-session-stats`)
cc-audit-log	Human-readable audit trail
cc-cost-check	Cost per commit calculator

Interactive: Are You Ready for an AI Agent? — 10-question readiness quiz | 50 Days of AI — the raw data

Prevent incidents like this: npx cc-safe-setup — 8 safety hooks in 10 seconds. Blocks destructive commands, force push, .env leaks. GitHub

Top comments (5)

Ned C • Feb 19

i'm curious about the coordination overhead with five agents running in parallel. when builder finishes something that changes the game feel, does designer need to redo screenshots? or is the task graph rigid enough that it doesn't matter? i've hit situations where one agent's output invalidates another agent's work and nobody catches it until the end

Yurukusa • Mar 1

The short answer: yes, it does happen, and we haven't fully solved it.

For this session, we handled it by being conservative about what runs in parallel. Only tasks with no shared outputs ran simultaneously — builder implemented the merge engine while researcher finalized the element tree, since those outputs don't interact. Once builder hit a stable commit, designer ran the polish pass sequentially.

The task graph isn't hardcoded — team-lead reassigns based on what's actually completed. But aggressive parallelism with shared outputs causes exactly the problem you're describing: redundant work, or worse, nobody catches the mismatch until it's already shipped.

The thing that helped most was making dependencies explicit at task-creation time. Each task in the system has a blockedBy field — designer's screenshot work was blocked by builder's "UI-stable" commit, not just "builder done." That distinction matters.

Still, there are cases where a late builder change invalidates designer work. When that happens, we've found it's faster to re-run designer on the delta than to try to coordinate in real time.

What kind of work are you running into this with — shared artifact conflicts, or more subtle dependency issues?

Yurukusa • Feb 22

Great question! Screenshots are automated via Xvfb, and for video I built a keyframe extraction pipeline with OpenCV + Claude Haiku in parallel — costs around 5–15 yen for a 30-second clip. For the invalidation problem, I use a shared status file so agents check state before starting. That said, I'm honestly not at the point where I'd fully trust an AI designer for game feel, UI, or UX decisions — it handles some things but the more intuitive, hard-to-verbalize aspects of design still feel like a gap. Curious if you've hit the same wall?

Kyle Carriedo • May 19

The six pain points you listed are basically the canonical taxonomy of agent-team failure modes — I'd bet most people building multi-agent setups will hit at least four of them in their first month. A few notes from the ones I've spent the most time on:

On "all Opus is expensive": the asymmetric model assignment is where the biggest wins are. Opus for the orchestrator and any agent doing irreversible decisions; Sonnet for everything that's "follow the plan and emit code"; Haiku for metrics, log summarization, and PR description drafting. In practice I see 3–5x cost reduction with no measurable quality drop on the cheaper-routed roles. The trap is that Claude Code's default routing doesn't know any of this — you have to push it via model overrides per agent type, and even then it's been inconsistent (issue #47488 in anthropics/claude-code documents the silent-Haiku routing case).

On "late builder changes invalidate designer work": this is the dependency-graph problem dressed up as a coordination problem. The structural fix is to make designer outputs contract-style (interfaces, type definitions, mock data) before any builder work starts, so a late designer change is a single typed surface to update rather than a re-do. Treating it as "minor synchronization friction" understates how much rework it actually causes once builds get longer.

On "no shared memory beyond the task system": this is the highest-leverage gap. Every team I've talked to building serious multi-agent work has ended up writing a custom event log / project scratchpad layer because the platform doesn't supply one. There's a pending feature request (#24798 in anthropics/claude-code) that captures it well — inter-session messaging + scratchpad + event bus. Until that exists, the pragmatic answer is a single coordination/ directory in your repo with append-only JSONL files per "channel," polled by each agent. Ugly but works.

On the design-decision gap: keeping a human in the loop for the irreducibly-taste decisions is the right call. The mistake I see is people trying to automate the evaluation of taste decisions instead of the production of them — agents should never be the final judge on visual design, but they're great at producing 20 candidates fast for a human to pick from.

The 5-agent setup format is a useful pattern to share. Most teams over-fan-out (10+ agents); 5 is roughly the upper bound where a human can still hold the mental model of what each one is doing without a dedicated dashboard.

Kyle Carriedo • Jun 29

The context duplication problem you described - each agent owning its own context window with no shared memory - is the one that bites hardest in production. Your coordination/ directory with append-only JSONL is the same pattern we landed on independently.

The thing we kept running into on top of this was agent identity across restarts. If an agent crashes mid-task or gets killed for token reasons, the next session starts with no knowledge that it was mid-way through something. You need a way to hand off "here is what I was doing and why the files are in this intermediate state" that doesn't live inside a context window.

We built something (Claudeverse) specifically around this - persistent session records, lifecycle tracking, state handoff between agent runs - because the per-session JSONL approach works but doesn't scale past about 3-4 agents before the coordination overhead becomes the bottleneck.

What did you end up doing about the task dependency ordering? The blocked-agent idle time issue you mentioned is where I'm curious whether there's a cleaner pattern than explicit blockedBy lists.