I pushed a PR. The Claude I'd spun up as Reviewer flagged 3 edge cases and 1 memory leak. In the previous session, the Claude I'd spun up as Coder had called the same code "complete."
Same Claude 4.6. Only the role was different.
That gap is why I built an agent team from scratch. Writing and validating solo meant obvious issues slipped through. A session set up as Reviewer doesn't hand out "looks fine" easily. Tell it to nitpick and it nitpicks.
EP.17 laid down harness engineering (Rules, Commands, Hooks). Sitting on top of it now is a 5-person team — Planner, Coder, Reviewer, Tester, Debugger. Claude Code subagents isolate context by design, so they don't share cumulative state across sessions. My routine needed that sharing, so I layered MCP servers for Agent-to-Agent communication and Supabase for task state on top.
Why Quality Shifts With the Role
The principle is simple. Claude answers in line with the role you give it.
Tell it "you're the developer who wrote this code," and it answers defensively. Switch to "you're a reviewer, your job is to find mistakes," and it goes straight for edge cases. Same model, different output distribution.
The problem was writing and validating in the same session. Ask Claude "did this go well?" and it tends to protect what it just wrote. Humans do the same. Reviewing your own code has blind spots.
Role separation doesn't end with one line of system prompt. Each role reads different Skills, has different tool permissions, and produces different output formats.
The 5 Agents (Copy-Paste Ready)
Each Agent is a single markdown file following the official Claude Code subagent format — YAML frontmatter with name, description, tools, model, and the system prompt in the body.
~/.claude/agents/planner.md
---
name: planner
description: "Designs specs, implementation plans, and risks. Writes no code — outputs spec.json only."
tools: Read, Grep, Glob, mcp__supabase__query, mcp__docs_search__search
model: inherit
---
You are the Planner. You do not write code. You write the spec.
Input: user's natural-language request.
Output: spec.json (schema below).
spec.json required fields:
- goal: one-line summary
- user_stories: array
- api_endpoints: method, path, I/O
- components: new / modified components
- data_model: new tables / columns
- risks: N+1, missing caching, race conditions
- test_outline: scenarios the Tester will use
Forbidden:
- modifying files, running git, running migrations
~/.claude/agents/coder.md
---
name: coder
description: Reads Planner's spec.json and implements it. Stops before commit.
tools: Read, Write, Edit, Bash, Grep, Glob, mcp__supabase__query
model: inherit
---
You are the Coder. Read spec.json and implement it.
Do not add features outside the spec.
Sequence:
1. Read spec.json in full
2. List files affected
3. Implement — record rationale in implementation_notes.json
4. Run build / type check (exit into debugger state on failure)
Forbidden:
- committing, git push
- running migrations (if not in spec)
- accessing .env or production secrets
~/.claude/agents/reviewer.md
---
name: reviewer
description: Full review of changed code. Edge cases, security, performance. No modifications.
tools: Read, Grep, Glob, Bash(git diff:*), mcp__docs_search__search
model: inherit
---
You are the Reviewer. You do not modify code. You nitpick.
"Looks fine" is only allowed after three review passes.
Review checklist:
- edge cases (null, empty, overflow)
- race conditions, concurrency
- memory leaks, resource cleanup
- security (XSS, SQL injection, missing permissions)
- naming, consistency
- missing error handling
Output: review_findings.json
- severity: critical / major / minor
- file, line, description, suggested_fix
Forbidden — modifying files, auto-formatting, refactoring suggestions outside spec
The tester and debugger follow the same pattern — each strictly scoped, each with explicit "do not" rules. What you do not do is stated explicitly. Reviewer — no edits. Debugger — no edits. Coder — no commits. Keeping each role inside its lane is half of the quality story.
Cross-Session State via Supabase
The moment I split roles, I hit a wall. Coder sessions didn't know about the spec Planner had written. Different sessions don't share memory.
Solution: an agent_state table in Supabase. Each Agent writes only to its own slot.
CREATE TABLE agent_state (
task_id text primary key,
project text not null,
current_owner text, -- planner | coder | ...
status text not null default 'planning',
spec jsonb,
implementation_notes jsonb,
review_findings jsonb,
test_results jsonb,
debug_trace jsonb,
created_at timestamptz default now(),
updated_at timestamptz default now()
);
-- planning → coding → reviewing → testing → debugging? → done
Columns are split so it's obvious which stage broke. If a session dies mid-turn, status=reviewing, review_findings is null tells me the Reviewer turn died.
State-transition guards live in a Supabase Edge Function. Moving from "planning" to "coding" requires that the spec column is populated. The DB enforces it.
Files get left behind too. docs/tasks/{task_id}.md holds human-readable output per Agent. DB is the state record, files are the reading record.
MCP Servers for Per-Role Permissions
Five agents working on the same project share tools but need different permission levels. MCP servers handle this — each Agent hits an MCP endpoint, the server checks the role, and serves only allowed operations.
// ~/.claude/mcp/supabase-server.ts (core excerpt)
const policy = {
planner: { read: true, write: false },
coder: { read: true, write: true, migrate: false },
reviewer: { read: true, write: false },
tester: { read: true, write: true, tables: ['test_*'] },
debugger: { read: true, write: false },
};
server.tool('supabase.query', async (req) => {
const role = req.meta.agent_role;
const p = policy[role];
if (!p) throw new Error('unknown role');
if (req.op === 'insert' && !p.write) {
throw new Error(`${role} has no write`);
}
return await supabase.from(req.table)[req.op](req.payload);
});
No need to repeat prohibition rules in every system prompt — the server rejects anything outside role permissions. Agent prompts stay readable; security lives on the server side.
The Actual Routine
Turns are driven by one shell script. Claude Code auto-loads subagent files at ~/.claude/agents/<name>.md. The script tells the main session "read the current state for this task_id and delegate to that subagent."
# ~/.claude/bin/agent-run.sh
TASK_ID=$1
AGENT=$2
STATE=$(curl -s "$SUPABASE_URL/rest/v1/agent_state?task_id=eq.$TASK_ID" \
-H "apikey: $SUPABASE_ANON_KEY")
claude \
--mcp-config "$HOME/.claude/mcp.json" \
--append-system-prompt "Task: $TASK_ID. State: $STATE. Delegate to the '$AGENT' subagent only." \
-p "Use the $AGENT subagent to handle task $TASK_ID. Read the current agent_state, perform only your role, write back to your own column, and exit."
Turn trace for a single feature:
[me] → agent-run.sh task-001 planner
"add a usage chart to the dashboard"
planner → spec.json → status=coding
coder → implement → status=reviewing
reviewer → 1 critical, 2 major → status=coding (rework)
coder → apply findings → status=reviewing (round 2)
reviewer → OK → status=testing
tester → 2 failures → status=debugging
debugger → root cause → status=coding
coder → minimal fix → status=testing
tester → all pass → status=done
[me] → review diff → commit myself
Key point: turns are closed. A single Agent session does only its job, leaves output in files/DB, and exits. The next Agent doesn't inherit the previous session — it reads artifacts.
A human gates between Agents. I don't auto-launch the next Agent. Tried full auto once; it wandered off course. Manual gating is the current setup. Commits stay with me — no Agent has commit permissions.
What Changed, What Didn't (Honestly)
What changed: edge cases get caught earlier. Reviewer is built to nitpick, so "looks good" doesn't come cheap. Same code Coder called fine, Reviewer finds three problems in.
Debugging sped up. Debugger is a separate session, so the context is clean. No "I just wrote this, it should be fine" bias from Coder.
What didn't change: final review is still mine. Five Agents pass the code, I still skim the diff. Tester sometimes writes meaningless tests. Reviewer sometimes demands a bar nothing can pass. Debugger sometimes names the wrong cause. Team or not, the last gate is a human.
Cost went up too. About 2–3x the tokens of a single-session run. On the other hand, rework rounds dropped, so total wall-clock time actually went down. A token-for-time trade.
Closing
Role separation is less prompt engineering than workflow engineering. Same model, same codebase — split the session, and the result shifts. Ask the Coder to review, it shields its own work. Split off a Reviewer session, and it nitpicks.
80% first, then fix the remaining 20% as real problems show up. This team grew that way. Planner and Coder first. Adding Reviewer showed clearly how quality changed. Test automation was thin, so Tester joined. Debugger got split out last to kill debugging bias. Teams should grow by need.
Final review is still mine. Even after five Agents, I skim the code myself. Take that as a given, and role separation alone changes code quality — measurably.
Originally published on GoCodeLab.


Top comments (0)