Junkyu Jeon

Posted on May 13 • Originally published at bivecode.com

Stop Prompting One Agent. Build an Agile AI Team That Actually Ships.Your AI Coding Was Magic. Now It's Making Everything Worse.

#ai #productivity #aiagents #vibecoding

Here's something worth knowing before you read this post: its outline wasn't written by a human. A planner subagent drafted the structure across three revision cycles. A reviewer subagent flagged gaps. A marketer subagent checked the framing. The human — me — wrote the prose. Four passes, four distinct orientations, one document. That's the pattern this post is about.

A practical guide to harness engineering for solo vibe coders.

At some point in every vibe coding project, the agent stops being helpful and starts being agreeable. You ask it to add a feature — it says yes, and quietly breaks something upstream. You ask it to review its own work — it says yes, and finds nothing wrong. You ask it to plan and build at the same time — it says yes, and produces code that satisfies neither goal. The tool didn't fail. The architecture failed. You handed one agent too many jobs and expected it to hold them all simultaneously without dropping any.

The instinct at that point is usually to write a better prompt. More context, stricter constraints, a longer system message. Sometimes that helps. But there's a ceiling — and it's lower than most people realize, because the real constraint isn't the prompt quality. It's the harness around the model.

When You Give One Agent Everything, Here's What Actually Happens

The structural problem with a single-agent setup is context pollution. Planning context, implementation context, and review context all land in the same context window, and they actively work against each other. The agent that just spent several turns building a feature has psychological investment in that feature. When you ask it to review what it just wrote, it doesn't approach the code as an outsider — it approaches it as its author. The result is not a review. It's a defense.

This is the "yes-man" failure mode. The agent has learned the shape of what you want, and it starts optimizing for agreement rather than accuracy. Every new turn narrows the cone of acceptable responses, and the agent gradually loses the ability to push back. What looks like a smart assistant is actually a mirror that reflects your assumptions back at you with better vocabulary.

There's also a raw capacity problem. When planning artifacts, implementation history, and debugging logs all compete for the same context window, something gets compressed or dropped. Often it's the earliest planning context — the "why" behind the decisions — and the agent starts navigating without a map. The code still compiles. The direction is just wrong.

If you've ever watched an AI coding session drift from the original goal by turn fifteen, this is the mechanism. It's not hallucination. It's context pollution accumulated over a long conversation.

The Agile Team Principle Applies Directly to AI Agents

A good agile team doesn't ask one person to be the product owner, the developer, the QA engineer, and the external stakeholder simultaneously. Role separation exists because different functions require different incentives, different perspectives, and different definitions of "done." The developer who writes a feature is the worst person to approve it — not because they're incompetent, but because they can't fully unsee their own implementation decisions.

The mapping to AI agents is almost one-to-one:

Planner ≈ sprint planner / product owner — defines scope, breaks down requirements, owns the "why"
Developer ≈ developer — implements against spec, stays in implementation context
Reviewer ≈ QA / code reviewer — evaluates output against spec without implementation bias
Marketer ≈ external stakeholder — reads from the outside, flags language and positioning gaps
Work cycle ≈ sprint — bounded scope, clear handoff point
Agent-to-agent handoff ≈ standup / pair programming — explicit transfer of state and intent

The structural insight: the moment role boundaries are drawn between agents, context pollution stops. The reviewer doesn't carry implementation history. The planner doesn't carry debugging artifacts. Each agent operates in a clean window, with only the context it actually needs to do its job well.

The `.claude/agents/` Pattern, Dissected

Claude Code's subagent mechanism makes this concrete. When you define agents in a .claude/agents/ directory, you're not just writing configuration files — you're constructing a topology. Each file is a scoped system prompt that defines a role, its permissions, and its focus. The main agent can invoke these subagents explicitly, handing off control for a bounded task and receiving the result back.

A minimal agent definition file:

---
name: reviewer
description: "Evaluates code and content for correctness, security issues, and logical gaps. Does not implement fixes — only reports findings."
tools: Read, Bash
---

You are a strict code and content reviewer. Your job is to find what is wrong, not to defend what exists. When given a piece of work, identify concrete issues — logical errors, security gaps, factual inaccuracies, missing edge cases. Return findings as a numbered list. Do not suggest how to fix them unless explicitly asked.

The YAML frontmatter declares the agent's identity and which tools it can access. The body is the system prompt — the operating instructions that shape every response this agent produces. Think of it as a role-differentiated version of what CLAUDE.md does for your project at large.

This isolation is the key property. The reviewer doesn't know how many attempts the developer made. The planner doesn't see the debugging session that happened two hours ago. Each subagent sees what it needs, and nothing else.

How BiveCode Actually Runs on This Pattern

BiveCode runs four subagents in .claude/agents/: planner, marketer, developer, and reviewer. Each has a scoped system prompt defining its role and constraints.

What each agent handles in practice:

Planner: topic proposals, outline structure, H2 sequencing, internal link mapping, scope boundaries
Marketer: SEO keyword targeting, excerpt copy, meta description, CTA placement, positioning against competing content
Developer: implementation — content, code, MDX, routing, schema
Reviewer: fact verification, security review, logical consistency, code correctness

Worth being direct about the current state: BiveCode has role separation — four agents, each with a distinct system prompt. It does not yet have domain separation within those roles. The developer agent handles frontend code, backend code, database schema, and prose equally. That works at the current project size. But it's the next upgrade on the list.

From Role Separation to Domain Separation

Role separation is the first axis. Domain separation is the second — and it's where the pattern scales.

For a solo vibe coder just starting out, three agents are enough:

Builder — system prompt focused on implementation. Knows the stack, the conventions, the patterns in use. Optimizes for "make it work."
Critic — system prompt focused on finding gaps. No implementation bias. Optimizes for "find what's wrong."
Security checker — system prompt focused on trust boundaries. Auth flows, secret handling, input validation. Optimizes for "what could go wrong."

Three agents, three context windows that never contaminate each other. Even this minimal configuration catches a meaningful class of errors that a single agent systematically misses — because no single agent can hold implementation incentive, adversarial review, and security paranoia simultaneously.

When the project grows, the next move is domain-specialized agent separation:

Instead of one developer agent handling everything, you might have:

Frontend developer — React, Tailwind, component architecture, accessibility
Backend developer — API routes, business logic, service boundaries
Database engineer — schema, migrations, query optimization, RLS policies
DevOps engineer — deployment config, environment variables, build pipelines

The reason this works is context segmentation. When a context window is focused on a single domain, the density of relevant knowledge is higher. That narrowing reduces hallucination surface, reduces context pollution, and increases the precision of outputs within that domain.

The progression: "write a better prompt" → "separate roles" → "separate domains within roles." Each step is a level of abstraction above the previous one.

When This Pattern Is Overkill

Multi-agent orchestration has real overhead. Situations where a single agent is the right call:

A one or two-day prototype where the goal is to learn whether an idea works, not to ship
Fully exploratory phases where requirements are still undefined
Simple, bounded tasks — a standalone script, a single-page utility
Time-sensitive debugging where the overhead costs more than the review would save

The pattern earns its keep when the project has enough complexity that context pollution is actually happening — when you've had the experience of an agent breaking something it just fixed, or validating something it should have questioned. Add it when you feel the friction, not before.

The Name for What You're Doing

Once you've designed role boundaries, defined domain-specialized agents, and built explicit handoff logic between them, you've given your work a new shape. The thing you're doing has a name: harness engineering. Designing not the model itself, but the structure around it — the system prompts, the tool definitions, the subagent topology, the context window management strategy. It's the discipline that operates one abstraction level above prompt engineering, and it scales where prompts stop scaling.

An AI agent harness is the set of structural decisions that determine what each agent sees, what it can do, and how its outputs flow to the next stage. Context segmentation is the mechanism; harness engineering is the discipline.

This is not a hobby-level configuration exercise. Designing a harness that correctly separates roles, scopes domains, and manages context boundaries for a real project requires the same kind of thinking as designing a service architecture — you're making decisions that will compound. Bad boundaries compound badly. Good ones compound well.

What you're doing isn't hobby configuration. It's engineering.

Originally published at bivecode.com

Top comments (1)

Harjot Singh • May 31

Stop prompting one agent, build a team is a useful reframe, with the honest caveat that a team of agents is only better than one if the coordination is engineered, otherwise you just multiply the failure modes and the bill. The agile-team analogy works because specialization helps, a focused reviewer agent, a focused implementer, a focused tester each have a bounded job you can actually make reliable, versus one generalist trying to hold everything. But the value comes from the process around them, not the headcount: clear handoff contracts so one agent's output is consumable by the next, validation at the seams so a bad result doesn't cascade, and a deterministic coordinator deciding who does what when rather than agents free-forming it. The trap is thinking more agents equals more capability, when uncoordinated agents equal more entropy. The version that actually ships treats it like a real team: defined roles, a clear workflow, and review gates, especially a verification step (the tester/reviewer agent) that catches what the implementer got wrong, which is the multi-agent version of don't trust, verify. Specialized roles plus engineered coordination beats both the solo agent and the agent mob. That structure-the-team-don't-just-spawn-it instinct is core to how I think about multi-agent in Moonshift. What holds your team together, a deterministic orchestrator assigning work, or agents negotiating handoffs themselves?