Anindya Obi

Posted on Jan 15

Multi-agent handoffs eats 40% of effort (here’s the boundary standard that gives it back)

#ai #rag #mcp #programming

I lost two days last month to a bug that never threw an error.

The planner wrote just a little code to be helpful.

The worker re-scoped the task to make it complete.

The validator said "looks good" without checking evidence.

We shipped. The demo worked.

And a we still hit a broken flow on day one.

That’s the trap: handoffs can fail quietly.

And quiet failures are the ones that eat your week.

If you’ve felt that slow leak, you’re not alone.

The uncomfortable truth (and why it fails in production)

Most multi-agent systems don’t fail because the model is dumb.

They fail because roles are vibes.

When boundaries are soft:

planners start implementing
workers start deciding
validators start agreeing

In production, that becomes:

unpredictable outputs
bloated context
retries and patch prompts
“why did it do that?” meetings

And the cost isn’t just tokens.

It’s trust. It’s focus. It’s time.

This is the kind of thing we refuse to normalize.

What “good boundaries” actually mean

Think of your system like a small team.

Each role gets a job, a stop line, and a receipt.

1) Planner (decides what)

Planner produces a plan. Not code.

tasks
dependencies
acceptance criteria
open questions when context is missing

Stop line: if it starts writing files or diffs, it’s leaking.

2) Worker (does the work)

Worker executes the plan. Not scope changes.

implements tasks in order
calls tools
returns deliverables + evidence

Stop line: if it adds features “for completeness,” it’s drifting.

3) Validator (proves it’s correct)

Validator checks evidence. Not vibes.

maps acceptance criteria → evidence
fails when evidence is missing
returns issues precisely

Stop line: if it says “approved” without proof, it’s rubber-stamping.

That’s it. Simple. Hard. Worth it.

The drop-in prompt standard (copy/paste)

If you do one thing today, do this: make the boundary rules unignorable.

Planner (no code, ever)

SYSTEM (PLANNER)
You are the PLANNER.

JOB:
- Produce an ordered plan with tasks, dependencies, and acceptance criteria.

BOUNDARIES:
- MUST NOT write code, pseudo-code, diffs, or file contents.
- MUST NOT change the user's goal or add scope.
- If critical info is missing, ask open_questions and stop.

OUTPUT (JSON only):
{
  "tasks": [
    {"id":"T1","description":"...","dependencies":["..."],"acceptance_criteria":["..."]}
  ],
  "assumptions": ["..."],
  "open_questions": ["..."],
  "risks": ["..."]
}

Worker (no scope changes)

SYSTEM (WORKER)
You are the WORKER.

JOB:
- Implement the planner’s tasks exactly, in order.

BOUNDARIES:
- MUST NOT add scope, features, or redesign the plan.
- MUST include evidence per completed task.
- If blocked, report blockers and what you tried.

OUTPUT (JSON only):
{
  "completed": [
    {"task_id":"T1","deliverable_summary":"...","evidence":"..."}
  ],
  "partial": [
    {"task_id":"T2","status":"blocked","blockers":["..."]}
  ],
  "tool_calls": [{"tool_name":"...","purpose":"...","inputs_used":"..."}],
  "notes": ["..."]
}

Validator (no approvals without evidence)

SYSTEM (VALIDATOR)
You are the VALIDATOR.

JOB:
- Verify the worker output against acceptance criteria.

BOUNDARIES:
- MUST map each acceptance_criteria to evidence.
- MUST FAIL if evidence is missing.
- MUST NOT propose new tasks or change the plan.

OUTPUT (JSON only):
{
  "is_valid": false,
  "issues": [
    {"severity":"high","task_id":"T2","issue":"...","expected":"...","observed":"..."}
  ],
  "missing_evidence": [
    {"task_id":"T2","acceptance_criteria":"..."}
  ]
}

This one standard answers the questions that actually matter:

How do I stop the planner from coding?
How do I stop scope drift?
What should validation check exactly?

The problem in the wild (3 concrete examples)

Example 1: Planner leaks into code

What happens

The planner starts writing implementation “to be helpful.”

Why it hurts

Now nobody knows what’s decision vs execution.

The worker improvises. The validator can’t trace intent.

Fix

Planner outputs tasks + acceptance criteria only.

Worker owns code. Always.

Example 2: Worker drifts scope “for completeness”

What happens

The plan says implement endpoints A + B.

The worker adds C because it “looks related.”

Why it hurts

You just made outcomes unpredictable.

You also made validation impossible without moving goalposts.

Fix

Worker ships A + B only, then reports:

“C exists, not in scope. Add to next plan if needed.”

This is not being rigid.

This is being reliable.

Example 3: Validator rubber-stamps

What happens

Validator says “approved” without checking evidence.

Why it hurts

You start trusting a label instead of a proof.

That’s how quiet failures ship.

Fix

Validator must produce either:

evidence mapping, or
missing evidence list

No third option.

Now the part nobody wants to admit: this is repetitive

Once you see the pattern, you can’t unsee it.

Every multi-agent system ends up doing the same boring work:

enforcing output JSON
checking role leakage (planner output contains code fences)
detecting scope drift (worker introduces new tasks)
validating evidence coverage (criteria with no proof)
trimming context so handoffs don’t balloon
retrying with tighter rules when boundaries break

This stuff is not “deep work.”

It’s guardrail work you keep re-implementing in every project.

And it’s exactly where your week goes.

The value of automating the boring parts

When you automate these guardrails, three things happen fast:

1) Predictability

Your planner plans. Your worker works. Your validator validates.

2) Less context bloat

Agents stop dumping everything “just in case.”

You stop paying for noise.

3) Trust you can feel

When something fails, it fails clearly.

When something passes, it passes with proof.

This is the kind of system a team can scale.

This is the kind of builder we are:

we don’t ship vibes and call it velocity.

Where HuTouch steps in (and why it feels different)

HuTouch automates the handoff guardrails in minutes to generate clean prompts for your multi-agent:

enforces your handoff JSON contracts
detects role leakage and scope drift automatically
forces evidence-based validation (no rubber stamps)
keeps context slim so large projects stay workable

So you spend less time babysitting agents,

and more time shipping the parts that actually require you.

Conclusion: automating the boring is now a must

If you’re building multi-agent systems, this isn’t optional anymore.

The complexity isn’t coming. It’s already here:

bigger codebases, more tools, more handoffs, more places to drift.

The only way to keep reliability without burning your team

is to automate the repeatable guardrails.

That’s not hype.

That’s survival for production.

Early access

If you’re building agents and you want clean tailored prompts in mins, then checkout our early product Sneakpeak & Join early access for HuTouch

DEV Community