Multi-Agent Handoff With Ownership Boundaries Nobody Crosses

#ai #python #architecture #agents

Book: AI Agents Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I talked to had two agents in a loop for 14 minutes before the budget guard killed the run. The research agent kept handing back to the writer with status: "needs_more_sources". The writer kept handing back to the research agent with status: "draft_incomplete". Neither agent had a definition of "done" the other one trusted. The orchestrator logged 47 handoffs and a $9.12 token bill before it tripped the circuit breaker.

The fix was a contract, not a smarter prompt. Two agents, two narrow jobs, one schema between them, one explicit "I'm done" signal, and a span at the boundary so you can see the handoff in your trace.

The failure mode you are trying to avoid

The OpenAI Agents SDK and LangGraph both push you toward the same shape: agents transfer control through an explicit handoff primitive instead of calling each other directly (OpenAI Agents SDK docs, LangChain on Command). Two anti-patterns show up before teams adopt that shape.

Orchestrator-as-god. A single planner LLM holds the whole state, calls every sub-agent, and reasons about their output. Latency stacks. The planner becomes a bottleneck. One bad token in its context poisons the rest of the run.

Every-agent-can-call-every-agent. Each agent has tools that invoke any other agent. The graph becomes a mesh. There is no single place to log a handoff, no single place to enforce a budget, and the loop above is one prompt regression away.

The pattern below sits between them. Two agents, one direction of flow, one terminal signal.

The shared state schema

State is a typed object both agents read and write. Same shape on both sides. No hidden fields.

from dataclasses import dataclass, field
from typing import Literal
from uuid import uuid4

Status = Literal[
    "research_in_progress",
    "research_done",
    "writing_in_progress",
    "writing_done",
    "failed",
]

@dataclass
class Source:
    url: str
    quote: str
    confidence: float  # 0.0 - 1.0

A Source is the unit the research agent produces and the writer consumes. The writer never fetches its own sources. The research agent never writes prose.

@dataclass
class RunState:
    run_id: str = field(default_factory=lambda: str(uuid4()))
    topic: str = ""
    sources: list[Source] = field(default_factory=list)
    draft: str = ""
    status: Status = "research_in_progress"
    owner: Literal["research", "writer", "done"] = "research"
    handoff_count: int = 0

owner is the load-bearing field. At any moment exactly one agent owns the state. The other one is not allowed to mutate it. handoff_count is your loop guard. Bound it and you bound the failure above.

The handoff contract

Two rules, written down so both agents see the same thing.

The owner is the only writer. Everyone else gets a read-only view.
A handoff is a single function call that returns a payload, not a tool call into the other agent.

@dataclass
class HandoffPayload:
    from_agent: str
    to_agent: str
    reason: str
    state: RunState

def handoff(state: RunState, to: str, reason: str) -> HandoffPayload:
    state.handoff_count += 1
    if state.handoff_count > 4:
        state.status = "failed"
        state.owner = "done"
        raise RuntimeError(
            f"handoff budget exceeded: {state.handoff_count}"
        )
    payload = HandoffPayload(
        from_agent=state.owner,
        to_agent=to,
        reason=reason,
        state=state,
    )
    state.owner = to  # type: ignore[assignment]
    return payload

Four handoffs is a lot for two agents. If you need more, you have the wrong decomposition or the wrong "done" signal. The RuntimeError is intentional. Failing loud beats burning tokens.

The research agent

The research agent has one job: produce at least N high-confidence sources for the topic, then hand off. It does not write. It does not edit. It does not call the writer.

from openai import OpenAI

client = OpenAI()
MIN_SOURCES = 3
MIN_CONFIDENCE = 0.6

def research_agent(state: RunState) -> HandoffPayload:
    assert state.owner == "research", "not my turn"

    new_sources = fetch_sources(state.topic)
    state.sources.extend(
        s for s in new_sources if s.confidence >= MIN_CONFIDENCE
    )

    if len(state.sources) >= MIN_SOURCES:
        state.status = "research_done"
        return handoff(state, to="writer", reason="sources ready")

    state.status = "failed"
    state.owner = "done"
    raise RuntimeError("could not gather enough sources")

The "done" signal is a check on len(state.sources) against a threshold the team agreed to. When the check passes, ownership moves. When it fails, the agent fails the run instead of bouncing the work back.

fetch_sources is your retrieval call: a search API, a vector store, a scraper. The agent's only loop is inside its own turn. It does not loop with the writer.

The writer agent

The writer reads state.sources and produces state.draft. It cannot mutate sources. If the sources are bad, it fails. It does not bounce work back to research with a vague complaint.

def writer_agent(state: RunState) -> HandoffPayload:
    assert state.owner == "writer", "not my turn"
    assert state.status == "research_done", "input not ready"

    state.status = "writing_in_progress"
    state.draft = generate_draft(state.topic, state.sources)

    if len(state.draft) < 200:
        state.status = "failed"
        state.owner = "done"
        raise RuntimeError("draft too short")

    state.status = "writing_done"
    return handoff(state, to="done", reason="draft ready")

The writer's terminal handoff goes to the literal string "done", not back to research. There is no reverse edge in this graph. The flow is research → writer → done, and that is the whole topology.

If you find yourself wanting to add a writer → research edge for "needs better sources," resist it. Add a critic agent after the writer instead, or raise the bar inside the research agent. A reverse edge is the loop in disguise.

The boundary span

The handoff is the most interesting line in the trace. Instrument it. Both OpenAI's SDK and LangGraph emit handoff events you can pick up; if you are rolling your own, OpenTelemetry plus a context manager covers the shape.

from opentelemetry import trace

tracer = trace.get_tracer("agents")

def run(topic: str) -> RunState:
    state = RunState(topic=topic)

    with tracer.start_as_current_span("agent.research") as span:
        span.set_attribute("run_id", state.run_id)
        payload = research_agent(state)
        span.set_attribute("sources.count", len(state.sources))

    with tracer.start_as_current_span("agent.handoff") as span:
        span.set_attribute("from", payload.from_agent)
        span.set_attribute("to", payload.to_agent)
        span.set_attribute("reason", payload.reason)
        span.set_attribute("count", state.handoff_count)

    with tracer.start_as_current_span("agent.writer") as span:
        span.set_attribute("run_id", state.run_id)
        writer_agent(state)
        span.set_attribute("draft.len", len(state.draft))

    return state

The boundary span is what lets you answer "where did this run go wrong" without re-reading every prompt. Loops show up as handoff_count climbing in span attributes. A stall is one span much wider than the rest, and a wrong payload shows in reason.

If you are on Langfuse, Arize, Braintrust, or W&B Weave, the agent-handoff event is a first-class span type in 2026. Use it.

Why this is not orchestrator-as-god

There is no planner LLM in the loop above. The graph is hard-coded: run calls research, then handoff, then writer. The agents are LLM-backed; the topology is not.

This is the shape OpenAI's SDK calls a "handoff graph" and LangGraph implements with Command(goto=...) (How Agent Handoffs Work in Multi-Agent Systems). Each node decides where control goes next based on its own state check, not on a god-planner's reasoning. You get the dynamism of LLM agents without paying for an extra LLM turn at every junction.

The trade is honest: you give up the planner's flexibility for predictable cost, predictable latency, and a graph you can draw on a whiteboard. For a two-agent pipeline, that trade always wins.

Closer

When two agents loop, the bug is rarely in either prompt. It is in the gap between them. The loop your team is fighting next quarter is probably one handoff_count guard away from being a 90-second incident instead of a 14-minute one.

If this was useful

The handoff pattern above is one of about a dozen the AI Agents Pocket Guide walks through with the same level of code: supervisor, swarm, planner-executor, critic loops, tool-routing, and the failure modes each one ships with. If you are putting two or more agents in front of users this quarter, it is the kind of book you read in an afternoon and reach for in code review.