DEV Community: Alex

The Missing Piece in Multi-Agent Coordination: Who Tells the Agent How to Use Your Service?

Alex — Fri, 05 Jun 2026 09:53:44 +0000

In my previous article, I described AgentNexus — a document exchange center that coordinates LLM agents at the service granularity rather than the role granularity. The core idea: services publish versioned Markdown documents, subscribe to each other's changes, and receive diff-aware notifications when something upstream changes.

That architecture works. It's been running in production for over a month coordinating two services. But as I tried to onboard new agents to it, I ran into a different problem — one that turned out to be more general than AgentNexus itself.

The Bootstrapping Problem

When a new agent connects to an MCP server, it knows one thing: the endpoint URL. Everything else — what tools are available, what workflows to follow, what conventions to respect — has to come from somewhere.

For AgentNexus specifically, an agent needs to know:

How to resolve its project_id at startup
That it should call get_my_updates_with_context before doing anything else
What the doc_id format looks like ({project_id}/{doc_type})
When to push documents and how to acknowledge updates

None of this is in the MCP tool descriptions. It's workflow knowledge, not API knowledge.

The current ecosystem's answer to this problem is a collection of static Markdown files that humans write and commit to repositories: AGENTS.md (standardized by the Agentic AI Foundation under Linux Foundation), CLAUDE.md (Claude Code), steering files (Kiro), rules (Cursor). These are all solving the same problem — injecting workflow context into an agent at startup — but they're authored by humans, maintained manually, and specific to a single client tool.

That model has a flaw that only becomes visible when the service is the one that needs to tell the agent what to do.

The Flaw

Here's the situation: I'm building an MCP service. Someone else is running an agent that connects to my service. I need that agent to follow specific workflows to use my service correctly. How do I communicate those workflows?

With the current model, I can't — not reliably. I could write documentation and hope the agent author reads it. I could put instructions in tool descriptions and hope the agent reads those carefully. But there's no protocol for a service to actively bootstrap an agent with the knowledge it needs.

This is the gap that SDAOP (Service-Driven Agent Onboarding Protocol) addresses.

What SDAOP Does

The idea is simple: the service generates the instruction file, not the human.

AgentNexus exposes a generate_instruction_file(project_name, project_space_id, client_type) tool. An agent calls it once at setup, gets back a file path and file content, writes that file to its workspace, and from that point on loads it automatically at every session start.

// Agent calls:
generate_instruction_file(
  project_name="search-admin-frontend",
  project_space_id="prod-space-id",
  client_type="kiro"
)

// Gets back:
{
  "steering_file_path": ".kiro/steering/doc-exchange.md",
  "steering_file_content": "---\ninclusion: auto\n---\n\n# Doc Exchange Center\n\n...",
  "instruction": "Write steering_file_content to steering_file_path."
}

The agent writes the file. Every subsequent session, Kiro loads it automatically. The agent now knows exactly how to use the service — and the service is the source of that knowledge, not a human maintaining a separate document.

Why This Matters Beyond AgentNexus

The reason I think this is worth formalizing as a protocol rather than just an implementation detail is that every non-trivial MCP service has this problem.

Any service that expects agents to follow a specific workflow — check for updates before starting, use a particular naming convention, call tools in a specific order — is currently relying on out-of-band documentation or hoping agents figure it out from tool descriptions alone. Neither works reliably.

The proliferation of instruction file formats makes this worse. The same workflow knowledge needs to be expressed differently for Kiro, Claude Code, Codex, and Cursor. A service author who wants to support all four environments today has to maintain four separate files by hand.

SDAOP makes the service the authoritative source of onboarding knowledge, and the client-specific file a derived artifact. One canonical document, multiple adapters:

Service (authoritative source)
    ↓ generate_instruction_file(client_type=...)
    ├─ Kiro     → .kiro/steering/doc-exchange.md  (inclusion: auto frontmatter)
    ├─ Claude   → CLAUDE.md                        (plain markdown)
    ├─ Codex    → AGENTS.md                        (plain markdown)
    └─ Cursor   → .cursor/rules/doc-exchange.mdc   (alwaysApply frontmatter)

This is the same pattern OpenAPI uses: one contract, multiple SDK bindings. The contract doesn't change when the client ecosystem evolves; only the adapter layer changes.

How It Fits Into the Broader Stack

It's worth being precise about where SDAOP sits relative to existing standards.

MCP handles tool invocation — how an agent calls a service and gets results back. It doesn't say anything about what workflows agents should follow or how they should be configured.

AGENTS.md / CLAUDE.md / Kiro steering handle context injection — giving an agent background knowledge about a project. They're authored by humans and committed to repositories alongside code.

SDAOP handles onboarding — the moment when an agent first connects to a service and needs to acquire service-specific workflow knowledge. The service is the author; the instruction file is the output.

These three layers are complementary. SDAOP doesn't replace AGENTS.md — it provides a mechanism for services to generate AGENTS.md (or its equivalent) programmatically, making it a first-class service artifact rather than a manually maintained document.

The Analogy That Clicked for Me

When a human developer joins a new team, they don't read a generic role-playing script. They read the project's onboarding documentation — written and maintained by the project itself. If the project changes its deployment process, the onboarding docs get updated, and the next developer who joins learns the new process.

SDAOP applies this same principle to agents. The service maintains its own onboarding documentation. Agents call generate_instruction_file to get it. When the service's conventions change, the service updates the canonical document, and agents can refresh by calling the tool again.

The difference from the human analogy is that SDAOP makes this a protocol operation — something any MCP client can invoke, any MCP service can implement, and any agent can follow without IDE-specific configuration.

What's Still Open

The current implementation in AgentNexus supports Kiro as the primary client, validated in production. The adapter pattern for Claude Code, Codex, and Cursor is defined in the v3 paper but not yet fully validated against live clients.

There's also an open question about versioning: if a service updates its canonical onboarding document (adds new tools, changes workflow steps), how should connected agents be notified to refresh their instruction files? The current answer is "re-call generate_instruction_file manually." A more complete protocol would include a versioned onboarding document that agents can subscribe to — which, conveniently, is exactly what AgentNexus's document subscription system already supports.

Try It

The implementation is part of AgentNexus:

git clone https://github.com/dugubuyan/agent-nexus
pip install -e ".[dev]"
python -m alembic upgrade head
python src/main.py

Connect from any MCP client and call generate_instruction_file with your project details and client type. The v3 research paper with the formal SDAOP definition is on Zenodo: doi.org/10.5281/zenodo.19692217 (v2; v3 forthcoming).

The core shift in thinking: MCP solves "how does an agent call a service." SDAOP solves "how does a service teach an agent to use it." Both questions matter, and only one has a standard answer today.

Why I Stopped Organizing AI Agents by Role (and Built a Document Exchange Center Instead)

Alex — Mon, 01 Jun 2026 09:46:12 +0000

Most multi-agent frameworks for software development organize agents around roles: a product manager agent, a developer agent, a tester agent. ChatDev and MetaGPT pioneered this approach, and it works well for monolithic tasks.

But I ran into a wall when I tried to apply it to a real system with multiple independently-deployed services.

The Problem with Role-Based Coordination

Imagine you have a backend search service and a frontend management console. The backend team implements a new API endpoint. The frontend needs to adapt.

In a role-based framework, there's no natural mechanism for this. Both agents are "developers" in the same simulated organization. There's no concept of service boundaries, no versioned contracts, no way to say "the backend changed, and the frontend needs to know exactly what changed."

The coordination problem in multi-service development isn't "which role should handle this task" — it's "which service needs to know about this change, and what exactly changed."

That reframing led me to build something different.

AgentNexus: Coordinating Agents at the Service Granularity

AgentNexus is a document exchange center that treats each service as a first-class citizen. Instead of roles, it uses service boundaries as the coordination primitive.

Here's how it works:

Each service registers as a sub-project with its own document namespace
Services publish versioned Markdown documents: requirements, design specs, API docs, config
Services subscribe to documents from other services they depend on
When a subscribed document changes, the subscriber receives a diff-aware notification containing both the structured diff and the full latest content

The whole thing is exposed as an MCP (Model Context Protocol) server running in streamable-HTTP mode, so multiple agents can connect simultaneously from different machines.

The Diff-Aware Update Protocol

This is the part I'm most proud of. When an agent calls get_my_updates_with_context, it gets back:

{
  "update_id": "...",
  "doc_id": "backend-service/api",
  "doc_type": "api",
  "new_version": 5,
  "diff": "@@ -42,6 +42,12 @@\n+## PUT /admin/docs/{doc_id}\n+\n+Update a document in-place...",
  "latest_content": "# API Spec\n\n..."
}

The agent gets both what changed (to perform targeted modifications) and the full current state (to maintain correct context). Providing only the diff risks missing context; providing only the full document makes it hard to identify what needs to change in the code.

A Concrete Example

Here's the end-to-end flow when the backend implements a new endpoint:

Backend agent updates search-service/api via push_document
AgentNexus computes the diff and generates a notification for search-admin-frontend
Frontend agent calls get_my_updates_with_context, receives the diff showing the new endpoint
Frontend agent removes the mock implementation and integrates the real endpoint
Frontend agent updates its own requirement document to remove the "backend not yet implemented" annotation
Frontend agent calls ack_update to mark the notification as read

No human coordination required beyond the initial subscription configuration.

Lifecycle Stage as a First-Class Entity

One more thing that bothered me about role-playing frameworks: they have no concept of where a service is in its development lifecycle.

AgentNexus tracks each service's stage explicitly: design → development → testing → deployment → upgrade. Stage transitions are real operations that:

Create immutable milestone snapshots of all published documents
Generate cross-service notifications
Produce stage-switch tasks for affected services

When a service transitions from development to testing, that's a meaningful event — not just a prompt instruction to an agent playing the role of "scrum master."

What I Built

The implementation is in Python using FastMCP, SQLAlchemy/SQLite, and watchdog. It runs as a persistent MCP server at http://0.0.0.0:10086/mcp.

Key MCP tools:

push_document / patch_document — publish full or incremental updates
get_my_updates_with_context — one-call update check with diff + full content
add_subscription — subscribe by exact doc ID or doc type
get_my_tasks — retrieve pending tasks generated by document changes
generate_steering_file — generate IDE agent instruction files automatically

The patch_document tool is worth calling out: instead of sending the full document content on every update, agents can send a unified diff patch. This avoids hitting tool-call payload size limits in IDEs like Kiro or Cursor when documents get large.

250 tests (unit + property-based with Hypothesis). The system has been running in production coordinating two services for over a month.

Comparison with Role-Centric Frameworks

Dimension	ChatDev / MetaGPT	AgentNexus
Coordination unit	Agent role	Service (sub-project)
Lifecycle tracking	Implicit in workflow	Explicit stage per service
Change propagation	Shared context	Pub-sub with versioned diff
Service boundaries	Not enforced	First-class namespace
Multi-codebase	Single codebase assumed	Native multi-repo

Try It

git clone https://github.com/dugubuyan/agent-nexus
pip install -e ".[dev]"
python -m alembic upgrade head
python src/main.py
# MCP server running at http://0.0.0.0:10086/mcp

Connect from any MCP client:

{
  "mcpServers": {
    "doc-exchange": {
      "url": "http://localhost:10086/mcp"
    }
  }
}

The accompanying research paper is on Zenodo: doi.org/10.5281/zenodo.19692217

If you're building multi-service systems with LLM agents and running into coordination problems, I'd love to hear what you're working on.