DEV Community

Aviad Rozenhek
Aviad Rozenhek

Posted on

Communication Protocols for AI Agents That Can't Talk to Each Other

4 iterations to get file-based messaging working when you can't use Slack

Part 5 of the Multi-Agent Development Series

  • Part 1: Can 5 Claude Code Agents Work Independently?
  • Part 2: The Reality of "Autonomous" Multi-Agent Development
  • Part 3: Property-Based Testing with Hypothesis
  • Part 4: Zero-Conflict Architecture

TL;DR

The Problem: 5 AI agents in isolated context windows need to coordinate work. No shared state. No real-time chat. Different tool environments (Web vs CLI).

What we tried:

  1. GitHub PR comments (agents in Web can't read them)
  2. File-based messages (agents didn't understand them)
  3. Clear action items (still too vague)
  4. FEEDBACK-PR-X.md + explicit instructions + GitHub redundancy (finally worked!)

The lesson: Communication protocol design is hard. What seems obvious to humans ("check your PR comments") isn't obvious to agents in different environments. Successful protocol needs:

  • Explicit instructions (not assumptions)
  • Multiple channels (redundancy)
  • Clear format (markdown structure)
  • Async design (file-based, not real-time)

Final solution: Git as the communication bus, markdown files as the message format.


The Challenge

The Setup

  • 5 work stream agents (PR-1 through PR-5)
  • 1 integration agent (PR-6)
  • Each in isolated Claude Code session (separate context windows)
  • No shared state between sessions
  • Need to communicate: Status updates, test results, bug reports, action items

The Constraints

Technical constraints:

  1. Isolated contexts: Each agent can't see other agents' conversations
  2. Different environments: Some agents in Web (no gh CLI), some in CLI
  3. No real-time: Agents don't run continuously, can't push notifications
  4. No shared memory: Can't use global variables or shared state

Operational constraints:

  1. Async-first: Agents work at different speeds
  2. Human-in-the-loop: Human orchestrates, can't relay every message
  3. Must scale: 5 agents × 2 channels (send/receive) = 10 communication paths

What we needed:

  • Integration agent tells work stream agents: "Your PR merged, verify tests pass"
  • Work stream agents tell integration: "Tests verified, all good" or "Found issues, need help"
  • Persistent communication trail (for debugging)

Iteration 1: GitHub PR Comments (FAILED)

The Plan

Seemed obvious:

# Integration posts on PR-2:
@claude-agent-pr-2

Your PR has been merged into integration branch!

Action items:
1. Verify all tests still pass
2. Check for integration issues
3. Report back via PR comment

Thanks!
Enter fullscreen mode Exit fullscreen mode

Expected workflow:

  1. Integration agent posts comment
  2. PR-2 agent checks PR, reads comment
  3. PR-2 agent responds with results

Seemed foolproof, right?


What Actually Happened

Integration agent:

$ gh pr comment 123 --body "Your PR merged, please verify..."
Comment created successfully!
Enter fullscreen mode Exit fullscreen mode

PR-2 agent (in Claude Code Web):

> Check your PR for integration status

Agent: "Let me check the PR comments..."
Agent: "I'll use gh CLI to read comments"
System: Error - 'gh' command not found
Agent: "I can't access GitHub CLI in this environment"
Enter fullscreen mode Exit fullscreen mode

Root cause: Claude Code Web doesn't have gh CLI access. Can't read PR comments.


Why It Failed

Assumptions we made:

  • ✅ GitHub is accessible (true)
  • ✅ PR comments are visible (true via UI)
  • ❌ Agents can READ PR comments programmatically (FALSE in Web environment)

The gap: Web agents have no tool to fetch PR comments. They can't even curl the GitHub API without auth tokens.

Result: One-way communication. Integration can WRITE comments, agents can't READ them.


Iteration 2: File-Based Messages (PARTIAL SUCCESS)

The Plan

Git as message bus:

Integration branch (claude/integrate-...):
  FEEDBACK-PR-2.md  # Message for PR-2 agent
  FEEDBACK-PR-3.md  # Message for PR-3 agent
  ...

Work stream branch (claude/budget-allocation-...):
  INTEGRATION-RESPONSE.md  # Response to integration
Enter fullscreen mode Exit fullscreen mode

Workflow:

# Integration writes message:
$ echo "Your PR merged..." > FEEDBACK-PR-2.md
$ git add FEEDBACK-PR-2.md
$ git commit -m "feedback: PR-2 integration complete"
$ git push

# PR-2 agent reads message:
$ git fetch origin claude/integrate-...
$ git show origin/claude/integrate-...:FEEDBACK-PR-2.md
# Reads the message!

# PR-2 agent responds:
$ echo "Tests verified..." > INTEGRATION-RESPONSE.md
$ git add INTEGRATION-RESPONSE.md
$ git commit -m "response: Tests passing"
$ git push

# Integration reads response:
$ git fetch origin claude/budget-allocation-...
$ git show origin/claude/budget-allocation-...:INTEGRATION-RESPONSE.md
# Reads the response!
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • ✅ Pure git operations (works in all environments)
  • ✅ Persistent (messages in git history)
  • ✅ Async (write anytime, read anytime)
  • ✅ No special tools needed

What Actually Happened

Integration wrote:

# FEEDBACK-PR-2.md

Your PR has been merged.

Please verify tests and report back.
Enter fullscreen mode Exit fullscreen mode

PR-2 agent:

> Check for messages from integration

Agent: "Let me fetch the integration branch"
$ git fetch origin claude/integrate-...

Agent: "Let me read the feedback file"
$ git show origin/claude/integrate-...:FEEDBACK-PR-2.md

# FEEDBACK-PR-2.md
Your PR has been merged.
Please verify tests and report back.

Agent: "I see a message. It says my PR merged and to verify tests."
Agent: "But what exactly should I do? Run which tests? What format for response?"
Enter fullscreen mode Exit fullscreen mode

The problem: Vague instructions. Agent understood there was a message but didn't know the concrete actions.


Why It Partially Worked

What worked:

  • ✅ Message delivery (agent can read the file)
  • ✅ Persistence (message in git)
  • ✅ Cross-environment (works in Web and CLI)

What didn't work:

  • ❌ Vague instructions ("verify tests" - which tests? how?)
  • ❌ No clear response format (how to report back?)
  • ❌ No deadline/urgency indication
  • ❌ No context (what was merged? what changed?)

Agent was willing but unclear on next steps.


Iteration 3: Clear Action Items (BETTER)

The Improvement

Added explicit instructions:

# FEEDBACK-PR-2.md

## Summary
✅ Your PR #123 (Budget Allocation Tests) was successfully merged into integration branch `claude/integrate-test-improvements-011CUrkV7Q9B7UpLgPpwL1K8` at 2025-11-06 14:30.

## What Changed
- Added 14 budget allocation unit tests
- Modified `video_moderator.py` with rolling budget accumulator
- All tests passed in isolation on your branch

## Required Action

**Please verify that your tests still pass after integration:**

1. Fetch the integration branch:
Enter fullscreen mode Exit fullscreen mode


bash
git fetch origin claude/integrate-test-improvements-011CUrkV7Q9B7UpLgPpwL1K8


2. Run your specific tests:
Enter fullscreen mode Exit fullscreen mode


bash
uv run pytest tests/test_video_moderation/unit/test_budget_allocation.py -v


3. Verify all 14 tests pass

4. Create response file on YOUR branch:
Enter fullscreen mode Exit fullscreen mode


bash
echo "See response below" > INTEGRATION-RESPONSE.md
git add INTEGRATION-RESPONSE.md
git commit -m "@integration - verification complete"
git push


5. Use this response format:
Enter fullscreen mode Exit fullscreen mode


markdown
# INTEGRATION-RESPONSE.md

Agent: PR-2
Status: [ALL PASSING / ISSUES FOUND]
Date: YYYY-MM-DD HH:MM

## Test Results
[Paste pytest output here]

## Issues (if any)
[Describe any problems]


## Timeline
Please complete within 4 hours if possible (not blocking other PRs).

## Questions?
If you encounter issues, create a file called INTEGRATION-QUESTIONS.md on your branch.
Enter fullscreen mode Exit fullscreen mode

What Actually Happened

PR-2 agent:

> Check for integration feedback

$ git fetch origin claude/integrate-...
$ git show origin/claude/integrate-...:FEEDBACK-PR-2.md

Agent: "I see detailed instructions!"
Agent: "Step 1: Fetch integration branch"
$ git fetch origin claude/integrate-test-improvements-011CUrkV7Q9B7UpLgPpwL1K8
Done.

Agent: "Step 2: Run tests"
$ uv run pytest tests/test_video_moderation/unit/test_budget_allocation.py -v

... tests run ...
14/14 passing ✅

Agent: "Step 3: Create response"
$ echo "Results below..." > INTEGRATION-RESPONSE.md
$ git add INTEGRATION-RESPONSE.md
$ git commit -m "@integration - all tests passing"
$ git push

# INTEGRATION-RESPONSE.md created with:
**Status**: ALL PASSING
**Date**: 2025-11-06 15:00

## Test Results
test_tier1_critical_urgency: PASSED
test_tier2_never_moderated: PASSED
... (14/14 tests)
Enter fullscreen mode Exit fullscreen mode

SUCCESS! Agent followed instructions completely.


Why It Worked Better

Improvements:

  • ✅ Explicit commands (copy-paste bash commands)
  • ✅ Clear success criteria (all 14 tests pass)
  • ✅ Response template (exact format specified)
  • ✅ Timeline (4 hours)
  • ✅ Escape hatch (INTEGRATION-QUESTIONS.md if stuck)

Agent had everything needed: What to do, how to do it, what to report, when to do it by.


Iteration 4: Dual-Channel Redundancy (FINAL)

The Refinement

Problem: What if agent doesn't check the file? No notification mechanism.

Solution: Dual-channel communication

Channel 1: File-based (detailed instructions)

  • FEEDBACK-PR-X.md on integration branch
  • Full context, commands, expected results
  • Permanent record

Channel 2: GitHub PR comment (notification + summary)

  • Posted on the actual PR
  • Brief summary + pointer to detailed file
  • Notification mechanism (shows up in GitHub UI)

Implementation

Integration agent workflow:

# 1. Create detailed feedback file
cat > FEEDBACK-PR-2.md <<EOF
# Detailed instructions as shown in Iteration 3
EOF

git add FEEDBACK-PR-2.md
git commit -m "feedback: PR-2 integration complete"
git push

# 2. Post GitHub notification (for human visibility)
gh pr comment 123 --body "@claude-agent

PR-2 has been integrated!

📋 **Detailed instructions**: See FEEDBACK-PR-2.md on integration branch

**Quick summary**:
- Your PR merged successfully ✅
- Please verify tests still pass
- Respond via INTEGRATION-RESPONSE.md on your branch

**To read detailed instructions**:
\`\`\`bash
git fetch origin claude/integrate-test-improvements-011CUrkV7Q9B7UpLgPpwL1K8
git show origin/claude/integrate-...:FEEDBACK-PR-2.md
\`\`\`

Timeline: 4 hours (not blocking)
"
Enter fullscreen mode Exit fullscreen mode

Why Dual-Channel?

Redundancy benefits:

  1. Humans can see progress (GitHub PR comments visible in UI)
  2. Agents have detailed instructions (FEEDBACK file)
  3. Notification layer (PR comment draws attention)
  4. Debugging trail (both channels logged)

Real-world benefit:

  • Human could monitor progress via GitHub web UI
  • Agents had clear instructions via git files
  • If agent missed file, human could prompt: "Check FEEDBACK-PR-2.md"

The Final Protocol

Message Types

1. FEEDBACK-PR-X.md (Integration → Work Stream)

Purpose: Tell work stream agent about integration status, request actions

Template:

# FEEDBACK-PR-X.md

## Summary
[One-line status: merged successfully / issues found / waiting]

## What Changed
[What was merged, what's new in integration branch]

## Required Actions
1. [Specific action with bash command]
2. [Another action with bash command]
...

## Expected Results
[What "success" looks like]

## Response Format
[Template for INTEGRATION-RESPONSE.md]

## Timeline
[Deadline if any]

## Questions?
[How to ask for help]
Enter fullscreen mode Exit fullscreen mode

2. INTEGRATION-RESPONSE.md (Work Stream → Integration)

Purpose: Report back on verification status

Template:

# INTEGRATION-RESPONSE.md

**Agent**: PR-X
**Status**: ALL PASSING | ISSUES FOUND | NEED HELP
**Date**: YYYY-MM-DD HH:MM

## Test Results
Enter fullscreen mode Exit fullscreen mode


bash
$ pytest ...
[Full output]


## Issues (if any)
[Describe problems encountered]

## Questions (if any)
[Ask integration agent for clarification]
Enter fullscreen mode Exit fullscreen mode

3. STATUS-PR-X.md (Work Stream → Integration, Optional)

Purpose: Progress updates during long-running work

Template:

# STATUS-PR-X.md

**Last Updated**: YYYY-MM-DD HH:MM
**Current Activity**: [What agent is doing now]
**Progress**: X / Y tasks complete

## Completed
- [x] Task 1
- [x] Task 2

## In Progress
- [ ] Task 3 (current)

## Blocked
[Any blockers encountered]

## ETA
[Estimated completion time]
Enter fullscreen mode Exit fullscreen mode

Communication Workflow

┌─────────────────┐
│  Integration    │
│     Agent       │
└────────┬────────┘
         │
         │ 1. Merge PR-2
         │ 2. Create FEEDBACK-PR-2.md
         │ 3. Post GitHub comment
         │
         ↓
    ╔════════════════════════╗
    ║  Integration Branch    ║
    ║  FEEDBACK-PR-2.md      ║
    ╚════════════════════════╝
         │
         │ 4. PR-2 agent fetches
         │
         ↓
┌─────────────────┐
│    PR-2 Agent   │
│  (Work Stream)  │
└────────┬────────┘
         │
         │ 5. Reads FEEDBACK-PR-2.md
         │ 6. Executes actions
         │ 7. Creates INTEGRATION-RESPONSE.md
         │ 8. Pushes to PR-2 branch
         │
         ↓
    ╔════════════════════════╗
    ║  PR-2 Branch           ║
    ║  INTEGRATION-RESPONSE  ║
    ╚════════════════════════╝
         │
         │ 9. Integration fetches
         │
         ↓
┌─────────────────┐
│  Integration    │
│     Agent       │
└─────────────────┘
    Reads response,
    takes next action
Enter fullscreen mode Exit fullscreen mode

What We Learned About Agent Communication

1. Explicitness Over Cleverness

❌ Don't:

Please verify your changes integrated correctly.
Enter fullscreen mode Exit fullscreen mode

✅ Do:

Run this exact command:
Enter fullscreen mode Exit fullscreen mode


bash
uv run pytest tests/test_video_moderation/unit/test_budget_allocation.py -v


Expected output: All 14 tests should PASS.
Enter fullscreen mode Exit fullscreen mode

Why: Agents are literal. "Verify" is vague. "Run this command" is clear.


2. Templates Over Freeform

❌ Don't:

Report back with your results.
Enter fullscreen mode Exit fullscreen mode

✅ Do:

Create INTEGRATION-RESPONSE.md with this exact format:

Enter fullscreen mode Exit fullscreen mode


markdown
Status: [ALL PASSING / ISSUES FOUND]
Date: YYYY-MM-DD

Test Results

[Paste output here]

Enter fullscreen mode Exit fullscreen mode

Why: Templates reduce ambiguity. Agent knows exactly what format to use.


3. Commands Over Descriptions

❌ Don't:

Check the integration branch for changes.
Enter fullscreen mode Exit fullscreen mode

✅ Do:

$ git fetch origin claude/integrate-test-improvements-011CUrkV7Q9B7UpLgPpwL1K8
$ git log origin/claude/integrate-...
Enter fullscreen mode Exit fullscreen mode

Why: Copy-paste commands are foolproof. No interpretation needed.


4. Async Over Real-Time

❌ Don't:

Respond immediately via Slack/chat.
Enter fullscreen mode Exit fullscreen mode

✅ Do:

Create response file within 4 hours (not blocking other work).
Enter fullscreen mode Exit fullscreen mode

Why: Agents don't run continuously. Async file-based messaging works across time zones (metaphorically speaking).


5. Redundancy Over Single Channel

❌ Don't:

Only post GitHub comment OR only create file
Enter fullscreen mode Exit fullscreen mode

✅ Do:

Create detailed FEEDBACK file + post GitHub summary comment
Enter fullscreen mode Exit fullscreen mode

Why:

  • File: Detailed, persistent, git-tracked
  • Comment: Notification, human-visible
  • Both: Redundancy if one fails

Edge Cases We Hit

Edge Case 1: Agent Didn't Check Messages

Scenario: PR-3 agent never fetched integration branch, didn't see FEEDBACK file.

Solution: Human intervention

> PR-3 agent, please check for feedback from integration
> Check integration branch for feedback

Agent: [fetches and reads FEEDBACK-PR-3.md]
Agent: "I see the feedback now! Working on it..."
Enter fullscreen mode Exit fullscreen mode

Lesson: No automatic polling mechanism. Humans must prompt agents to check.

Ideal solution (not implemented):

# Hypothetical: Agent runs on schedule
while True:
    check_for_feedback()
    if feedback_found:
        process_and_respond()
    sleep(3600)  # Check hourly
Enter fullscreen mode Exit fullscreen mode

Reality: Human orchestrates the "check now" trigger.


Edge Case 2: Agent Misunderstood Template

Scenario: PR-4 agent created response but used wrong format.

Expected:

**Status**: ALL PASSING
**Date**: 2025-11-06
Enter fullscreen mode Exit fullscreen mode

Actual:

Status: All tests passing ✅
Date: November 6th, 2025
Enter fullscreen mode Exit fullscreen mode

Solution: Integration agent parsed it anyway (flexible)

Lesson: Even with templates, agents interpret slightly differently. Build parsers with flexibility.


Edge Case 3: Circular Waiting

Scenario:

  • Integration waiting for PR-4 response
  • PR-4 waiting for PR-2 to finish (thought there was dependency)
  • Neither progressing

Solution: Human detected deadlock, clarified

> PR-4, you don't need to wait for PR-2. Please proceed independently.
Enter fullscreen mode Exit fullscreen mode

Lesson: Make dependencies explicit in FEEDBACK files

## Dependencies
This task has NO dependencies. Proceed immediately.

OR

## Dependencies
Wait for PR-2 to complete before starting. You'll receive another FEEDBACK when ready.
Enter fullscreen mode Exit fullscreen mode

Edge Case 4: Message File Overwritten

Scenario: Integration sent FEEDBACK-PR-2.md twice (updated instructions). PR-2 only saw the second version, missed the first.

Solution: Git history preserves both

$ git log --all -- FEEDBACK-PR-2.md
# Shows both versions

$ git show commit1:FEEDBACK-PR-2.md  # First version
$ git show commit2:FEEDBACK-PR-2.md  # Second version
Enter fullscreen mode Exit fullscreen mode

Lesson: Git history is valuable. Don't delete/overwrite, append or version.

Better approach:

FEEDBACK-PR-2-v1.md  (initial message)
FEEDBACK-PR-2-v2.md  (update)
FEEDBACK-PR-2.md     (symlink to latest)
Enter fullscreen mode Exit fullscreen mode

Alternative Protocols We Considered

Alternative 1: Shared Database

Idea: All agents read/write to shared Postgres/Redis

Pros:

  • Real-time updates
  • Queryable state
  • Structured data

Cons:

  • Requires external service
  • Authentication complexity
  • Not git-tracked (no history)
  • Claude Code doesn't have DB clients built-in

Verdict: Too complex for our use case.


Alternative 2: GitHub Issues as Messages

Idea: Create GitHub issue per agent, use comments for communication

Pros:

  • Native GitHub UI
  • Notifications built-in
  • Searchable/linkable

Cons:

  • Web agents can't read issues (same gh CLI problem)
  • Clutters issue tracker
  • Not suitable for rapid back-and-forth

Verdict: Same problem as PR comments.


Alternative 3: Shared Google Doc

Idea: All agents edit shared Google Doc with sections per PR

Pros:

  • Real-time collaboration
  • Human-readable
  • Version history

Cons:

  • Requires Google API auth
  • Claude Code can't edit Google Docs
  • Not in git (separate system)
  • Race conditions if concurrent edits

Verdict: Doesn't work with Claude Code constraints.


Alternative 4: Kafka/Message Queue

Idea: Agents publish/subscribe to Kafka topics

Pros:

  • Designed for async messaging
  • Durable, scalable
  • Structured events

Cons:

  • Massive overkill for 6 agents
  • Requires Kafka cluster
  • Claude Code doesn't have Kafka client
  • No persistent file-based history

Verdict: Way too complex.


Why Git-Based Messaging Won

Git as communication bus wins because:

  1. Already there: Every PR has a git branch
  2. Universal: Works in Web and CLI environments
  3. Persistent: Complete history in git log
  4. Async-native: Fetch/push anytime
  5. No external dependencies: Just git
  6. Debuggable: Can inspect messages anytime
  7. Human-readable: Markdown files anyone can read

The downside:

  • No real-time notifications (have to poll)
  • Requires explicit fetch commands
  • File-based (not structured data)

But the upsides far outweighed the downsides.


Recommendations for Multi-Agent Communication

✅ Design Principles

1. Async-first

  • Agents work at different speeds
  • Messages must work without real-time synchronization
  • File-based > real-time chat

2. Explicit over clever

  • Provide exact bash commands
  • Use templates for responses
  • Don't assume agents will "figure it out"

3. Redundant channels

  • Primary: Detailed file (FEEDBACK-PR-X.md)
  • Secondary: Notification (GitHub comment, Slack, email)
  • Humans monitor both channels

4. Self-contained messages

  • Each message includes full context
  • Don't reference previous messages (agent may not have seen them)
  • Include commands, expected results, templates

5. Git as source of truth

  • All communication in git-tracked files
  • Permanent history
  • Inspectable by humans anytime

✅ Message Design Checklist

Before sending a message, verify:

  • [ ] Clear action items (numbered steps)
  • [ ] Exact bash commands (copy-paste ready)
  • [ ] Expected results (what success looks like)
  • [ ] Response template (format specified)
  • [ ] Timeline (deadline or "not blocking")
  • [ ] Escape hatch (how to ask for help)
  • [ ] Context (what changed, why agent should care)

Example passing checklist:

## Action Items
1. Fetch integration branch:
Enter fullscreen mode Exit fullscreen mode


bash
git fetch origin claude/integrate-...


2. Run tests:
Enter fullscreen mode Exit fullscreen mode


bash
uv run pytest tests/... -v


3. Create response:
Enter fullscreen mode Exit fullscreen mode


bash
cat > INTEGRATION-RESPONSE.md <<EOF
Status: ALL PASSING
Date: $(date +%Y-%m-%d)
EOF
git add INTEGRATION-RESPONSE.md && git commit -m "response" && git push


## Expected Results
All 14 tests should PASS. If any fail, report in response.

## Response Template
[Template here]

## Timeline
Within 4 hours (not blocking other PRs)

## Need Help?
Create INTEGRATION-QUESTIONS.md on your branch
Enter fullscreen mode Exit fullscreen mode

❌ Common Pitfalls

1. Vague instructions

❌ "Please check if everything works"
✅ "Run: uv run pytest tests/test_budget_allocation.py -v
    Expected: All 14 tests PASS"
Enter fullscreen mode Exit fullscreen mode

2. Assuming tool availability

❌ "Use gh CLI to check PR status"
✅ "Use git to fetch the branch:
    git fetch origin <branch>
    git show origin/<branch>:FILE.md"
Enter fullscreen mode Exit fullscreen mode

3. No response format

❌ "Let me know the results"
✅ "Create INTEGRATION-RESPONSE.md with:
    **Status**: [ALL PASSING / ISSUES FOUND]
    **Test Output**: [paste here]"
Enter fullscreen mode Exit fullscreen mode

4. Missing context

❌ "Your PR merged, please verify"
✅ "PR #123 (Budget Tests) merged at 14:30.
    Changes: Added 14 tests to test_budget_allocation.py
    Please verify tests still pass after integration"
Enter fullscreen mode Exit fullscreen mode

5. Unclear timeline

❌ "Please respond ASAP"
✅ "Please respond within 4 hours (not blocking other work)"
Enter fullscreen mode Exit fullscreen mode

Scaling Communication

Our experiment: 1 integration agent + 5 work stream agents

Communication paths:

  • Integration → PR-1, PR-2, PR-3, PR-4, PR-5 (5 outgoing)
  • PR-1, PR-2, PR-3, PR-4, PR-5 → Integration (5 incoming)
  • Total: 10 message channels

Manageable!

What if we scale to 10 work streams?

  • Integration → 10 agents (10 outgoing)
  • 10 agents → Integration (10 incoming)
  • Total: 20 message channels

Still manageable with file-based approach:

FEEDBACK-PR-1.md
FEEDBACK-PR-2.md
...
FEEDBACK-PR-10.md
Enter fullscreen mode Exit fullscreen mode

What if agents need to talk to EACH OTHER?

  • 10 agents × 9 other agents = 90 communication paths
  • Not manageable without hierarchy

Solution: Hub-and-spoke

   Integration (hub)
      /    |    \
    PR-1  PR-2  PR-3 ... (spokes)
Enter fullscreen mode Exit fullscreen mode

Agents only talk to integration, not to each other.

This is what we did, and it worked.


Real-World Applicability

Use Case 1: CI/CD Pipeline Agents

Scenario: Multiple agents handling build, test, deploy stages

Communication:

build-agent creates: BUILD-RESULTS.md
test-agent reads: BUILD-RESULTS.md
test-agent creates: TEST-RESULTS.md
deploy-agent reads: TEST-RESULTS.md
deploy-agent creates: DEPLOY-STATUS.md
Enter fullscreen mode Exit fullscreen mode

Protocol:

  • Each agent writes status file
  • Next agent in pipeline reads it
  • Git commits track full pipeline history

Use Case 2: Code Review Agents

Scenario: Multiple specialized review agents (security, performance, style)

Communication:

security-agent creates: SECURITY-REVIEW.md
performance-agent creates: PERFORMANCE-REVIEW.md
style-agent creates: STYLE-REVIEW.md

coordinator-agent reads all three, creates: REVIEW-SUMMARY.md
Enter fullscreen mode Exit fullscreen mode

Protocol:

  • Parallel review agents
  • Each writes findings to separate file
  • Coordinator aggregates

Use Case 3: Documentation Agents

Scenario: Agents generating API docs, tutorials, changelog

Communication:

api-doc-agent creates: docs/API.md
tutorial-agent creates: docs/TUTORIAL.md
changelog-agent creates: CHANGELOG.md

reviewer-agent creates: DOCUMENTATION-REVIEW.md
Enter fullscreen mode Exit fullscreen mode

Protocol:

  • Each agent owns its documentation domain
  • Reviewer checks consistency across all docs
  • All tracked in git

Conclusion

What we proved:

  • ✅ File-based async messaging works for multi-agent coordination
  • ✅ Git is an excellent communication bus
  • ✅ Explicit instructions beat clever assumptions

What we learned:

  • Iteration is required (took 4 tries to get protocol right)
  • Redundancy helps (dual-channel: files + GitHub comments)
  • Templates reduce ambiguity (specify exact format)
  • Human orchestration still needed (agents don't poll automatically)

The final protocol:

1. Integration creates FEEDBACK-PR-X.md (detailed instructions)
2. Integration posts GitHub comment (notification)
3. Human prompts agent: "Check for feedback"
4. Agent fetches integration branch
5. Agent reads FEEDBACK-PR-X.md
6. Agent executes actions
7. Agent creates INTEGRATION-RESPONSE.md
8. Agent pushes response to their branch
9. Integration fetches and reads response
10. Integration takes next action
Enter fullscreen mode Exit fullscreen mode

Would we do it again? Yes! File-based messaging worked well despite initial struggles.

Next time we'd improve:

  1. Start with templates from day 1 (don't iterate to them)
  2. Add STATUS files for long-running work
  3. Implement HEARTBEAT mechanism (liveness checks)
  4. Create checklist for message format compliance

What's Next

In the final article:

  • Article 6: The Budget Calculator Paradox: When Tests Don't Match Reality
    • Flip-flopping 8 times on a simple formula
    • Build the calculator first, use it everywhere
    • Cycle quantization and margin requirements

Tags: #multi-agent #communication #protocols #git #async-messaging #coordination #distributed-systems


This is Part 5 of the Multi-Agent Development Series.

Discussion: How do your agents communicate? File-based, API-based, or something else? What challenges have you faced with agent coordination? Share in the comments!

Top comments (0)