How do 20+ AI agents talk to each other? A lightweight message bus design and lessons from real-world operation.
The Problem: How Do Agents Communicate?
When you have a single AI assistant, communication isn't a problem. But when you scale to 10+ agents distributed across multiple servers, a fundamental challenge emerges: how do agents communicate with each other?
Our environment runs 20+ agents spread across 9 nodes, each responsible for different domains. They frequently need to:
- Delegate tasks: A manager agent assigns sub-tasks to specialist agents
- Sync state: An agent notifies others after completing a task
- Request information: Agent A queries knowledge held by Agent B
- Broadcast: System-wide announcements
Why Not Use an Off-the-Shelf Message Queue?
RabbitMQ, Redis Pub/Sub, or NATS would be overkill. AI agent communication characteristics:
- Low message frequency (tens to hundreds per day)
- Small message bodies (text instructions)
- Inbox semantics required (messages persist while agent is offline)
- Simple sender/receiver identification sufficient
We chose: HTTP API service + SQLite storage.
Architecture
┌───────────┐ HTTP POST ┌────────────────┐
│ Agent A │ ────────────────→│ Message Bus │
└───────────┘ /api/send │ (Node.js + │
┌───────────┐ HTTP GET │ SQLite) │
│ Agent B │ ←────────────────│ │
└───────────┘ /api/inbox/:id └────────────────┘
API Design
# Send a message
curl -X POST http://bus-server:8091/api/send \
-H 'Content-Type: application/json' \
-H 'X-Bus-Token: <TOKEN>' \
-d '{"from":"joe","to":"jack","subject":"Task","body":"Handle blog publishing"}'
# Pull inbox
curl http://bus-server:8091/api/inbox/jack?mark_read=true \
-H 'X-Bus-Token: <TOKEN>'
Key Design Decisions
1. Pull Over Push
AI agents are intermittently online — Heartbeat fires every 5 minutes, Cron triggers at specific times. Pull is naturally idempotent, no disconnect handling needed.
2. Message Format
{"from":"joe","to":"jack","subject":"...","body":"...","timestamp":"..."}
Classic email quartet. Intuitive, sufficient. No priority, tags, or threads — agents don't need Gmail.
3. Broadcast
to: ALL copies the message to every registered agent's inbox.
4. Persistence
SQLite. Single file backup, low concurrent writes, sufficient query capability.
Real-World Patterns
Task Delegation Chain
User → Joe (manager) → Jack (blog) → completion → Joe → User
Cross-Agent Knowledge Query
Investment Agent → Learning Agent: "Any quant trading notes?"
Learning Agent: "Factor IC/IR notes at /shared/knowledge/quant/"
Periodic Status Reports
Each agent sends status summaries to the manager during Heartbeat.
Lessons Learned
- Message Loss: No persistence = messages gone on restart. SQLite fixed this. For AI agents, message loss > latency.
- Broadcast Storm: Agent looped broadcasts, flooding 20+ inboxes. Added rate limit: 10 msgs/min/sender.
- Single Point of Failure: Bus server down = no communication. Mitigation: agents degrade to local-only mode.
Conclusion
The core need is simple, reliable async messaging. HTTP API + SQLite beats enterprise MQs for this use case.
Key principles:
- Pull over Push — matches intermittent online behavior
- Inbox semantics — no loss, no duplicates
- Minimal complexity — email-style format is enough
- Network-layer security — trusted internal network
20+ agents, ~2 months stable, ~200 msgs/day, zero loss. Sometimes the simplest solution really is the best one.
Top comments (0)