linou518

Posted on Apr 3

Designing a Message Bus for AI Agents — Lightweight Communication for 20+ Autonomous Agents

#ai #programming #automation #architecture

How do 20+ AI agents talk to each other? A lightweight message bus design and lessons from real-world operation.

The Problem: How Do Agents Communicate?

When you have a single AI assistant, communication isn't a problem. But when you scale to 10+ agents distributed across multiple servers, a fundamental challenge emerges: how do agents communicate with each other?

Our environment runs 20+ agents spread across 9 nodes, each responsible for different domains. They frequently need to:

Delegate tasks: A manager agent assigns sub-tasks to specialist agents
Sync state: An agent notifies others after completing a task
Request information: Agent A queries knowledge held by Agent B
Broadcast: System-wide announcements

Why Not Use an Off-the-Shelf Message Queue?

RabbitMQ, Redis Pub/Sub, or NATS would be overkill. AI agent communication characteristics:

Low message frequency (tens to hundreds per day)
Small message bodies (text instructions)
Inbox semantics required (messages persist while agent is offline)
Simple sender/receiver identification sufficient

We chose: HTTP API service + SQLite storage.

Architecture

┌───────────┐    HTTP POST     ┌────────────────┐
│ Agent A   │ ────────────────→│  Message Bus   │
└───────────┘  /api/send        │  (Node.js +    │
┌───────────┐    HTTP GET       │   SQLite)      │
│ Agent B   │ ←────────────────│                │
└───────────┘  /api/inbox/:id   └────────────────┘

API Design

# Send a message
curl -X POST http://bus-server:8091/api/send \
  -H 'Content-Type: application/json' \
  -H 'X-Bus-Token: <TOKEN>' \
  -d '{"from":"joe","to":"jack","subject":"Task","body":"Handle blog publishing"}'

# Pull inbox
curl http://bus-server:8091/api/inbox/jack?mark_read=true \
  -H 'X-Bus-Token: <TOKEN>'

Key Design Decisions

1. Pull Over Push

AI agents are intermittently online — Heartbeat fires every 5 minutes, Cron triggers at specific times. Pull is naturally idempotent, no disconnect handling needed.

2. Message Format

{"from":"joe","to":"jack","subject":"...","body":"...","timestamp":"..."}

Classic email quartet. Intuitive, sufficient. No priority, tags, or threads — agents don't need Gmail.

3. Broadcast

to: ALL copies the message to every registered agent's inbox.

4. Persistence

SQLite. Single file backup, low concurrent writes, sufficient query capability.

Real-World Patterns

Task Delegation Chain

User → Joe (manager) → Jack (blog) → completion → Joe → User

Cross-Agent Knowledge Query

Investment Agent → Learning Agent: "Any quant trading notes?"
Learning Agent: "Factor IC/IR notes at /shared/knowledge/quant/"

Periodic Status Reports

Each agent sends status summaries to the manager during Heartbeat.

Lessons Learned

Message Loss: No persistence = messages gone on restart. SQLite fixed this. For AI agents, message loss > latency.
Broadcast Storm: Agent looped broadcasts, flooding 20+ inboxes. Added rate limit: 10 msgs/min/sender.
Single Point of Failure: Bus server down = no communication. Mitigation: agents degrade to local-only mode.

Conclusion

The core need is simple, reliable async messaging. HTTP API + SQLite beats enterprise MQs for this use case.

Key principles:

Pull over Push — matches intermittent online behavior
Inbox semantics — no loss, no duplicates
Minimal complexity — email-style format is enough
Network-layer security — trusted internal network

20+ agents, ~2 months stable, ~200 msgs/day, zero loss. Sometimes the simplest solution really is the best one.

DEV Community