How We Cut Agent-to-Agent Message Latency from 30 Minutes to 1 Second
TL;DR
We run 19 AI agents across 9 mini-PCs using OpenClaw. Agent-to-agent message delivery was taking up to 30 minutes — we got it down to ~1 second using a lightweight SSE + systemd bridge architecture. Here's how.
The Problem: Heartbeat-Driven Polling
OpenClaw agents are event-driven by design. They respond to user messages instantly — but inter-agent communication is a different story.
In our setup, we run a custom message bus: a simple Flask + Gunicorn HTTP API where agents post messages and recipients poll for them. The polling happens via OpenClaw's cron.wake heartbeat.
The heartbeat interval maxes out at 30 minutes. This means:
- Agent A posts a message → 0 seconds
- Agent B's next heartbeat fires → up to 30 minutes later
- B reads and processes the message → a few more seconds
For real-time coordination tasks, this was a dealbreaker.
First Attempt: sessions_send (Didn't Work)
OpenClaw has a sessions_send API for injecting messages directly into another session:
sessions_send(sessionKey="agent:some-agent:main", message="New task for you")
This looked perfect — messages delivered instantly! But there was a catch.
sessions_send only works for main/webchat sessions. Our agents primarily run on Telegram sessions. Messages injected this way were silently ignored by the agents.
Back to the drawing board.
The Solution: SSE + bus-watcher Bridge
We flipped the approach: instead of agents polling the bus, the bus pushes events to a lightweight watcher process running on each node.
Architecture
[Agent A] → POST /api/send → [Message Bus] → SSE /api/stream
↓
[bus-watcher.py]
↓
cron.wake(mode=now)
↓
[Agent B wakes]
↓
heartbeat → GET /api/inbox
↓
[Message processed]
Step 1: Add SSE Endpoint to the Message Bus
We added /api/stream to the Flask app — a persistent connection that pushes new messages in real time:
@app.route('/api/stream')
def stream():
def generate():
last_id = 0
while True:
new_msgs = get_messages_after(last_id)
for msg in new_msgs:
yield f"data: {json.dumps(msg)}\n\n"
last_id = msg['id']
time.sleep(1)
return Response(generate(), mimetype='text/event-stream')
Gotcha — Gunicorn worker count: We initially ran with 2 workers, which caused SSE subscribers to be spread across workers. A message arriving at worker 1 wouldn't reach a subscriber on worker 2. Switching to a single gevent worker fixed this.
Step 2: bus-watcher.py on Each Node
A minimal Python script subscribes to the SSE stream and triggers cron.wake when a message arrives for a local agent:
#!/usr/bin/env python3
"""SSE → cron.wake bridge"""
import urllib.request, json, subprocess
def watch():
url = "http://192.168.x.x:8091/api/stream" # internal message bus
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as resp:
for line in resp: # line-based reading
line = line.decode().strip()
if line.startswith("data:"):
msg = json.loads(line[5:])
if msg["to_agent"] in LOCAL_AGENTS:
subprocess.run([
"openclaw", "cron", "wake",
msg["to_agent"], "--mode=now"
])
Gotcha — urllib buffering: Using resp.read() buffered the stream and events didn't arrive in real time. Switching to readline()-based iteration (iterating over the response object directly) solved it.
Step 3: systemd Service for Reliability
We deployed bus-watcher.service on every node for auto-start and auto-reconnect:
[Unit]
Description=Message Bus Watcher
After=network.target
[Service]
ExecStart=/usr/bin/python3 /path/to/bus-watcher.py
Restart=always
RestartSec=5
[Install]
WantedBy=default.target
Deployed to all 7 nodes, tested with all 19 agents.
Results
| Metric | Before | After |
|---|---|---|
| Message delivery latency | Up to 30 min | ~1 second |
| Additional infrastructure | None | SSE endpoint + lightweight watcher |
| CPU/Memory overhead | — | Nearly zero |
| New dependencies | — | None (stdlib only) |
Watching agents respond to each other in real time for the first time was genuinely exciting. Multiple agents firing off replies in rapid succession — it finally felt like a live agent network.
Key Takeaways
- Know sessions_send's limits: OpenClaw session injection is channel-aware. It's not a universal delivery mechanism.
- SSE is underrated: Far simpler than WebSockets for this use case, and more than sufficient.
- Gunicorn + SSE = watch your worker count: Single gevent worker is the right setup for SSE.
-
urllib buffering bites: For streaming, always iterate line-by-line rather than calling
read(). - cron.wake --mode=now is powerful: OpenClaw's hidden gem for instant agent activation without waiting for the next heartbeat.
Wrap-Up
You don't need Redis, RabbitMQ, or any heavy message queue to build real-time inter-agent communication. SSE + a few dozen lines of Python got the job done.
In a multi-agent system, communication latency defines the responsiveness of the whole network. The gap between 30 minutes and 1 second isn't just a performance metric — it's the difference between a batch system and a live collaborative agent team.
Tags: #OpenClaw #MultiAgent #SSE #Python #Infrastructure #RealTime #MessageBus #systemd
Top comments (0)