DEV Community

linou518
linou518

Posted on

How We Cut Agent-to-Agent Message Latency from 30 Minutes to 1 Second

How We Cut Agent-to-Agent Message Latency from 30 Minutes to 1 Second

TL;DR

We run 19 AI agents across 9 mini-PCs using OpenClaw. Agent-to-agent message delivery was taking up to 30 minutes — we got it down to ~1 second using a lightweight SSE + systemd bridge architecture. Here's how.

The Problem: Heartbeat-Driven Polling

OpenClaw agents are event-driven by design. They respond to user messages instantly — but inter-agent communication is a different story.

In our setup, we run a custom message bus: a simple Flask + Gunicorn HTTP API where agents post messages and recipients poll for them. The polling happens via OpenClaw's cron.wake heartbeat.

The heartbeat interval maxes out at 30 minutes. This means:

  • Agent A posts a message → 0 seconds
  • Agent B's next heartbeat fires → up to 30 minutes later
  • B reads and processes the message → a few more seconds

For real-time coordination tasks, this was a dealbreaker.

First Attempt: sessions_send (Didn't Work)

OpenClaw has a sessions_send API for injecting messages directly into another session:

sessions_send(sessionKey="agent:some-agent:main", message="New task for you")
Enter fullscreen mode Exit fullscreen mode

This looked perfect — messages delivered instantly! But there was a catch.

sessions_send only works for main/webchat sessions. Our agents primarily run on Telegram sessions. Messages injected this way were silently ignored by the agents.

Back to the drawing board.

The Solution: SSE + bus-watcher Bridge

We flipped the approach: instead of agents polling the bus, the bus pushes events to a lightweight watcher process running on each node.

Architecture

[Agent A] → POST /api/send → [Message Bus] → SSE /api/stream
                                                     ↓
                                             [bus-watcher.py]
                                                     ↓
                                           cron.wake(mode=now)
                                                     ↓
                                               [Agent B wakes]
                                                     ↓
                                          heartbeat → GET /api/inbox
                                                     ↓
                                             [Message processed]
Enter fullscreen mode Exit fullscreen mode

Step 1: Add SSE Endpoint to the Message Bus

We added /api/stream to the Flask app — a persistent connection that pushes new messages in real time:

@app.route('/api/stream')
def stream():
    def generate():
        last_id = 0
        while True:
            new_msgs = get_messages_after(last_id)
            for msg in new_msgs:
                yield f"data: {json.dumps(msg)}\n\n"
                last_id = msg['id']
            time.sleep(1)
    return Response(generate(), mimetype='text/event-stream')
Enter fullscreen mode Exit fullscreen mode

Gotcha — Gunicorn worker count: We initially ran with 2 workers, which caused SSE subscribers to be spread across workers. A message arriving at worker 1 wouldn't reach a subscriber on worker 2. Switching to a single gevent worker fixed this.

Step 2: bus-watcher.py on Each Node

A minimal Python script subscribes to the SSE stream and triggers cron.wake when a message arrives for a local agent:

#!/usr/bin/env python3
"""SSE → cron.wake bridge"""
import urllib.request, json, subprocess

def watch():
    url = "http://192.168.x.x:8091/api/stream"  # internal message bus
    req = urllib.request.Request(url)
    with urllib.request.urlopen(req) as resp:
        for line in resp:  # line-based reading
            line = line.decode().strip()
            if line.startswith("data:"):
                msg = json.loads(line[5:])
                if msg["to_agent"] in LOCAL_AGENTS:
                    subprocess.run([
                        "openclaw", "cron", "wake",
                        msg["to_agent"], "--mode=now"
                    ])
Enter fullscreen mode Exit fullscreen mode

Gotcha — urllib buffering: Using resp.read() buffered the stream and events didn't arrive in real time. Switching to readline()-based iteration (iterating over the response object directly) solved it.

Step 3: systemd Service for Reliability

We deployed bus-watcher.service on every node for auto-start and auto-reconnect:

[Unit]
Description=Message Bus Watcher
After=network.target

[Service]
ExecStart=/usr/bin/python3 /path/to/bus-watcher.py
Restart=always
RestartSec=5

[Install]
WantedBy=default.target
Enter fullscreen mode Exit fullscreen mode

Deployed to all 7 nodes, tested with all 19 agents.

Results

Metric Before After
Message delivery latency Up to 30 min ~1 second
Additional infrastructure None SSE endpoint + lightweight watcher
CPU/Memory overhead Nearly zero
New dependencies None (stdlib only)

Watching agents respond to each other in real time for the first time was genuinely exciting. Multiple agents firing off replies in rapid succession — it finally felt like a live agent network.

Key Takeaways

  1. Know sessions_send's limits: OpenClaw session injection is channel-aware. It's not a universal delivery mechanism.
  2. SSE is underrated: Far simpler than WebSockets for this use case, and more than sufficient.
  3. Gunicorn + SSE = watch your worker count: Single gevent worker is the right setup for SSE.
  4. urllib buffering bites: For streaming, always iterate line-by-line rather than calling read().
  5. cron.wake --mode=now is powerful: OpenClaw's hidden gem for instant agent activation without waiting for the next heartbeat.

Wrap-Up

You don't need Redis, RabbitMQ, or any heavy message queue to build real-time inter-agent communication. SSE + a few dozen lines of Python got the job done.

In a multi-agent system, communication latency defines the responsiveness of the whole network. The gap between 30 minutes and 1 second isn't just a performance metric — it's the difference between a batch system and a live collaborative agent team.


Tags: #OpenClaw #MultiAgent #SSE #Python #Infrastructure #RealTime #MessageBus #systemd

Top comments (0)