Lars Winstand

Posted on May 21 • Originally published at standardcompute.com

I thought multi-agent orchestration meant agents should talk more — 2 Reddit threads convinced me the opposite is usually better

#ai #agents #automation #openai

I used to assume the “advanced” version of multi-agent orchestration was obvious:

More agents. More channels. More back-and-forth.

If one GPT-5 agent is useful, then surely two GPT-5 agents debating in Discord is better. Add Claude for review, maybe another model for cleanup, and now you’ve got a tiny AI company running inside your workflow.

That sounds smart right up until you try to supervise it.

Then the agents start echoing each other. The transcript gets huge. Nobody knows which answer is final. You spend more time reading bot chatter than shipping output.

While researching OpenClaw workflows, I found two Reddit threads that changed my mind:

one about getting multiple OpenClaw agents to collaborate in Telegram
another about long-running OpenClaw workflows becoming hard to supervise

The useful takeaway was not “make agents chat better.”

It was: use structured handoffs, fresh review, and explicit checkpoints.

That is the pattern I’d recommend to anyone building AI agents in OpenClaw, n8n, Make, Zapier, or custom automations.

The Reddit comment that got it right

In a thread on r/openclaw, someone asked about making two OpenClaw agents collaborate in a Telegram group.

The best reply was basically this:

Works better if they don't actually chat in real time. Have Agent 1 write a structured note, then trigger Agent 2 to review it fresh with no shared conversation history.

That should be the default design for a lot of multi-agent systems.

Why? Because shared live context creates fast agreement, not necessarily good critique.

If Agent 2 sees the entire conversation, it tends to inherit Agent 1’s framing. It becomes a collaborator in the same mistake instead of a reviewer.

OpenClaw already nudges you toward isolation

This is what made the advice click for me.

OpenClaw is not really designed around one giant immortal shared chat. A lot of its execution contexts are already isolated.

For example, OpenClaw documents session scoping like this:

{ "session": { "dmScope": "per-channel-peer" } }

And cron jobs start a fresh session per run.

That matters.

A fresh session means the next agent or next run does not inherit a giant blob of stale context by default. That is usually a feature, not a limitation.

There’s also a daily session reset behavior in OpenClaw, with a default new session time of 4:00 AM local time on the gateway host. Again: this is not a framework optimized for endless context accumulation.

It’s a framework that assumes boundaries are healthy.

Why real-time agent chat looks smart but performs dumb

Live agent chat feels like progress because there are lots of messages.

But a busy transcript is not the same thing as a reliable workflow.

The common failure modes are boring and predictable:

agents repeat each other
agents converge too early
weak assumptions get reinforced
supervision gets harder over time
the final artifact is less clear than the conversation that produced it

If your goal is critique, you want distance.

You want Agent 2 to arrive slightly skeptical, with a clean starting point.

That is why I think the supervisor/reviewer pattern beats free-form bot banter in most production workflows.

The real problem is drift

The second Reddit thread was about long-running OpenClaw workflows getting harder to supervise.

That is the real operational problem.

Not sociability. Drift.

If you’ve run serious agent workflows, this probably feels familiar:

one agent is writing code
another is summarizing research
another is running automations
one task failed quietly
one task is technically still running but no longer useful
you come back later and can’t tell which state is trustworthy

At that point, more chat is the last thing you need.

You need:

the latest approved artifact
the current task state
a reviewer that can compare output against something stable
bounded retries instead of endless loops

One commenter in that thread put it bluntly: drift happens quickly, and markdown notes alone are not enough.

I agree.

This is a checkpoints problem.
A verification problem.
A state management problem.

It is not a “put GPT-5 and Claude in Telegram and let them vibe” problem.

What I’d build instead

If I were wiring up a production multi-agent workflow today, I’d use explicit artifacts and fresh sessions.

Something like this:

Worker agent does the first pass.
Worker agent writes a structured handoff note.
Reviewer agent gets the artifact plus the handoff note.
Reviewer agent does not get the full chat transcript unless absolutely necessary.
Reviewer approves, rejects, or requests one bounded revision.
A supervisor step checks output against a known-good state.

That handoff note should be boring and explicit.

For example:

{
  "goal": "Generate a patch for the failing webhook retry logic",
  "inputs": ["src/webhooks/retry.ts", "error logs from last 24h"],
  "assumptions": [
    "429s should back off exponentially",
    "network timeouts are retryable",
    "4xx validation errors are not retryable"
  ],
  "proposed_output": "Patch + tests + migration note",
  "open_questions": [
    "Should 408 be grouped with network errors?",
    "Do we cap retries at 5 or 7?"
  ],
  "failure_risks": [
    "Could duplicate webhook delivery on timeout edge cases",
    "May break current metrics labels"
  ]
}

That is much easier to review than 200 messages of agent chatter.

A practical file-based pattern

OpenClaw users mentioned internal coordination through things like session_send() and file-based handoffs across workspaces.

That makes sense to me.

A simple filesystem-based pattern is often enough:

/workspace
  /tasks
    task-142.json
  /artifacts
    task-142.patch
  /reviews
    task-142.review.json
  /state
    approved.json

Example task file:

{
  "task_id": "task-142",
  "status": "review_pending",
  "owner": "worker-agent",
  "artifact": "artifacts/task-142.patch",
  "reviewer": "review-agent",
  "retry_count": 0
}

Example review output:

{
  "task_id": "task-142",
  "decision": "changes_requested",
  "issues": [
    "Missing test for timeout retry path",
    "Backoff jitter not applied"
  ],
  "approved_artifact": null,
  "next_action": "worker-revise"
}

That is orchestration.

Not a room full of bots pretending to be coworkers.

When real-time agent chat is actually useful

I’m not saying agent-to-agent chat is always bad.

There are a few cases where it makes sense.

1. Brainstorming

If you want divergent ideas fast, shared chat can help.

For example: product naming, rough architecture exploration, or generating lots of candidate approaches.

2. Human-adjacent workflows

If agents need to operate in Discord or Telegram because humans are already there, fine. That can be useful at the edges.

But I still would not make that the core execution layer.

3. Demos

A room full of agents talking looks impressive.

It demos well.

It also tends to be much less reliable than a boring artifact pipeline.

Token cost is not the only cost

This part matters for anyone running automations at scale.

Shared channels do not just increase noise. They increase consumption.

If agents are watching entire chat threads, they ingest irrelevant context constantly.

That means more tokens, more latency, and more supervision overhead.

For teams running n8n, Make, Zapier, OpenClaw, or custom agent systems, this is where pricing starts to matter a lot.

If every workflow turns into agents repeatedly reading giant transcripts, per-token billing gets painful fast.

That’s one reason I think flat-rate infrastructure is a better fit for serious automation work. If your agents are running all day, you do not want your architecture decisions distorted by token anxiety.

Standard Compute is interesting here because it gives you an OpenAI-compatible API with flat monthly pricing instead of per-token billing. So if you’re experimenting with reviewer loops, supervisor agents, retries, or multi-step automations, you can optimize for reliability first instead of constantly asking whether each extra pass is too expensive.

That matters even more when you’re routing across different models for different roles.

For example:

GPT-5.4 for implementation
Claude Opus 4.6 for review
Grok 4.20 for alternate reasoning or edge-case checks

That kind of setup is useful, but only if the cost model doesn’t punish every extra step.

A simple reviewer loop you can actually use

Here’s a minimal pattern I’d trust more than live bot chat.

Step 1: generate artifact

node worker.js --task tasks/task-142.json > artifacts/task-142.patch

Step 2: create handoff note

node create-handoff.js \
  --task tasks/task-142.json \
  --artifact artifacts/task-142.patch \
  > reviews/task-142.handoff.json

Step 3: run fresh review

node review.js \
  --handoff reviews/task-142.handoff.json \
  --artifact artifacts/task-142.patch \
  > reviews/task-142.review.json

Step 4: enforce bounded retry

node supervisor.js --task tasks/task-142.json --review reviews/task-142.review.json

Supervisor logic should be strict:

if approved, publish artifact
if changes requested and retry_count < max_retries, revise once
otherwise fail closed and escalate

That gives you a workflow you can inspect later.

Quick comparison

Approach	What actually happens
Real-time agent chat in Discord or Telegram	Shared live history, faster convergence, more supervision overhead, more irrelevant context
Structured reviewer handoff	Clear artifact, fresh review context, better critique, easier audit trail
File-based or session-based coordination in OpenClaw	Lower platform friction, deterministic state, easier retries and checkpointing

My default rule now

If the job needs reliability, don’t start with agent chat.

Start with:

explicit artifacts
structured handoff notes
isolated review passes
checkpointed state
bounded retries

That pattern is less flashy than a multi-agent group chat.

It is also much easier to operate.

The best multi-agent systems I can imagine do not act like a group chat.
They act more like a newsroom or a code review pipeline.

One agent files a draft.
Another reviews it fresh.
A supervisor checks whether it meets the bar.
The result gets approved, rejected, or revised with a clear paper trail.

That is the design lesson I’d steal from those OpenClaw threads.

Don’t optimize for sociable agents.
Optimize for legible handoffs and clean review.

Because the common failure mode in multi-agent work is not silence.

It’s two agents confidently talking each other into the same mistake.

DEV Community