I used to assume the “advanced” version of multi-agent orchestration was obvious:
More agents. More channels. More back-and-forth.
If one GPT-5 agent is useful, then surely two GPT-5 agents debating in Discord is better. Add Claude for review, maybe another model for cleanup, and now you’ve got a tiny AI company running inside your workflow.
That sounds smart right up until you try to supervise it.
Then the agents start echoing each other. The transcript gets huge. Nobody knows which answer is final. You spend more time reading bot chatter than shipping output.
While researching OpenClaw workflows, I found two Reddit threads that changed my mind:
- one about getting multiple OpenClaw agents to collaborate in Telegram
- another about long-running OpenClaw workflows becoming hard to supervise
The useful takeaway was not “make agents chat better.”
It was: use structured handoffs, fresh review, and explicit checkpoints.
That is the pattern I’d recommend to anyone building AI agents in OpenClaw, n8n, Make, Zapier, or custom automations.
The Reddit comment that got it right
In a thread on r/openclaw, someone asked about making two OpenClaw agents collaborate in a Telegram group.
The best reply was basically this:
Works better if they don't actually chat in real time. Have Agent 1 write a structured note, then trigger Agent 2 to review it fresh with no shared conversation history.
That should be the default design for a lot of multi-agent systems.
Why? Because shared live context creates fast agreement, not necessarily good critique.
If Agent 2 sees the entire conversation, it tends to inherit Agent 1’s framing. It becomes a collaborator in the same mistake instead of a reviewer.
OpenClaw already nudges you toward isolation
This is what made the advice click for me.
OpenClaw is not really designed around one giant immortal shared chat. A lot of its execution contexts are already isolated.
For example, OpenClaw documents session scoping like this:
{ "session": { "dmScope": "per-channel-peer" } }
And cron jobs start a fresh session per run.
That matters.
A fresh session means the next agent or next run does not inherit a giant blob of stale context by default. That is usually a feature, not a limitation.
There’s also a daily session reset behavior in OpenClaw, with a default new session time of 4:00 AM local time on the gateway host. Again: this is not a framework optimized for endless context accumulation.
It’s a framework that assumes boundaries are healthy.
Why real-time agent chat looks smart but performs dumb
Live agent chat feels like progress because there are lots of messages.
But a busy transcript is not the same thing as a reliable workflow.
The common failure modes are boring and predictable:
- agents repeat each other
- agents converge too early
- weak assumptions get reinforced
- supervision gets harder over time
- the final artifact is less clear than the conversation that produced it
If your goal is critique, you want distance.
You want Agent 2 to arrive slightly skeptical, with a clean starting point.
That is why I think the supervisor/reviewer pattern beats free-form bot banter in most production workflows.
The real problem is drift
The second Reddit thread was about long-running OpenClaw workflows getting harder to supervise.
That is the real operational problem.
Not sociability. Drift.
If you’ve run serious agent workflows, this probably feels familiar:
- one agent is writing code
- another is summarizing research
- another is running automations
- one task failed quietly
- one task is technically still running but no longer useful
- you come back later and can’t tell which state is trustworthy
At that point, more chat is the last thing you need.
You need:
- the latest approved artifact
- the current task state
- a reviewer that can compare output against something stable
- bounded retries instead of endless loops
One commenter in that thread put it bluntly: drift happens quickly, and markdown notes alone are not enough.
I agree.
This is a checkpoints problem.
A verification problem.
A state management problem.
It is not a “put GPT-5 and Claude in Telegram and let them vibe” problem.
What I’d build instead
If I were wiring up a production multi-agent workflow today, I’d use explicit artifacts and fresh sessions.
Something like this:
- Worker agent does the first pass.
- Worker agent writes a structured handoff note.
- Reviewer agent gets the artifact plus the handoff note.
- Reviewer agent does not get the full chat transcript unless absolutely necessary.
- Reviewer approves, rejects, or requests one bounded revision.
- A supervisor step checks output against a known-good state.
That handoff note should be boring and explicit.
For example:
{
"goal": "Generate a patch for the failing webhook retry logic",
"inputs": ["src/webhooks/retry.ts", "error logs from last 24h"],
"assumptions": [
"429s should back off exponentially",
"network timeouts are retryable",
"4xx validation errors are not retryable"
],
"proposed_output": "Patch + tests + migration note",
"open_questions": [
"Should 408 be grouped with network errors?",
"Do we cap retries at 5 or 7?"
],
"failure_risks": [
"Could duplicate webhook delivery on timeout edge cases",
"May break current metrics labels"
]
}
That is much easier to review than 200 messages of agent chatter.
A practical file-based pattern
OpenClaw users mentioned internal coordination through things like session_send() and file-based handoffs across workspaces.
That makes sense to me.
A simple filesystem-based pattern is often enough:
/workspace
/tasks
task-142.json
/artifacts
task-142.patch
/reviews
task-142.review.json
/state
approved.json
Example task file:
{
"task_id": "task-142",
"status": "review_pending",
"owner": "worker-agent",
"artifact": "artifacts/task-142.patch",
"reviewer": "review-agent",
"retry_count": 0
}
Example review output:
{
"task_id": "task-142",
"decision": "changes_requested",
"issues": [
"Missing test for timeout retry path",
"Backoff jitter not applied"
],
"approved_artifact": null,
"next_action": "worker-revise"
}
That is orchestration.
Not a room full of bots pretending to be coworkers.
When real-time agent chat is actually useful
I’m not saying agent-to-agent chat is always bad.
There are a few cases where it makes sense.
1. Brainstorming
If you want divergent ideas fast, shared chat can help.
For example: product naming, rough architecture exploration, or generating lots of candidate approaches.
2. Human-adjacent workflows
If agents need to operate in Discord or Telegram because humans are already there, fine. That can be useful at the edges.
But I still would not make that the core execution layer.
3. Demos
A room full of agents talking looks impressive.
It demos well.
It also tends to be much less reliable than a boring artifact pipeline.
Token cost is not the only cost
This part matters for anyone running automations at scale.
Shared channels do not just increase noise. They increase consumption.
If agents are watching entire chat threads, they ingest irrelevant context constantly.
That means more tokens, more latency, and more supervision overhead.
For teams running n8n, Make, Zapier, OpenClaw, or custom agent systems, this is where pricing starts to matter a lot.
If every workflow turns into agents repeatedly reading giant transcripts, per-token billing gets painful fast.
That’s one reason I think flat-rate infrastructure is a better fit for serious automation work. If your agents are running all day, you do not want your architecture decisions distorted by token anxiety.
Standard Compute is interesting here because it gives you an OpenAI-compatible API with flat monthly pricing instead of per-token billing. So if you’re experimenting with reviewer loops, supervisor agents, retries, or multi-step automations, you can optimize for reliability first instead of constantly asking whether each extra pass is too expensive.
That matters even more when you’re routing across different models for different roles.
For example:
- GPT-5.4 for implementation
- Claude Opus 4.6 for review
- Grok 4.20 for alternate reasoning or edge-case checks
That kind of setup is useful, but only if the cost model doesn’t punish every extra step.
A simple reviewer loop you can actually use
Here’s a minimal pattern I’d trust more than live bot chat.
Step 1: generate artifact
node worker.js --task tasks/task-142.json > artifacts/task-142.patch
Step 2: create handoff note
node create-handoff.js \
--task tasks/task-142.json \
--artifact artifacts/task-142.patch \
> reviews/task-142.handoff.json
Step 3: run fresh review
node review.js \
--handoff reviews/task-142.handoff.json \
--artifact artifacts/task-142.patch \
> reviews/task-142.review.json
Step 4: enforce bounded retry
node supervisor.js --task tasks/task-142.json --review reviews/task-142.review.json
Supervisor logic should be strict:
- if approved, publish artifact
- if changes requested and retry_count < max_retries, revise once
- otherwise fail closed and escalate
That gives you a workflow you can inspect later.
Quick comparison
| Approach | What actually happens |
|---|---|
| Real-time agent chat in Discord or Telegram | Shared live history, faster convergence, more supervision overhead, more irrelevant context |
| Structured reviewer handoff | Clear artifact, fresh review context, better critique, easier audit trail |
| File-based or session-based coordination in OpenClaw | Lower platform friction, deterministic state, easier retries and checkpointing |
My default rule now
If the job needs reliability, don’t start with agent chat.
Start with:
- explicit artifacts
- structured handoff notes
- isolated review passes
- checkpointed state
- bounded retries
That pattern is less flashy than a multi-agent group chat.
It is also much easier to operate.
The best multi-agent systems I can imagine do not act like a group chat.
They act more like a newsroom or a code review pipeline.
One agent files a draft.
Another reviews it fresh.
A supervisor checks whether it meets the bar.
The result gets approved, rejected, or revised with a clear paper trail.
That is the design lesson I’d steal from those OpenClaw threads.
Don’t optimize for sociable agents.
Optimize for legible handoffs and clean review.
Because the common failure mode in multi-agent work is not silence.
It’s two agents confidently talking each other into the same mistake.
Top comments (0)