Behram

Posted on Feb 21

I Built a 5-Agent AI Collaborative Operating System with OpenClaw: A Full Technical Breakdown!!!

#ai #productivity #tutorial #beginners

I spent a significant amount of time transforming OpenClaw from a single assistant into a multi-role collaborative operating system. This isn't just "running a few bots that chat independently."

5 AI roles share a single gateway, operate across Discord and Telegram channels, and have clear divisions of labor, routing, memory isolation, and collaboration rules. They work together like a relay team.

In this post, I am breaking down the entire building process, the design decisions at every layer, specific configurations, and the pitfalls I encountered.

If you are using OpenClaw or are interested in how to make multiple AI agents truly collaborate, this guide should help you avoid many detours.

The Conclusion: This is an "Agent OS" under a Single Gateway, not just "Multiple Bots"

When people hear "5 AI roles," their first reaction is often: "You're just running 5 independent bots, right?"

Yes and no. My architecture is designed like this:

1 Gateway Process: Unifies all channel access and routing.
5 Independent Agents: Commander, Strategist, Engineer, Creator, and Think Tank.
Independent Workspaces: Each agent has its own isolated workspace (personality, rules, memory, and sessions are all separated).
Dual-Channel (Discord + Telegram): They run on both simultaneously. I use Discord as the primary workspace, using "bindings" to precisely distribute messages.
Private vs. Group Chat: They use completely different mechanisms for DMs and group interactions.

Analogy: This isn't just hiring 5 people and throwing them in a room to do whatever they want. This is building a company—with an organizational structure, job descriptions, communication protocols, independent offices, and meeting rules.

OpenClaw itself is an open-source personal AI assistant framework supporting multiple platforms (Discord, Telegram, WhatsApp, etc.) and models (Claude, GPT, Gemini, etc.), with data stored 100% locally.

Its multi-agent capabilities are the core reason I chose it—native support for independent agent workspaces and "bindings" routing allowed me to build a true collaborative system on top of it.

I. Architecture: Single Gateway + Multi-Agent + Multi-Workspace + Multi-Channel

Let's discuss the foundational architectural decisions.

1) Unified Hosting via Single Gateway

Currently, one OpenClaw Gateway process carries all capabilities—message access, routing, session management, tool calling, memory indexing, and state management—all in one gateway.

Why not run a separate service for each role? Three reasons:

Centralized Maintenance: I only maintain one Gateway instead of five independent services.
Unified Configuration: One master configuration manages global strategy, making monitoring and troubleshooting easier.
Collaboration Foundation: For roles to collaborate, they must be in the same runtime for efficient communication.

2) 5 Parallel Agents (Not "Loose" Bots)

My five fixed roles:

Commander: Global situational awareness, task decomposition, assignment, correction, and closing.
Strategist: Strategic analysis, scheme evaluation, and risk prediction.
Engineer: Technical execution, code implementation, and system maintenance.
Creator: Content creation, expression optimization, and external output.
Think Tank: Knowledge auditing, quality control, and compliance checks.

Every agent has its own workspace (e.g., workspace-engineer, workspace-strategist). Personality files, rule files, memory files, and script assets are all independent and never "pollute" each other.

3) Multi-Channel Access: Discord + Telegram

The same Gateway connects to Discord and Telegram simultaneously. Each role has "accountId" level bindings on both channels. You could use this same config to connect to Feishu, WeChat, etc.

This isn't "redundant deployment" across platforms; it's "one brain cluster, different access layers." I have configured Discord as the primary collaborative battlefield.

If you want multiple agents to cooperate in a group, Discord is the best choice. I've tried others; they aren't as perfect for this.

II. Routing Layer: Using "Bindings" to Map "Accounts" to "Roles"

This is the entry logic of the entire system. I use an explicit binding strategy: channel + accountId -> agentId.

Specifically:

discord + account_commander -> commander
discord + account_engineer -> engineer
telegram + account_creator -> creator
...Totaling 10 mappings (5 roles × 2 channels).

Why do this? Because the system decides "who should handle this message" at the entry layer, rather than letting all agents hear it and scramble to answer. If this step isn't done well, all subsequent collaboration becomes chaotic.

Bindings act as the "triage desk" of the system. When a message comes in, the system checks the channel and account ID and routes it directly to the correct role.

III. Session Isolation: Why I can achieve "No mixing in DMs, no mess in Groups"

This is one of the most critical engineering points in my system. The core configuration is: session.dmScope = per-account-channel-peer.

This parameter means private chat context is isolated by three dimensions: "Account + Channel + Peer User."

Why choose this?

If the same person contacts the same role via Discord and Telegram, the contexts won't mix.
If different users contact the same role, their contexts are completely isolated.
In multi-agent/multi-account scenarios, the risk of "cross-contamination" is minimized.

In other words, I didn't just create "multiple roles"; I engineered a "context isolation strategy." Many people build multi-agent systems where roles are clear, but context management is a mess—User A's private chat ends up in User B's reply, or Discord dialogue memory pollutes Telegram context. per-account-channel-peer is the official recommended strategy for multi-account scenarios in OpenClaw, and it has proven to be the most stable choice.

IV. Group Chat Orchestration: Rule-Driven Collaboration, Not Free Chat

This is the most interesting part—and where the most pitfalls are. The core strategy: Commander global monitoring + Other roles triggered by @mentions.

My Discord group chat strategy is:

Commander: requireMention = false (Global Listening)
- Can see all messages in the group by default.
- Responsible for capturing the global situation, judging if collaboration is needed, decomposing tasks, and assigning work.
Other 4 Roles: requireMention = true (@mention Trigger)
- Only act when explicitly @mentioned.
- This reduces noise and prevents agents from talking over each other.
Mention Patterns: Every role has mentionPatterns configured.
- For example, the Engineer can be triggered by @Engineer or @engineer. This makes "summoning" in the group stable and predictable.

The essence of this setup:

The Commander "sees the whole picture," like a Project Manager.
Specialized roles are "triggered on demand," like subject matter experts.
Group discussions change from "free-form scattering" to "controlled relay."

The result: When you ask a question in the group, the Commander first judges the task type, then @mentions the corresponding role. Once the role finishes, the Commander closes the loop. It feels like a real team meeting.

V. Discord vs. Telegram: Why Discord is the Primary Battlefield

Strictly speaking, it's not that "only Discord can collaborate." It's just that Discord is currently the best fit for multi-role public collaboration orchestration.

Specific reasons:

Discord allows 5 accounts to run in parallel with clear @mention mechanisms.
Role identities, dialogue chains, and the relay process are all visible—making it look like a team discussion.
The "Commander listens/Others mention-gate" strategy is more intuitive in group chats.
My groupPolicy for Discord is set to open, offering higher flexibility.

On Telegram, my strategy is more restricted (allowlist + mention gate), making it a more "controlled production channel." Discord is the stage for collaboration.

VI. Config Layer + Prompt Layer: Dual-Track Governance

This is the biggest difference between this system and "just playing around." I don't rely only on configuration or only on prompts. I use both.

A. Configuration Track (Platform-Level Control)

These are hard configurations at the OpenClaw platform level:

Channel Policy: groupPolicy, dmPolicy control basic group/DM behavior.
requireMention: Defines who responds by default vs. who needs an @mention.
Bindings: Message routing mappings.
dmScope: Session isolation granularity.
Agent-to-Agent Ping-Pong Limits: I set this to 0 to suppress meaningless back-and-forth between agents.
- This is crucial. Without this, two AI agents might get stuck in an infinite loop of "Thank you" and "You're welcome." Setting it to 0 tells the system: "Don't automatically ping other agents."

B. Rule Track (Behavior-Level Control)

These are rule files I wrote inside each workspace:

SOUL.md: The "soul" of the role—personality, tone, responsibilities, and quality floor.
AGENTS.md: Operational manual—collaboration check processes, memory read/write standards, and lazy-loading strategies.
ROLE-COLLAB-RULES.md: Role-specific collaboration boundaries and "red lines."
TEAM-RULEBOOK.md: Unified hard rules for the team (shared by all roles).
TEAM-DIRECTORY.md: Mapping roles to real IDs to prevent @mentioning the wrong person.

The effect: The platform layer limits flow, while the behavior layer constrains action. We don't just hope the model "behaves." Models drift and forget rules, so you need hard constraints in the config layer and soft guidance in the prompts. Double insurance.

VII. Workspace File System: "Independent Offices" for Each Role

Every workspace has a nearly identical file skeleton. This standardization is key.

File	Purpose
SOUL.md	Role Soul: Personality, behavior patterns, quality floor.
AGENTS.md	Manual: Collaboration processes, memory standards, checklists.
ROLE-COLLAB-RULES.md	Boundaries: What this role can and cannot do.
IDENTITY.md	Definition: Name, positioning, scope of ability.
USER.md	User Profile: Preferences, goals, taboos, terminology.
TOOLS.md	Tool List: Permitted tools and permission boundaries.
MEMORY.md	Long-term Memory: Stable preferences, decisions, reusable experience.
GROUP_MEMORY.md	Group Memory: Reusable and safe information for group context.
HEARTBEAT.md	Heartbeat: Periodic self-checks, recovery, state maintenance.
*memory/YYYY-MM-DD.md**	Daily Logs: Tasks, context fragments, on-site decisions for the day.

VIII. Memory System: Lazy Loading + Layering + Archiving

Memory management is the most overlooked part of multi-agent systems, yet it's the easiest to break. My strategy is "quality over quantity," using a tiered approach:

Short-term Logs (Daily Memory): Records daily task processes and context. Files are named by date, creating a natural timeline.
Long-term Memory (MEMORY.md): Consolidates stable preferences and decisions. Only verified, stable information is written here.
Group Long-term Memory (GROUP_MEMORY.md): Keeps reusable and safe information from groups. Private content is never mixed in—this is a privacy red line.
Cold Archiving (Archive): Old data is archived periodically to prevent the active context from bloating. It’s moved to low-priority storage rather than deleted.
Retrieval Mechanism (memory_search + memory_get): Semantic recall followed by precise reading. This avoids loading everything at once—context windows are limited resources.

Core value of this system:

Private chat quality isn't polluted by group history.
Group collaboration isn't interfered with by private context.
Context is "loaded on demand" rather than "poured in full."

I treat context budget as a resource management problem. Tokens are limited; every memory added occupies reasoning space. You have to be precise.

IX. DM Mode vs. Group Mode: One Role, Two Strategies

One thing people often miss: A role should behave differently in a DM than in a group chat. I've defined this in every SOUL.md:

DM Mode:

Each role acts as an end-to-end expert solving user problems.
No collaboration process needed; provide the full answer directly.
The standard is "one person can handle it."

Group Mode:

Follow team collaboration protocols for incremental relay.
Each role only handles its area of expertise.
The Commander is responsible for stringing it together and closing.

Specific Role Behaviors:

Commander: Stays silent in groups by default; intervenes strongly only when necessary to avoid talking over others.
Engineer: Deliverables must be executable, verifiable, and rollback-capable—not just an "idea."
Strategist: Conclusions must include hypotheses and verification paths—not just a "guess."
Think Tank: Audits must provide problem classification + repair plans—not just saying "there's a problem."
Creator: Expression must not sacrifice authenticity or executability—it shouldn't just "look good."

Difference in behavior comes from the rule files, not the model's own judgment.

Final Words

Multi-agent setups aren't just about opening more bots. It's a complete engineering system—from architecture, routing, and isolation to memory and governance.

OpenClaw provides a great foundation, but the journey from "it runs" to "it runs well" involves more engineering than most imagine.

If you're building something similar, I hope this helps. This is just the beginning; I'll share more specific deep-dives soon. 🥤

DEV Community