zac

Posted on Apr 14 • Originally published at remoteopenclaw.com

How to Set Up Hermes Multi-Agent: Step-by-Step Guide

#claude #ai #productivity #tutorial

Originally published on Remote OpenClaw.

Hermes Agent's multi-agent system, introduced in v0.6.0, uses an orchestrator-worker pattern where a main agent decomposes complex tasks and spawns specialist subagents that run in parallel with isolated conversation threads. Unlike OpenClaw's multi-agent setup (which routes messages to separate, fully isolated agents), Hermes multi-agent is built for collaboration: the orchestrator delegates, workers execute, and structured result objects flow back for synthesis.

Key Takeaways

Hermes multi-agent uses an orchestrator-worker pattern where one agent coordinates and spawns specialist subagents for parallel task execution.
Each subagent gets its own conversation thread, terminal, and API calls — fully isolated from other workers but coordinated by the orchestrator.
Workers exchange typed result objects with the orchestrator, not raw conversation summaries, preventing context degradation across agents.
Configurable concurrency limits prevent runaway agent fleets from exhausting API rate limits or VPS memory.
Community best practice: use a cheap, fast model (Kimi K2.5) for the orchestrator and a capable model (Claude) for validation.

In this guide

How Hermes Multi-Agent Works
Hermes vs OpenClaw Multi-Agent
Step 1: Configure the Orchestrator
Step 2: Spawn and Manage Subagents
Step 3: Set Up Communication Patterns
Step 4: Shared Skills and Memory
Real-World Multi-Agent Configurations
Limitations and Tradeoffs
FAQ

How Hermes Multi-Agent Works

Hermes multi-agent orchestration implements a hierarchical task decomposition model. The orchestrator (main agent) analyzes a complex task, identifies the optimal work breakdown structure, and spawns specialist worker agents with tailored context.

Each worker receives only the task-relevant subset of the full conversation context, plus any shared skills or knowledge the orchestrator decides to pass. As of April 2026, multi-agent is a core feature in Hermes v0.8.0+ with resource-aware scheduling and configurable concurrency limits built in.

The execution model supports both synchronous and streaming results. The orchestrator can fire multiple workers simultaneously and wait for all results before proceeding, or begin synthesis as early outputs arrive. This flexibility lets you build anything from simple parallel research tasks to complex sequential pipelines.

For the foundational single-agent setup, see our Hermes Agent setup guide.

Hermes vs OpenClaw Multi-Agent

Hermes and OpenClaw take fundamentally different approaches to running multiple agents. The distinction matters when choosing which platform to use for multi-agent workflows.

Feature

Hermes Agent

OpenClaw

Architecture

Orchestrator-worker (collaborative)

Isolated agents with message routing

Agent communication

Typed result objects via structured message passing

No cross-agent communication by design

Spawning method

Natural language ("spawn a research agent") or CLI

Pre-configured in openclaw.json

Memory sharing

Shared skills directory; PLUR plugin for engram sharing

Fully isolated workspaces per agent

Model per agent

Yes — different models per orchestrator and worker

Yes — model field per agent definition

Concurrency control

Configurable limits to prevent over-spawning

Limited by hardware resources

Best for

Complex tasks requiring coordination

Separate roles with data isolation

OpenClaw multi-agent excels at giving different people or roles their own isolated agent with separate credentials and personality. Hermes multi-agent excels at breaking one complex task into parallel workstreams that collaborate. For the full OpenClaw approach, see our OpenClaw multi-agent setup guide.

Step 1: Configure the Orchestrator

The orchestrator is your main Hermes instance — the agent that receives incoming tasks, decides how to decompose them, and coordinates worker output. Configure it in your ~/.hermes/config.yaml file.

# ~/.hermes/config.yaml
agent:
  name: "orchestrator"
  model: "nous/hermes-3-405b"  # or a fast model like kimi/k2.5
  multi_agent:
    enabled: true
    max_concurrent_workers: 4
    worker_timeout: 300  # seconds per worker task
    result_format: "structured"  # typed result objects

The max_concurrent_workers setting prevents runaway spawning. Set it based on your API rate limits and available memory. Most operators run 2-4 concurrent workers comfortably on a standard VPS.

A common optimization from the Hermes community: use a cheap, fast model for the orchestrator (it only needs to decompose tasks and route results) and reserve expensive models for workers that do the actual reasoning.

Step 2: Spawn and Manage Subagents

Hermes subagents can be spawned through natural language commands or the CLI. Each subagent gets its own conversation thread, its own terminal session, and makes its own API calls independently.

Natural Language Spawning

During a conversation with your orchestrator, simply ask it to delegate:

# In conversation with the orchestrator:
"Research the top 5 competitors in the CRM space in a subagent"
"Spawn a coding agent to refactor the authentication module"
"Run a research agent and a writing agent in parallel"

CLI Management

# List active agents
hermes agents list

# Switch to a specific agent
hermes agents use research-agent

# Stop a running worker
hermes agents stop research-agent

Workers inherit the orchestrator's API credentials by default but can be configured with separate keys for billing isolation.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Step 3: Set Up Communication Patterns

Agent-to-agent communication in Hermes flows through a structured message passing layer. Workers do not share memory or conversation context directly. Instead, they exchange typed result objects that the orchestrator validates, transforms, and routes.

This design prevents the "telephone game" degradation common in naive multi-agent setups, where agents summarize each other's summaries and information degrades with each hop. The orchestrator receives raw structured outputs and handles synthesis itself.

Result Object Structure

{
  "agent_id": "research-agent-01",
  "task": "Analyze competitor pricing in CRM market",
  "status": "complete",
  "result": {
    "findings": [...],
    "sources": [...],
    "confidence": 0.87
  },
  "duration_seconds": 42
}

Communication Patterns

Fan-out / Fan-in: Orchestrator spawns multiple workers in parallel, waits for all results, then synthesizes. Best for research and data gathering.
Pipeline: Output from one worker feeds into the next. Best for sequential workflows like research-then-write.
Validator loop: A dedicated validator agent reviews worker output and sends it back for revision if quality is insufficient.

Step 4: Shared Skills and Memory

By default, each Hermes agent has isolated memory. However, skills stored in the shared ~/.hermes/skills/ directory are accessible to all agents on the same machine, giving workers access to learned procedures without manual copying.

Enabling Shared Memory with PLUR

For true shared memory between agents, the PLUR community plugin enables engram sharing — corrections and learnings from one agent propagate to others working on the same project. Install it as a plugin:

hermes plugins install plur-memory

With PLUR enabled, when your coding agent learns a project convention (like a preferred naming pattern), that knowledge propagates to the research agent working on the same codebase. For more on how Hermes handles memory, see our Hermes Agent memory system guide.

Real-World Multi-Agent Configurations

Three production patterns have emerged from the Hermes community as the most effective multi-agent setups.

Content Production Pipeline

A social media manager agent writes posts, a visual agent generates branded images, and an ads creator agent consumes both outputs — all coordinated by a Kimi-powered orchestrator with a MiniMax validator reviewing quality before anything ships.

Development Workflow

An orchestrator receives a feature request, spawns a research agent to analyze the codebase, a coding agent to implement changes, and a testing agent to verify the output. The orchestrator reviews all results before presenting the final implementation.

Research and Analysis

Multiple research agents investigate different aspects of a topic in parallel. Each returns structured findings. The orchestrator synthesizes a comprehensive report, resolving contradictions across sources and highlighting consensus points.

Limitations and Tradeoffs

Hermes multi-agent is powerful but has clear constraints that affect production use.

API cost multiplication. Each subagent makes its own API calls. A 4-worker setup can cost 4-5x a single-agent workflow. Use cheap models for orchestration and limit concurrency to control spend.
No persistent worker state. Workers are ephemeral by default. They do not retain memory between tasks unless you explicitly configure shared skills or PLUR. Each spawn is a fresh agent.
Orchestrator bottleneck. The orchestrator is a single point of coordination. If the orchestrator model is slow or makes poor decomposition decisions, the entire pipeline suffers. This is why a fast model matters here.
Debugging complexity. When something goes wrong in a 4-agent pipeline, tracing the issue requires inspecting multiple conversation threads. Use hermes agents list and structured logging (available since v0.8.0) to diagnose.
Not suitable for simple tasks. If a single agent can handle the task in one pass, multi-agent adds latency and cost without benefit. Use multi-agent when parallelism or specialization genuinely improves the outcome.

Related Guides

FAQ

How many Hermes subagents can run at the same time?

The practical limit depends on your API rate limits and server resources. Most operators run 2-4 concurrent workers. The max_concurrent_workers config setting enforces a hard cap to prevent runaway spawning that exhausts memory or API quotas.

Can Hermes subagents use different AI models?

Yes. Each subagent can run a different model. The recommended pattern is a cheap, fast model (like Kimi K2.5) for the orchestrator and a more capable model (like Claude Opus) for workers handling complex reasoning or validation tasks.

Do Hermes agents share memory with each other?

By default, no. Each agent has isolated memory. Skills in the shared ~/.hermes/skills/ directory are accessible to all agents. For true cross-agent memory sharing, install the PLUR community plugin, which propagates learnings between agents working on the same project.

How is Hermes multi-agent different from OpenClaw multi-agent?

OpenClaw multi-agent routes messages to separate, fully isolated agents — each with its own workspace and no cross-agent communication. Hermes multi-agent uses an orchestrator-worker pattern where agents actively collaborate through structured message passing. OpenClaw is better for role separation; Hermes is better for task coordination.

What happens if a Hermes subagent fails?

The orchestrator receives a failure status in the result object. Depending on configuration, it can retry the task, reassign it to a different worker, or escalate to the human operator. The worker_timeout setting prevents workers from running indefinitely.

DEV Community