I Built a Multi-Agent Research Team in OpenClaw — Here's the Config That Made It Work

#openclaw #ai #agents #productivity

I needed to research a new LLM provider last Tuesday. Normally that means opening tabs, reading for an hour, taking notes, losing the thread, starting over. Instead, I spawned three subagents in OpenClaw, gave them distinct roles, and had a draft comparison on my desk in eighteen minutes.

This is what that setup actually looked like — not a toy demo, not a theoretical architecture. A real delegation hierarchy I now use every week.

The Problem With Single-Agent Research

When you run one agent on a research task, you hit a ceiling fast. The agent can either explore broadly (shallow on everything) or dig deep on one thing (misses context elsewhere). You end up babysitting it: re-prompting, asking for more detail, asking for less detail.

The fix isn't a smarter agent. It's a structure that lets multiple agents be dumb and focused.

OpenClaw's subagent system lets you spin up isolated sessions that can each own a slice of work. Here's the config I landed on after a few iterations.

The Three-Agent Setup

{
  "research_team": {
    "scout": {
      "role": "gatherer",
      "prompt": "You are a research scout. Find 5-7 relevant articles, docs, or benchmarks for {topic}. Return a markdown list with titles, URLs, and one-line descriptions. Be selective — only include sources that add distinct information.",
      "model": "claude-sonnet",
      "tools": ["web_search", "read"]
    },
    "analyst": {
      "role": "synthesizer", 
      "prompt": "You are a critical analyst. Review the gathered sources and identify: (1) consensus points, (2) conflicting claims, (3) gaps or unanswered questions. Write 300-400 words of structured analysis in markdown.",
      "model": "claude-sonnet",
      "tools": ["read"]
    },
    "writer": {
      "role": "communicator",
      "prompt": "You are a technical writer. Using the analysis provided, write a first-person research briefing. Include: topic summary, key findings, what it means for practical use, and a recommendation. Write in a direct, specific voice. No hedging.",
      "model": "claude-opus",
      "tools": ["write"]
    }
  }
}

The scout gathers. The analyst distills. The writer packages. Each agent gets a narrow task, which means they do it fast and do it well.

How the Delegation Works in Practice

I trigger this with a single cron job that runs every Tuesday at 10 AM. The cron spawns an isolated session that runs this sequence:

# Spawn the research team
openclaw agents spawn --name research-scout --role gatherer --topic "$TOPIC"
openclaw agents spawn --name research-analyst --role synthesizer --sources "$SCOUT_OUTPUT"  
openclaw agents spawn --name research-writer --role communicator --analysis "$ANALYST_OUTPUT"

# Collect and assemble
openclaw agents collect --session research-writer --output research-briefing.md

The key insight: each agent's output becomes the next agent's input. The scout doesn't just dump links — it returns structured markdown that the analyst can immediately consume. The analyst doesn't summarize — it structures contradictions and gaps that the writer can address directly.

The model choice matters here. The scout and analyst run on claude-sonnet (fast, good enough for gathering and analysis). The writer runs on claude-opus (slower, better at coherent long-form). This keeps the pipeline cheap: I'm not paying opus rates for link gathering.

What Goes Wrong (And How I Fixed It)

Problem 1: The analyst would sometimes re-read the sources and draw different conclusions than the scout gathered.

Fix: I added an explicit sources field to the analyst prompt that must be cited inline. If the analyst cites something the scout didn't provide, I flag it in the review.

Problem 2: The writer would occasionally hallucinate a finding that wasn't in the analysis.

Fix: I added a "claim verification" step — the writer must tag each substantive claim with [source: analyst] or [source: scout]. Any untagged claim gets flagged.

Problem 3: Context bleeding between agents.

Fix: Each subagent runs in an isolated session with lightContext: true. They get only the payload I send them, not the full conversation history. This was the single biggest reliability improvement.

Results After Two Months

My research pipeline went from ~90 minutes (start to finished draft) to ~20 minutes. The quality is actually higher because each agent is optimized for its step rather than trying to be good at everything.

The other benefit people don't talk about: delegation changes your relationship with the task. Instead of being the researcher, you're the editor. You catch bad calls the agents make, but you make fewer of them because you're not deep in the weeds.

What I Learned

The biggest surprise: simpler agents outperform complex ones when working together. My first attempt had a "super-agent" that tried to do gathering, analysis, and writing in one pass. It was slower and noisier. Breaking it into three specialized agents — even though there's overhead in the handoffs — produced better output faster.

The second lesson: the model tier matters less than the prompt structure. A well-structured Sonnet outperforms a poorly-structured Opus. The delegation hierarchy forces you to structure your prompts because each agent has a specific, narrow input and output.

If you're running OpenClaw and not using subagents, you're leaving the most powerful orchestration feature on the table. Start small — one delegation, one handoff — and build up from there.

Questions about the setup? The configs above are stripped from my actual working files. Reach out if you want the full cron job payload.