Your OpenClaw Slack Agent Is Burning Money on Sub-Agents
We added sub-agents to our Slack setup in February. By the end of the first week, our daily token spend had tripled. Not because sub-agents are expensive in theory; because nobody thought about what happens when an autonomous agent decides to spawn other autonomous agents inside a busy Slack workspace.
The bill was $47 on Tuesday. It had been $14 the previous Tuesday. Same workspace, same number of users, same channels being monitored. The only change was enabling sub-agent spawning.
Here's what went wrong, and what I'd do differently now.
How Sub-Agent Spawning Works in Slack
OpenClaw's sub-agent system lets your main agent delegate tasks to child agents. Each sub-agent gets its own context window, its own tool access, and its own token budget. The main agent acts as an orchestrator; it receives a request, decides it's complex enough to warrant delegation, and spawns a sub-agent to handle part of it.
In a Slack context, this looks like: someone asks "what's the status of the billing migration?" The main agent decides it needs to check GitHub PRs, Linear issues, and recent Slack messages. Instead of doing all three sequentially, it spawns three sub-agents to run in parallel.
Sounds efficient. And it is, if you control it. We didn't.
Where the Money Goes
Each sub-agent starts with its own system prompt. That's input tokens. Each one loads the tool descriptions for whatever MCP servers it has access to. More input tokens. Each one gets a copy of whatever context the main agent decides is relevant. Even more input tokens.
A single user question that spawns three sub-agents doesn't cost 3x what a single-agent response costs. It costs 5-7x, because each sub-agent has overhead that the main agent already paid for. System prompts get re-processed. Tool descriptions get re-loaded. Context gets duplicated.
Our agent had access to six MCP servers. Each tool description is 200-500 tokens. Six servers, 4-5 tools each, that's roughly 6,000 tokens of tool descriptions loaded into every sub-agent context before it does anything useful. Multiply by three sub-agents per complex query, and you're spending 18,000 tokens on tool descriptions alone. That's before the actual question gets processed.
We were averaging 12 sub-agent spawns per hour during business hours. At roughly 25,000 tokens per spawn (prompt + tools + context + response), that's 300,000 tokens per hour. About $3/hour on input tokens alone, during a workday. Times 8 hours, $24/day just in sub-agent overhead. Our base agent cost $14/day before sub-agents.
The Model Routing Mistake
Here's what made it worse: we were running sub-agents on the same model as the main agent. Claude Opus for everything, because that's what the config defaulted to.
Most sub-agent tasks don't need Opus. Checking a GitHub PR status? Sonnet handles that fine. Searching Slack messages for mentions of a keyword? Haiku could do it. Summarising a Linear issue? Sonnet, easily.
OpenClaw lets you set a default model for sub-agents separately from the main agent. We changed ours to Sonnet and saw an immediate 60% cost reduction on sub-agent operations. The quality drop was negligible for retrieval tasks. For complex reasoning, the main agent handles that itself before spawning.
The config is one line: agents.defaults.subagents.model. If you haven't set it, you're paying premium rates for grunt work.
The Spawning Trigger Problem
The bigger issue was how often the agent decided to spawn. OpenClaw's default behaviour is... generous. The main agent has a lot of latitude in deciding when a task warrants delegation. In our experience, it was spawning sub-agents for tasks that a single agent could handle in one pass.
"What's the deploy status?" doesn't need three parallel sub-agents checking different sources. It needs one API call. But our agent's system prompt said "use sub-agents for complex queries," and the agent interpreted "complex" very broadly.
We rewrote the spawning guidance to be explicit: "Only spawn sub-agents when a task requires data from 3+ independent sources AND the user is waiting for a response AND parallel retrieval would save more than 10 seconds." That cut our spawn rate from 12/hour to 3/hour.
The lesson: LLMs are bad at self-regulating resource usage. They'll use every tool you give them. You need hard constraints, not guidelines.
Depth Limits Are Non-Negotiable
OpenClaw supports nested spawning: a sub-agent can spawn its own sub-agents. The default maxSpawnDepth is... well, it used to be unlimited. That's fixed now, but if you're on an older version, check your config.
We had a sub-agent spawn a child to "verify" a result by querying the same data source from a different angle. The child spawned its own child to "cross-reference." This recursive spawning burned through $8 in about 90 seconds before the timeout killed the chain.
Set maxSpawnDepth: 1 unless you have a specific orchestration pattern that requires nested delegation. Depth 2 is occasionally useful for coordinator patterns. Depth 3+ is almost never worth the cost.
What Actually Works for Slack
After a month of tuning, here's our working pattern:
Main agent on Opus, sub-agents on Sonnet. The main agent handles reasoning, conversation, and deciding what to delegate. Sub-agents handle retrieval and simple tasks. The cost difference is 5x per token, and sub-agents rarely need the reasoning depth.
Strict spawning criteria. We moved from "spawn when it seems useful" to explicit conditions in the system prompt. Sub-agents are for parallel retrieval across independent sources. Everything else, the main agent does itself.
Shared tool results. Instead of each sub-agent loading all tool descriptions, we scope sub-agents to only the tools they need. A GitHub sub-agent gets GitHub tools. A Slack search sub-agent gets Slack tools. No 6,000-token tool description overhead per spawn.
Token budgets per sub-agent. OpenClaw doesn't enforce this natively yet, but you can set it in your system prompt: "Each sub-agent response should be under 500 tokens. Return data, not analysis." This prevents sub-agents from writing essays when you just need a PR status.
Batch over spawn. If three people ask similar questions within a few minutes, don't spawn three independent sub-agents to query the same data. Cache the first result and reuse it. We built a simple TTL cache for MCP tool results — 60-second expiry. Cut redundant spawns by about 40%.
The Managed Platform Advantage
I'll be transparent: SlackClaw was purpose-built to solve exactly these problems. The platform handles model routing for sub-agents automatically, enforces spawn limits and depth caps by default, and gives you per-channel cost dashboards that break down main agent vs sub-agent spend.
When we were self-hosting, figuring out that sub-agents were the cost driver took us three days of manual log analysis. On SlackClaw, you'd see it in the dashboard on day one.
That said, the patterns above work for self-hosted setups too. You just have to build the instrumentation yourself, which we covered in a previous post about observability for Slack agents.
The Numbers After Optimisation
Before tuning: $47/day, 12 spawns/hour, all Opus, no depth limits.
After tuning: $18/day, 3 spawns/hour, Sonnet for sub-agents, depth capped at 1, scoped tools.
That's a 62% reduction. The agent answers the same questions with the same quality. It just doesn't burn tokens on unnecessary overhead.
If you haven't checked your sub-agent costs recently, you probably should. The number might surprise you the way it surprised us.
Helen Mireille is chief of staff at an early-stage tech startup. She writes about the gap between AI agent demos and production economics.
Top comments (0)