Lars Winstand

Posted on May 28 • Originally published at standardcompute.com

I thought the OpenClaw ADHD thread was a joke, then it taught me when agent delegation is worth 5x cost

#ai #automation #agents #openai

I clicked the “I gave OpenClaw ADHD” post expecting a meme.

Instead, I found one of the clearest explanations of why so many coding agents still fail in production.

The thread was on r/openclaw, and the idea was simple: stop forcing OpenClaw to think in one straight line. Let it generate multiple candidate thoughts, run a critic over them, and keep the promising branches.

That sounds goofy until you realize it’s basically Tree of Thoughts with a better Reddit title.

And the most useful part wasn’t the cleverness. It was the tradeoff.

The author said the approach increased cost by around 5x and latency by around 10x.

That’s not a footnote. That’s the whole design discussion.

If you’re building agents with OpenClaw, LangGraph, AutoGen, n8n, Make, or your own OpenAI-compatible stack, this is the real question:

When should an agent branch and explore, and when should it stop thinking and hand execution to deterministic code?

The production lesson hidden inside a joke post

The best line in that thread was this:

“Limitation : It shoots cost by ~5x and time to output by ~10x but enables instant novel thinking. Good for brainstorming and planning, not for coding.”

That is more useful than most polished agent demos.

Because now we have an actual rule:

branching helps when the task is ambiguous
branching hurts when the task is obvious and execution-heavy

So the right architecture is not “always use more agents.”

It’s “branch selectively.”

That sounds obvious, but a lot of agent systems still do the opposite. They treat every task like a reasoning benchmark and then wonder why cost explodes and reliability collapses.

What actually breaks in agent loops

Most agent failures are not about raw model intelligence.

They’re about premature commitment.

The agent picks the first plausible plan and starts digging.

You’ve seen this if you’ve watched GPT-5, Claude Opus, or Grok inside an agent loop:

It picks one interpretation of the task
It chooses one query, one file, one API path
It commits too early
You discover 20 steps later that step 2 was wrong

A longer prompt does not fix that.

A bigger context window does not fix that.

“Think carefully” definitely does not fix that.

What helps is giving the runtime permission to do what good engineers do naturally:

Generate a few plausible approaches
Score them
Kill weak options early
Commit only when one path clearly wins

That’s the useful part of Tree of Thoughts for real systems.

Linear loops vs branching runtimes

Here’s the practical version.

Approach	What it’s good at
Linear agent loop	Low cost, low latency, straightforward execution
Controlled branching	Exploring alternatives, planning, research, ambiguous tasks
Deterministic tools and services	Retries, batching, pagination, permissions, long-running operations

These are not competing ideas.

They should be stacked.

Use branching for thinking.

Use deterministic infrastructure for doing.

A simple mental model

I’ve started using this rule:

Branch for thinking. Constrain for execution.

That one sentence explains a lot of agent failures.

If your agent is trying to do planning, execution, retries, pagination, permissions, and recovery inside one giant prompt-shaped blob, it will look smart in a demo and flaky in production.

Where branching helps

Branching is useful when the first answer is often wrong.

Good examples:

architecture decisions
migration planning
root-cause investigation
research tasks
safety analysis
early design before implementation

Example pseudo-flow:

candidates = planner.generate_candidates(task, n=4)
scored = [critic.score(c) for c in candidates]
best = select_top(scored, threshold=0.8)
return executor.run(best)

That’s a lot better than pretending one pass through a giant prompt will always produce the best plan.

Where branching is a terrible idea

There are tasks where extra reasoning is mostly waste.

Bad places to branch:

sending 1000 BoldSign documents
paginating through an API
retrying failed webhooks
bulk transcription
applying a known code change across many files

For those, the model should choose the action, then get out of the way.

For example, this should not be “reasoned through” 1000 times:

for page in $(seq 1 100); do
  curl -s "https://api.vendor.com/items?page=$page" >> items.json
  sleep 0.2
done

That belongs in code.

Not in a loop where Claude or GPT-5 decides every page turn.

The best OpenClaw insight wasn’t about OpenClaw

While reading OpenClaw threads, I found another useful point from teams using it in enterprise workflows.

Their argument was basically this:

Generic MCP wrappers around vendor APIs don’t scale if the model has to do all the integration work itself.

That means the model ends up handling:

endpoint selection
pagination
retries
looping
error handling
permission-sensitive actions

That is exactly where many “agent framework” projects go wrong.

They push operational logic into the model because it feels flexible.

But flexibility is not the same thing as good architecture.

A saner shape looks more like this:

Agent planner -> tool router -> deterministic service -> result

Or, if you want to be more explicit:

OpenClaw -> runtime/orchestrator -> locked-down worker

The planner decides.

The worker executes.

The runtime handles retries, batching, timeouts, and state.

That’s delegation.

Example: branch on planning, not on execution

Here’s a pattern I’d actually ship.

Step 1: branch only for plan generation

{
  "task": "migrate a webhook ingestion service from polling to event-driven processing",
  "mode": "branch",
  "max_branches": 3,
  "critic": true
}

Step 2: select one plan

{
  "selected_plan": "use queue-backed ingestion with idempotent consumers and replay support"
}

Step 3: switch to deterministic execution

./run_migration_checklist.sh
terraform apply -auto-approve
python scripts/verify_replay.py

That split matters.

If you keep branching all the way through execution, you pay for lots of extra thought where thought no longer helps.

Cost is the part people avoid talking about

Branching is intellectually attractive and economically annoying.

The OpenClaw post got attention because people recognized the behavior immediately. But they also recognized the bill.

A 5x cost increase and 10x latency increase is not “slightly more expensive.” It’s a different product decision.

And this is where infrastructure starts to matter more than prompting.

If you’re paying per token, branching gets painful fast.

Every uncertain step becomes a mini fan-out event:

3 candidate plans
3 critiques
1 synthesis
maybe another branch after that

That’s how a simple workflow turns into a surprise invoice.

For teams running agents all day in n8n, Make, Zapier, OpenClaw, or custom automations, this is exactly the kind of thing that kills experimentation. You stop asking the agent to explore because every branch feels expensive.

That’s one reason unlimited compute is such a big deal for agent workflows.

If your API is a drop-in replacement for the OpenAI API and you’re on a flat monthly plan instead of per-token billing, you can afford to use branching where it actually helps instead of designing around token anxiety.

That’s the interesting part of Standard Compute for agent builders: it makes selective branching much easier to use in practice because you’re not constantly asking whether one more planning pass is worth the cost.

You still need discipline. Unlimited compute does not mean unlimited bad architecture.

But it does mean useful patterns like:

branch on high-uncertainty tasks
cap branch width
prune aggressively
route execution to deterministic tools
keep agents running 24/7 without obsessing over token spend

A practical routing policy

If you’re building an agent runtime today, this is a decent starting point.

routing:
  planning:
    strategy: branch
    max_branches: 3
    critic: true
    timeout_seconds: 20
  execution:
    strategy: deterministic
    retries: 3
    batch_size: 100
  fallback:
    strategy: linear
    timeout_seconds: 10

That policy is boring, which is exactly why it works.

My opinionated take

Most multi-agent hype is fake separation.

Researcher agent. Critic agent. Planner agent. Reviewer agent.

A lot of the time it’s just one model wearing different hats in the same mushy loop.

What actually matters is separation of concerns.

Real delegation means:

one component explores options
one component scores or selects
one component executes with narrow permissions
one component handles retries, batching, and recovery

That’s architecture.

Not prompt cosplay.

What I’d build right now

If I were designing an agent stack for production use, I’d do this:

Use a fast OpenAI-compatible endpoint for standard agent calls
Enable branching only on tasks marked high-uncertainty
Keep branch width small
Use deterministic services for API-heavy operations
Add hard timeouts and retry policies outside the model
Track where branching improves outcomes and where it just burns latency

And if I expected the system to run continuously, I’d strongly prefer flat-rate compute over per-token pricing.

Because once you let agents branch, retry, and run long workflows, token-based pricing stops feeling like billing and starts feeling like a brake pedal.

The actual takeaway

The “I gave OpenClaw ADHD” thread was funny.

It was also right.

The next step in agent design is probably not “write an even smarter prompt.”

It’s this:

branch when the problem is uncertain
constrain when the job is operational
delegate execution to deterministic systems
don’t let pricing punish the architecture you actually want to use

That’s the useful lesson hiding inside the joke.

And honestly, it’s better than most serious agent advice I’ve read lately.

DEV Community