Lars Winstand

Posted on May 20 • Originally published at standardcompute.com

I read the OpenClaw thread everyone shared — these 5 fixes cut agent costs to one-third and stopped the loops

#agents #ai #automation #openclaw

I read the OpenClaw thread everyone shared — these 5 fixes cut agent costs to one-third and stopped the loops

I clicked into a popular r/openclaw thread expecting the usual advice: tweak the prompt, pick a smarter model, maybe add more context.

Instead, the OP described the exact failure mode a lot of us hit when we move from demos to always-on agents:

Claude Opus 4.6 handling cheap background work
vague completion criteria
retries with no hard stop
state living inside prompts instead of durable storage
loops burning money while doing almost nothing useful

The useful part was that this wasn’t one silver bullet. It was a stack of practical fixes.

And the biggest one was brutally simple:

stop sending cheap work to expensive models

According to the thread, moving heartbeat checks, cron pings, and other low-value supervision off Claude Opus cut spend to about one-third.

That tracks with what I keep seeing in OpenClaw, n8n, Make, Zapier, and custom worker setups. The expensive part usually isn’t the main reasoning step. It’s the invisible scaffolding around it.

If you’re building long-running agents, these 5 fixes are worth stealing.

The pattern: cost problems start as reliability problems

Agents rarely become expensive because one prompt was huge.

They become expensive because a workflow can’t confidently tell whether it succeeded.

Then it retries.

Then it retries again.

Then it does all of that on Claude Opus 4.6.

That’s how you end up paying premium-model rates for what is basically daemon maintenance.

A rough version of the bad pattern looks like this:

while (!done) {
  const result = await callModel({
    model: "claude-opus-4-6",
    prompt: `Check whether the job completed. If not, decide what to do next. Context: ${hugeContext}`
  })

  if (result.saysDone) {
    done = true
  } else {
    await sleep(30000)
  }
}

This looks fine in testing.

It gets ugly when it runs 24/7.

Fix 1: Stop using Claude Opus for heartbeat checks and cron pings

This was the clearest lesson from the thread.

Claude Opus 4.6 is great for hard reasoning. It is a bad choice for cheap supervision.

Tasks that usually should not hit your most expensive model:

heartbeat checks n- cron-trigger validation
retry bookkeeping
simple routing
status classification
watchdog logic
"did this step finish?" checks

If the task is basically classification or state inspection, use a cheaper layer.

A cleaner architecture looks more like this:

async function routeTask(task: Task) {
  if (task.type === "heartbeat") {
    return lightweightCheck(task)
  }

  if (task.type === "status_check") {
    return gpt54StatusCheck(task)
  }

  if (task.type === "deep_reasoning") {
    return claudeOpusDecision(task)
  }

  if (task.type === "synthesis") {
    return grok420Synthesis(task)
  }
}

That’s the right mental model: model triage.

Not loyalty.

Not “send everything to the smartest model.”

Just match cost to task difficulty.

My take

The loser here is the all-Claude-Opus architecture. It feels elegant until you realize your agent is using a premium model to narrate its own retries.

If a task could be implemented as a boolean check, a rules engine, or a cheap classifier, don’t wrap it in expensive reasoning.

Fix 2: Add explicit success criteria or the agent will loop forever

A lot of agent loops are just weak definitions of done.

Bad:

“make sure the sync worked”
“confirm the task completed”
“retry if needed”

Better:

file exists at expected path
API returned HTTP 200
row count increased by 1
webhook delivered with matching job ID
CRM record status changed to processed

The thread’s OP improved reliability by making completion verifiable instead of interpretive.

That’s the difference between an agent that finishes and an agent that keeps thinking out loud.

Example:

async function verifyJobComplete(jobId: string) {
  const res = await fetch(`https://api.example.com/jobs/${jobId}`)
  const job = await res.json()

  return job.status === "completed" && job.output_url != null
}

Then your loop becomes:

for (let attempt = 1; attempt <= 5; attempt++) {
  await runStep(jobId)

  const ok = await verifyJobComplete(jobId)
  if (ok) return { success: true }

  await sleep(5000)
}

return { success: false, reason: "verification_failed_after_5_attempts" }

That’s boring code.

Boring is good.

Boring code is cheaper than “agent intuition.”

Fix 3: Put anti-loop rules in code, not just prompts

If your only loop prevention is “please do not retry excessively,” you do not have loop prevention.

You have wishful thinking.

Hard limits matter:

max retries per step
max retries per job
cooldown windows
duplicate action detection
dead-letter queue for stuck runs
escalation path to human review

A practical pattern:

const MAX_STEP_RETRIES = 3
const MAX_JOB_RETRIES = 10

async function shouldRetry(state: WorkflowState) {
  if (state.stepRetries >= MAX_STEP_RETRIES) return false
  if (state.jobRetries >= MAX_JOB_RETRIES) return false
  if (state.lastError === "invalid_input") return false
  return true
}

And log retry reasons explicitly:

{
  "jobId": "job_123",
  "step": "sync_customer",
  "retry": 2,
  "reason": "webhook_timeout",
  "nextAttemptInSeconds": 30
}

This is where a lot of teams get lazy. They let the model decide whether another retry “feels right.”

Don’t do that.

Retries are control flow. Control flow belongs in code.

Fix 4: Store state in Redis or Postgres instead of re-prompting old context

This one matters a lot for long-running OpenClaw jobs.

If an agent made a decision, store it somewhere durable.

Don’t keep shoving the same history back into the prompt and hope compaction preserves the important part.

That approach fails first when your workflow crosses tools.

A realistic automation might look like this:

OpenClaw decides to start a task
n8n waits for a webhook
Make transforms the payload
Zapier updates Salesforce or HubSpot
the agent wakes up six minutes later and needs to resume

If the only memory is inside a shrinking prompt window, drift is inevitable.

If the state is in Redis or Postgres, the agent can resume from facts.

Redis example

import Redis from "ioredis"

const redis = new Redis(process.env.REDIS_URL!)

async function saveWorkflowState(jobId: string, state: object) {
  await redis.set(`workflow:${jobId}`, JSON.stringify(state), "EX", 86400)
}

async function loadWorkflowState(jobId: string) {
  const raw = await redis.get(`workflow:${jobId}`)
  return raw ? JSON.parse(raw) : null
}

Postgres example

create table workflow_state (
  job_id text primary key,
  status text not null,
  last_decision jsonb not null,
  retry_count integer not null default 0,
  updated_at timestamptz not null default now()
);

Then your agent prompt can stay small and focused:

Job status: awaiting_webhook
Last decision: wait for provider callback
Retry count: 1
Next action options: [poll_status, mark_failed, continue_waiting]

That’s much better than pasting 4,000 tokens of historical narration back into every call.

My take

A lot of teams pay premium model costs to compensate for weak state handling.

That’s backwards.

Better state is cheaper than better prompting.

Fix 5: Separate orchestration from reasoning

This is the architectural version of the first four fixes.

Use code for orchestration.
Use models for reasoning.

Not the other way around.

Your worker should own:

retries
scheduling
idempotency
state transitions
timeout handling
webhook correlation
rate limiting

Your model should own:

ambiguous classification
planning when rules are insufficient
summarization
extraction when structure is messy
non-trivial decision-making

A simple split:

async function processJob(job: Job) {
  const state = await loadWorkflowState(job.id)

  switch (state.status) {
    case "awaiting_classification":
      return classifyWithGPT54(job)

    case "awaiting_complex_decision":
      return decideWithClaudeOpus(job)

    case "awaiting_status_check":
      return pollProviderAPI(job)

    case "awaiting_synthesis":
      return synthesizeWithGrok(job)

    default:
      throw new Error(`Unknown state: ${state.status}`)
  }
}

This is less magical than “autonomous agent does everything.”

It’s also much more reliable.

What changed after these fixes

The thread’s reported result was the kind of improvement that actually changes workflow design:

spend dropped to about one-third
loops were reduced
reliability improved
long-running jobs stopped losing the plot

That sequence makes sense.

First, move cheap recurring work off expensive models.
Then define what success actually means.
Then stop retries from becoming infinite.
Then give the agent durable state.

Once you do that, you stop paying for confusion.

The practical checklist

If you’re running OpenClaw agents or similar automations, here’s the checklist I’d use:

Fix	What to do
Model triage	Keep Claude Opus 4.6 for hard reasoning. Use GPT-5.4 or cheaper logic for status checks, routing, and supervision.
Verifiable completion	End every important step with a testable success condition.
Anti-loop controls	Set max retries, cooldowns, duplicate detection, and dead-letter handling in code.
Durable state	Store decisions in Redis, Postgres, or OpenClaw memory features instead of bloating prompts.
Orchestration split	Let code manage workflow control flow; let models handle actual reasoning.

Why this matters more under per-token billing

This is the part people notice late.

Per-token pricing punishes exactly the kind of behavior serious automations need:

watchdog checks
retries
polling
long-running supervision
cross-tool coordination

In a chat app, one bad retry is annoying.

In OpenClaw, n8n, Make, Zapier, or a custom queue, one bad retry pattern can run every few minutes forever.

That’s why predictable pricing matters more as agents get more useful.

The more background calls your system needs, the worse token anxiety gets.

If you’re running agents continuously, a flat-cost API setup is often a better fit than metering every tiny supervision call. Standard Compute is interesting here because it keeps the OpenAI-compatible API shape developers already use, but swaps per-token pricing for a predictable monthly cost. That makes a lot more sense for always-on automations than staring at usage charts and hoping your watchdog logic behaves.

Final thought

The best part of that OpenClaw thread was that it didn’t pretend the answer was “just use a smarter model.”

It was the opposite.

Use Claude Opus 4.6 when the task deserves Claude Opus 4.6.
Use GPT-5.4 for lighter decisions.
Use Grok 4.20 when synthesis is the actual job.
And don’t ask premium models to babysit your infrastructure.

If a workflow can’t prove it finished, it will eventually loop.
If state only lives in prompts, it will eventually drift.
If retries are controlled by vibes, they will eventually get expensive.

That’s not just an OpenClaw lesson.

That’s the operating manual for any long-running AI automation.

If you’re building one right now, start by auditing every model call that happens when nothing interesting is happening.

That’s usually where the money is going.

Top comments (1)

Harpinder • May 22

this matches what i've been seeing too.

one thing i'd add: for inbox/calendar-style background work, the answer isn't always "use a cheaper model for cron". a cleaner split is often to move the trigger/filter out of the agent loop entirely, then wake the agent only when a matching event exists.

small founder disclosure: i'm building Watchline for this pattern, including a first-party OpenClaw plugin. start_watch registers the future event/filter, then a pull channel delivers only matched events back into the local OpenClaw session. that feels a lot closer to how these systems should run than asking the agent to keep checking the world every few minutes.