DEV Community

Cover image for I kept seeing the same OpenClaw mistake: one expensive model for every job
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

I kept seeing the same OpenClaw mistake: one expensive model for every job

I kept running into the same OpenClaw setup mistake over and over:

people pick one expensive model, wire it in as the default, and then let it handle everything.

Heartbeat checks.
Cron pings.
Inbox triage.
"Nothing changed" loops.
Low-stakes tagging.

That is not a clever agent architecture. That is just an expensive default.

While researching OpenClaw setups, I found a thread on r/openclaw where someone said the quiet part out loud:

“Stop using opus for everything. seriously. i was running it on heartbeat checks and cron pings which is just lighting money on fire. glm-5.1 handles all that stuff fine. i only use sonnet 4.6 now when the task actually needs reasoning and my token costs are like a third of what they were”

That is the right lesson.

Not just for OpenClaw.
For n8n, Make, Zapier, custom Python workers, and basically any agent setup that runs on a schedule.

If you are still using one premium model for every task, you do not have a model strategy. You have a billing strategy you forgot to review.

The actual takeaway: route by task, not by brand loyalty

A lot of developers still treat model selection like a global app setting.

Pick GPT-5.4.
Or Claude Opus 4.6.
Or Gemini 3.5 Flash.
Done.

That works for a demo.
It falls apart in production.

Real agent systems do different kinds of work:

  • cheap classification
  • extraction
  • tagging
  • summarization
  • retries
  • memory maintenance
  • occasional hard reasoning
  • occasional high-risk decisions

Those should not all hit the same model.

The boring jobs should be cheap.
The hard jobs should get the expensive model.
The dangerous jobs should get the model that passes your evals.

That is model routing.
And honestly, it is just basic engineering once your workflows run all day.

OpenClaw already nudges you toward this

One thing I like about OpenClaw is that the config shape already hints at the right mental model.

You can define a primary model and ordered fallbacks.
You can also split out image, PDF, and image generation models.

That is not accidental.
That is the product telling you different tasks deserve different models.

Example:

agents:
  defaults:
    model:
      primary: openai/gpt-5.4-mini
      fallbacks:
        - anthropic/claude-sonnet-4.6
        - google/gemini-3.5-flash
    imageModel: google/gemini-3.5-flash
    pdfModel: openai/gpt-5.4
    imageGenerationModel: openai/gpt-image-1
Enter fullscreen mode Exit fullscreen mode

Once you look at OpenClaw this way, the Reddit advice stops sounding like a hack.
It starts sounding like the intended operating model.

Why people keep wasting frontier models on tiny jobs

Because "agent work" sounds smarter than it usually is.

A heartbeat check feels sophisticated because an agent is doing it.
A cron-triggered inbox review feels important because it uses AI.
But a lot of recurring automation work is just:

  • classify this
  • summarize that
  • compare two notes
  • tag a ticket
  • decide whether anything changed
  • move on

That is exactly where smaller, cheaper models win.

One commenter in the same thread said it perfectly:

“No reason to burn opus tokens on a cron check that runs every 10 minutes.”

Yep.

If a task runs every 10 minutes, you are not choosing a model once.
You are choosing it 144 times per day.
Then you multiply that by every queue, retry loop, mailbox, and background task you forgot was still running.

The pricing spread is big enough that bad defaults compound fast

This is where the mistake stops being theoretical.

Here is the rough shape of the cost difference across common automation-friendly models:

Model What it means for automation work
GPT-5.4 $2.50 input / $15.00 output per 1M tokens; best kept for hard reasoning and high-value steps
GPT-5.4-mini $0.75 input / $4.50 output per 1M tokens; solid default for routine transforms and summaries
GPT-5.4-nano $0.20 input / $1.25 output per 1M tokens; strong candidate for heartbeat checks, classifiers, and cron work
Gemini 3.5 Flash $1.50 input / $9.00 output per 1M tokens; usable for recurring admin tasks and batch workflows

The exact numbers will change over time.
The important part is the spread.

If your default is a frontier model, every low-value task inherits premium pricing.
And retries make it worse.

OpenClaw memory makes this even more obvious

OpenClaw’s memory model is one of the more practical parts of the system.

Instead of pretending memory is some magical hidden state, it writes durable state to files like:

  • MEMORY.md
  • memory/YYYY-MM-DD.md
  • optional DREAMS.md

That means a lot of recurring "agentic" work is really just file maintenance plus lightweight judgment.

Things like:

  • checking whether today’s notes contain anything worth promoting
  • summarizing a session into MEMORY.md
  • tagging daily notes
  • triaging low-priority email
  • deciding whether something needs escalation
  • confirming a scheduled task completed normally

That does not automatically require Claude Opus 4.6 or GPT-5.4.

If the model is reading yesterday’s note, reading today’s note, and deciding whether to append one sentence, you probably want cheap and consistent, not frontier-tier reasoning.

Retries and fallbacks are where expensive defaults get really dumb

OpenClaw supports failover and fallback chains.
That is good.
You want that.

But fallback logic changes the economics.

If your default model is expensive, you do not just overpay once.
You overpay on:

  • the initial call
  • the retry
  • the fallback attempt
  • the loop you forgot to cap

That is why background jobs are dangerous.
They are easy to ignore, and they quietly multiply usage.

A cron task every 30 minutes does not feel expensive.
A cron task every 30 minutes for weeks, with retries, definitely is.

A sane routing policy for OpenClaw

This is the practical version.

Use smaller models for:

  • heartbeat checks
  • cron pings
  • simple classification
  • tagging
  • deduping
  • queue cleanup
  • low-risk summarization

Use mid-tier models for:

  • routine transforms
  • memory promotion drafts
  • support triage with some ambiguity
  • structured extraction with moderate complexity

Use premium models for:

  • hard reasoning
  • ambiguous multi-step tool use
  • sensitive customer-facing responses
  • compliance-sensitive decisions
  • destructive actions

Here is a simple default stack:

agents:
  defaults:
    model:
      primary: openai/gpt-5.4-nano
      fallbacks:
        - openai/gpt-5.4-mini
        - anthropic/claude-sonnet-4.6
        - openai/gpt-5.4
Enter fullscreen mode Exit fullscreen mode

Then override specific tasks that actually need the heavier model.

That setup is less pretty than "we use Claude for everything" or "we standardized on GPT-5.4."

It is also much more competent.

Don’t route by price alone

Cheap routing can absolutely backfire.

A task can look simple but still be failure-sensitive.

Examples:

  • approving refunds
  • sending customer-facing messages
  • deciding whether to escalate a compliance issue
  • triggering a destructive action in a tool chain

Those should not be assigned by vibes.

Two rules help a lot:

1. Route by consequence, not just complexity

A simple classifier can still be dangerous.
If the output controls money, customer trust, or irreversible actions, treat it as high-risk.

2. Route by evals, not marketing

A cheaper model that fails your real prompts is not cheaper.
It is just a slower way to ship bugs.

If Gemini 3.5 Flash, GLM-5.1, GPT-5.4-nano, or Claude Sonnet 4.6 passes your actual eval set for a task, great.
Use it.
If it fails, move up.

That is routing.
Not ideology.

Quick way to audit your current setup

If you already have OpenClaw running, here is a dead simple audit process.

1. List every recurring task

Make a table for:

  • task name
  • trigger frequency
  • current model
  • failure impact
  • average prompt size
  • average output size

2. Find the obviously overpriced jobs

Look for tasks that are:

  • frequent
  • repetitive
  • low-risk
  • easy to evaluate

Those are your first routing wins.

3. Create a cheap-first policy

Start with a small model for low-risk jobs.
Escalate only if evals say you need to.

4. Cap loops and retries

If a job can retry forever, your pricing model is already broken.

5. Measure before and after

Even a rough comparison is enough:

# pseudo-checklist
# before
# - model: claude-opus-4.6
# - task frequency: every 10 minutes
# - retries: 2
# - monthly usage: painful

# after
# - default: gpt-5.4-nano
# - escalate only on low confidence / failed eval cases
# - monthly usage: much less painful
Enter fullscreen mode Exit fullscreen mode

If you’re building agents at scale, per-token pricing becomes a workflow problem

This is the part people eventually learn the hard way.

Per-token billing is annoying enough in interactive chat apps.
In automations, it is worse.

Because the expensive calls are often not the flashy ones.
They are the boring background ones:

  • scheduled checks
  • agent loops
  • retries
  • nightly summaries
  • queue maintenance
  • memory updates

That is exactly why routing matters.
And it is also why a flat-cost API setup is appealing for teams running lots of recurring agent work.

If your agents are constantly working through n8n, Make, Zapier, OpenClaw, or custom queues, the real pain is not just token price.
It is the constant need to babysit usage.

That is the problem Standard Compute is aimed at.

It gives you an OpenAI-compatible API with flat monthly pricing, so you can keep the routing mindset without getting punished every time your automations actually run.

The useful combo is:

  • route small jobs to cheaper models
  • reserve bigger models for hard steps
  • stop treating every cron task like it deserves frontier pricing
  • stop watching token spend like a stress dashboard

More here if that sounds familiar: https://standardcompute.com

The real tell that someone is new to agents

They brag about which model they use.

People who have actually operated automations for weeks brag about which tasks they stopped wasting expensive models on.

That is why that OpenClaw thread stuck with me.
The useful lesson was not "GLM-5.1 is secretly amazing" or "Claude Sonnet 4.6 is enough."

It was the shift underneath:

Your agent is a workflow, not a shrine to your favorite model.

Once you see that, model routing stops looking like an optimization trick.
It starts looking like basic competence.

If a heartbeat check is hitting Claude Opus 4.6 every 10 minutes, that is not sophistication.
It is a leak.

And if your setup still uses one expensive model for everything, you probably do not need a better prompt first.

You need a routing policy.

Top comments (0)