<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lars Winstand</title>
    <description>The latest articles on DEV Community by Lars Winstand (@lars_winstand).</description>
    <link>https://dev.to/lars_winstand</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908932%2Feb8bc1ff-405f-4ef0-8204-ba1ed7caa59f.jpeg</url>
      <title>DEV Community: Lars Winstand</title>
      <link>https://dev.to/lars_winstand</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lars_winstand"/>
    <language>en</language>
    <item>
      <title>r/openclaw had 40 comments about “better alternatives” and the mods are only half right</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Fri, 22 May 2026 19:30:50 +0000</pubDate>
      <link>https://dev.to/lars_winstand/ropenclaw-had-40-comments-about-better-alternatives-and-the-mods-are-only-half-right-150m</link>
      <guid>https://dev.to/lars_winstand/ropenclaw-had-40-comments-about-better-alternatives-and-the-mods-are-only-half-right-150m</guid>
      <description>&lt;p&gt;I found a thread on r/openclaw with 14 upvotes and 40 comments asking a simple question: why are people not allowed to mention “better alternatives” to OpenClaw?&lt;/p&gt;

&lt;p&gt;At first glance, this looks like standard product-community drama.&lt;/p&gt;

&lt;p&gt;Read the comments, though, and it turns into something more useful for anyone building agents, automations, or LLM workflows.&lt;/p&gt;

&lt;p&gt;My take: the mods are right about spam. They’re wrong about trust.&lt;/p&gt;

&lt;p&gt;And the bigger lesson has almost nothing to do with subreddit rules.&lt;/p&gt;

&lt;p&gt;It’s about what happens when an agent stack is expensive, unstable, and hard to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem was probably spam, not competition
&lt;/h2&gt;

&lt;p&gt;The highest-signal comment in the thread said the subreddit had been flooded with Hermes spam for months.&lt;/p&gt;

&lt;p&gt;Another commenter said bots were posting low-value competitor mentions and derailing support threads.&lt;/p&gt;

&lt;p&gt;If you’ve ever moderated a technical community, that part is easy to believe.&lt;/p&gt;

&lt;p&gt;A support thread starts like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Why did /think stop working after the update?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And five replies later it becomes this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;just switch to Hermes
use Codex Desktop instead
OpenClaw is dead
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s not comparison. That’s thread hijacking.&lt;/p&gt;

&lt;p&gt;So yes, I get why mods would clamp down.&lt;/p&gt;

&lt;p&gt;If every OpenClaw bug report turns into a migration ad, the subreddit stops being useful for actual OpenClaw users.&lt;/p&gt;

&lt;h2&gt;
  
  
  But the user frustration is also real
&lt;/h2&gt;

&lt;p&gt;The anti-alternative rule would feel reasonable if OpenClaw were boring and reliable.&lt;/p&gt;

&lt;p&gt;That is not the vibe I got from nearby posts.&lt;/p&gt;

&lt;p&gt;Users were complaining about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;regressions between versions&lt;/li&gt;
&lt;li&gt;missing UI elements after updates&lt;/li&gt;
&lt;li&gt;behavior changes without much warning&lt;/li&gt;
&lt;li&gt;agent quality getting worse after upgrades&lt;/li&gt;
&lt;li&gt;surprisingly high API costs for mediocre output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one matters a lot.&lt;/p&gt;

&lt;p&gt;One user described a cron job summarizing email and spending around &lt;code&gt;$0.25&lt;/code&gt; on Claude 4.6 Sonnet to summarize 10 messages, with output they still thought was low quality.&lt;/p&gt;

&lt;p&gt;That’s the moment when “what’s a better alternative?” stops being tribalism and starts being architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden argument: people aren’t comparing apps, they’re comparing failure modes
&lt;/h2&gt;

&lt;p&gt;Most of these threads pretend the debate is:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Better or worse?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex Desktop&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That’s too shallow.&lt;/p&gt;

&lt;p&gt;What people are actually comparing is this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;What they really mean&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Is OpenClaw bad?&lt;/td&gt;
&lt;td&gt;Is the workflow unreliable?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Is Hermes better?&lt;/td&gt;
&lt;td&gt;Is it cheaper or less annoying?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Should I switch?&lt;/td&gt;
&lt;td&gt;Can I get acceptable output with fewer moving parts?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That’s why these discussions get heated. People say “tool choice,” but they mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model quality&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;API cost&lt;/li&gt;
&lt;li&gt;update stability&lt;/li&gt;
&lt;li&gt;how much babysitting the workflow needs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of “this agent framework sucks” is really “my model routing is bad and I’m paying too much for weak results.”&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw may not be the main bottleneck
&lt;/h2&gt;

&lt;p&gt;This was the most interesting part of the whole thing.&lt;/p&gt;

&lt;p&gt;One commenter basically said OpenClaw itself has little to do with the reasoning quality.&lt;/p&gt;

&lt;p&gt;I think that’s mostly correct.&lt;/p&gt;

&lt;p&gt;For many agent workflows, the real bottlenecks are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the model you picked&lt;/li&gt;
&lt;li&gt;the latency budget you can tolerate&lt;/li&gt;
&lt;li&gt;whether the task should be an agent at all&lt;/li&gt;
&lt;li&gt;whether your API bill makes the whole thing stupid&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s a practical way to think about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before blaming the framework, test the workflow shape
&lt;/h2&gt;

&lt;p&gt;If you’re building something like email triage, lead enrichment, or a Telegram assistant, don’t start with “which agent framework wins?”&lt;/p&gt;

&lt;p&gt;Start with this checklist.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Can this be a deterministic workflow?
&lt;/h3&gt;

&lt;p&gt;A lot of “agent” tasks should really be a pipeline.&lt;/p&gt;

&lt;p&gt;For example, this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fetch unread emails -&amp;gt; summarize -&amp;gt; classify -&amp;gt; send digest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;is often better as n8n or Make than a freeform autonomous loop.&lt;/p&gt;

&lt;p&gt;Example pseudo-flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cron -&amp;gt; fetch emails -&amp;gt; batch messages -&amp;gt; summarize -&amp;gt; store result -&amp;gt; notify Slack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the task has a fixed sequence, use a fixed sequence.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Is the model too expensive for the job?
&lt;/h3&gt;

&lt;p&gt;If you’re spending premium-model money on low-value summarization, you may not have a framework problem.&lt;/p&gt;

&lt;p&gt;You may have a routing problem.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bad routing:
- Claude Opus / Sonnet for every summary
- GPT-5 for every classification
- no batching

Better routing:
- cheaper model for triage
- stronger model only for ambiguous items
- batch related prompts together
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is exactly why flat-rate compute is becoming more attractive for automation teams. Once you have cron jobs, background agents, retries, and multi-step workflows, per-token pricing starts punishing experimentation.&lt;/p&gt;

&lt;p&gt;That’s the part a lot of these subreddit fights miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  The alternatives are not obviously better either
&lt;/h2&gt;

&lt;p&gt;This is where the “just switch” crowd loses me.&lt;/p&gt;

&lt;p&gt;Hermes gets recommended constantly, but enough people complained about spammy promotion that it triggered a moderation rule.&lt;/p&gt;

&lt;p&gt;Codex Desktop gets mentioned as a simpler option, especially for coding-heavy tasks, but it’s narrower than a general-purpose agent stack.&lt;/p&gt;

&lt;p&gt;Some users say goclaw feels lighter than OpenClaw. Fair. Lighter is good.&lt;/p&gt;

&lt;p&gt;But “lighter” is not the same as “better for production automation.”&lt;/p&gt;

&lt;p&gt;Here’s the more honest comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it seems best at&lt;/th&gt;
&lt;th&gt;Main tradeoff&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Broad agent workflows and ambitious setups&lt;/td&gt;
&lt;td&gt;Users report regressions, complexity, and API cost pain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes&lt;/td&gt;
&lt;td&gt;Frequently recommended as an alternative&lt;/td&gt;
&lt;td&gt;Reputation gets hurt by spammy promotion and mixed results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex Desktop&lt;/td&gt;
&lt;td&gt;Simpler coding-focused workflows&lt;/td&gt;
&lt;td&gt;Narrower scope than a general agent orchestration stack&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There is no magic winner here.&lt;/p&gt;

&lt;p&gt;A bad model choice can make every one of these look dumb.&lt;/p&gt;

&lt;h2&gt;
  
  
  The best comment nobody quite made: reliability beats ambition
&lt;/h2&gt;

&lt;p&gt;One nearby post described a “perfect agent system” as a Telegram butler named &lt;code&gt;Alfred&lt;/code&gt; coordinating specialist agents.&lt;/p&gt;

&lt;p&gt;Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Alfred
├── coder_agent
├── email_agent
└── notion_agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds great.&lt;/p&gt;

&lt;p&gt;It probably demos great too.&lt;/p&gt;

&lt;p&gt;But if it breaks every other update, the architecture stops mattering.&lt;/p&gt;

&lt;p&gt;This is the thing agent builders need to hear more often:&lt;/p&gt;

&lt;p&gt;The killer feature is not multi-agent orchestration.&lt;/p&gt;

&lt;p&gt;The killer feature is reliability on a random Tuesday.&lt;/p&gt;

&lt;p&gt;If your workflow survives version bumps, handles retries, stays within budget, and produces consistent output, people will forgive a lot.&lt;/p&gt;

&lt;p&gt;If it doesn’t, they start shopping.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the mods should probably do instead
&lt;/h2&gt;

&lt;p&gt;Blanket bans on mentioning alternatives are too blunt.&lt;/p&gt;

&lt;p&gt;They solve the moderation problem by creating a credibility problem.&lt;/p&gt;

&lt;p&gt;A better rule set would look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;no drive-by “use Hermes” replies&lt;/li&gt;
&lt;li&gt;no bot posting or affiliate-style promotion&lt;/li&gt;
&lt;li&gt;alternatives allowed when directly relevant to debugging, architecture, or cost&lt;/li&gt;
&lt;li&gt;side-by-side comparisons go in dedicated threads&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That keeps support threads usable without pretending OpenClaw exists in a vacuum.&lt;/p&gt;

&lt;p&gt;Because it doesn’t.&lt;/p&gt;

&lt;p&gt;Anyone building real automations is already comparing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw vs Hermes&lt;/li&gt;
&lt;li&gt;agent vs workflow engine&lt;/li&gt;
&lt;li&gt;GPT-5 vs Claude Opus vs cheaper models&lt;/li&gt;
&lt;li&gt;per-token APIs vs flat-rate compute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That comparison is not disloyalty. It’s engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical takeaway for developers building agents
&lt;/h2&gt;

&lt;p&gt;If your team is evaluating agent stacks, don’t ask only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Which framework is best?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Ask this instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What is the cheapest reliable architecture that gets this job done?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That usually means testing four things separately:&lt;/p&gt;

&lt;h3&gt;
  
  
  Framework
&lt;/h3&gt;

&lt;p&gt;Can OpenClaw, Hermes, or Codex Desktop actually execute the workflow cleanly?&lt;/p&gt;

&lt;h3&gt;
  
  
  Model
&lt;/h3&gt;

&lt;p&gt;Does this task really need a top-tier model every time?&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost
&lt;/h3&gt;

&lt;p&gt;Will this still make sense when it runs 24/7?&lt;/p&gt;

&lt;h3&gt;
  
  
  Operations
&lt;/h3&gt;

&lt;p&gt;What happens after updates, retries, rate limits, and bad outputs?&lt;/p&gt;

&lt;p&gt;A quick evaluation matrix helps.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What to test&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Workflow shape&lt;/td&gt;
&lt;td&gt;deterministic pipeline vs autonomous agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model choice&lt;/td&gt;
&lt;td&gt;premium model vs cheaper router path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost profile&lt;/td&gt;
&lt;td&gt;per-run cost, retry cost, monthly ceiling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stability&lt;/td&gt;
&lt;td&gt;update regressions, latency spikes, failure recovery&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you skip any of those, you can easily blame the wrong thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Standard Compute fits into this
&lt;/h2&gt;

&lt;p&gt;The reason this OpenClaw thread matters is that it exposes a pattern I keep seeing across agent communities:&lt;/p&gt;

&lt;p&gt;people think they are arguing about tools, but they are actually arguing about compute economics.&lt;/p&gt;

&lt;p&gt;If your automation stack is built on per-token billing, every bad retry, long context window, and overpowered model choice becomes a tax on experimentation.&lt;/p&gt;

&lt;p&gt;That’s brutal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n agents&lt;/li&gt;
&lt;li&gt;Make automations&lt;/li&gt;
&lt;li&gt;Zapier AI steps&lt;/li&gt;
&lt;li&gt;OpenClaw workflows&lt;/li&gt;
&lt;li&gt;custom cron-driven agent systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Standard Compute is interesting because it attacks that specific pain point.&lt;/p&gt;

&lt;p&gt;It gives you an OpenAI-compatible API with flat monthly pricing instead of per-token billing, plus routing across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20.&lt;/p&gt;

&lt;p&gt;So if your real problem is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my agent works, but every test run feels like I'm lighting money on fire
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;that’s a different class of fix than switching from OpenClaw to Hermes.&lt;/p&gt;

&lt;p&gt;You can keep your existing SDKs and clients and swap the economics underneath.&lt;/p&gt;

&lt;p&gt;That matters more than people admit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;The mods are probably right that r/openclaw needed spam control.&lt;/p&gt;

&lt;p&gt;They’re wrong if they think banning mention of alternatives restores confidence.&lt;/p&gt;

&lt;p&gt;Confidence comes from stable releases, reliable workflows, sane costs, and honest comparisons.&lt;/p&gt;

&lt;p&gt;Once users are paying too much for brittle automations, the moderation fight is already downstream of the real issue.&lt;/p&gt;

&lt;p&gt;By that point, nobody is asking what they’re allowed to say.&lt;/p&gt;

&lt;p&gt;They’re asking what still works.&lt;/p&gt;

&lt;p&gt;And usually, the answer depends less on subreddit rules than on architecture, model routing, and whether your compute pricing punishes real-world automation.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
      <category>api</category>
    </item>
    <item>
      <title>I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Fri, 22 May 2026 11:31:04 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-keep-seeing-people-build-an-ai-lead-processing-agent-when-they-really-need-a-6-step-rules-engine-p2f</link>
      <guid>https://dev.to/lars_winstand/i-keep-seeing-people-build-an-ai-lead-processing-agent-when-they-really-need-a-6-step-rules-engine-p2f</guid>
      <description>&lt;p&gt;I knew this was worth writing when I saw a Reddit thread describing an “AI lead processing agent” for underwriting.&lt;/p&gt;

&lt;p&gt;The job sounded fancy until you translated it into actual steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Watch an inbox&lt;/li&gt;
&lt;li&gt;Extract business name + monthly deposits&lt;/li&gt;
&lt;li&gt;Check Salesforce, HubSpot, or a custom CRM/CMR&lt;/li&gt;
&lt;li&gt;See whether the lead already exists&lt;/li&gt;
&lt;li&gt;Route to new banks if needed&lt;/li&gt;
&lt;li&gt;Assign a rep only if deposits are over $30,000&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is not an agent problem.&lt;/p&gt;

&lt;p&gt;That is workflow logic with one messy-input step.&lt;/p&gt;

&lt;p&gt;And a commenter in r/openclaw said the quiet part out loud:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Don't use AI for deterministic processing. You can write a simple script for this and it will be much more reliable and cheaper.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I think that’s exactly right.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mistake: using an LLM as a decision engine
&lt;/h2&gt;

&lt;p&gt;A lot of teams are building “AI lead gen automation” that should really be split into two pieces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fuzzy extraction&lt;/li&gt;
&lt;li&gt;deterministic state transitions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not the same thing.&lt;/p&gt;

&lt;p&gt;If the input is ugly — forwarded email chains, scanned PDFs, weird broker notes, inconsistent merchant statements — then yes, use Claude, GPT-5, or Qwen to extract fields.&lt;/p&gt;

&lt;p&gt;But once you have the fields, stop asking the model to make business decisions that can be expressed as code.&lt;/p&gt;

&lt;p&gt;Bad pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Figure out whether this is a duplicate”&lt;/li&gt;
&lt;li&gt;“Decide whether to assign a rep”&lt;/li&gt;
&lt;li&gt;“Determine which bank should receive this”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Better pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model extracts &lt;code&gt;business_name&lt;/code&gt;, &lt;code&gt;monthly_deposits&lt;/code&gt;, &lt;code&gt;contact_email&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;code checks CRM state&lt;/li&gt;
&lt;li&gt;code applies explicit rules&lt;/li&gt;
&lt;li&gt;code writes the result atomically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That split matters a lot in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture I’d actually ship
&lt;/h2&gt;

&lt;p&gt;If I were building underwriting intake or lead routing, I’d use this shape:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trigger on inbound email/webhook&lt;/li&gt;
&lt;li&gt;Parse sender/subject/attachments deterministically&lt;/li&gt;
&lt;li&gt;Send only messy text to an LLM for strict extraction&lt;/li&gt;
&lt;li&gt;Normalize extracted values&lt;/li&gt;
&lt;li&gt;Check CRM using normalized identifiers&lt;/li&gt;
&lt;li&gt;In one locked step, decide duplicate/new/assignment&lt;/li&gt;
&lt;li&gt;Only after the write succeeds, trigger downstream actions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That gives you a small LLM boundary and a deterministic core.&lt;/p&gt;

&lt;h2&gt;
  
  
  Put the LLM in a tiny box
&lt;/h2&gt;

&lt;p&gt;The safest contract is something boring like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"business_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Blue Lantern LLC"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"monthly_deposits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;35000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"contact_email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ops@bluelantern.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"requested_amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a good use of GPT-5, Claude Sonnet, or Qwen.&lt;/p&gt;

&lt;p&gt;What I would not do is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read the email, decide if the lead is a duplicate, determine whether it qualifies for rep assignment, and choose which bank should receive it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That prompt looks convenient right up until you need consistency, auditability, and duplicate prevention.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually breaks first in production
&lt;/h2&gt;

&lt;p&gt;Not the prompt.&lt;/p&gt;

&lt;p&gt;Concurrency.&lt;/p&gt;

&lt;p&gt;This is where agent demos usually lie to you. They work great with one email.&lt;/p&gt;

&lt;p&gt;Then two brokers forward the same merchant 20 seconds apart.&lt;/p&gt;

&lt;p&gt;Now both workers do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;query CRM&lt;/li&gt;
&lt;li&gt;see no assigned record yet&lt;/li&gt;
&lt;li&gt;decide the lead is new&lt;/li&gt;
&lt;li&gt;route it&lt;/li&gt;
&lt;li&gt;create duplicate work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not an AI failure. That’s a race condition.&lt;/p&gt;

&lt;p&gt;And no amount of “reasoning” fixes a missing transaction boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rule that matters more than your prompt
&lt;/h2&gt;

&lt;p&gt;If your flow includes duplicate checks, threshold-based assignment, or bank routing, the critical part is the write path.&lt;/p&gt;

&lt;p&gt;This logic should be explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_lead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crm_record_exists&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;new_banks_available&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monthly_deposits&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;crm_record_exists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;new_banks_available&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route_to_new_banks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mark_duplicate_internal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;monthly_deposits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assign_rep_and_send_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mark_low_revenue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the final decision should happen inside one transaction or locked operation.&lt;/p&gt;

&lt;p&gt;For example, in PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;leads&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;normalized_business_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- if exists, update state&lt;/span&gt;
&lt;span class="c1"&gt;-- if not, insert new row with unique constraint protection&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with an upsert:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;leads&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized_business_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contact_email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monthly_deposits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normalized_business_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;RETURNING&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the part worth obsessing over.&lt;/p&gt;

&lt;p&gt;Not whether your agent sounds confident while making inconsistent choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical hybrid design
&lt;/h2&gt;

&lt;p&gt;This is the version I’d recommend to most teams using n8n, Make, Zapier, OpenClaw, or custom Python/TypeScript workers.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it should do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trigger/orchestration&lt;/td&gt;
&lt;td&gt;Watch inboxes, webhooks, retries, notifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM step&lt;/td&gt;
&lt;td&gt;Extract fields from messy email/PDF text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Normalization step&lt;/td&gt;
&lt;td&gt;Clean business names, parse currency, standardize email/domain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rules engine&lt;/td&gt;
&lt;td&gt;Apply deposit thresholds, duplicate policy, assignment logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transaction-safe write&lt;/td&gt;
&lt;td&gt;Insert/update CRM state atomically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Downstream actions&lt;/td&gt;
&lt;td&gt;Send docs, notify reps, route to banks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is what “AI where it helps, code where it matters” actually looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: n8n + Python worker
&lt;/h2&gt;

&lt;p&gt;If I wanted to move fast, I’d use n8n for orchestration and a small Python service for the transaction-sensitive part.&lt;/p&gt;

&lt;h3&gt;
  
  
  n8n flow
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;IMAP Email Trigger or webhook&lt;/li&gt;
&lt;li&gt;extract attachments/text&lt;/li&gt;
&lt;li&gt;LLM node for structured extraction&lt;/li&gt;
&lt;li&gt;HTTP request to internal worker&lt;/li&gt;
&lt;li&gt;Slack/email notification after successful write&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Python worker sketch
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LeadPayload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;business_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;monthly_deposits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;contact_email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;requested_amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/process-lead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_lead&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LeadPayload&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;normalized_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;business_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# pseudo-code for transaction-safe logic
&lt;/span&gt;    &lt;span class="c1"&gt;# begin transaction
&lt;/span&gt;    &lt;span class="c1"&gt;# lock matching lead row or rely on unique constraint
&lt;/span&gt;    &lt;span class="c1"&gt;# check duplicate/new bank state
&lt;/span&gt;    &lt;span class="c1"&gt;# apply 30k threshold
&lt;/span&gt;    &lt;span class="c1"&gt;# write final status
&lt;/span&gt;    &lt;span class="c1"&gt;# commit transaction
&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;monthly_deposits&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assign_rep_and_send_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mark_low_revenue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a workflow people can reason about.&lt;/p&gt;

&lt;p&gt;It also makes debugging possible when something goes wrong at 9:12 a.m. on a Monday.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where AI does help
&lt;/h2&gt;

&lt;p&gt;I’m not arguing against LLMs here.&lt;/p&gt;

&lt;p&gt;I’m arguing against giving them the wrong job.&lt;/p&gt;

&lt;p&gt;Good uses in this flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extracting fields from ugly broker emails&lt;/li&gt;
&lt;li&gt;parsing scanned PDFs or OCR output&lt;/li&gt;
&lt;li&gt;summarizing long email threads for a rep&lt;/li&gt;
&lt;li&gt;drafting a reply asking for missing docs&lt;/li&gt;
&lt;li&gt;flagging low-confidence extractions for human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad uses in this flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;duplicate detection when the criteria are known&lt;/li&gt;
&lt;li&gt;rep assignment when the threshold is explicit&lt;/li&gt;
&lt;li&gt;bank routing when policy is fixed&lt;/li&gt;
&lt;li&gt;deciding whether CRM state “probably means” something&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The moment a rule can be written down, it should stop being an LLM decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost problem gets ugly fast
&lt;/h2&gt;

&lt;p&gt;There’s another reason to avoid agent-first design: cost creep.&lt;/p&gt;

&lt;p&gt;I saw another Reddit comment from someone using OpenClaw who said summarizing the last 10 emails with Claude 4.6 Sonnet cost about $0.25.&lt;/p&gt;

&lt;p&gt;That sounds tiny.&lt;/p&gt;

&lt;p&gt;Until your “agent” is doing that kind of work all day across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inbox triage&lt;/li&gt;
&lt;li&gt;CRM re-checks&lt;/li&gt;
&lt;li&gt;duplicate review&lt;/li&gt;
&lt;li&gt;status summaries&lt;/li&gt;
&lt;li&gt;follow-up drafts&lt;/li&gt;
&lt;li&gt;lead routing decisions that should have been simple SQL or code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s how teams end up saying their agent stack burns tokens faster than expected.&lt;/p&gt;

&lt;p&gt;The model is doing office work your rules engine should be doing for free.&lt;/p&gt;

&lt;p&gt;This is exactly why predictable pricing matters if you’re running automations 24/7. If your workflows call models constantly, per-token billing turns every design mistake into a monthly surprise. Standard Compute is interesting here because it gives you an OpenAI-compatible API with flat monthly pricing, so you can afford to use models for the messy extraction layer without constantly watching token spend. That doesn’t mean you should waste LLM calls on deterministic routing. It means you can use AI where it actually helps and keep the rest of the pipeline boring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent-first vs automation-first
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What usually happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Automation-first&lt;/td&gt;
&lt;td&gt;Deterministic branching, explicit thresholds, atomic writes, easier debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent-first&lt;/td&gt;
&lt;td&gt;More token usage, inconsistent decisions, harder audits, race-condition blind spots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid&lt;/td&gt;
&lt;td&gt;LLM for extraction/summaries, code for rules and state transitions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you remember one thing, make it this:&lt;/p&gt;

&lt;p&gt;Using an LLM for extraction is not the same as handing control to an agent.&lt;/p&gt;

&lt;p&gt;Those are completely different design choices.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete test
&lt;/h2&gt;

&lt;p&gt;Before you add an AI agent to a workflow, ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What part of this flow is genuinely ambiguous?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the answer is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“the email is messy”&lt;/li&gt;
&lt;li&gt;“the PDF format is inconsistent”&lt;/li&gt;
&lt;li&gt;“the broker note is hard to parse”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use Claude, GPT-5, Grok, or Qwen for extraction.&lt;/p&gt;

&lt;p&gt;If the answer is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“check the CRM”&lt;/li&gt;
&lt;li&gt;“apply the $30k rule”&lt;/li&gt;
&lt;li&gt;“avoid duplicates”&lt;/li&gt;
&lt;li&gt;“assign the right rep”&lt;/li&gt;
&lt;li&gt;“route to the right bank”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You do not need autonomy.&lt;/p&gt;

&lt;p&gt;You need explicit logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  My opinionated version
&lt;/h2&gt;

&lt;p&gt;Most underwriting intake automations are not agent problems.&lt;/p&gt;

&lt;p&gt;They are data integrity problems wearing an AI costume.&lt;/p&gt;

&lt;p&gt;The messy-input layer is where AI earns its keep.&lt;/p&gt;

&lt;p&gt;The state-transition layer is where software engineering still wins.&lt;/p&gt;

&lt;p&gt;So if you’re building this in n8n, Make, Zapier, OpenClaw, or custom code, keep the model on a short leash:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extract&lt;/li&gt;
&lt;li&gt;classify uncertainty&lt;/li&gt;
&lt;li&gt;draft summaries&lt;/li&gt;
&lt;li&gt;stop there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then let your rules engine do the real work.&lt;/p&gt;

&lt;p&gt;That may be less exciting than saying you built an autonomous underwriting agent.&lt;/p&gt;

&lt;p&gt;It also sounds a lot more like something I’d trust with real leads.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I finally get why every serious browser agent demo looks a little cursed</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 21 May 2026 19:32:24 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-finally-get-why-every-serious-browser-agent-demo-looks-a-little-cursed-4eko</link>
      <guid>https://dev.to/lars_winstand/i-finally-get-why-every-serious-browser-agent-demo-looks-a-little-cursed-4eko</guid>
      <description>&lt;p&gt;A browser agent is suddenly useful for one very specific reason: it can work where no usable API exists.&lt;/p&gt;

&lt;p&gt;That sounds obvious, but it took me a while to stop evaluating browser agents like bad API clients.&lt;/p&gt;

&lt;p&gt;They are not competing with Stripe, HubSpot, or Salesforce APIs.&lt;br&gt;
They are competing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vendor dashboards with no export endpoint&lt;/li&gt;
&lt;li&gt;internal tools built in 2017&lt;/li&gt;
&lt;li&gt;partner portals that hate automation&lt;/li&gt;
&lt;li&gt;Android apps used by ops teams&lt;/li&gt;
&lt;li&gt;admin panels where the UI is the only interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That shift matters.&lt;/p&gt;

&lt;p&gt;A few weeks ago I was digging through browser agent workflows and found a thread on r/openclaw from someone trying to pull social media analytics for 15+ client accounts across Instagram, TikTok, YouTube, and LinkedIn into a spreadsheet.&lt;/p&gt;

&lt;p&gt;That post explained the market better than most landing pages do.&lt;/p&gt;

&lt;p&gt;Because if you build automations for real businesses, you eventually run into the same wall:&lt;/p&gt;

&lt;p&gt;the work that matters is often trapped in a screen, not exposed through a clean REST API.&lt;/p&gt;

&lt;p&gt;And once you accept that, the interesting question stops being:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;are browser agents better than APIs?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They are not.&lt;/p&gt;

&lt;p&gt;The real question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;when is browser automation worth the pain?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  APIs still win
&lt;/h2&gt;

&lt;p&gt;Let me say the unfashionable thing first.&lt;/p&gt;

&lt;p&gt;If your workflow can be done with a direct API integration, use the API.&lt;/p&gt;

&lt;p&gt;Every time.&lt;/p&gt;

&lt;p&gt;If you're moving data between Zendesk and HubSpot, syncing Stripe invoices into NetSuite, or pulling Salesforce leads into a warehouse, API-first automation is still the adult choice.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;structured data&lt;/li&gt;
&lt;li&gt;explicit auth&lt;/li&gt;
&lt;li&gt;better logs&lt;/li&gt;
&lt;li&gt;easier testing&lt;/li&gt;
&lt;li&gt;fewer weird failures&lt;/li&gt;
&lt;li&gt;less latency&lt;/li&gt;
&lt;li&gt;less ambiguity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A browser agent does not improve any of that.&lt;br&gt;
It adds more moving parts.&lt;/p&gt;

&lt;p&gt;If a button moves, a modal appears, the session expires, or the site decides your cloud IP looks suspicious, your flow gets weird fast.&lt;/p&gt;

&lt;p&gt;So no, this is not an "APIs are dead" post.&lt;/p&gt;

&lt;p&gt;It's the opposite.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why browser agents suddenly matter anyway
&lt;/h2&gt;

&lt;p&gt;For years, GUI automation had a demo problem.&lt;/p&gt;

&lt;p&gt;You'd see a slick video of an agent ordering groceries or filling out a form, and the only question that mattered was:&lt;/p&gt;

&lt;p&gt;"Cool, but does it still work on Tuesday?"&lt;/p&gt;

&lt;p&gt;Now we at least have benchmark numbers instead of vibes.&lt;/p&gt;

&lt;p&gt;OpenAI's Computer-Using Agent has published results like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;38.1% on OSWorld&lt;/li&gt;
&lt;li&gt;58.1% on WebArena&lt;/li&gt;
&lt;li&gt;87% on WebVoyager&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those numbers are not impressive if you expect deterministic software.&lt;/p&gt;

&lt;p&gt;They are impressive if you understand what they mean:&lt;/p&gt;

&lt;p&gt;browser agents crossed the line from party trick to plausible under supervision.&lt;/p&gt;

&lt;p&gt;That's the whole story.&lt;/p&gt;

&lt;p&gt;Not autonomous back office.&lt;br&gt;
Not "replace your ops team."&lt;br&gt;
But definitely "this can probably handle repetitive dashboard work if you wrap it in retries, checkpoints, and approval gates."&lt;/p&gt;

&lt;p&gt;That is a much bigger market than people expected.&lt;/p&gt;
&lt;h2&gt;
  
  
  The hard part was never just clicking
&lt;/h2&gt;

&lt;p&gt;The hard part of browser automation was never getting a model to click a button.&lt;/p&gt;

&lt;p&gt;The hard part was everything around it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can you run it repeatedly?&lt;/li&gt;
&lt;li&gt;Can you inspect what happened?&lt;/li&gt;
&lt;li&gt;Can you retry failed steps?&lt;/li&gt;
&lt;li&gt;Can you keep session state?&lt;/li&gt;
&lt;li&gt;Can you scale it beyond one laptop and one tab?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's why Browser Use is more interesting than it first looks.&lt;/p&gt;

&lt;p&gt;It isn't just "LLM clicks browser."&lt;br&gt;
It gives you an actual developer surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open-source library&lt;/li&gt;
&lt;li&gt;hosted cloud browsers&lt;/li&gt;
&lt;li&gt;Python API&lt;/li&gt;
&lt;li&gt;CLI&lt;/li&gt;
&lt;li&gt;benchmarking on real browser tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the entry point is pretty lightweight.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;browser_use&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatBrowserUse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find the number of stars of the browser-use repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatBrowserUse&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setup is also straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv init
uv add browser-use
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uvx browser-use &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's a very different world from the old stack of Selenium + Playwright + OCR + screenshots + prayer.&lt;/p&gt;

&lt;p&gt;Still, the real question isn't whether you can get a demo working.&lt;/p&gt;

&lt;p&gt;It's whether the workflow deserves this level of complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  My rule: use a browser agent only when the interface is the integration
&lt;/h2&gt;

&lt;p&gt;This is the rule I keep coming back to:&lt;/p&gt;

&lt;p&gt;Use a browser agent only when the interface is the integration.&lt;/p&gt;

&lt;p&gt;Teams ignore this all the time.&lt;/p&gt;

&lt;p&gt;They reach for an agent because it feels modern, when what they actually need is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one webhook&lt;/li&gt;
&lt;li&gt;one cron job&lt;/li&gt;
&lt;li&gt;one decent API client&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Browser automation becomes worth it when all three are true:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The work is trapped in a UI.&lt;/li&gt;
&lt;li&gt;The task is repetitive enough to justify retries and supervision.&lt;/li&gt;
&lt;li&gt;The business value is high enough that brittle automation beats manual labor.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The social analytics example is perfect.&lt;/p&gt;

&lt;p&gt;Pulling metrics across Instagram, TikTok, YouTube, and LinkedIn for 15+ client accounts sounds simple until you try to operationalize it.&lt;/p&gt;

&lt;p&gt;Then you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;different permissions&lt;/li&gt;
&lt;li&gt;different export formats&lt;/li&gt;
&lt;li&gt;changing layouts&lt;/li&gt;
&lt;li&gt;inconsistent dashboards&lt;/li&gt;
&lt;li&gt;random login prompts&lt;/li&gt;
&lt;li&gt;occasional rate limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not a clean API integration problem.&lt;/p&gt;

&lt;p&gt;That is browser-agent territory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical split: API vs browser agent vs app-surface agent
&lt;/h2&gt;

&lt;p&gt;Here is the version that actually holds up in production.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Where it wins&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direct API integration&lt;/td&gt;
&lt;td&gt;Best for stable structured systems like CRM, ERP, billing, and helpdesk APIs. Highest reliability, lowest ambiguity, easiest to test.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser agent&lt;/td&gt;
&lt;td&gt;Best for web dashboards, partner portals, and brittle internal tools with no useful API. Flexible, but needs retries, supervision, and state management.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;App-surface agent&lt;/td&gt;
&lt;td&gt;Best when the work lives in native desktop or mobile apps. Highest flexibility and highest fragility.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last category matters more than people admit.&lt;/p&gt;

&lt;p&gt;A lot of real operations work does not happen in a browser.&lt;br&gt;
It happens in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Android apps in warehouses&lt;/li&gt;
&lt;li&gt;field-service apps used by contractors&lt;/li&gt;
&lt;li&gt;legacy Windows apps in VDI sessions&lt;/li&gt;
&lt;li&gt;internal tools nobody wants to rebuild&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly why computer-use models are getting attention.&lt;/p&gt;

&lt;p&gt;They are not replacing clean integrations.&lt;br&gt;
They are reaching work developers were previously locked out of.&lt;/p&gt;
&lt;h2&gt;
  
  
  The part people skip: this gets operationally expensive fast
&lt;/h2&gt;

&lt;p&gt;This is where the hype usually gets dishonest.&lt;/p&gt;

&lt;p&gt;Browser agents unlock trapped work.&lt;br&gt;
They also create a lot more operational drag than API-only flows.&lt;/p&gt;

&lt;p&gt;You get more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;state handling&lt;/li&gt;
&lt;li&gt;session issues&lt;/li&gt;
&lt;li&gt;screenshots&lt;/li&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;failure modes&lt;/li&gt;
&lt;li&gt;anti-bot weirdness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're using something like OpenClaw or building your own orchestration, the architecture starts to look a lot less magical and a lot more like normal distributed systems work.&lt;/p&gt;

&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scheduling&lt;/li&gt;
&lt;li&gt;durable task records&lt;/li&gt;
&lt;li&gt;resumable workflows&lt;/li&gt;
&lt;li&gt;checkpoints&lt;/li&gt;
&lt;li&gt;human approval for risky actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A tiny scheduled task can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw cron add &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"Reminder"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--at&lt;/span&gt; &lt;span class="s2"&gt;"2026-02-01T16:00:00Z"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--session&lt;/span&gt; main &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--system-event&lt;/span&gt; &lt;span class="s2"&gt;"Reminder: check the cron docs draft"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wake&lt;/span&gt; now &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--delete-after-run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command is boring.&lt;/p&gt;

&lt;p&gt;That's the point.&lt;/p&gt;

&lt;p&gt;Serious agent workflows need boring infrastructure around the weird part.&lt;/p&gt;

&lt;p&gt;The pattern I trust looks more like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;deterministic scheduler&lt;/li&gt;
&lt;li&gt;durable task record&lt;/li&gt;
&lt;li&gt;browser or app-surface step for the ugly part&lt;/li&gt;
&lt;li&gt;screenshot or structured checkpoint&lt;/li&gt;
&lt;li&gt;human approval if money, compliance, or customer output is involved&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is much less cinematic than the demos.&lt;/p&gt;

&lt;p&gt;It is also how these systems survive contact with production.&lt;/p&gt;

&lt;h2&gt;
  
  
  A minimal Python pattern for supervised browser work
&lt;/h2&gt;

&lt;p&gt;If I were wiring this into a real automation stack, I'd think in terms of checkpoints instead of full autonomy.&lt;/p&gt;

&lt;p&gt;Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;browser_use&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatBrowserUse&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatBrowserUse&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Pseudocode: persist result, screenshot, and next action
&lt;/span&gt;    &lt;span class="c1"&gt;# save_task_run(result)
&lt;/span&gt;    &lt;span class="c1"&gt;# request_human_approval_if_needed(result)
&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Log into LinkedIn campaign manager and collect spend + impressions for the last 7 days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;run_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important thing is not the call to &lt;code&gt;agent.run()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The important thing is everything around it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where state is stored&lt;/li&gt;
&lt;li&gt;how retries work&lt;/li&gt;
&lt;li&gt;how approvals happen&lt;/li&gt;
&lt;li&gt;how you detect drift&lt;/li&gt;
&lt;li&gt;how you recover from expired sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where production browser automation lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost is now part of the architecture
&lt;/h2&gt;

&lt;p&gt;There is one more thing people avoid talking about.&lt;/p&gt;

&lt;p&gt;These workflows can burn a lot of model calls.&lt;/p&gt;

&lt;p&gt;If you're running browser agents across dashboards all day, or chaining them inside n8n, Make, Zapier, OpenClaw, or custom workers, per-token pricing gets annoying fast.&lt;/p&gt;

&lt;p&gt;Not because one run is expensive.&lt;br&gt;
Because the architecture itself creates lots of small, repeated, hard-to-predict calls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;page re-reads&lt;/li&gt;
&lt;li&gt;intermediate reasoning steps&lt;/li&gt;
&lt;li&gt;checkpoint summaries&lt;/li&gt;
&lt;li&gt;extraction passes&lt;/li&gt;
&lt;li&gt;fallback runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's exactly the kind of workload where teams start caring less about model purity and more about predictable spend.&lt;/p&gt;

&lt;p&gt;If you're building agents that run constantly, flat-rate compute is a lot easier to operationalize than watching token usage spike because one dashboard changed and your workflow started retrying five times.&lt;/p&gt;

&lt;p&gt;That's the appeal of something like Standard Compute.&lt;/p&gt;

&lt;p&gt;It gives you an OpenAI-compatible API, but with unlimited compute on a flat monthly plan instead of per-token billing. For agentic workflows, especially the messy ones, that changes the math.&lt;/p&gt;

&lt;p&gt;You can keep the orchestration you already have and stop treating every retry like a tiny finance event.&lt;/p&gt;

&lt;h2&gt;
  
  
  My actual take
&lt;/h2&gt;

&lt;p&gt;The surprise is not that browser agents got good.&lt;/p&gt;

&lt;p&gt;The surprise is that they got good enough right when businesses ran out of patience waiting for proper integrations.&lt;/p&gt;

&lt;p&gt;And "good enough under supervision" is a real market.&lt;/p&gt;

&lt;p&gt;If you have a stable back-office flow, use the API.&lt;/p&gt;

&lt;p&gt;If the work is trapped in TikTok analytics, LinkedIn campaign screens, YouTube Studio, vendor portals, internal admin tools, or Android apps, then a browser agent or app-surface agent may be the only realistic option.&lt;/p&gt;

&lt;p&gt;Not the prettiest option.&lt;br&gt;
Not the simplest option.&lt;br&gt;
Definitely not the most elegant option.&lt;/p&gt;

&lt;p&gt;But realistic beats elegant when the work has to get done.&lt;/p&gt;

&lt;p&gt;That's why every serious browser agent demo looks a little cursed.&lt;/p&gt;

&lt;p&gt;It is solving cursed problems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>api</category>
      <category>developers</category>
    </item>
    <item>
      <title>I thought multi-agent orchestration meant agents should talk more — 2 Reddit threads convinced me the opposite is usually better</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 21 May 2026 06:58:22 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-thought-multi-agent-orchestration-meant-agents-should-talk-more-2-reddit-threads-convinced-me-1000</link>
      <guid>https://dev.to/lars_winstand/i-thought-multi-agent-orchestration-meant-agents-should-talk-more-2-reddit-threads-convinced-me-1000</guid>
      <description>&lt;p&gt;I used to assume the “advanced” version of multi-agent orchestration was obvious:&lt;/p&gt;

&lt;p&gt;More agents. More channels. More back-and-forth.&lt;/p&gt;

&lt;p&gt;If one GPT-5 agent is useful, then surely two GPT-5 agents debating in Discord is better. Add Claude for review, maybe another model for cleanup, and now you’ve got a tiny AI company running inside your workflow.&lt;/p&gt;

&lt;p&gt;That sounds smart right up until you try to supervise it.&lt;/p&gt;

&lt;p&gt;Then the agents start echoing each other. The transcript gets huge. Nobody knows which answer is final. You spend more time reading bot chatter than shipping output.&lt;/p&gt;

&lt;p&gt;While researching OpenClaw workflows, I found two Reddit threads that changed my mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one about getting multiple OpenClaw agents to collaborate in Telegram&lt;/li&gt;
&lt;li&gt;another about long-running OpenClaw workflows becoming hard to supervise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The useful takeaway was not “make agents chat better.”&lt;/p&gt;

&lt;p&gt;It was: use structured handoffs, fresh review, and explicit checkpoints.&lt;/p&gt;

&lt;p&gt;That is the pattern I’d recommend to anyone building AI agents in OpenClaw, n8n, Make, Zapier, or custom automations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reddit comment that got it right
&lt;/h2&gt;

&lt;p&gt;In a thread on r/openclaw, someone asked about making two OpenClaw agents collaborate in a Telegram group.&lt;/p&gt;

&lt;p&gt;The best reply was basically this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Works better if they don't actually chat in real time. Have Agent 1 write a structured note, then trigger Agent 2 to review it fresh with no shared conversation history.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That should be the default design for a lot of multi-agent systems.&lt;/p&gt;

&lt;p&gt;Why? Because shared live context creates fast agreement, not necessarily good critique.&lt;/p&gt;

&lt;p&gt;If Agent 2 sees the entire conversation, it tends to inherit Agent 1’s framing. It becomes a collaborator in the same mistake instead of a reviewer.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw already nudges you toward isolation
&lt;/h2&gt;

&lt;p&gt;This is what made the advice click for me.&lt;/p&gt;

&lt;p&gt;OpenClaw is not really designed around one giant immortal shared chat. A lot of its execution contexts are already isolated.&lt;/p&gt;

&lt;p&gt;For example, OpenClaw documents session scoping like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"session"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"dmScope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"per-channel-peer"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And cron jobs start a fresh session per run.&lt;/p&gt;

&lt;p&gt;That matters.&lt;/p&gt;

&lt;p&gt;A fresh session means the next agent or next run does not inherit a giant blob of stale context by default. That is usually a feature, not a limitation.&lt;/p&gt;

&lt;p&gt;There’s also a daily session reset behavior in OpenClaw, with a default new session time of 4:00 AM local time on the gateway host. Again: this is not a framework optimized for endless context accumulation.&lt;/p&gt;

&lt;p&gt;It’s a framework that assumes boundaries are healthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why real-time agent chat looks smart but performs dumb
&lt;/h2&gt;

&lt;p&gt;Live agent chat feels like progress because there are lots of messages.&lt;/p&gt;

&lt;p&gt;But a busy transcript is not the same thing as a reliable workflow.&lt;/p&gt;

&lt;p&gt;The common failure modes are boring and predictable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agents repeat each other&lt;/li&gt;
&lt;li&gt;agents converge too early&lt;/li&gt;
&lt;li&gt;weak assumptions get reinforced&lt;/li&gt;
&lt;li&gt;supervision gets harder over time&lt;/li&gt;
&lt;li&gt;the final artifact is less clear than the conversation that produced it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your goal is critique, you want distance.&lt;/p&gt;

&lt;p&gt;You want Agent 2 to arrive slightly skeptical, with a clean starting point.&lt;/p&gt;

&lt;p&gt;That is why I think the supervisor/reviewer pattern beats free-form bot banter in most production workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem is drift
&lt;/h2&gt;

&lt;p&gt;The second Reddit thread was about long-running OpenClaw workflows getting harder to supervise.&lt;/p&gt;

&lt;p&gt;That is the real operational problem.&lt;/p&gt;

&lt;p&gt;Not sociability. Drift.&lt;/p&gt;

&lt;p&gt;If you’ve run serious agent workflows, this probably feels familiar:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one agent is writing code&lt;/li&gt;
&lt;li&gt;another is summarizing research&lt;/li&gt;
&lt;li&gt;another is running automations&lt;/li&gt;
&lt;li&gt;one task failed quietly&lt;/li&gt;
&lt;li&gt;one task is technically still running but no longer useful&lt;/li&gt;
&lt;li&gt;you come back later and can’t tell which state is trustworthy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, more chat is the last thing you need.&lt;/p&gt;

&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the latest approved artifact&lt;/li&gt;
&lt;li&gt;the current task state&lt;/li&gt;
&lt;li&gt;a reviewer that can compare output against something stable&lt;/li&gt;
&lt;li&gt;bounded retries instead of endless loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One commenter in that thread put it bluntly: drift happens quickly, and markdown notes alone are not enough.&lt;/p&gt;

&lt;p&gt;I agree.&lt;/p&gt;

&lt;p&gt;This is a checkpoints problem.&lt;br&gt;
A verification problem.&lt;br&gt;
A state management problem.&lt;/p&gt;

&lt;p&gt;It is not a “put GPT-5 and Claude in Telegram and let them vibe” problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I’d build instead
&lt;/h2&gt;

&lt;p&gt;If I were wiring up a production multi-agent workflow today, I’d use explicit artifacts and fresh sessions.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Worker agent does the first pass.&lt;/li&gt;
&lt;li&gt;Worker agent writes a structured handoff note.&lt;/li&gt;
&lt;li&gt;Reviewer agent gets the artifact plus the handoff note.&lt;/li&gt;
&lt;li&gt;Reviewer agent does not get the full chat transcript unless absolutely necessary.&lt;/li&gt;
&lt;li&gt;Reviewer approves, rejects, or requests one bounded revision.&lt;/li&gt;
&lt;li&gt;A supervisor step checks output against a known-good state.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That handoff note should be boring and explicit.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"goal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Generate a patch for the failing webhook retry logic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"inputs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"src/webhooks/retry.ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"error logs from last 24h"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"assumptions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"429s should back off exponentially"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"network timeouts are retryable"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"4xx validation errors are not retryable"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"proposed_output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patch + tests + migration note"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"open_questions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Should 408 be grouped with network errors?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Do we cap retries at 5 or 7?"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"failure_risks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Could duplicate webhook delivery on timeout edge cases"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"May break current metrics labels"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is much easier to review than 200 messages of agent chatter.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical file-based pattern
&lt;/h2&gt;

&lt;p&gt;OpenClaw users mentioned internal coordination through things like &lt;code&gt;session_send()&lt;/code&gt; and file-based handoffs across workspaces.&lt;/p&gt;

&lt;p&gt;That makes sense to me.&lt;/p&gt;

&lt;p&gt;A simple filesystem-based pattern is often enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/workspace
  /tasks
    task-142.json
  /artifacts
    task-142.patch
  /reviews
    task-142.review.json
  /state
    approved.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example task file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task-142"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"review_pending"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"owner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"worker-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"artifact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"artifacts/task-142.patch"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reviewer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"review-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"retry_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example review output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task-142"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"changes_requested"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"issues"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Missing test for timeout retry path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Backoff jitter not applied"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"approved_artifact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"next_action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"worker-revise"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is orchestration.&lt;/p&gt;

&lt;p&gt;Not a room full of bots pretending to be coworkers.&lt;/p&gt;

&lt;h2&gt;
  
  
  When real-time agent chat is actually useful
&lt;/h2&gt;

&lt;p&gt;I’m not saying agent-to-agent chat is always bad.&lt;/p&gt;

&lt;p&gt;There are a few cases where it makes sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Brainstorming
&lt;/h3&gt;

&lt;p&gt;If you want divergent ideas fast, shared chat can help.&lt;/p&gt;

&lt;p&gt;For example: product naming, rough architecture exploration, or generating lots of candidate approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Human-adjacent workflows
&lt;/h3&gt;

&lt;p&gt;If agents need to operate in Discord or Telegram because humans are already there, fine. That can be useful at the edges.&lt;/p&gt;

&lt;p&gt;But I still would not make that the core execution layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Demos
&lt;/h3&gt;

&lt;p&gt;A room full of agents talking looks impressive.&lt;/p&gt;

&lt;p&gt;It demos well.&lt;/p&gt;

&lt;p&gt;It also tends to be much less reliable than a boring artifact pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token cost is not the only cost
&lt;/h2&gt;

&lt;p&gt;This part matters for anyone running automations at scale.&lt;/p&gt;

&lt;p&gt;Shared channels do not just increase noise. They increase consumption.&lt;/p&gt;

&lt;p&gt;If agents are watching entire chat threads, they ingest irrelevant context constantly.&lt;/p&gt;

&lt;p&gt;That means more tokens, more latency, and more supervision overhead.&lt;/p&gt;

&lt;p&gt;For teams running n8n, Make, Zapier, OpenClaw, or custom agent systems, this is where pricing starts to matter a lot.&lt;/p&gt;

&lt;p&gt;If every workflow turns into agents repeatedly reading giant transcripts, per-token billing gets painful fast.&lt;/p&gt;

&lt;p&gt;That’s one reason I think flat-rate infrastructure is a better fit for serious automation work. If your agents are running all day, you do not want your architecture decisions distorted by token anxiety.&lt;/p&gt;

&lt;p&gt;Standard Compute is interesting here because it gives you an OpenAI-compatible API with flat monthly pricing instead of per-token billing. So if you’re experimenting with reviewer loops, supervisor agents, retries, or multi-step automations, you can optimize for reliability first instead of constantly asking whether each extra pass is too expensive.&lt;/p&gt;

&lt;p&gt;That matters even more when you’re routing across different models for different roles.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5.4 for implementation&lt;/li&gt;
&lt;li&gt;Claude Opus 4.6 for review&lt;/li&gt;
&lt;li&gt;Grok 4.20 for alternate reasoning or edge-case checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That kind of setup is useful, but only if the cost model doesn’t punish every extra step.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple reviewer loop you can actually use
&lt;/h2&gt;

&lt;p&gt;Here’s a minimal pattern I’d trust more than live bot chat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: generate artifact
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node worker.js &lt;span class="nt"&gt;--task&lt;/span&gt; tasks/task-142.json &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; artifacts/task-142.patch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: create handoff note
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node create-handoff.js &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--task&lt;/span&gt; tasks/task-142.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--artifact&lt;/span&gt; artifacts/task-142.patch &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; reviews/task-142.handoff.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: run fresh review
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node review.js &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--handoff&lt;/span&gt; reviews/task-142.handoff.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--artifact&lt;/span&gt; artifacts/task-142.patch &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; reviews/task-142.review.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: enforce bounded retry
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node supervisor.js &lt;span class="nt"&gt;--task&lt;/span&gt; tasks/task-142.json &lt;span class="nt"&gt;--review&lt;/span&gt; reviews/task-142.review.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supervisor logic should be strict:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;if approved, publish artifact&lt;/li&gt;
&lt;li&gt;if changes requested and retry_count &amp;lt; max_retries, revise once&lt;/li&gt;
&lt;li&gt;otherwise fail closed and escalate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you a workflow you can inspect later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What actually happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Real-time agent chat in Discord or Telegram&lt;/td&gt;
&lt;td&gt;Shared live history, faster convergence, more supervision overhead, more irrelevant context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured reviewer handoff&lt;/td&gt;
&lt;td&gt;Clear artifact, fresh review context, better critique, easier audit trail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File-based or session-based coordination in OpenClaw&lt;/td&gt;
&lt;td&gt;Lower platform friction, deterministic state, easier retries and checkpointing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  My default rule now
&lt;/h2&gt;

&lt;p&gt;If the job needs reliability, don’t start with agent chat.&lt;/p&gt;

&lt;p&gt;Start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;explicit artifacts&lt;/li&gt;
&lt;li&gt;structured handoff notes&lt;/li&gt;
&lt;li&gt;isolated review passes&lt;/li&gt;
&lt;li&gt;checkpointed state&lt;/li&gt;
&lt;li&gt;bounded retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That pattern is less flashy than a multi-agent group chat.&lt;/p&gt;

&lt;p&gt;It is also much easier to operate.&lt;/p&gt;

&lt;p&gt;The best multi-agent systems I can imagine do not act like a group chat.&lt;br&gt;
They act more like a newsroom or a code review pipeline.&lt;/p&gt;

&lt;p&gt;One agent files a draft.&lt;br&gt;
Another reviews it fresh.&lt;br&gt;
A supervisor checks whether it meets the bar.&lt;br&gt;
The result gets approved, rejected, or revised with a clear paper trail.&lt;/p&gt;

&lt;p&gt;That is the design lesson I’d steal from those OpenClaw threads.&lt;/p&gt;

&lt;p&gt;Don’t optimize for sociable agents.&lt;br&gt;
Optimize for legible handoffs and clean review.&lt;/p&gt;

&lt;p&gt;Because the common failure mode in multi-agent work is not silence.&lt;/p&gt;

&lt;p&gt;It’s two agents confidently talking each other into the same mistake.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
      <category>openai</category>
    </item>
    <item>
      <title>I read the r/openclaw thread asking if anyone has a fully working setup and the answer is weirdly yes</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 21 May 2026 05:07:56 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-read-the-ropenclaw-thread-asking-if-anyone-has-a-fully-working-setup-and-the-answer-is-weirdly-25aa</link>
      <guid>https://dev.to/lars_winstand/i-read-the-ropenclaw-thread-asking-if-anyone-has-a-fully-working-setup-and-the-answer-is-weirdly-25aa</guid>
      <description>&lt;p&gt;A few days ago I was digging into why some OpenClaw setups look incredibly capable and others look like a lab accident with Slack access.&lt;/p&gt;

&lt;p&gt;That led me to this r/openclaw thread:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://reddit.com/r/openclaw/comments/1tiicaq/anyone_else_have_a_fully_working_oc/" rel="noopener noreferrer"&gt;“Anyone else have a fully working OC ?”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It had 22 upvotes and 30 comments, which is honestly the perfect size for this kind of post. Big enough to get real operators. Small enough that nobody is pretending.&lt;/p&gt;

&lt;p&gt;My main takeaway: yes, some people absolutely have a fully working OpenClaw setup.&lt;/p&gt;

&lt;p&gt;But they did not get there by installing OpenClaw, pointing it at a random cheap model, and hoping autonomy would sort itself out.&lt;/p&gt;

&lt;p&gt;They got there with guardrails, pinned versions, backups, and realistic expectations about what models can handle.&lt;/p&gt;

&lt;p&gt;That distinction matters if you run agents in production, especially if they touch Slack, Discord, Telegram, cron jobs, memory, files, or external tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw is not "just a chatbot"
&lt;/h2&gt;

&lt;p&gt;A lot of people talk about OpenClaw like it is ChatGPT with extra tabs.&lt;/p&gt;

&lt;p&gt;It is not.&lt;/p&gt;

&lt;p&gt;OpenClaw is a self-hosted gateway for AI agents that can connect to real channels like Slack, Discord, Telegram, WhatsApp, Microsoft Teams, Signal, Matrix, Google Chat, iMessage, and Zalo.&lt;/p&gt;

&lt;p&gt;That means you are not debugging a prompt box. You are operating an always-on agent runtime with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;channel auth&lt;/li&gt;
&lt;li&gt;session state&lt;/li&gt;
&lt;li&gt;memory&lt;/li&gt;
&lt;li&gt;cron jobs&lt;/li&gt;
&lt;li&gt;model routing&lt;/li&gt;
&lt;li&gt;permissions&lt;/li&gt;
&lt;li&gt;persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you frame it that way, a lot of the Reddit drama makes more sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The people saying "it works" were not casual users
&lt;/h2&gt;

&lt;p&gt;The original poster was already using OpenClaw daily.&lt;/p&gt;

&lt;p&gt;They wrote:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have had openclaw for 4 weeks now, it has helped me In so many ways, all projects are flying, memory is superb, full access to all systems, security hardened (by itself) on all system, doing regular routine work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not a toy setup.&lt;/p&gt;

&lt;p&gt;And they were specific about the model too: Qwen 3.6 27B, quantized to q4 or q6 depending on task complexity.&lt;/p&gt;

&lt;p&gt;Another commenter mentioned buying RTX 3090 cards for $550 each and 128 GB DDR5 for $500 a couple of years ago to support local model usage.&lt;/p&gt;

&lt;p&gt;That is the first useful reality check.&lt;/p&gt;

&lt;p&gt;When people say they have a “fully working” OpenClaw setup, they usually mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it works for the workflows they designed&lt;/li&gt;
&lt;li&gt;it works with the versions they pinned&lt;/li&gt;
&lt;li&gt;it works with the model they tested&lt;/li&gt;
&lt;li&gt;it works with the channel integrations they actually configured&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a very different claim from “OpenClaw is universally reliable under any conditions.”&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually breaks OpenClaw
&lt;/h2&gt;

&lt;p&gt;The most useful replies in the thread were not victory laps. They were basically postmortems.&lt;/p&gt;

&lt;p&gt;The common pattern was simple:&lt;/p&gt;

&lt;p&gt;OpenClaw gets unstable when people give agents too much freedom, too many permissions, weak task boundaries, and a model that is not good enough for long-running agent work.&lt;/p&gt;

&lt;p&gt;That combination creates chaos fast.&lt;/p&gt;

&lt;p&gt;Think of it like this:&lt;/p&gt;

&lt;p&gt;If you give an intern root access, vague instructions, and a live Slack workspace, you did not create autonomy. You created incident response.&lt;/p&gt;

&lt;p&gt;The boring engineering answer is still the right one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;constrain autonomy&lt;/li&gt;
&lt;li&gt;pin versions&lt;/li&gt;
&lt;li&gt;back up state&lt;/li&gt;
&lt;li&gt;isolate integrations&lt;/li&gt;
&lt;li&gt;choose models for reliability, not just price&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cheap models do not just give worse answers
&lt;/h2&gt;

&lt;p&gt;They can make the whole stack feel broken.&lt;/p&gt;

&lt;p&gt;The post referenced ClawBench V2 numbers from 2026-05-20. It is not a pure OpenClaw benchmark, but it is still useful for understanding model capability gaps in agent-style tasks.&lt;/p&gt;

&lt;p&gt;Here is the rough picture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Snapshot takeaway&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;claude-opus-4-7&lt;/td&gt;
&lt;td&gt;Best score in the cited snapshot, but expensive per task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gpt-5.5&lt;/td&gt;
&lt;td&gt;Lower score than Opus, much cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;deepseek-v4-pro&lt;/td&gt;
&lt;td&gt;Competitive enough to be interesting on cost/performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;deepseek-v4-flash:free&lt;/td&gt;
&lt;td&gt;Basically unusable for serious agent work in that snapshot&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last point explains a lot of "OpenClaw is unusable" complaints.&lt;/p&gt;

&lt;p&gt;If you connect a weak model to persistent workflows, channel routing, memory, and tool calls, OpenClaw will not feel cheap.&lt;/p&gt;

&lt;p&gt;It will feel haunted.&lt;/p&gt;

&lt;p&gt;This is exactly where cost starts mattering in a practical way.&lt;/p&gt;

&lt;p&gt;Teams want capable models for agents, but per-token billing makes people downshift into weaker models or over-optimize prompts just to control spend.&lt;/p&gt;

&lt;p&gt;That is bad engineering pressure.&lt;/p&gt;

&lt;p&gt;If you are running agents in n8n, Make, Zapier, OpenClaw, or custom automations, the real requirement is predictable access to strong models without having to meter every call like it is a taxi.&lt;/p&gt;

&lt;p&gt;That is the whole reason products like &lt;a href="https://standardcompute.com" rel="noopener noreferrer"&gt;Standard Compute&lt;/a&gt; are interesting: flat monthly pricing changes the model-selection decision. You can route agent workloads across stronger models without turning every automation into a cost-anxiety exercise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The most mature comment in the thread was about backups
&lt;/h2&gt;

&lt;p&gt;This was my favorite reply:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I also back up the memory and files of my agent every hour. So if something goes wrong or if i do something crazy with it, i just restore the memory and everything is back on track.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the mindset difference right there.&lt;/p&gt;

&lt;p&gt;That person is not treating OpenClaw like a demo. They are treating it like production software.&lt;/p&gt;

&lt;p&gt;And OpenClaw is built for persistence. Its docs support scheduling and long-running automation through cron inside the gateway.&lt;/p&gt;

&lt;p&gt;The docs mention job state here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.openclaw/cron/jobs.json
~/.openclaw/cron/jobs-state.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a command like this is a totally normal example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw cron add &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"Reminder"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--at&lt;/span&gt; &lt;span class="s2"&gt;"2026-02-01T16:00:00Z"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--session&lt;/span&gt; main &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--system-event&lt;/span&gt; &lt;span class="s2"&gt;"Reminder: check the cron docs draft"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wake&lt;/span&gt; now &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--delete-after-run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not chat UI territory anymore.&lt;/p&gt;

&lt;p&gt;That is persistent agent operations.&lt;/p&gt;

&lt;p&gt;If your agent can wake up later, remember context, touch files, and post into channels, restore points are not optional.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three commands I would run before blaming the model
&lt;/h2&gt;

&lt;p&gt;If I were debugging an OpenClaw deployment, these would be near the top of the list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status
openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
openclaw status &lt;span class="nt"&gt;--deep&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds obvious, but a lot of people jump straight to prompt tweaking when the actual problem is one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the gateway is unhealthy&lt;/li&gt;
&lt;li&gt;a channel integration is degraded&lt;/li&gt;
&lt;li&gt;session state is stale&lt;/li&gt;
&lt;li&gt;a release introduced regressions&lt;/li&gt;
&lt;li&gt;the model provider is timing out or truncating requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You want to isolate the layer that is failing before you decide the whole system is bad.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sometimes the bug is not OpenClaw
&lt;/h2&gt;

&lt;p&gt;One commenter in the thread said this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What got me was buggy versions. 2026.5.16 has been working so far. .12 had all kinds of issues with longer prompts going to OpenRouter. IIRC, I was on .4 and chat integration was broken (both Slack and Discord).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a huge clue.&lt;/p&gt;

&lt;p&gt;A lot of “OpenClaw is broken” reports are really one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a bad OpenClaw release&lt;/li&gt;
&lt;li&gt;an OpenRouter issue&lt;/li&gt;
&lt;li&gt;a Slack integration issue&lt;/li&gt;
&lt;li&gt;a Discord integration issue&lt;/li&gt;
&lt;li&gt;a Telegram visibility/config issue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are not the same class of problem.&lt;/p&gt;

&lt;p&gt;Here is the practical version:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Integration&lt;/th&gt;
&lt;th&gt;What makes it tricky&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Slack&lt;/td&gt;
&lt;td&gt;Socket Mode vs HTTP mode, token setup, signing secret, public URL differences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Telegram&lt;/td&gt;
&lt;td&gt;Long polling vs webhook, pairing-based DM access, privacy mode, mention behavior, group admin settings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discord&lt;/td&gt;
&lt;td&gt;Bot permissions, message intent settings, release-specific regressions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model provider layer&lt;/td&gt;
&lt;td&gt;Prompt length handling, timeout behavior, retries, truncation, routing bugs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Telegram in particular has enough edge cases to waste a whole afternoon.&lt;/p&gt;

&lt;p&gt;A config like this is not weird at all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"channels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"telegram"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"botToken"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123:abc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"dmPolicy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pairing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"groups"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"requireMention"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If one user says OpenClaw is amazing and another says it cannot reliably answer in group chat, they may not be testing comparable systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the stable users seemed to have in common
&lt;/h2&gt;

&lt;p&gt;The thread is obviously biased toward success stories.&lt;/p&gt;

&lt;p&gt;So no, you cannot use it to estimate the overall OpenClaw success rate.&lt;/p&gt;

&lt;p&gt;But you can use it to identify patterns among the people getting good results.&lt;/p&gt;

&lt;p&gt;The patterns were pretty consistent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;They limit autonomy instead of maximizing it.&lt;/li&gt;
&lt;li&gt;They pin known-good versions instead of chasing every release.&lt;/li&gt;
&lt;li&gt;They back up memory and files.&lt;/li&gt;
&lt;li&gt;They treat Slack, Discord, and Telegram as operational systems, not just chat windows.&lt;/li&gt;
&lt;li&gt;They use models that can survive multi-step agent work.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last point is the big one for anyone building automations at scale.&lt;/p&gt;

&lt;p&gt;A surprising amount of “agent framework instability” is actually “wrong model plus unpredictable cost constraints.”&lt;/p&gt;

&lt;p&gt;If every extra token feels expensive, teams start making bad tradeoffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;using weaker models than the workflow needs&lt;/li&gt;
&lt;li&gt;overcompressing prompts&lt;/li&gt;
&lt;li&gt;avoiding retries&lt;/li&gt;
&lt;li&gt;disabling useful context&lt;/li&gt;
&lt;li&gt;limiting agent loops for cost reasons instead of safety reasons&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not a technical limitation. That is pricing leaking into architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;After reading the whole thread, I think both camps are telling the truth.&lt;/p&gt;

&lt;p&gt;The “OpenClaw is broken” camp is discovering that persistent agents are hard.&lt;/p&gt;

&lt;p&gt;The “mine works great” camp already accepted that and engineered around it.&lt;/p&gt;

&lt;p&gt;If I had to summarize the real lesson in one sentence, it would be this:&lt;/p&gt;

&lt;p&gt;OpenClaw works when you treat it like infrastructure.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;narrow task scope&lt;/li&gt;
&lt;li&gt;choose a competent model&lt;/li&gt;
&lt;li&gt;pin a stable release&lt;/li&gt;
&lt;li&gt;monitor the gateway&lt;/li&gt;
&lt;li&gt;expect integration weirdness&lt;/li&gt;
&lt;li&gt;plan for recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your setup has broad permissions, no backups, flaky chat integrations, and a bargain-bin model, do not say OpenClaw cannot work.&lt;/p&gt;

&lt;p&gt;Say you built a distributed failure demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical checklist for a sane OpenClaw setup
&lt;/h2&gt;

&lt;p&gt;If you are building or stabilizing an OpenClaw deployment, this is the checklist I would start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Verify runtime health&lt;/span&gt;
openclaw status &lt;span class="nt"&gt;--deep&lt;/span&gt;

&lt;span class="c"&gt;# 2. Pin a known-good version&lt;/span&gt;
&lt;span class="c"&gt;# example only: use your package manager / deployment method&lt;/span&gt;

&lt;span class="c"&gt;# 3. Snapshot memory + job state&lt;/span&gt;
rsync &lt;span class="nt"&gt;-av&lt;/span&gt; ~/.openclaw/ /backups/openclaw-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%F-%H%M&lt;span class="si"&gt;)&lt;/span&gt;/

&lt;span class="c"&gt;# 4. Test channels independently&lt;/span&gt;
&lt;span class="c"&gt;# Slack, Discord, Telegram should each get isolated smoke tests&lt;/span&gt;

&lt;span class="c"&gt;# 5. Run short constrained tasks before long autonomous loops&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And at the architecture level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Good:
- narrow task scope
- explicit tools
- bounded permissions
- stable model choice
- backup + restore path

Bad:
- vague goals
- broad permissions
- cheapest available model
- no state recovery
- multiple live integrations changed at once
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Where Standard Compute fits
&lt;/h2&gt;

&lt;p&gt;If you are running OpenClaw, n8n, Make, Zapier, or custom agents, the hardest part is often not getting a model call to work.&lt;/p&gt;

&lt;p&gt;It is keeping capable model usage affordable enough that you do not sabotage your own system design.&lt;/p&gt;

&lt;p&gt;That is why I think the pricing model matters almost as much as the benchmark.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://standardcompute.com" rel="noopener noreferrer"&gt;Standard Compute&lt;/a&gt; is interesting because it is a drop-in OpenAI-compatible API with flat monthly pricing, so you can run agent and automation workloads without per-token billing pressure. It also routes across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20.&lt;/p&gt;

&lt;p&gt;For this kind of workload, that matters.&lt;/p&gt;

&lt;p&gt;Because once you stop optimizing every workflow around token anxiety, you can make better engineering choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pick stronger models when reliability matters&lt;/li&gt;
&lt;li&gt;let automations run continuously&lt;/li&gt;
&lt;li&gt;avoid weird prompt-minimization hacks&lt;/li&gt;
&lt;li&gt;build around throughput and outcomes instead of token panic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That does not magically fix OpenClaw.&lt;/p&gt;

&lt;p&gt;But it does remove one of the most common reasons people deploy agents with the wrong model in the first place.&lt;/p&gt;

&lt;p&gt;And based on that Reddit thread, wrong model choice is a lot of the story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;That thread did not prove OpenClaw is universally stable.&lt;/p&gt;

&lt;p&gt;It proved something more useful:&lt;/p&gt;

&lt;p&gt;Fully working setups exist, and they are engineered into existence.&lt;/p&gt;

&lt;p&gt;That is a much better answer than hype.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
    <item>
      <title>I read the 33-comment Reddit fight about Google Spark vs OpenClaw and the real debate is way weirder</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 20 May 2026 19:41:55 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-read-the-33-comment-reddit-fight-about-google-spark-vs-openclaw-and-the-real-debate-is-way-weirder-2261</link>
      <guid>https://dev.to/lars_winstand/i-read-the-33-comment-reddit-fight-about-google-spark-vs-openclaw-and-the-real-debate-is-way-weirder-2261</guid>
      <description>&lt;p&gt;The 33-comment &lt;a href="https://reddit.com/r/openclaw/comments/1ti9oyj/google_spark_vs_openclaw/" rel="noopener noreferrer"&gt;r/openclaw thread about Google Spark vs OpenClaw&lt;/a&gt; looks like a model comparison.&lt;/p&gt;

&lt;p&gt;It isn’t.&lt;/p&gt;

&lt;p&gt;The actual argument is about who controls the workflow surface area.&lt;/p&gt;

&lt;p&gt;One commenter nailed it: &lt;a href="https://reddit.com/r/openclaw/comments/1ti9oyj/google_spark_vs_openclaw/" rel="noopener noreferrer"&gt;“the real competition isn’t ‘whose AI is smarter’ — it’s: who owns the workflow surface area”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That’s the whole thing.&lt;/p&gt;

&lt;p&gt;Google owns Gmail, Docs, Calendar, Drive, Meet, Search, and Android. OpenClaw gives you local models, Markdown memory, and the ability to wire an agent into your own machine and your own stack.&lt;/p&gt;

&lt;p&gt;Those are not competing on the same axis.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real split: convenience vs control
&lt;/h2&gt;

&lt;p&gt;If Google Spark can act across Gmail, Google Docs, Google Calendar, Google Drive, and Android with one identity layer, that’s an absurd distribution advantage.&lt;/p&gt;

&lt;p&gt;For mainstream users, that matters more than whether one model scores 3 points higher on some benchmark.&lt;/p&gt;

&lt;p&gt;But the OpenClaw crowd is optimizing for something else entirely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local model support&lt;/li&gt;
&lt;li&gt;self-hosted or semi-self-hosted workflows&lt;/li&gt;
&lt;li&gt;inspectable memory&lt;/li&gt;
&lt;li&gt;tool use outside a SaaS sandbox&lt;/li&gt;
&lt;li&gt;control over long-running agent behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why one of the cleanest comments in the thread was: &lt;a href="https://reddit.com/r/openclaw/comments/1ti9oyj/google_spark_vs_openclaw/" rel="noopener noreferrer"&gt;“Can Google Spark use local models? If not, different use case in my mind.”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Exactly.&lt;/p&gt;

&lt;p&gt;Same category label. Different species.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OpenClaw is actually good at
&lt;/h2&gt;

&lt;p&gt;OpenClaw makes local and custom backends a first-class feature, not a checkbox.&lt;/p&gt;

&lt;p&gt;It supports things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LM Studio&lt;/li&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;li&gt;MLX&lt;/li&gt;
&lt;li&gt;vLLM&lt;/li&gt;
&lt;li&gt;SGLang&lt;/li&gt;
&lt;li&gt;LiteLLM&lt;/li&gt;
&lt;li&gt;custom OpenAI-compatible endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because developers don’t just want “bring your own API key.”&lt;/p&gt;

&lt;p&gt;They want this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# local model server&lt;/span&gt;
ollama serve

&lt;span class="c"&gt;# local OpenAI-compatible endpoint&lt;/span&gt;
curl http://127.0.0.1:11434/api/tags
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# LM Studio local server&lt;/span&gt;
curl http://127.0.0.1:1234/v1/models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:1234/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not-needed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a very different product philosophy from a cloud assistant that mostly wants to keep you inside one vendor’s ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The local model dream is real. So is the hardware bill.
&lt;/h2&gt;

&lt;p&gt;This is where the Reddit threads got honest.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://reddit.com/r/openclaw/comments/1thyofj/mac_studio_ultra_192gb_for_local_ai_can_you/" rel="noopener noreferrer"&gt;another r/openclaw post&lt;/a&gt;, someone said they were spending about $280/month on Claude and Codex for browser automation workflows and were considering a Mac Studio M4 Ultra 192GB to reduce recurring cost.&lt;/p&gt;

&lt;p&gt;That is the real buyer question for agent builders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep paying cloud rent?&lt;/li&gt;
&lt;li&gt;Buy hardware and absorb setup pain?&lt;/li&gt;
&lt;li&gt;Trade convenience for control?&lt;/li&gt;
&lt;li&gt;Optimize for demos or for agents running all day?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also where token pricing quietly wrecks good automation ideas.&lt;/p&gt;

&lt;p&gt;A workflow looks cheap when you test it 10 times.&lt;/p&gt;

&lt;p&gt;Then you put it into n8n, Make, Zapier, OpenClaw, or a custom worker loop and suddenly you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;summarization passes&lt;/li&gt;
&lt;li&gt;browser actions&lt;/li&gt;
&lt;li&gt;memory writes&lt;/li&gt;
&lt;li&gt;planning loops&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;background runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And now cost isn’t a line item. It’s architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why memory is the sneaky important part
&lt;/h2&gt;

&lt;p&gt;The most underrated OpenClaw feature is not local inference.&lt;/p&gt;

&lt;p&gt;It’s memory that developers can inspect.&lt;/p&gt;

&lt;p&gt;OpenClaw stores memory in plain Markdown files. There’s a daily log and optionally a long-term &lt;code&gt;MEMORY.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That means your agent memory is not trapped behind a mystery UI.&lt;/p&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;version it with Git&lt;/li&gt;
&lt;li&gt;back it up&lt;/li&gt;
&lt;li&gt;audit it&lt;/li&gt;
&lt;li&gt;edit it manually&lt;/li&gt;
&lt;li&gt;replace the strategy entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a huge difference from assistants where memory exists somewhere behind the curtain and you’re expected to trust the vibe.&lt;/p&gt;

&lt;p&gt;Example of the kind of tooling OpenClaw exposes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory_search
memory_get
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the compaction setup is even more interesting because it assumes long-running sessions are normal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"compaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"reserveTokensFloor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"memoryFlush"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"softThresholdTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"systemPrompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Session nearing compaction. Store durable memories now."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That tells you OpenClaw expects agents to run long enough that context management becomes a systems problem.&lt;/p&gt;

&lt;p&gt;That’s not a toy use case.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://reddit.com/r/openclaw/comments/1tiw94x/how_do_you_keep_long_sessions_from_eating_the/" rel="noopener noreferrer"&gt;another thread about long sessions eating context&lt;/a&gt;, one user said native compaction felt too late and too all-or-nothing, so they added a Plugin SDK hook on &lt;code&gt;before_prompt_build&lt;/code&gt; to gradually compress older turns.&lt;/p&gt;

&lt;p&gt;That’s peak OpenClaw.&lt;/p&gt;

&lt;p&gt;If the runtime doesn’t behave the way you want, patch the runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hard line: can it touch your machine?
&lt;/h2&gt;

&lt;p&gt;This was the cleanest product boundary in the whole Reddit debate.&lt;/p&gt;

&lt;p&gt;In a related thread about Google launching its own version of OpenClaw, a top comment said: &lt;a href="https://reddit.com/r/openclaw/comments/1ti2bkp/google_is_launching_its_own_version_of_openclaw/" rel="noopener noreferrer"&gt;“This isn’t their version of openclaw, it’s cloud based. It can’t sysadmin your linux machine.”&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That sounds snarky, but it’s also accurate.&lt;/p&gt;

&lt;p&gt;For a lot of developers, an assistant is only interesting if it can operate across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local files&lt;/li&gt;
&lt;li&gt;terminal commands&lt;/li&gt;
&lt;li&gt;browser automation&lt;/li&gt;
&lt;li&gt;internal services&lt;/li&gt;
&lt;li&gt;remote APIs&lt;/li&gt;
&lt;li&gt;local model servers&lt;/li&gt;
&lt;li&gt;self-hosted tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a different category from “help me inside Gmail and Docs.”&lt;/p&gt;

&lt;p&gt;And yes, there are users already doing this for real. In &lt;a href="https://reddit.com/r/openclaw/comments/1tiicaq/anyone_else_have_a_fully_working_oc/" rel="noopener noreferrer"&gt;“Anyone else have a fully working OC?”&lt;/a&gt;, the original poster said OpenClaw had been helping for 4 weeks with projects, memory, full system access, and routine work using a local Qwen 3.6 27B model.&lt;/p&gt;

&lt;p&gt;That’s not hypothetical.&lt;/p&gt;

&lt;h2&gt;
  
  
  So who wins?
&lt;/h2&gt;

&lt;p&gt;My take is simple.&lt;/p&gt;

&lt;p&gt;Google Spark will probably win the most users.&lt;/p&gt;

&lt;p&gt;OpenClaw will probably win the most demanding users.&lt;/p&gt;

&lt;p&gt;Those are different contests.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;What it’s really optimizing for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Google Spark&lt;/td&gt;
&lt;td&gt;Tight Google ecosystem integration, cloud-first convenience, built-in distribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Local model support, hackable workflows, inspectable Markdown memory, self-hostable components&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw + Ollama or LM Studio + Qwen&lt;/td&gt;
&lt;td&gt;More setup work, lower potential recurring cost, maximum control over runtime and model behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If your work already lives in Gmail, Docs, Calendar, Drive, and Android, Google has a massive advantage.&lt;/p&gt;

&lt;p&gt;If your definition of success includes local models, OpenAI-compatible endpoints, custom memory behavior, or agents that can operate outside a SaaS sandbox, then Google Spark is not replacing OpenClaw.&lt;/p&gt;

&lt;p&gt;It’s solving a different problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part developers should care about most: cost changes behavior
&lt;/h2&gt;

&lt;p&gt;The bluntest comment in the thread was probably the most useful: “Ultimately the cheaper option will win.”&lt;/p&gt;

&lt;p&gt;That sounds simplistic until you run real agents in production.&lt;/p&gt;

&lt;p&gt;Once your automations are active all day, pricing changes everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether you use Claude Opus or a smaller model&lt;/li&gt;
&lt;li&gt;whether you let the model think longer&lt;/li&gt;
&lt;li&gt;whether you keep memory verbose or compress aggressively&lt;/li&gt;
&lt;li&gt;whether you retry failed steps&lt;/li&gt;
&lt;li&gt;whether you build the workflow at all&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly why flat-rate inference is becoming more interesting for agent builders.&lt;/p&gt;

&lt;p&gt;If you’re wiring LLMs into n8n, Make, Zapier, OpenClaw, or custom workers, per-token billing creates weird incentives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cut context too early&lt;/li&gt;
&lt;li&gt;avoid retries&lt;/li&gt;
&lt;li&gt;downgrade models aggressively&lt;/li&gt;
&lt;li&gt;add cost guards everywhere&lt;/li&gt;
&lt;li&gt;hesitate to let agents run 24/7&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s bad for product design.&lt;/p&gt;

&lt;p&gt;It’s one reason services like &lt;a href="https://standardcompute.com" rel="noopener noreferrer"&gt;Standard Compute&lt;/a&gt; are worth paying attention to.&lt;/p&gt;

&lt;p&gt;Standard Compute is a drop-in OpenAI API replacement with flat monthly pricing, so you can run automations and agents without obsessing over every token. It works with existing OpenAI-compatible SDKs and HTTP clients, which is the practical part developers actually care about.&lt;/p&gt;

&lt;p&gt;A minimal swap looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.standardcompute.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a coding assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review this pull request summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters if your real problem is not “which model won Twitter this week?” but “how do I keep my agents running without turning token spend into a full-time job?”&lt;/p&gt;

&lt;h2&gt;
  
  
  My practical takeaway
&lt;/h2&gt;

&lt;p&gt;If you want a polished assistant inside the Google ecosystem, Google Spark is the obvious bet.&lt;/p&gt;

&lt;p&gt;If you want an agent runtime you can inspect, rewire, point at Ollama, LM Studio, LiteLLM, or a custom proxy, OpenClaw is still playing a different and much more interesting game.&lt;/p&gt;

&lt;p&gt;And if you’re building serious automations, the missing variable in most of these debates is cost.&lt;/p&gt;

&lt;p&gt;Not benchmark cost.&lt;/p&gt;

&lt;p&gt;Operational cost.&lt;/p&gt;

&lt;p&gt;That’s what decides whether the workflow survives first contact with production.&lt;/p&gt;

&lt;p&gt;This 33-comment Reddit fight wasn’t really Spark vs OpenClaw.&lt;/p&gt;

&lt;p&gt;It was managed convenience vs user sovereignty.&lt;/p&gt;

&lt;p&gt;And for developers building long-running AI automations, that argument is only getting louder.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>automation</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I think the real AI agent war is who owns your inbox, browser, and calendar</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 20 May 2026 11:43:02 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-think-the-real-ai-agent-war-is-who-owns-your-inbox-browser-and-calendar-jgg</link>
      <guid>https://dev.to/lars_winstand/i-think-the-real-ai-agent-war-is-who-owns-your-inbox-browser-and-calendar-jgg</guid>
      <description>&lt;p&gt;I went into a Reddit rabbit hole expecting the usual argument:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5 vs Claude Opus&lt;/li&gt;
&lt;li&gt;Gemini vs DeepSeek&lt;/li&gt;
&lt;li&gt;hosted vs local&lt;/li&gt;
&lt;li&gt;benchmark chart of the week&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not what serious agent users are arguing about.&lt;/p&gt;

&lt;p&gt;After reading a big r/openclaw thread and digging through real setups, I think the actual battle is much simpler:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who owns the workflow surface area?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not who has the smartest model on a leaderboard.&lt;/p&gt;

&lt;p&gt;Who controls the places where work actually happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inbox&lt;/li&gt;
&lt;li&gt;calendar&lt;/li&gt;
&lt;li&gt;docs&lt;/li&gt;
&lt;li&gt;browser&lt;/li&gt;
&lt;li&gt;chat&lt;/li&gt;
&lt;li&gt;internal tools&lt;/li&gt;
&lt;li&gt;task state&lt;/li&gt;
&lt;li&gt;permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you build agents for real work, that question matters more than another 2% on an eval.&lt;/p&gt;

&lt;h2&gt;
  
  
  The useful mental model: models are commodities, surfaces are moats
&lt;/h2&gt;

&lt;p&gt;A model can be swapped.&lt;/p&gt;

&lt;p&gt;A workflow surface is much harder to displace.&lt;/p&gt;

&lt;p&gt;If an agent already has trusted access to Gmail, Google Docs, Calendar, Drive, Meet, Search, and Android, it starts with a huge advantage. It can read, draft, schedule, search, notify, and follow up without asking the user to glue together 12 APIs.&lt;/p&gt;

&lt;p&gt;That is why Google is dangerous in agents.&lt;/p&gt;

&lt;p&gt;Not because Gemini is magical.&lt;/p&gt;

&lt;p&gt;Because Google already sits where work happens.&lt;/p&gt;

&lt;p&gt;On the other side, tools like OpenClaw are winning a different way: not by owning the surface, but by giving you access to more of it.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Telegram as the control plane&lt;/li&gt;
&lt;li&gt;browser automation for messy tasks&lt;/li&gt;
&lt;li&gt;local model support&lt;/li&gt;
&lt;li&gt;direct access to internal dashboards&lt;/li&gt;
&lt;li&gt;persistent memory across projects&lt;/li&gt;
&lt;li&gt;custom routing between multiple models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a very different product strategy.&lt;/p&gt;

&lt;p&gt;And honestly, for a lot of developers, it is the more interesting one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I saw in real user setups
&lt;/h2&gt;

&lt;p&gt;The best part of these Reddit threads was that people were not talking in abstractions.&lt;/p&gt;

&lt;p&gt;They were posting actual stacks.&lt;/p&gt;

&lt;p&gt;One user described a setup on a Mac Mini M4 with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5 via OAuth&lt;/li&gt;
&lt;li&gt;Telegram as the primary interface&lt;/li&gt;
&lt;li&gt;memory and workflow routing&lt;/li&gt;
&lt;li&gt;project-specific threads&lt;/li&gt;
&lt;li&gt;a second framework running as a sandbox&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not a “which model is smartest?” setup.&lt;/p&gt;

&lt;p&gt;That is an orchestration setup.&lt;/p&gt;

&lt;p&gt;The clever detail was using one Telegram group with just the user and bot, then creating a topic per project so each conversation becomes its own working session.&lt;/p&gt;

&lt;p&gt;That is scrappy, weird, and very good.&lt;/p&gt;

&lt;p&gt;Big vendors usually do not ship weird-first.&lt;/p&gt;

&lt;p&gt;Open ecosystems do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that breaks first in production is not intelligence
&lt;/h2&gt;

&lt;p&gt;It is cost and reliability.&lt;/p&gt;

&lt;p&gt;That was the most practical takeaway from the threads.&lt;/p&gt;

&lt;p&gt;People were not mainly complaining about model quality.&lt;/p&gt;

&lt;p&gt;They were complaining about agents wasting money on dumb work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heartbeat checks&lt;/li&gt;
&lt;li&gt;cron-triggered polling&lt;/li&gt;
&lt;li&gt;status checks&lt;/li&gt;
&lt;li&gt;retries after browser failures&lt;/li&gt;
&lt;li&gt;re-reading context that did not need premium reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where a lot of “agent demos” fall apart when you run them all day.&lt;/p&gt;

&lt;p&gt;The expensive part is often not the hard reasoning step.&lt;/p&gt;

&lt;p&gt;It is the junk around the edges.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple routing rule beats a stronger model used everywhere
&lt;/h2&gt;

&lt;p&gt;A lot of teams still treat model selection like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-opus&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is easy.&lt;/p&gt;

&lt;p&gt;It is also how you end up with a giant bill for work that did not need Claude Opus.&lt;/p&gt;

&lt;p&gt;A more realistic setup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;agent_routing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;heartbeat_checks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;glm-5.1&lt;/span&gt;
  &lt;span class="na"&gt;cron_pings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;glm-5.1&lt;/span&gt;
  &lt;span class="na"&gt;browser_research&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-sonnet-4.6&lt;/span&gt;
  &lt;span class="na"&gt;hard_reasoning&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4&lt;/span&gt;
  &lt;span class="na"&gt;local_private_tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qwen-3.6-27b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That routing layer is boring compared to model launch drama.&lt;/p&gt;

&lt;p&gt;It is also where your economics live.&lt;/p&gt;

&lt;p&gt;If your agent runs 24/7, the difference between “best model everywhere” and “best model where it matters” is the difference between a toy and a system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closed ecosystems vs open ecosystems
&lt;/h2&gt;

&lt;p&gt;I do not think one side wins completely.&lt;/p&gt;

&lt;p&gt;I think the stack is splitting into two categories.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;What wins&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Closed ecosystem agents&lt;/td&gt;
&lt;td&gt;Native access, trust, convenience, polished UX, enterprise-friendly permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open ecosystem agents&lt;/td&gt;
&lt;td&gt;Browser control, local models, custom workflows, internal tools, weird glue code, faster experimentation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Durable advantage&lt;/td&gt;
&lt;td&gt;Integrations, memory, routing, retries, permissions, and action surfaces&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gmail&lt;/li&gt;
&lt;li&gt;Calendar&lt;/li&gt;
&lt;li&gt;Docs&lt;/li&gt;
&lt;li&gt;Meet&lt;/li&gt;
&lt;li&gt;Android notifications&lt;/li&gt;
&lt;li&gt;admin-friendly controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then a Google-style ecosystem is hard to beat.&lt;/p&gt;

&lt;p&gt;If you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;browser automation&lt;/li&gt;
&lt;li&gt;Telegram control&lt;/li&gt;
&lt;li&gt;local Qwen or Llama fallback&lt;/li&gt;
&lt;li&gt;weird CRM integrations&lt;/li&gt;
&lt;li&gt;direct database access&lt;/li&gt;
&lt;li&gt;internal admin panels&lt;/li&gt;
&lt;li&gt;custom long-running workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then the open side gets very attractive very quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why developers should care about “workflow surface area”
&lt;/h2&gt;

&lt;p&gt;Because this changes how you should build agents.&lt;/p&gt;

&lt;p&gt;If you are still thinking mostly in terms of “pick one best model,” you are optimizing the wrong layer.&lt;/p&gt;

&lt;p&gt;The real architecture questions are more like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Where does the agent live?&lt;/li&gt;
&lt;li&gt;What systems can it touch?&lt;/li&gt;
&lt;li&gt;What state does it persist?&lt;/li&gt;
&lt;li&gt;Which steps need premium reasoning?&lt;/li&gt;
&lt;li&gt;Which steps can be cheap?&lt;/li&gt;
&lt;li&gt;What happens when browser automation fails?&lt;/li&gt;
&lt;li&gt;How do you keep it running without babysitting cost?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the practical agent stack.&lt;/p&gt;

&lt;p&gt;Not just prompts. Not just evals. Not just model preference.&lt;/p&gt;

&lt;h2&gt;
  
  
  A more realistic agent architecture
&lt;/h2&gt;

&lt;p&gt;This is closer to what production looks like than most AI demos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interface: Telegram / Slack / email
Memory: shared vector store + task state
Routing: cheap model for routine, premium model for edge cases
Actions: browser, docs, calendar, CRM, internal tools
Fallbacks: local Qwen or Llama when cloud access is blocked
Observability: logs, retries, alerts, usage tracking
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not a chatbot.&lt;/p&gt;

&lt;p&gt;That is an operating layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost problem nobody wants to admit
&lt;/h2&gt;

&lt;p&gt;A lot of teams are quietly discovering that agent economics are still ugly.&lt;/p&gt;

&lt;p&gt;A Zapier AI agent that runs a few times a day can survive sloppy orchestration.&lt;/p&gt;

&lt;p&gt;A real always-on agent cannot.&lt;/p&gt;

&lt;p&gt;If your system is constantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;polling APIs&lt;/li&gt;
&lt;li&gt;checking inboxes&lt;/li&gt;
&lt;li&gt;re-summarizing threads&lt;/li&gt;
&lt;li&gt;retrying browser steps&lt;/li&gt;
&lt;li&gt;escalating everything to the most expensive model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then your architecture is broken, even if your demo looked great.&lt;/p&gt;

&lt;p&gt;This is exactly why predictable compute matters more for agents than for casual chat.&lt;/p&gt;

&lt;p&gt;Agents are persistent. They loop. They retry. They watch things. They perform background work.&lt;/p&gt;

&lt;p&gt;That means per-token billing gets painful fast.&lt;/p&gt;

&lt;p&gt;Not because one single request is expensive.&lt;/p&gt;

&lt;p&gt;Because the system never really stops.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would build differently after reading all this
&lt;/h2&gt;

&lt;p&gt;If I were designing an agent stack today, I would optimize in this order:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Own the interface
&lt;/h3&gt;

&lt;p&gt;Pick the place users already live.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slack&lt;/li&gt;
&lt;li&gt;Telegram&lt;/li&gt;
&lt;li&gt;email&lt;/li&gt;
&lt;li&gt;browser extension&lt;/li&gt;
&lt;li&gt;internal ops dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Minimize premium model usage
&lt;/h3&gt;

&lt;p&gt;Use the strongest model only where failure is expensive.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;contract review&lt;/li&gt;
&lt;li&gt;ambiguous browser recovery&lt;/li&gt;
&lt;li&gt;long-horizon planning&lt;/li&gt;
&lt;li&gt;code generation with side effects&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Treat orchestration as the product
&lt;/h3&gt;

&lt;p&gt;Your moat is not just the model.&lt;/p&gt;

&lt;p&gt;It is the combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;permissions&lt;/li&gt;
&lt;li&gt;tool access&lt;/li&gt;
&lt;li&gt;state management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Design for long-running economics
&lt;/h3&gt;

&lt;p&gt;Assume the agent runs all day.&lt;/p&gt;

&lt;p&gt;If the cost model only works in a demo, it does not work.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete dev setup
&lt;/h2&gt;

&lt;p&gt;If you are experimenting with agents right now, a practical stack might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# example services&lt;/span&gt;
agent-api
worker
browser-runner
memory-store
postgres
redis
telegram-bot
observability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And your routing layer might be as simple as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pickModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;taskType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;heartbeat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;polling&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;glm-5.1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;browser_research&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4.6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hard_reasoning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;critical_planning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;private_local_task&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;qwen-3.6-27b&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one function can matter more than a week of benchmark discourse.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring infrastructure question that becomes the whole business
&lt;/h2&gt;

&lt;p&gt;Once agents move from demos to production, you start caring about things that are not fun to tweet about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request routing&lt;/li&gt;
&lt;li&gt;concurrency&lt;/li&gt;
&lt;li&gt;throttling&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;batching&lt;/li&gt;
&lt;li&gt;model fallback&lt;/li&gt;
&lt;li&gt;cost ceilings&lt;/li&gt;
&lt;li&gt;always-on usage patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also where a lot of teams hit a wall with standard API pricing.&lt;/p&gt;

&lt;p&gt;Per-token billing is tolerable when humans are manually prompting.&lt;/p&gt;

&lt;p&gt;It gets much worse when software is prompting constantly.&lt;/p&gt;

&lt;p&gt;That is why “unlimited compute for agents” is not just a pricing gimmick. It changes what you are willing to automate.&lt;/p&gt;

&lt;p&gt;If your n8n, Make, Zapier, OpenClaw, or custom agent workflow has to think about token spend on every loop, you end up designing around fear.&lt;/p&gt;

&lt;p&gt;If the cost is predictable, you design around throughput and reliability instead.&lt;/p&gt;

&lt;p&gt;That is a much better place to build from.&lt;/p&gt;

&lt;p&gt;For teams running long-lived automations, this is the part worth paying attention to: a drop-in OpenAI-compatible API with flat monthly pricing means you can keep the orchestration logic you already have, but stop treating every background task like a billing risk.&lt;/p&gt;

&lt;p&gt;That is the practical appeal of Standard Compute.&lt;/p&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;The next AI agent war is not primarily Gemini vs GPT-5 vs Claude.&lt;/p&gt;

&lt;p&gt;That fight matters, but it is downstream.&lt;/p&gt;

&lt;p&gt;The upstream fight is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who gets to sit on top of your daily workflow surfaces?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If Google owns the trusted surfaces, it will be brutally strong.&lt;/p&gt;

&lt;p&gt;If tools like OpenClaw keep owning the weird workflows, they will keep attracting the most inventive users.&lt;/p&gt;

&lt;p&gt;And the teams that win will not just have good models.&lt;/p&gt;

&lt;p&gt;They will have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the best access to real work surfaces&lt;/li&gt;
&lt;li&gt;the best orchestration layer&lt;/li&gt;
&lt;li&gt;the best routing logic&lt;/li&gt;
&lt;li&gt;the best memory and tool control&lt;/li&gt;
&lt;li&gt;a cost structure that lets agents stay on all day&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last part is still underrated.&lt;/p&gt;

&lt;p&gt;Model intelligence matters.&lt;/p&gt;

&lt;p&gt;But for real agents, &lt;strong&gt;workflow gravity&lt;/strong&gt; matters more.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
      <category>devops</category>
    </item>
    <item>
      <title>I thought we needed another agent framework — turns out we needed a job_id and a boring config folder</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 20 May 2026 08:40:25 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-thought-we-needed-another-agent-framework-turns-out-we-needed-a-jobid-and-a-boring-config-4hfk</link>
      <guid>https://dev.to/lars_winstand/i-thought-we-needed-another-agent-framework-turns-out-we-needed-a-jobid-and-a-boring-config-4hfk</guid>
      <description>&lt;p&gt;A lot of agent engineering advice still sounds like framework shopping.&lt;/p&gt;

&lt;p&gt;Should you use OpenClaw or n8n?&lt;br&gt;
Is LiteLLM enough?&lt;br&gt;
Do you need LangGraph, an MCP server, or a custom Rust runtime with a dashboard that looks like Mission Control?&lt;/p&gt;

&lt;p&gt;After reading a bunch of real production threads, I think most teams are solving the wrong problem.&lt;/p&gt;

&lt;p&gt;They think they need a better framework.&lt;/p&gt;

&lt;p&gt;What they actually need is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a shared config layer for prompts, tools, and policies&lt;/li&gt;
&lt;li&gt;explicit model routing&lt;/li&gt;
&lt;li&gt;run-level tracing with a stable &lt;code&gt;job_id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;one place to see what happened across retries, tool calls, fallbacks, and provider swaps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the boring part of agent systems.&lt;/p&gt;

&lt;p&gt;It’s also the part that keeps long-running automations from turning into folklore.&lt;/p&gt;
&lt;h2&gt;
  
  
  The pattern I kept seeing
&lt;/h2&gt;

&lt;p&gt;I kept running into Reddit posts from people who said they wanted an agent framework comparison.&lt;/p&gt;

&lt;p&gt;But when you read closely, they were describing operations problems.&lt;/p&gt;

&lt;p&gt;One thread on r/openclaw was from someone running OpenClaw in production on a Mac Mini M4 with 16GB RAM, using GPT-5.5 via OAuth, Telegram as the interface, memory, workflow routing, and a side-by-side sandbox for testing a second framework.&lt;/p&gt;

&lt;p&gt;The key line was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Building a portable 'brain' layer (prompts, memory, workflows, routing rules) that can eventually work across multiple frameworks&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not a framework problem.&lt;/p&gt;

&lt;p&gt;That is the adult version of agent engineering.&lt;/p&gt;

&lt;p&gt;Another thread described an API gateway with a Rust correlator where every run gets a &lt;code&gt;job_id&lt;/code&gt; and that ID follows the run across LLM calls and tool invocations.&lt;/p&gt;

&lt;p&gt;That’s the layer most teams are missing.&lt;/p&gt;

&lt;p&gt;Not another runtime.&lt;/p&gt;

&lt;p&gt;A durable operational spine.&lt;/p&gt;
&lt;h2&gt;
  
  
  What actually breaks first in long-running agents?
&lt;/h2&gt;

&lt;p&gt;Not intelligence.&lt;/p&gt;

&lt;p&gt;Operations.&lt;/p&gt;

&lt;p&gt;The first failures are usually boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;runaway loops&lt;/li&gt;
&lt;li&gt;fallback confusion&lt;/li&gt;
&lt;li&gt;stale memory&lt;/li&gt;
&lt;li&gt;duplicated retry logic&lt;/li&gt;
&lt;li&gt;expensive models handling cheap tasks&lt;/li&gt;
&lt;li&gt;no way to explain one bad run end-to-end&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One OpenClaw user said they burned through tokens their first week because the agent looped on heartbeat checks and cron pings.&lt;/p&gt;

&lt;p&gt;That should sound familiar to anyone who has let an automation run overnight.&lt;/p&gt;

&lt;p&gt;The fix was not a better prompt.&lt;/p&gt;

&lt;p&gt;The fix was routing policy.&lt;/p&gt;

&lt;p&gt;They moved routine work to cheaper models and kept stronger reasoning models for the hard parts.&lt;/p&gt;

&lt;p&gt;That’s the move.&lt;/p&gt;

&lt;p&gt;Not “make the agent smarter.”&lt;/p&gt;

&lt;p&gt;Make the default path cheaper and easier to debug.&lt;/p&gt;
&lt;h2&gt;
  
  
  Cheap defaults beat clever prompts
&lt;/h2&gt;

&lt;p&gt;If your agent is doing background work like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heartbeat checks&lt;/li&gt;
&lt;li&gt;cron pings&lt;/li&gt;
&lt;li&gt;email triage&lt;/li&gt;
&lt;li&gt;status polling&lt;/li&gt;
&lt;li&gt;repetitive browser steps&lt;/li&gt;
&lt;li&gt;simple classification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...then sending every step to Claude Opus or GPT-5 is just expensive laziness.&lt;/p&gt;

&lt;p&gt;Use the expensive model when the run has earned it.&lt;/p&gt;

&lt;p&gt;A simple routing policy gets you further than another week of prompt tuning:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TASK_TO_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;heartbeat_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast-cheap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cron_ping&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast-cheap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_triage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast-cheap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_poll&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast-cheap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mid-tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;browser_exception&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strong-reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complex_reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strong-reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pick_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TASK_TO_MODEL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mid-tier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re running agents in n8n, Make, Zapier, OpenClaw, or custom workers, this matters a lot more than people admit.&lt;/p&gt;

&lt;p&gt;Most runaway cost comes from boring background work nobody classified.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one thing I’d add before adopting another framework
&lt;/h2&gt;

&lt;p&gt;Before you migrate anything, add a &lt;code&gt;job_id&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Not request IDs.&lt;/p&gt;

&lt;p&gt;Run IDs.&lt;/p&gt;

&lt;p&gt;A single long-running automation can touch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5.4&lt;/li&gt;
&lt;li&gt;Claude Opus 4.6&lt;/li&gt;
&lt;li&gt;Grok 4.20&lt;/li&gt;
&lt;li&gt;browser tools&lt;/li&gt;
&lt;li&gt;webhooks&lt;/li&gt;
&lt;li&gt;approval steps&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;queues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your observability stops at request logs, you don’t really have observability.&lt;/p&gt;

&lt;p&gt;You have receipts.&lt;/p&gt;

&lt;p&gt;What you need is a story for one run.&lt;/p&gt;

&lt;p&gt;Here’s the minimum useful pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_job&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;job_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="n"&gt;job_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;start_job&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-job-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x-agent-name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support-triage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# pass these headers into every LLM request, tool call, and webhook
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then aggregate by &lt;code&gt;job_id&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model used at each step&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;fallbacks&lt;/li&gt;
&lt;li&gt;token usage&lt;/li&gt;
&lt;li&gt;cost&lt;/li&gt;
&lt;li&gt;human interventions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you do that, incident review gets much easier.&lt;/p&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why is the dashboard weird?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What happened in job_123?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a much better question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The repo shape tells you whether a team gets agent ops
&lt;/h2&gt;

&lt;p&gt;The healthiest setups I’ve seen all converge on the same basic shape.&lt;/p&gt;

&lt;p&gt;Keep the durable stuff separate from the replaceable stuff.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agents/
  openclaw-prod/
    .env
    workflows/
    runtime/
  sandbox-framework/
    .env
    workflows/
    runtime/
shared-brain/
  prompts/
  tools/
  policies/
  memory-schema.json
  routing.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That layout says:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompts are portable&lt;/li&gt;
&lt;li&gt;tool contracts are portable&lt;/li&gt;
&lt;li&gt;policies are portable&lt;/li&gt;
&lt;li&gt;memory schema is portable&lt;/li&gt;
&lt;li&gt;runtimes are disposable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s what you want.&lt;/p&gt;

&lt;p&gt;Because OpenClaw might change.&lt;br&gt;
Your n8n flow might become a Python worker.&lt;br&gt;
Your memory layer might move to a Cloudflare Worker exposed over MCP.&lt;br&gt;
Your provider mix might change next month.&lt;/p&gt;

&lt;p&gt;If your prompts, policies, and memory schema are trapped inside one framework’s opinionated format, every migration becomes painful for no good reason.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical routing config beats framework magic
&lt;/h2&gt;

&lt;p&gt;I’d rather have a plain YAML file I can inspect than hidden routing logic buried in a framework abstraction.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;default_model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4-mini&lt;/span&gt;
&lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;heartbeat_check&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4-mini&lt;/span&gt;
  &lt;span class="na"&gt;cron_ping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4-mini&lt;/span&gt;
  &lt;span class="na"&gt;email_triage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4-mini&lt;/span&gt;
  &lt;span class="na"&gt;browser_automation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-opus-4.6&lt;/span&gt;
  &lt;span class="na"&gt;research_synthesis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4&lt;/span&gt;
  &lt;span class="na"&gt;fallback_reasoning&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grok-4.20&lt;/span&gt;
&lt;span class="na"&gt;budgets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;max_cost_per_job_usd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.75&lt;/span&gt;
  &lt;span class="na"&gt;max_llm_calls_per_job&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;40&lt;/span&gt;
&lt;span class="na"&gt;fallbacks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-opus-4.6&lt;/span&gt;
    &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-5.4&lt;/span&gt;
    &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grok-4.20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your routing policy is visible.&lt;/p&gt;

&lt;p&gt;You can diff it.&lt;br&gt;
You can review it in PRs.&lt;br&gt;
You can compare behavior across frameworks.&lt;/p&gt;

&lt;p&gt;That is a lot more useful than another demo of an autonomous agent planning vacation itineraries.&lt;/p&gt;
&lt;h2&gt;
  
  
  Framework choice still matters, just less than people think
&lt;/h2&gt;

&lt;p&gt;To be fair: framework choice is not fake.&lt;/p&gt;

&lt;p&gt;It matters if you care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;built-in memory models&lt;/li&gt;
&lt;li&gt;local model support for Qwen or Llama&lt;/li&gt;
&lt;li&gt;UI ergonomics&lt;/li&gt;
&lt;li&gt;tool ecosystem&lt;/li&gt;
&lt;li&gt;workflow authoring style&lt;/li&gt;
&lt;li&gt;MCP support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But once agents become operationally important, framework choice stops being the center of gravity.&lt;/p&gt;

&lt;p&gt;The real questions become:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can I move prompts and policies without rewriting everything?&lt;/li&gt;
&lt;li&gt;Can I compare Claude, GPT-5, and Grok on the same job type?&lt;/li&gt;
&lt;li&gt;Can I see cost, latency, retries, and tool calls in one run view?&lt;/li&gt;
&lt;li&gt;Can I stop silent fallback behavior before it burns budget?&lt;/li&gt;
&lt;li&gt;Can I swap runtimes without losing my memory schema?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s agent ops.&lt;/p&gt;

&lt;p&gt;It’s less glamorous than framework demos.&lt;/p&gt;

&lt;p&gt;It’s also what survives six months of production use.&lt;/p&gt;
&lt;h2&gt;
  
  
  The tradeoff, plainly
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What happens over time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Framework-centric setup&lt;/td&gt;
&lt;td&gt;Fast to start, but prompts, memory, and workflow logic get tightly coupled to one runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API gateway plus portable config&lt;/td&gt;
&lt;td&gt;Better visibility, easier provider swaps, cleaner routing control, but requires discipline around schemas and metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct provider integrations in each workflow&lt;/td&gt;
&lt;td&gt;Fine for small projects, but routing, observability, and fallback logic get duplicated everywhere&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you are a solo builder with one short-lived agent, don’t build a giant control plane.&lt;/p&gt;

&lt;p&gt;That’s overkill.&lt;/p&gt;

&lt;p&gt;But if you have multiple workflows, long-running jobs, or agents running 24/7, the framework-first setup starts rotting from the edges.&lt;/p&gt;

&lt;p&gt;Every workflow invents its own retry logic.&lt;br&gt;
Every prompt drifts.&lt;br&gt;
Every dashboard tells a different partial truth.&lt;/p&gt;

&lt;p&gt;That’s usually when teams start looking for an OpenAI API alternative.&lt;/p&gt;

&lt;p&gt;And honestly, what they often want is not just lower pricing.&lt;/p&gt;

&lt;p&gt;They want one consistent execution layer where routing, budgets, and visibility are not reinvented inside every single agent.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why this connects directly to cost
&lt;/h2&gt;

&lt;p&gt;This is the part people miss.&lt;/p&gt;

&lt;p&gt;Agent ops is cost control.&lt;/p&gt;

&lt;p&gt;If you can’t see a run end-to-end, you can’t answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why one workflow got expensive&lt;/li&gt;
&lt;li&gt;which model handled each step&lt;/li&gt;
&lt;li&gt;whether fallback increased cost&lt;/li&gt;
&lt;li&gt;whether retries multiplied spend&lt;/li&gt;
&lt;li&gt;whether background tasks should be routed to cheaper models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why flat, predictable AI compute is interesting for automation teams.&lt;/p&gt;

&lt;p&gt;Not because pricing is a nice spreadsheet feature.&lt;/p&gt;

&lt;p&gt;Because per-token billing punishes exactly the kind of experimentation and long-running execution that agent systems need.&lt;/p&gt;

&lt;p&gt;If you’re building automations that run all day in n8n, Make, Zapier, OpenClaw, or custom workers, token anxiety becomes an architecture problem.&lt;/p&gt;

&lt;p&gt;You start avoiding useful checks.&lt;br&gt;
You under-instrument jobs.&lt;br&gt;
You hesitate to add retries.&lt;br&gt;
You route too much logic through one provider because cost modeling is annoying.&lt;/p&gt;

&lt;p&gt;That’s backwards.&lt;/p&gt;

&lt;p&gt;The infrastructure should make long-running jobs easier to operate, not harder to justify.&lt;/p&gt;

&lt;p&gt;This is a big part of why services like Standard Compute are interesting to teams building agents and automations.&lt;/p&gt;

&lt;p&gt;You keep the OpenAI-compatible API surface, but you get predictable monthly pricing, dynamic routing across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20, and you stop treating every extra automation step like a billing event you need to babysit.&lt;/p&gt;

&lt;p&gt;That changes how people build.&lt;/p&gt;

&lt;p&gt;Especially once jobs run 24/7.&lt;/p&gt;
&lt;h2&gt;
  
  
  My practical recommendation
&lt;/h2&gt;

&lt;p&gt;If your first instinct is to adopt another framework, stop for a minute.&lt;/p&gt;

&lt;p&gt;Do these four things first:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Add a shared config layer
&lt;/h3&gt;

&lt;p&gt;Put prompts, policies, tool definitions, and memory schema outside the runtime.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Add explicit routing rules
&lt;/h3&gt;

&lt;p&gt;Don’t let model selection happen implicitly.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Add a &lt;code&gt;job_id&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Trace one run across every LLM call, tool call, retry, and fallback.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. Add budget controls outside the framework
&lt;/h3&gt;

&lt;p&gt;Make spend limits and fallback policy visible and editable without rewriting workflow code.&lt;/p&gt;

&lt;p&gt;If you want a tiny starting point, even this is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; shared-brain/&lt;span class="o"&gt;{&lt;/span&gt;prompts,tools,policies&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="nb"&gt;touch &lt;/span&gt;shared-brain/memory-schema.json
&lt;span class="nb"&gt;touch &lt;/span&gt;shared-brain/routing.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then wire your runtime to read from it.&lt;/p&gt;

&lt;p&gt;That one decision will age better than most framework migrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring layer is the real product
&lt;/h2&gt;

&lt;p&gt;The cleanest mental model I’ve found is to separate three things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The brain
&lt;/h3&gt;

&lt;p&gt;Prompts, policies, workflow definitions, tool contracts, memory references.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The runtime
&lt;/h3&gt;

&lt;p&gt;OpenClaw, n8n, a Python worker, a Rust gateway, a Cloudflare Worker, whatever runs the job today.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The ops layer
&lt;/h3&gt;

&lt;p&gt;Routing, budgets, tracing, correlation, failover rules, reporting.&lt;/p&gt;

&lt;p&gt;If those are fused together, every change becomes political.&lt;/p&gt;

&lt;p&gt;Switching providers feels risky.&lt;br&gt;
Testing a second framework feels expensive.&lt;br&gt;
Debugging a bad run feels like archaeology.&lt;/p&gt;

&lt;p&gt;If those layers are separate, your system gets boring in the best possible way.&lt;/p&gt;

&lt;p&gt;And boring is exactly what you want when an agent has been running for eight hours, touched email, Telegram, browser automation, and background jobs, and now somebody wants to know why it made one weird decision at 3:14 AM.&lt;/p&gt;

&lt;p&gt;My takeaway is simple.&lt;/p&gt;

&lt;p&gt;Most teams do not need another agent framework.&lt;/p&gt;

&lt;p&gt;They need a shared config folder, explicit routing rules, and a &lt;code&gt;job_id&lt;/code&gt; that can explain what their agent did all night.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>devops</category>
      <category>automation</category>
    </item>
    <item>
      <title>I read the OpenClaw thread everyone shared — these 5 fixes cut agent costs to one-third and stopped the loops</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 20 May 2026 04:15:07 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-read-the-openclaw-thread-everyone-shared-these-5-fixes-cut-agent-costs-to-one-third-and-stopped-2792</link>
      <guid>https://dev.to/lars_winstand/i-read-the-openclaw-thread-everyone-shared-these-5-fixes-cut-agent-costs-to-one-third-and-stopped-2792</guid>
      <description>&lt;h1&gt;
  
  
  I read the OpenClaw thread everyone shared — these 5 fixes cut agent costs to one-third and stopped the loops
&lt;/h1&gt;

&lt;p&gt;I clicked into a popular r/openclaw thread expecting the usual advice: tweak the prompt, pick a smarter model, maybe add more context.&lt;/p&gt;

&lt;p&gt;Instead, the OP described the exact failure mode a lot of us hit when we move from demos to always-on agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Opus 4.6 handling cheap background work&lt;/li&gt;
&lt;li&gt;vague completion criteria&lt;/li&gt;
&lt;li&gt;retries with no hard stop&lt;/li&gt;
&lt;li&gt;state living inside prompts instead of durable storage&lt;/li&gt;
&lt;li&gt;loops burning money while doing almost nothing useful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The useful part was that this wasn’t one silver bullet. It was a stack of practical fixes.&lt;/p&gt;

&lt;p&gt;And the biggest one was brutally simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;stop sending cheap work to expensive models&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to the thread, moving heartbeat checks, cron pings, and other low-value supervision off Claude Opus cut spend to about one-third.&lt;/p&gt;

&lt;p&gt;That tracks with what I keep seeing in OpenClaw, n8n, Make, Zapier, and custom worker setups. The expensive part usually isn’t the main reasoning step. It’s the invisible scaffolding around it.&lt;/p&gt;

&lt;p&gt;If you’re building long-running agents, these 5 fixes are worth stealing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern: cost problems start as reliability problems
&lt;/h2&gt;

&lt;p&gt;Agents rarely become expensive because one prompt was huge.&lt;/p&gt;

&lt;p&gt;They become expensive because a workflow can’t confidently tell whether it succeeded.&lt;/p&gt;

&lt;p&gt;Then it retries.&lt;/p&gt;

&lt;p&gt;Then it retries again.&lt;/p&gt;

&lt;p&gt;Then it does all of that on Claude Opus 4.6.&lt;/p&gt;

&lt;p&gt;That’s how you end up paying premium-model rates for what is basically daemon maintenance.&lt;/p&gt;

&lt;p&gt;A rough version of the bad pattern looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callModel&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Check whether the job completed. If not, decide what to do next. Context: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;hugeContext&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;saysDone&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;done&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks fine in testing.&lt;/p&gt;

&lt;p&gt;It gets ugly when it runs 24/7.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 1: Stop using Claude Opus for heartbeat checks and cron pings
&lt;/h2&gt;

&lt;p&gt;This was the clearest lesson from the thread.&lt;/p&gt;

&lt;p&gt;Claude Opus 4.6 is great for hard reasoning. It is a bad choice for cheap supervision.&lt;/p&gt;

&lt;p&gt;Tasks that usually should not hit your most expensive model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heartbeat checks
n- cron-trigger validation&lt;/li&gt;
&lt;li&gt;retry bookkeeping&lt;/li&gt;
&lt;li&gt;simple routing&lt;/li&gt;
&lt;li&gt;status classification&lt;/li&gt;
&lt;li&gt;watchdog logic&lt;/li&gt;
&lt;li&gt;"did this step finish?" checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the task is basically classification or state inspection, use a cheaper layer.&lt;/p&gt;

&lt;p&gt;A cleaner architecture looks more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;routeTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;heartbeat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;lightweightCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;status_check&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;gpt54StatusCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deep_reasoning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;claudeOpusDecision&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;synthesis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;grok420Synthesis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the right mental model: model triage.&lt;/p&gt;

&lt;p&gt;Not loyalty.&lt;/p&gt;

&lt;p&gt;Not “send everything to the smartest model.”&lt;/p&gt;

&lt;p&gt;Just match cost to task difficulty.&lt;/p&gt;

&lt;h3&gt;
  
  
  My take
&lt;/h3&gt;

&lt;p&gt;The loser here is the all-Claude-Opus architecture. It feels elegant until you realize your agent is using a premium model to narrate its own retries.&lt;/p&gt;

&lt;p&gt;If a task could be implemented as a boolean check, a rules engine, or a cheap classifier, don’t wrap it in expensive reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 2: Add explicit success criteria or the agent will loop forever
&lt;/h2&gt;

&lt;p&gt;A lot of agent loops are just weak definitions of done.&lt;/p&gt;

&lt;p&gt;Bad:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“make sure the sync worked”&lt;/li&gt;
&lt;li&gt;“confirm the task completed”&lt;/li&gt;
&lt;li&gt;“retry if needed”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;file exists at expected path&lt;/li&gt;
&lt;li&gt;API returned HTTP 200&lt;/li&gt;
&lt;li&gt;row count increased by 1&lt;/li&gt;
&lt;li&gt;webhook delivered with matching job ID&lt;/li&gt;
&lt;li&gt;CRM record status changed to &lt;code&gt;processed&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thread’s OP improved reliability by making completion verifiable instead of interpretive.&lt;/p&gt;

&lt;p&gt;That’s the difference between an agent that finishes and an agent that keeps thinking out loud.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;verifyJobComplete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`https://api.example.com/jobs/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_url&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then your loop becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runStep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;verifyJobComplete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;verification_failed_after_5_attempts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s boring code.&lt;/p&gt;

&lt;p&gt;Boring is good.&lt;/p&gt;

&lt;p&gt;Boring code is cheaper than “agent intuition.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 3: Put anti-loop rules in code, not just prompts
&lt;/h2&gt;

&lt;p&gt;If your only loop prevention is “please do not retry excessively,” you do not have loop prevention.&lt;/p&gt;

&lt;p&gt;You have wishful thinking.&lt;/p&gt;

&lt;p&gt;Hard limits matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;max retries per step&lt;/li&gt;
&lt;li&gt;max retries per job&lt;/li&gt;
&lt;li&gt;cooldown windows&lt;/li&gt;
&lt;li&gt;duplicate action detection&lt;/li&gt;
&lt;li&gt;dead-letter queue for stuck runs&lt;/li&gt;
&lt;li&gt;escalation path to human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A practical pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_STEP_RETRIES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_JOB_RETRIES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;shouldRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stepRetries&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;MAX_STEP_RETRIES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;jobRetries&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;MAX_JOB_RETRIES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastError&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;invalid_input&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And log retry reasons explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jobId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"job_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sync_customer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"retry"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"webhook_timeout"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"nextAttemptInSeconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where a lot of teams get lazy. They let the model decide whether another retry “feels right.”&lt;/p&gt;

&lt;p&gt;Don’t do that.&lt;/p&gt;

&lt;p&gt;Retries are control flow. Control flow belongs in code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 4: Store state in Redis or Postgres instead of re-prompting old context
&lt;/h2&gt;

&lt;p&gt;This one matters a lot for long-running OpenClaw jobs.&lt;/p&gt;

&lt;p&gt;If an agent made a decision, store it somewhere durable.&lt;/p&gt;

&lt;p&gt;Don’t keep shoving the same history back into the prompt and hope compaction preserves the important part.&lt;/p&gt;

&lt;p&gt;That approach fails first when your workflow crosses tools.&lt;/p&gt;

&lt;p&gt;A realistic automation might look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;OpenClaw decides to start a task&lt;/li&gt;
&lt;li&gt;n8n waits for a webhook&lt;/li&gt;
&lt;li&gt;Make transforms the payload&lt;/li&gt;
&lt;li&gt;Zapier updates Salesforce or HubSpot&lt;/li&gt;
&lt;li&gt;the agent wakes up six minutes later and needs to resume&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the only memory is inside a shrinking prompt window, drift is inevitable.&lt;/p&gt;

&lt;p&gt;If the state is in Redis or Postgres, the agent can resume from facts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Redis example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Redis&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ioredis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;REDIS_URL&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;saveWorkflowState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`workflow:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;EX&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadWorkflowState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`workflow:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;jobId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Postgres example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;workflow_state&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;job_id&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;last_decision&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;retry_count&lt;/span&gt; &lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then your agent prompt can stay small and focused:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Job status: awaiting_webhook
Last decision: wait for provider callback
Retry count: 1
Next action options: [poll_status, mark_failed, continue_waiting]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s much better than pasting 4,000 tokens of historical narration back into every call.&lt;/p&gt;

&lt;h3&gt;
  
  
  My take
&lt;/h3&gt;

&lt;p&gt;A lot of teams pay premium model costs to compensate for weak state handling.&lt;/p&gt;

&lt;p&gt;That’s backwards.&lt;/p&gt;

&lt;p&gt;Better state is cheaper than better prompting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 5: Separate orchestration from reasoning
&lt;/h2&gt;

&lt;p&gt;This is the architectural version of the first four fixes.&lt;/p&gt;

&lt;p&gt;Use code for orchestration.&lt;br&gt;
Use models for reasoning.&lt;/p&gt;

&lt;p&gt;Not the other way around.&lt;/p&gt;

&lt;p&gt;Your worker should own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;scheduling&lt;/li&gt;
&lt;li&gt;idempotency&lt;/li&gt;
&lt;li&gt;state transitions&lt;/li&gt;
&lt;li&gt;timeout handling&lt;/li&gt;
&lt;li&gt;webhook correlation&lt;/li&gt;
&lt;li&gt;rate limiting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your model should own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ambiguous classification&lt;/li&gt;
&lt;li&gt;planning when rules are insufficient&lt;/li&gt;
&lt;li&gt;summarization&lt;/li&gt;
&lt;li&gt;extraction when structure is messy&lt;/li&gt;
&lt;li&gt;non-trivial decision-making&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple split:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;loadWorkflowState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;awaiting_classification&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;classifyWithGPT54&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;awaiting_complex_decision&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;decideWithClaudeOpus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;awaiting_status_check&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;pollProviderAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;awaiting_synthesis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;synthesizeWithGrok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Unknown state: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is less magical than “autonomous agent does everything.”&lt;/p&gt;

&lt;p&gt;It’s also much more reliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed after these fixes
&lt;/h2&gt;

&lt;p&gt;The thread’s reported result was the kind of improvement that actually changes workflow design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;spend dropped to about one-third&lt;/li&gt;
&lt;li&gt;loops were reduced&lt;/li&gt;
&lt;li&gt;reliability improved&lt;/li&gt;
&lt;li&gt;long-running jobs stopped losing the plot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sequence makes sense.&lt;/p&gt;

&lt;p&gt;First, move cheap recurring work off expensive models.&lt;br&gt;
Then define what success actually means.&lt;br&gt;
Then stop retries from becoming infinite.&lt;br&gt;
Then give the agent durable state.&lt;/p&gt;

&lt;p&gt;Once you do that, you stop paying for confusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical checklist
&lt;/h2&gt;

&lt;p&gt;If you’re running OpenClaw agents or similar automations, here’s the checklist I’d use:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;th&gt;What to do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model triage&lt;/td&gt;
&lt;td&gt;Keep Claude Opus 4.6 for hard reasoning. Use GPT-5.4 or cheaper logic for status checks, routing, and supervision.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verifiable completion&lt;/td&gt;
&lt;td&gt;End every important step with a testable success condition.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anti-loop controls&lt;/td&gt;
&lt;td&gt;Set max retries, cooldowns, duplicate detection, and dead-letter handling in code.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Durable state&lt;/td&gt;
&lt;td&gt;Store decisions in Redis, Postgres, or OpenClaw memory features instead of bloating prompts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration split&lt;/td&gt;
&lt;td&gt;Let code manage workflow control flow; let models handle actual reasoning.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why this matters more under per-token billing
&lt;/h2&gt;

&lt;p&gt;This is the part people notice late.&lt;/p&gt;

&lt;p&gt;Per-token pricing punishes exactly the kind of behavior serious automations need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;watchdog checks&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;polling&lt;/li&gt;
&lt;li&gt;long-running supervision&lt;/li&gt;
&lt;li&gt;cross-tool coordination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a chat app, one bad retry is annoying.&lt;/p&gt;

&lt;p&gt;In OpenClaw, n8n, Make, Zapier, or a custom queue, one bad retry pattern can run every few minutes forever.&lt;/p&gt;

&lt;p&gt;That’s why predictable pricing matters more as agents get more useful.&lt;/p&gt;

&lt;p&gt;The more background calls your system needs, the worse token anxiety gets.&lt;/p&gt;

&lt;p&gt;If you’re running agents continuously, a flat-cost API setup is often a better fit than metering every tiny supervision call. Standard Compute is interesting here because it keeps the OpenAI-compatible API shape developers already use, but swaps per-token pricing for a predictable monthly cost. That makes a lot more sense for always-on automations than staring at usage charts and hoping your watchdog logic behaves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The best part of that OpenClaw thread was that it didn’t pretend the answer was “just use a smarter model.”&lt;/p&gt;

&lt;p&gt;It was the opposite.&lt;/p&gt;

&lt;p&gt;Use Claude Opus 4.6 when the task deserves Claude Opus 4.6.&lt;br&gt;
Use GPT-5.4 for lighter decisions.&lt;br&gt;
Use Grok 4.20 when synthesis is the actual job.&lt;br&gt;
And don’t ask premium models to babysit your infrastructure.&lt;/p&gt;

&lt;p&gt;If a workflow can’t prove it finished, it will eventually loop.&lt;br&gt;
If state only lives in prompts, it will eventually drift.&lt;br&gt;
If retries are controlled by vibes, they will eventually get expensive.&lt;/p&gt;

&lt;p&gt;That’s not just an OpenClaw lesson.&lt;/p&gt;

&lt;p&gt;That’s the operating manual for any long-running AI automation.&lt;/p&gt;

&lt;p&gt;If you’re building one right now, start by auditing every model call that happens when nothing interesting is happening.&lt;/p&gt;

&lt;p&gt;That’s usually where the money is going.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>I kept seeing the same OpenClaw mistake: one expensive model for every job</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Tue, 19 May 2026 19:42:33 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-kept-seeing-the-same-openclaw-mistake-one-expensive-model-for-every-job-5fmp</link>
      <guid>https://dev.to/lars_winstand/i-kept-seeing-the-same-openclaw-mistake-one-expensive-model-for-every-job-5fmp</guid>
      <description>&lt;p&gt;I kept running into the same OpenClaw setup mistake over and over:&lt;/p&gt;

&lt;p&gt;people pick one expensive model, wire it in as the default, and then let it handle everything.&lt;/p&gt;

&lt;p&gt;Heartbeat checks.&lt;br&gt;
Cron pings.&lt;br&gt;
Inbox triage.&lt;br&gt;
"Nothing changed" loops.&lt;br&gt;
Low-stakes tagging.&lt;/p&gt;

&lt;p&gt;That is not a clever agent architecture. That is just an expensive default.&lt;/p&gt;

&lt;p&gt;While researching OpenClaw setups, I found a thread on r/openclaw where someone said the quiet part out loud:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Stop using opus for everything. seriously. i was running it on heartbeat checks and cron pings which is just lighting money on fire. glm-5.1 handles all that stuff fine. i only use sonnet 4.6 now when the task actually needs reasoning and my token costs are like a third of what they were”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the right lesson.&lt;/p&gt;

&lt;p&gt;Not just for OpenClaw.&lt;br&gt;
For n8n, Make, Zapier, custom Python workers, and basically any agent setup that runs on a schedule.&lt;/p&gt;

&lt;p&gt;If you are still using one premium model for every task, you do not have a model strategy. You have a billing strategy you forgot to review.&lt;/p&gt;
&lt;h2&gt;
  
  
  The actual takeaway: route by task, not by brand loyalty
&lt;/h2&gt;

&lt;p&gt;A lot of developers still treat model selection like a global app setting.&lt;/p&gt;

&lt;p&gt;Pick GPT-5.4.&lt;br&gt;
Or Claude Opus 4.6.&lt;br&gt;
Or Gemini 3.5 Flash.&lt;br&gt;
Done.&lt;/p&gt;

&lt;p&gt;That works for a demo.&lt;br&gt;
It falls apart in production.&lt;/p&gt;

&lt;p&gt;Real agent systems do different kinds of work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cheap classification&lt;/li&gt;
&lt;li&gt;extraction&lt;/li&gt;
&lt;li&gt;tagging&lt;/li&gt;
&lt;li&gt;summarization&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;memory maintenance&lt;/li&gt;
&lt;li&gt;occasional hard reasoning&lt;/li&gt;
&lt;li&gt;occasional high-risk decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those should not all hit the same model.&lt;/p&gt;

&lt;p&gt;The boring jobs should be cheap.&lt;br&gt;
The hard jobs should get the expensive model.&lt;br&gt;
The dangerous jobs should get the model that passes your evals.&lt;/p&gt;

&lt;p&gt;That is model routing.&lt;br&gt;
And honestly, it is just basic engineering once your workflows run all day.&lt;/p&gt;
&lt;h2&gt;
  
  
  OpenClaw already nudges you toward this
&lt;/h2&gt;

&lt;p&gt;One thing I like about OpenClaw is that the config shape already hints at the right mental model.&lt;/p&gt;

&lt;p&gt;You can define a primary model and ordered fallbacks.&lt;br&gt;
You can also split out image, PDF, and image generation models.&lt;/p&gt;

&lt;p&gt;That is not accidental.&lt;br&gt;
That is the product telling you different tasks deserve different models.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;primary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-5.4-mini&lt;/span&gt;
      &lt;span class="na"&gt;fallbacks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;anthropic/claude-sonnet-4.6&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;google/gemini-3.5-flash&lt;/span&gt;
    &lt;span class="na"&gt;imageModel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;google/gemini-3.5-flash&lt;/span&gt;
    &lt;span class="na"&gt;pdfModel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-5.4&lt;/span&gt;
    &lt;span class="na"&gt;imageGenerationModel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-image-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you look at OpenClaw this way, the Reddit advice stops sounding like a hack.&lt;br&gt;
It starts sounding like the intended operating model.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why people keep wasting frontier models on tiny jobs
&lt;/h2&gt;

&lt;p&gt;Because "agent work" sounds smarter than it usually is.&lt;/p&gt;

&lt;p&gt;A heartbeat check feels sophisticated because an agent is doing it.&lt;br&gt;
A cron-triggered inbox review feels important because it uses AI.&lt;br&gt;
But a lot of recurring automation work is just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classify this&lt;/li&gt;
&lt;li&gt;summarize that&lt;/li&gt;
&lt;li&gt;compare two notes&lt;/li&gt;
&lt;li&gt;tag a ticket&lt;/li&gt;
&lt;li&gt;decide whether anything changed&lt;/li&gt;
&lt;li&gt;move on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly where smaller, cheaper models win.&lt;/p&gt;

&lt;p&gt;One commenter in the same thread said it perfectly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“No reason to burn opus tokens on a cron check that runs every 10 minutes.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yep.&lt;/p&gt;

&lt;p&gt;If a task runs every 10 minutes, you are not choosing a model once.&lt;br&gt;
You are choosing it 144 times per day.&lt;br&gt;
Then you multiply that by every queue, retry loop, mailbox, and background task you forgot was still running.&lt;/p&gt;
&lt;h2&gt;
  
  
  The pricing spread is big enough that bad defaults compound fast
&lt;/h2&gt;

&lt;p&gt;This is where the mistake stops being theoretical.&lt;/p&gt;

&lt;p&gt;Here is the rough shape of the cost difference across common automation-friendly models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;What it means for automation work&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4&lt;/td&gt;
&lt;td&gt;$2.50 input / $15.00 output per 1M tokens; best kept for hard reasoning and high-value steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4-mini&lt;/td&gt;
&lt;td&gt;$0.75 input / $4.50 output per 1M tokens; solid default for routine transforms and summaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4-nano&lt;/td&gt;
&lt;td&gt;$0.20 input / $1.25 output per 1M tokens; strong candidate for heartbeat checks, classifiers, and cron work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.5 Flash&lt;/td&gt;
&lt;td&gt;$1.50 input / $9.00 output per 1M tokens; usable for recurring admin tasks and batch workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The exact numbers will change over time.&lt;br&gt;
The important part is the spread.&lt;/p&gt;

&lt;p&gt;If your default is a frontier model, every low-value task inherits premium pricing.&lt;br&gt;
And retries make it worse.&lt;/p&gt;
&lt;h2&gt;
  
  
  OpenClaw memory makes this even more obvious
&lt;/h2&gt;

&lt;p&gt;OpenClaw’s memory model is one of the more practical parts of the system.&lt;/p&gt;

&lt;p&gt;Instead of pretending memory is some magical hidden state, it writes durable state to files like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;MEMORY.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;memory/YYYY-MM-DD.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;optional &lt;code&gt;DREAMS.md&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means a lot of recurring "agentic" work is really just file maintenance plus lightweight judgment.&lt;/p&gt;

&lt;p&gt;Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;checking whether today’s notes contain anything worth promoting&lt;/li&gt;
&lt;li&gt;summarizing a session into &lt;code&gt;MEMORY.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;tagging daily notes&lt;/li&gt;
&lt;li&gt;triaging low-priority email&lt;/li&gt;
&lt;li&gt;deciding whether something needs escalation&lt;/li&gt;
&lt;li&gt;confirming a scheduled task completed normally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That does not automatically require Claude Opus 4.6 or GPT-5.4.&lt;/p&gt;

&lt;p&gt;If the model is reading yesterday’s note, reading today’s note, and deciding whether to append one sentence, you probably want cheap and consistent, not frontier-tier reasoning.&lt;/p&gt;
&lt;h2&gt;
  
  
  Retries and fallbacks are where expensive defaults get really dumb
&lt;/h2&gt;

&lt;p&gt;OpenClaw supports failover and fallback chains.&lt;br&gt;
That is good.&lt;br&gt;
You want that.&lt;/p&gt;

&lt;p&gt;But fallback logic changes the economics.&lt;/p&gt;

&lt;p&gt;If your default model is expensive, you do not just overpay once.&lt;br&gt;
You overpay on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the initial call&lt;/li&gt;
&lt;li&gt;the retry&lt;/li&gt;
&lt;li&gt;the fallback attempt&lt;/li&gt;
&lt;li&gt;the loop you forgot to cap&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why background jobs are dangerous.&lt;br&gt;
They are easy to ignore, and they quietly multiply usage.&lt;/p&gt;

&lt;p&gt;A cron task every 30 minutes does not feel expensive.&lt;br&gt;
A cron task every 30 minutes for weeks, with retries, definitely is.&lt;/p&gt;
&lt;h2&gt;
  
  
  A sane routing policy for OpenClaw
&lt;/h2&gt;

&lt;p&gt;This is the practical version.&lt;/p&gt;
&lt;h3&gt;
  
  
  Use smaller models for:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;heartbeat checks&lt;/li&gt;
&lt;li&gt;cron pings&lt;/li&gt;
&lt;li&gt;simple classification&lt;/li&gt;
&lt;li&gt;tagging&lt;/li&gt;
&lt;li&gt;deduping&lt;/li&gt;
&lt;li&gt;queue cleanup&lt;/li&gt;
&lt;li&gt;low-risk summarization&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Use mid-tier models for:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;routine transforms&lt;/li&gt;
&lt;li&gt;memory promotion drafts&lt;/li&gt;
&lt;li&gt;support triage with some ambiguity&lt;/li&gt;
&lt;li&gt;structured extraction with moderate complexity&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Use premium models for:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;hard reasoning&lt;/li&gt;
&lt;li&gt;ambiguous multi-step tool use&lt;/li&gt;
&lt;li&gt;sensitive customer-facing responses&lt;/li&gt;
&lt;li&gt;compliance-sensitive decisions&lt;/li&gt;
&lt;li&gt;destructive actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is a simple default stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;defaults&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;primary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-5.4-nano&lt;/span&gt;
      &lt;span class="na"&gt;fallbacks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-5.4-mini&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;anthropic/claude-sonnet-4.6&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-5.4&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then override specific tasks that actually need the heavier model.&lt;/p&gt;

&lt;p&gt;That setup is less pretty than "we use Claude for everything" or "we standardized on GPT-5.4."&lt;/p&gt;

&lt;p&gt;It is also much more competent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don’t route by price alone
&lt;/h2&gt;

&lt;p&gt;Cheap routing can absolutely backfire.&lt;/p&gt;

&lt;p&gt;A task can look simple but still be failure-sensitive.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;approving refunds&lt;/li&gt;
&lt;li&gt;sending customer-facing messages&lt;/li&gt;
&lt;li&gt;deciding whether to escalate a compliance issue&lt;/li&gt;
&lt;li&gt;triggering a destructive action in a tool chain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those should not be assigned by vibes.&lt;/p&gt;

&lt;p&gt;Two rules help a lot:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Route by consequence, not just complexity
&lt;/h3&gt;

&lt;p&gt;A simple classifier can still be dangerous.&lt;br&gt;
If the output controls money, customer trust, or irreversible actions, treat it as high-risk.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Route by evals, not marketing
&lt;/h3&gt;

&lt;p&gt;A cheaper model that fails your real prompts is not cheaper.&lt;br&gt;
It is just a slower way to ship bugs.&lt;/p&gt;

&lt;p&gt;If Gemini 3.5 Flash, GLM-5.1, GPT-5.4-nano, or Claude Sonnet 4.6 passes your actual eval set for a task, great.&lt;br&gt;
Use it.&lt;br&gt;
If it fails, move up.&lt;/p&gt;

&lt;p&gt;That is routing.&lt;br&gt;
Not ideology.&lt;/p&gt;
&lt;h2&gt;
  
  
  Quick way to audit your current setup
&lt;/h2&gt;

&lt;p&gt;If you already have OpenClaw running, here is a dead simple audit process.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. List every recurring task
&lt;/h3&gt;

&lt;p&gt;Make a table for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task name&lt;/li&gt;
&lt;li&gt;trigger frequency&lt;/li&gt;
&lt;li&gt;current model&lt;/li&gt;
&lt;li&gt;failure impact&lt;/li&gt;
&lt;li&gt;average prompt size&lt;/li&gt;
&lt;li&gt;average output size&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. Find the obviously overpriced jobs
&lt;/h3&gt;

&lt;p&gt;Look for tasks that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;frequent&lt;/li&gt;
&lt;li&gt;repetitive&lt;/li&gt;
&lt;li&gt;low-risk&lt;/li&gt;
&lt;li&gt;easy to evaluate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are your first routing wins.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Create a cheap-first policy
&lt;/h3&gt;

&lt;p&gt;Start with a small model for low-risk jobs.&lt;br&gt;
Escalate only if evals say you need to.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. Cap loops and retries
&lt;/h3&gt;

&lt;p&gt;If a job can retry forever, your pricing model is already broken.&lt;/p&gt;
&lt;h3&gt;
  
  
  5. Measure before and after
&lt;/h3&gt;

&lt;p&gt;Even a rough comparison is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# pseudo-checklist&lt;/span&gt;
&lt;span class="c"&gt;# before&lt;/span&gt;
&lt;span class="c"&gt;# - model: claude-opus-4.6&lt;/span&gt;
&lt;span class="c"&gt;# - task frequency: every 10 minutes&lt;/span&gt;
&lt;span class="c"&gt;# - retries: 2&lt;/span&gt;
&lt;span class="c"&gt;# - monthly usage: painful&lt;/span&gt;

&lt;span class="c"&gt;# after&lt;/span&gt;
&lt;span class="c"&gt;# - default: gpt-5.4-nano&lt;/span&gt;
&lt;span class="c"&gt;# - escalate only on low confidence / failed eval cases&lt;/span&gt;
&lt;span class="c"&gt;# - monthly usage: much less painful&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  If you’re building agents at scale, per-token pricing becomes a workflow problem
&lt;/h2&gt;

&lt;p&gt;This is the part people eventually learn the hard way.&lt;/p&gt;

&lt;p&gt;Per-token billing is annoying enough in interactive chat apps.&lt;br&gt;
In automations, it is worse.&lt;/p&gt;

&lt;p&gt;Because the expensive calls are often not the flashy ones.&lt;br&gt;
They are the boring background ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scheduled checks&lt;/li&gt;
&lt;li&gt;agent loops&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;nightly summaries&lt;/li&gt;
&lt;li&gt;queue maintenance&lt;/li&gt;
&lt;li&gt;memory updates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly why routing matters.&lt;br&gt;
And it is also why a flat-cost API setup is appealing for teams running lots of recurring agent work.&lt;/p&gt;

&lt;p&gt;If your agents are constantly working through n8n, Make, Zapier, OpenClaw, or custom queues, the real pain is not just token price.&lt;br&gt;
It is the constant need to babysit usage.&lt;/p&gt;

&lt;p&gt;That is the problem Standard Compute is aimed at.&lt;/p&gt;

&lt;p&gt;It gives you an OpenAI-compatible API with flat monthly pricing, so you can keep the routing mindset without getting punished every time your automations actually run.&lt;/p&gt;

&lt;p&gt;The useful combo is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route small jobs to cheaper models&lt;/li&gt;
&lt;li&gt;reserve bigger models for hard steps&lt;/li&gt;
&lt;li&gt;stop treating every cron task like it deserves frontier pricing&lt;/li&gt;
&lt;li&gt;stop watching token spend like a stress dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More here if that sounds familiar: &lt;a href="https://standardcompute.com" rel="noopener noreferrer"&gt;https://standardcompute.com&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The real tell that someone is new to agents
&lt;/h2&gt;

&lt;p&gt;They brag about which model they use.&lt;/p&gt;

&lt;p&gt;People who have actually operated automations for weeks brag about which tasks they stopped wasting expensive models on.&lt;/p&gt;

&lt;p&gt;That is why that OpenClaw thread stuck with me.&lt;br&gt;
The useful lesson was not "GLM-5.1 is secretly amazing" or "Claude Sonnet 4.6 is enough."&lt;/p&gt;

&lt;p&gt;It was the shift underneath:&lt;/p&gt;

&lt;p&gt;Your agent is a workflow, not a shrine to your favorite model.&lt;/p&gt;

&lt;p&gt;Once you see that, model routing stops looking like an optimization trick.&lt;br&gt;
It starts looking like basic competence.&lt;/p&gt;

&lt;p&gt;If a heartbeat check is hitting Claude Opus 4.6 every 10 minutes, that is not sophistication.&lt;br&gt;
It is a leak.&lt;/p&gt;

&lt;p&gt;And if your setup still uses one expensive model for everything, you probably do not need a better prompt first.&lt;/p&gt;

&lt;p&gt;You need a routing policy.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>automation</category>
      <category>devops</category>
    </item>
    <item>
      <title>Anthropic changed the rules on June 15 and exposed the biggest lie in agent pricing</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Tue, 19 May 2026 14:19:45 +0000</pubDate>
      <link>https://dev.to/lars_winstand/anthropic-changed-the-rules-on-june-15-and-exposed-the-biggest-lie-in-agent-pricing-29hf</link>
      <guid>https://dev.to/lars_winstand/anthropic-changed-the-rules-on-june-15-and-exposed-the-biggest-lie-in-agent-pricing-29hf</guid>
      <description>&lt;p&gt;Anthropic’s June 15 change clarified something a lot of people in agent land were trying not to say out loud:&lt;/p&gt;

&lt;p&gt;If your workflow depends on one provider behaving like an unlimited subscription, you do not have durable infrastructure. You have a temporary pricing loophole.&lt;/p&gt;

&lt;p&gt;I found this while going down an AI API pricing comparison rabbit hole and landing on a thread in r/openclaw:&lt;br&gt;
&lt;a href="https://reddit.com/r/openclaw/comments/1tgt1yi/anthropic_is_limiting_openclaw_again_and_honestly/" rel="noopener noreferrer"&gt;https://reddit.com/r/openclaw/comments/1tgt1yi/anthropic_is_limiting_openclaw_again_and_honestly/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At first it looked like a normal pricing complaint.&lt;/p&gt;

&lt;p&gt;It wasn’t.&lt;/p&gt;

&lt;p&gt;It was an architecture post disguised as a billing post.&lt;/p&gt;
&lt;h2&gt;
  
  
  What changed on June 15
&lt;/h2&gt;

&lt;p&gt;The short version:&lt;/p&gt;

&lt;p&gt;Programmatic Claude usage through the Agent SDK, &lt;code&gt;claude -p&lt;/code&gt;, OpenClaw, Zed, and custom scripts now sits behind a separate monthly credit pool.&lt;/p&gt;

&lt;p&gt;Those credits do not roll over.&lt;/p&gt;

&lt;p&gt;When they run out, your automation either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stops&lt;/li&gt;
&lt;li&gt;degrades&lt;/li&gt;
&lt;li&gt;or falls back to standard API billing if you explicitly allow it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds like a billing detail until you picture it happening mid-run.&lt;/p&gt;

&lt;p&gt;Your OpenClaw loop is halfway through a browser task.&lt;br&gt;
Your retry worker is still active.&lt;br&gt;
Your background monitor is polling.&lt;br&gt;
Then the credits hit zero.&lt;/p&gt;

&lt;p&gt;That is not a pricing annoyance.&lt;br&gt;
That is a production failure mode.&lt;/p&gt;
&lt;h2&gt;
  
  
  The real problem is not Anthropic
&lt;/h2&gt;

&lt;p&gt;Anthropic is just the latest company to remind developers that heavy programmatic usage and consumer-style subscriptions were never the same thing.&lt;/p&gt;

&lt;p&gt;One commenter in that thread put it perfectly:&lt;/p&gt;

&lt;p&gt;“This is a subsidized race to market share and lock-in. Take advantage of the competitive dynamics all you can...”&lt;/p&gt;

&lt;p&gt;That’s the whole story.&lt;/p&gt;

&lt;p&gt;A lot of agent stacks were built during a weird honeymoon period where AI pricing felt soft, generous, and kind of fake. Bundles were vague. Limits were fuzzy. Everybody acted like heavy automation could live forever inside a subscription-shaped box.&lt;/p&gt;

&lt;p&gt;But serious workloads always run into quota math.&lt;/p&gt;

&lt;p&gt;If you have ever seen one of these, you already know the pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;openai api quota exceeded&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;rate limit spikes&lt;/li&gt;
&lt;li&gt;token-per-minute caps&lt;/li&gt;
&lt;li&gt;request-per-minute caps&lt;/li&gt;
&lt;li&gt;org-level usage limits&lt;/li&gt;
&lt;li&gt;acceleration limits after sudden traffic bursts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is normal API behavior.&lt;/p&gt;

&lt;p&gt;Not villain behavior.&lt;/p&gt;

&lt;p&gt;Normal.&lt;/p&gt;

&lt;p&gt;And that makes the lesson more uncomfortable:&lt;/p&gt;

&lt;p&gt;If one provider policy change can freeze your agent stack, you never really had an architecture.&lt;br&gt;
You had a discount.&lt;/p&gt;
&lt;h2&gt;
  
  
  What breaks first when credits run out
&lt;/h2&gt;

&lt;p&gt;Not the cool demo.&lt;/p&gt;

&lt;p&gt;The boring glue.&lt;/p&gt;

&lt;p&gt;That’s what makes agent systems dangerous to operate. They usually fail in the background jobs you forgot existed.&lt;/p&gt;
&lt;h2&gt;
  
  
  The silent token burners
&lt;/h2&gt;

&lt;p&gt;While reading more OpenClaw discussions, I found another useful thread:&lt;br&gt;
&lt;a href="https://reddit.com/r/openclaw/comments/1thlo6s/stuff_i_figured_out_after_3_weeks_with_openclaw/" rel="noopener noreferrer"&gt;https://reddit.com/r/openclaw/comments/1thlo6s/stuff_i_figured_out_after_3_weeks_with_openclaw/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One user admitted they burned through tokens in week one for a dumb reason:&lt;/p&gt;

&lt;p&gt;They were using premium models for junk work.&lt;/p&gt;

&lt;p&gt;Their fix was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stop using Claude Opus for heartbeat checks&lt;/li&gt;
&lt;li&gt;stop using expensive models for cron pings&lt;/li&gt;
&lt;li&gt;move routine tasks to cheaper models&lt;/li&gt;
&lt;li&gt;keep premium models only for tasks that actually need reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They switched routine work to GLM-5.1, kept Claude Sonnet 4.6 for real reasoning, and said cost dropped to roughly one-third.&lt;/p&gt;

&lt;p&gt;That is not a micro-optimization.&lt;/p&gt;

&lt;p&gt;That is a different architecture.&lt;/p&gt;

&lt;p&gt;And once you see it, you can’t unsee it.&lt;/p&gt;

&lt;p&gt;A huge amount of agent spend comes from jobs that absolutely do not need Claude Opus, GPT-5, or any premium reasoning model.&lt;/p&gt;

&lt;p&gt;Typical waste buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;browser loops&lt;/li&gt;
&lt;li&gt;screenshot checks&lt;/li&gt;
&lt;li&gt;wait/retry cycles&lt;/li&gt;
&lt;li&gt;health checks&lt;/li&gt;
&lt;li&gt;cron-triggered pings&lt;/li&gt;
&lt;li&gt;simple extraction&lt;/li&gt;
&lt;li&gt;low-stakes classification&lt;/li&gt;
&lt;li&gt;summarizing already structured data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That work can often go to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a cheaper cloud model&lt;/li&gt;
&lt;li&gt;a local Gemma model&lt;/li&gt;
&lt;li&gt;a Qwen variant&lt;/li&gt;
&lt;li&gt;a Llama variant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If quality allows it, keep expensive reasoning models focused on work that deserves them.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why are people still building single-provider stacks?
&lt;/h2&gt;

&lt;p&gt;Because it’s easy.&lt;/p&gt;

&lt;p&gt;A single-provider stack is clean right up until it isn’t.&lt;/p&gt;

&lt;p&gt;It feels simple in the same way plugging your whole desk into one cheap power strip feels simple.&lt;/p&gt;

&lt;p&gt;Then one thing fails and everything goes dark.&lt;/p&gt;

&lt;p&gt;Here’s the actual tradeoff:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stack design&lt;/th&gt;
&lt;th&gt;What you get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-provider subscription stack&lt;/td&gt;
&lt;td&gt;Lowest setup complexity, but high exposure to quota changes, policy shifts, and ugly cost surprises once programmatic usage gets segmented&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-provider routed stack&lt;/td&gt;
&lt;td&gt;Better resilience, better cost control, and easier failover, but more operational complexity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local + cloud hybrid stack&lt;/td&gt;
&lt;td&gt;Better control and cheap handling for repetitive work, with cloud reserved for hard tasks, but requires local hardware and model-quality tradeoffs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The single-provider version feels great until the provider changes one rule.&lt;br&gt;
Then it fails all at once.&lt;/p&gt;
&lt;h2&gt;
  
  
  Routing is not optional anymore
&lt;/h2&gt;

&lt;p&gt;This is the part that should be boring by now.&lt;/p&gt;

&lt;p&gt;Provider routing and failover are not advanced features anymore. They are table stakes.&lt;/p&gt;
&lt;h3&gt;
  
  
  OpenRouter example
&lt;/h3&gt;

&lt;p&gt;OpenRouter already supports provider order, fallback behavior, and sorting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai/gpt-4.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ping"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"allow_fallbacks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sort"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"price"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  LiteLLM router example
&lt;/h3&gt;

&lt;p&gt;LiteLLM gives you router-level fallbacks and load balancing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;litellm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Router&lt;/span&gt;

&lt;span class="n"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model_list&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;litellm_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;azure/&amp;lt;your-deployment-name&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-azure-endpoint&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-azure-api-key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rpm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;litellm_params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;azure/gpt-4-ca&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://my-endpoint-canada-berri992.openai.azure.com/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-azure-api-key&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rpm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="n"&gt;fallbacks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-3.5-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  OpenAI-compatible HTTP matters more than people think
&lt;/h3&gt;

&lt;p&gt;This is especially true if you are using automation tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n&lt;/li&gt;
&lt;li&gt;Make&lt;/li&gt;
&lt;li&gt;Zapier&lt;/li&gt;
&lt;li&gt;OpenClaw&lt;/li&gt;
&lt;li&gt;custom internal workers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of these systems already assume an OpenAI-style API shape.&lt;/p&gt;

&lt;p&gt;That means backend swaps get much easier if your app speaks OpenAI-compatible HTTP instead of a provider-specific SDK with weird assumptions baked in.&lt;/p&gt;

&lt;p&gt;This is one reason Standard Compute is interesting for automation-heavy teams: it works as a drop-in OpenAI API replacement, so existing SDKs and HTTP clients usually need minimal change.&lt;/p&gt;

&lt;p&gt;That matters when your real problem is not prompt quality.&lt;br&gt;
It’s keeping agents running without rewriting your whole integration every time pricing or quotas move.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical way to structure an agent stack
&lt;/h2&gt;

&lt;p&gt;This is the version I think makes sense for most teams.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Split models by job
&lt;/h3&gt;

&lt;p&gt;Use premium models like Claude Sonnet 4.6 or GPT-5 for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hard reasoning&lt;/li&gt;
&lt;li&gt;coding&lt;/li&gt;
&lt;li&gt;planning&lt;/li&gt;
&lt;li&gt;ambiguous decision-making&lt;/li&gt;
&lt;li&gt;complex tool selection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use cheaper models for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heartbeat checks&lt;/li&gt;
&lt;li&gt;cron pings&lt;/li&gt;
&lt;li&gt;extraction&lt;/li&gt;
&lt;li&gt;classification&lt;/li&gt;
&lt;li&gt;browser state verification&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;summarization of structured outputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good rule:&lt;/p&gt;

&lt;p&gt;If assigning the task to Claude Opus would feel embarrassing in a design review, don’t assign it to Claude Opus.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Add failover before you need it
&lt;/h2&gt;

&lt;p&gt;If Anthropic is your preferred provider, fine.&lt;br&gt;
If OpenAI is your preferred provider, also fine.&lt;/p&gt;

&lt;p&gt;Just do not make preferred mean only.&lt;/p&gt;

&lt;p&gt;If you are using OpenClaw, basic operational visibility helps a lot when debugging provider issues versus your own code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status
openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
openclaw status &lt;span class="nt"&gt;--deep&lt;/span&gt;
openclaw gateway status
openclaw logs &lt;span class="nt"&gt;--follow&lt;/span&gt;
openclaw doctor
openclaw health &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;p&gt;One provider should be able to degrade without taking your whole workflow down.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Keep repetitive work cheap
&lt;/h2&gt;

&lt;p&gt;A lot of teams still do the opposite.&lt;/p&gt;

&lt;p&gt;They spend premium tokens on repetitive mechanical work, then act surprised when costs explode or usage caps show up.&lt;/p&gt;

&lt;p&gt;Cheap work should stay cheap.&lt;/p&gt;

&lt;p&gt;That can mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;smaller hosted models&lt;/li&gt;
&lt;li&gt;local inference for repetitive tasks&lt;/li&gt;
&lt;li&gt;aggressive routing rules&lt;/li&gt;
&lt;li&gt;batching where possible&lt;/li&gt;
&lt;li&gt;throttling background jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also where a flat-rate system can be useful.&lt;/p&gt;

&lt;p&gt;Standard Compute routes across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20 behind an OpenAI-compatible API, with batching and adaptive throttling built in. For teams running automations all day, the appeal is obvious: predictable monthly cost instead of constant per-token monitoring.&lt;/p&gt;

&lt;p&gt;That’s not magic.&lt;br&gt;
It’s just a better fit for people running agents like infrastructure instead of using AI like a chat app.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Keep some work local if it’s repetitive enough
&lt;/h2&gt;

&lt;p&gt;One OpenClaw user described a setup with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a Mac mini M4 Pro&lt;/li&gt;
&lt;li&gt;a remote PC with 2 RTX 5090s&lt;/li&gt;
&lt;li&gt;local Gemma models&lt;/li&gt;
&lt;li&gt;a GPT subscription&lt;/li&gt;
&lt;li&gt;additional machines for research workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds extreme until you realize what they were actually doing:&lt;/p&gt;

&lt;p&gt;They stopped assuming one vendor and one billing model should carry every workload.&lt;/p&gt;

&lt;p&gt;That is the adult version of agent infrastructure.&lt;/p&gt;

&lt;p&gt;Hybrid stacks are messy, but they are honest.&lt;/p&gt;
&lt;h2&gt;
  
  
  A simple resilience pattern
&lt;/h2&gt;

&lt;p&gt;If you want something concrete, this is a decent baseline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TASK_MODEL_MAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;heartbeat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cheap-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;classification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cheap-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;browser_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cheap-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mid-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;premium-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;premium-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pick_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TASK_MODEL_MAP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mid-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add provider fallback at the transport layer.&lt;/p&gt;

&lt;p&gt;Pseudo-flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Route cheap tasks to low-cost model
2. Route hard tasks to premium model
3. If provider A fails, retry on provider B
4. If premium path is unavailable, decide whether to degrade or queue
5. Log usage by task type so you know what is burning budget
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not glamorous.&lt;/p&gt;

&lt;p&gt;It is also the difference between a toy agent stack and a production one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real AI API pricing comparison is not token price
&lt;/h2&gt;

&lt;p&gt;This is the trap developers keep falling into.&lt;/p&gt;

&lt;p&gt;They compare token rates and stop there.&lt;/p&gt;

&lt;p&gt;Example questions people ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is Claude Sonnet API pricing?&lt;/li&gt;
&lt;li&gt;Is GPT-5 cheaper than Claude Opus?&lt;/li&gt;
&lt;li&gt;Which model has the lowest cost per million tokens?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are fine questions.&lt;/p&gt;

&lt;p&gt;They are not the important questions.&lt;/p&gt;

&lt;p&gt;The important questions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What happens when one provider changes policy?&lt;/li&gt;
&lt;li&gt;What happens when rate limits tighten after a usage spike?&lt;/li&gt;
&lt;li&gt;What happens when your agent burns premium tokens on low-value loops?&lt;/li&gt;
&lt;li&gt;What happens when your automation assumes one provider will always behave the same way?&lt;/li&gt;
&lt;li&gt;What happens on Sunday night when billing assumptions change before your code does?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer is “everything stops,” then price per token is not your core problem.&lt;/p&gt;

&lt;p&gt;Your architecture is.&lt;/p&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;The June 15 change did not prove Anthropic is evil.&lt;/p&gt;

&lt;p&gt;It proved something more useful:&lt;/p&gt;

&lt;p&gt;Serious agent workloads need to be designed like infrastructure, not treated like a generous app subscription.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route across providers&lt;/li&gt;
&lt;li&gt;tier models by task&lt;/li&gt;
&lt;li&gt;keep cheap work cheap&lt;/li&gt;
&lt;li&gt;use OpenAI-compatible interfaces where possible&lt;/li&gt;
&lt;li&gt;assume quotas and pricing will change&lt;/li&gt;
&lt;li&gt;build for failover before you need it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the default now.&lt;br&gt;
Not the edge case.&lt;/p&gt;

&lt;p&gt;If you are running agents in n8n, Make, Zapier, OpenClaw, or custom internal workflows, this is the mindset shift that matters most.&lt;/p&gt;

&lt;p&gt;The winning setup is not the one with the prettiest model demo.&lt;/p&gt;

&lt;p&gt;It is the one that keeps working when your favorite vendor says no.&lt;/p&gt;

&lt;p&gt;If you want the simplest version of that idea, Standard Compute is worth a look:&lt;br&gt;
&lt;a href="https://standardcompute.com" rel="noopener noreferrer"&gt;https://standardcompute.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Flat monthly pricing, OpenAI-compatible API, and no per-token anxiety is a much better fit for always-on automations than pretending a chat-style subscription was infrastructure all along.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>api</category>
      <category>devops</category>
    </item>
    <item>
      <title>I thought my bot needed a better prompt, but TikTok’s 20 RPM limit and WhatsApp’s service window were the real problem</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Tue, 19 May 2026 11:48:21 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-thought-my-bot-needed-a-better-prompt-but-tiktoks-20-rpm-limit-and-whatsapps-service-window-m22</link>
      <guid>https://dev.to/lars_winstand/i-thought-my-bot-needed-a-better-prompt-but-tiktoks-20-rpm-limit-and-whatsapps-service-window-m22</guid>
      <description>&lt;p&gt;I kept seeing the same failure mode in social reply bots:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;someone wires up GPT-5 or Claude&lt;/li&gt;
&lt;li&gt;the first few demos look great&lt;/li&gt;
&lt;li&gt;then production gets weird&lt;/li&gt;
&lt;li&gt;and everybody blames the prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I thought that too.&lt;/p&gt;

&lt;p&gt;Then I read a thread on &lt;a href="https://reddit.com/r/openclaw/comments/1th15qf/can_i_safely_build_an_openclawbased_bot_that/" rel="noopener noreferrer"&gt;r/openclaw&lt;/a&gt; asking the exact question a lot of teams want answered:&lt;/p&gt;

&lt;p&gt;Can you build one OpenClaw-based bot that safely auto-replies across TikTok, Instagram, Facebook, and WhatsApp without getting banned?&lt;/p&gt;

&lt;p&gt;The most honest reply in that thread was also the most useful:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"It's impossible to build this fully ban proof. Social media sites are getting stricter every day around bots and automated tools."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the real starting point.&lt;/p&gt;

&lt;p&gt;The bottleneck usually isn’t text generation anymore. It’s orchestration.&lt;/p&gt;

&lt;p&gt;If you’re building multi-agent systems for comments, DMs, and support flows, the hard part is not getting a model to write a decent reply. The hard part is deciding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;should this message be sent automatically?&lt;/li&gt;
&lt;li&gt;should it become a draft?&lt;/li&gt;
&lt;li&gt;should a human review it?&lt;/li&gt;
&lt;li&gt;are we even allowed to send it on this channel right now?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not prompt engineering. That’s control-plane engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model is rarely the thing breaking first
&lt;/h2&gt;

&lt;p&gt;For social auto-replies, text generation is mostly solved.&lt;/p&gt;

&lt;p&gt;GPT-5 can write a refund response.&lt;br&gt;
Claude can draft a cautious moderation reply.&lt;br&gt;
Smaller models can classify FAQ vs escalation pretty well.&lt;/p&gt;

&lt;p&gt;If your only problem were wording, this category would already be boring.&lt;/p&gt;

&lt;p&gt;But Instagram, WhatsApp, Facebook, and TikTok do not behave like one unified inbox. They behave like four different policy environments with different APIs, different rate limits, different review requirements, and different definitions of acceptable automation.&lt;/p&gt;

&lt;p&gt;That means your architecture matters more than your prompt.&lt;/p&gt;
&lt;h2&gt;
  
  
  What actually breaks first
&lt;/h2&gt;

&lt;p&gt;Usually:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;unofficial integrations&lt;/li&gt;
&lt;li&gt;polling-heavy designs&lt;/li&gt;
&lt;li&gt;missing review gates&lt;/li&gt;
&lt;li&gt;channel-specific policy mismatches&lt;/li&gt;
&lt;li&gt;rate limits nobody modeled up front&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Another comment in that same &lt;a href="https://reddit.com/r/openclaw/comments/1th15qf/can_i_safely_build_an_openclawbased_bot_that/" rel="noopener noreferrer"&gt;r/openclaw thread&lt;/a&gt; said it even more bluntly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"If the site does have an official API and you have proper access to it (not resold by third party) then you are generally okay. But for sites that don't (which is most social media sites) you are risking a ban"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That lines up with the docs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Channel&lt;/th&gt;
&lt;th&gt;What you actually get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Instagram Platform&lt;/td&gt;
&lt;td&gt;Official comment reply/moderation endpoints, webhook support, Meta app permissions and access requirements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WhatsApp Business Platform&lt;/td&gt;
&lt;td&gt;Service-window rules, template-message requirements outside allowed windows, business policy enforcement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TikTok Content Posting API&lt;/td&gt;
&lt;td&gt;Strong publishing workflow docs, explicit rate limits like 20 requests/minute on creator info query, but a much narrower official surface than most teams assume&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That table is the whole story.&lt;/p&gt;

&lt;p&gt;A lot of teams say they want “one bot for every channel.”&lt;/p&gt;

&lt;p&gt;What they actually want is one abstraction over four incompatible rule systems.&lt;/p&gt;
&lt;h2&gt;
  
  
  OpenClaw helps, but it does not make the channel risk disappear
&lt;/h2&gt;

&lt;p&gt;I like OpenClaw for this kind of stack because it gives you orchestration primitives instead of pretending the internet is uniform.&lt;/p&gt;

&lt;p&gt;OpenClaw gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hooks&lt;/li&gt;
&lt;li&gt;background tasks&lt;/li&gt;
&lt;li&gt;scheduled jobs&lt;/li&gt;
&lt;li&gt;standing orders&lt;/li&gt;
&lt;li&gt;multi-agent routing&lt;/li&gt;
&lt;li&gt;gateway support for channels like WhatsApp via Baileys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is useful.&lt;/p&gt;

&lt;p&gt;But useful is not the same as safe.&lt;/p&gt;

&lt;p&gt;If you run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw onboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;you are not getting a ban-proof social automation layer.&lt;/p&gt;

&lt;p&gt;You are getting the pieces to build your own supervisor.&lt;/p&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;The default local gateway dashboard and canvas ports in the docs make that clear too. This is orchestration software, not magic social middleware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gateway dashboard: http://127.0.0.1:18789/
Canvas host:       http://127.0.0.1:18793/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your architecture needs routing, state, retries, review queues, and compliance rules, OpenClaw is useful.&lt;br&gt;
If your architecture depends on a platform tolerating unofficial automation forever, OpenClaw cannot save that.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Instagram is the easiest place to start
&lt;/h2&gt;

&lt;p&gt;Instagram is one of the few channels where the official path is clear enough to build something real.&lt;/p&gt;

&lt;p&gt;Example comment fetch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; GET &lt;span class="s2"&gt;"https://&amp;lt;HOST_URL&amp;gt;/v25.0/&amp;lt;IG_MEDIA_ID&amp;gt;/comments"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example reply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://&amp;lt;HOST_URL&amp;gt;/v25.0/&amp;lt;IG_COMMENT_ID&amp;gt;/replies"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters a lot.&lt;/p&gt;

&lt;p&gt;Meta also recommends webhooks over polling to reduce rate-limit pressure.&lt;/p&gt;

&lt;p&gt;That tells you what a production architecture should look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;event-driven intake&lt;/li&gt;
&lt;li&gt;low polling footprint&lt;/li&gt;
&lt;li&gt;explicit permissions&lt;/li&gt;
&lt;li&gt;narrow automation scope&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A practical Instagram routing pattern
&lt;/h3&gt;

&lt;p&gt;For Instagram comments, this is the split I’d actually ship:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-send: basic FAQ, store hours, shipping ETA, product availability&lt;/li&gt;
&lt;li&gt;Draft for review: refunds, damaged orders, account issues, edge-case complaints&lt;/li&gt;
&lt;li&gt;Human-only: harassment, legal threats, medical claims, regulated products, PR-risk situations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the useful role for an LLM here.&lt;/p&gt;

&lt;p&gt;The model writes the draft.&lt;br&gt;
The supervisor decides whether the draft is allowed to exist.&lt;/p&gt;
&lt;h2&gt;
  
  
  WhatsApp is where timing beats prompting
&lt;/h2&gt;

&lt;p&gt;WhatsApp Business is where teams finally realize the model is not in charge.&lt;/p&gt;

&lt;p&gt;The same user intent may require different outbound behavior depending on whether you are inside or outside the customer service window.&lt;/p&gt;

&lt;p&gt;Inside the window, you may be able to send a free-form message.&lt;br&gt;
Outside it, you typically need an approved template.&lt;/p&gt;

&lt;p&gt;So before the model output matters, your system needs to answer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Are we inside the service window?&lt;/li&gt;
&lt;li&gt;If not, do we have an approved template for this intent?&lt;/li&gt;
&lt;li&gt;If not, should we wait for a human?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is routing logic.&lt;/p&gt;

&lt;p&gt;Not prompt logic.&lt;/p&gt;

&lt;p&gt;You can have the best prompt in the world and still be blocked by policy.&lt;/p&gt;
&lt;h2&gt;
  
  
  TikTok is where the “one bot for everything” idea usually falls apart
&lt;/h2&gt;

&lt;p&gt;TikTok is the channel people tend to over-assume.&lt;/p&gt;

&lt;p&gt;Because TikTok is huge, teams assume there must be a broad official API for full comment and DM automation.&lt;/p&gt;

&lt;p&gt;But the public docs are much narrower than that fantasy.&lt;/p&gt;

&lt;p&gt;The Content Posting API is real and useful. It documents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;publishing workflows&lt;/li&gt;
&lt;li&gt;creator capability checks&lt;/li&gt;
&lt;li&gt;media transfer rules&lt;/li&gt;
&lt;li&gt;rate limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, the &lt;code&gt;/v2/post/publish/creator_info/query/&lt;/code&gt; endpoint is limited to 20 requests per minute per user access token.&lt;/p&gt;

&lt;p&gt;That is a real production constraint.&lt;/p&gt;

&lt;p&gt;If you’re polling capability or eligibility too aggressively across many accounts, you’ll hit it.&lt;/p&gt;

&lt;p&gt;A simple guard might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenBucket&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;lastRefill&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nx"&gt;refillPerMinute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastRefill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;elapsedMinutes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastRefill&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;60000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;refill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;elapsedMinutes&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;refillPerMinute&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;refill&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastRefill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="nx"&gt;count&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;creatorInfoLimiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TokenBucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;creatorInfoLimiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TikTok creator_info/query rate limit reached&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That handles one narrow API behavior.&lt;/p&gt;

&lt;p&gt;It does not magically give you broad, safe, official automation for every TikTok interaction surface.&lt;/p&gt;

&lt;p&gt;That’s the gap teams keep underestimating.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to build instead: a supervisor, not a universal auto-replier
&lt;/h2&gt;

&lt;p&gt;This is the architecture I trust most.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Channel-specific event intake
&lt;/h3&gt;

&lt;p&gt;Use the official event surface where it exists.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instagram: webhooks&lt;/li&gt;
&lt;li&gt;WhatsApp Business API events&lt;/li&gt;
&lt;li&gt;TikTok: only the approved official endpoints and event flows you actually have access to&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Policy checks before generation
&lt;/h3&gt;

&lt;p&gt;Before asking any model to write, check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;permissions&lt;/li&gt;
&lt;li&gt;message window rules&lt;/li&gt;
&lt;li&gt;endpoint availability&lt;/li&gt;
&lt;li&gt;quotas&lt;/li&gt;
&lt;li&gt;account state&lt;/li&gt;
&lt;li&gt;category risk&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Supervisor classification
&lt;/h3&gt;

&lt;p&gt;Every inbound message should land in one of three buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;auto_send&lt;/li&gt;
&lt;li&gt;draft_review&lt;/li&gt;
&lt;li&gt;human_only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auto_send&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;draft_review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;human_only&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;instagram&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;whatsapp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tiktok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;facebook&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;riskScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;insideServiceWindow&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;hasApprovedTemplate&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}):&lt;/span&gt; &lt;span class="nx"&gt;Decision&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;riskScore&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;human_only&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;whatsapp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;insideServiceWindow&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hasApprovedTemplate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;human_only&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;refund&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;legal&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;medical&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;account_access&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;draft_review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auto_send&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Background enrichment
&lt;/h3&gt;

&lt;p&gt;Run the expensive stuff asynchronously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CRM lookup&lt;/li&gt;
&lt;li&gt;prior conversation summary&lt;/li&gt;
&lt;li&gt;abuse detection&lt;/li&gt;
&lt;li&gt;urgency classification&lt;/li&gt;
&lt;li&gt;draft generation with GPT-5, Claude, Qwen, or another model&lt;/li&gt;
&lt;li&gt;dedupe against recent replies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Compliance rules before outbound send
&lt;/h3&gt;

&lt;p&gt;Before the final send, enforce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prohibited-claim checks&lt;/li&gt;
&lt;li&gt;burst throttling&lt;/li&gt;
&lt;li&gt;retry limits&lt;/li&gt;
&lt;li&gt;escalation on repeated failures&lt;/li&gt;
&lt;li&gt;channel-specific formatting and policy rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where orchestration frameworks earn their keep.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this gets expensive fast under per-token pricing
&lt;/h2&gt;

&lt;p&gt;A lot of teams estimate cost based on the visible reply.&lt;/p&gt;

&lt;p&gt;That’s the wrong unit.&lt;/p&gt;

&lt;p&gt;The expensive part in agentic automation is all the invisible work around the reply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification&lt;/li&gt;
&lt;li&gt;moderation&lt;/li&gt;
&lt;li&gt;summarization&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;fallback generation&lt;/li&gt;
&lt;li&gt;audit logging&lt;/li&gt;
&lt;li&gt;routing decisions&lt;/li&gt;
&lt;li&gt;review queue handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single customer message can trigger multiple model calls before anything gets sent.&lt;/p&gt;

&lt;p&gt;That’s why flat monthly compute is a much better fit for orchestration-heavy systems than per-token billing.&lt;/p&gt;

&lt;p&gt;If you’re building agents that run all day inside n8n, Make, Zapier, OpenClaw, or custom workers, per-token pricing punishes exactly the safety layers you should be adding.&lt;/p&gt;

&lt;p&gt;That’s the pitch for &lt;a href="https://standardcompute.com" rel="noopener noreferrer"&gt;Standard Compute&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unlimited AI compute&lt;/li&gt;
&lt;li&gt;flat monthly pricing&lt;/li&gt;
&lt;li&gt;OpenAI-compatible API&lt;/li&gt;
&lt;li&gt;works with existing SDKs and HTTP clients&lt;/li&gt;
&lt;li&gt;useful for agent workflows that make lots of small routing and safety calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your stack is doing classification, moderation, drafting, retries, and escalation all day, predictable cost is a real architecture advantage, not just a pricing preference.&lt;/p&gt;

&lt;h2&gt;
  
  
  A minimal implementation sketch
&lt;/h2&gt;

&lt;p&gt;If I were building this today, I would start with something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleInboundEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;normalizeEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runPolicyChecks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;allowed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;queueForHuman&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;riskScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;riskScore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;insideServiceWindow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;insideServiceWindow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;hasApprovedTemplate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hasApprovedTemplate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;human_only&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;queueForHuman&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;policy_or_risk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;draft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateDraft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;draft_review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;queueDraft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;finalComplianceCheck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sendReply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That flow is boring.&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;Boring is what you want when real accounts, rate limits, and platform policy are involved.&lt;/p&gt;

&lt;h2&gt;
  
  
  When you do not need the full multi-agent circus
&lt;/h2&gt;

&lt;p&gt;To be fair, not every team needs a giant supervisor stack.&lt;/p&gt;

&lt;p&gt;If most of your volume lives inside Meta, and your use case is narrow, a simpler rules-based system may be enough.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instagram comments only&lt;/li&gt;
&lt;li&gt;WhatsApp support only&lt;/li&gt;
&lt;li&gt;clear escalation categories&lt;/li&gt;
&lt;li&gt;small review queue&lt;/li&gt;
&lt;li&gt;no promise of “reply to everything automatically”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That can work.&lt;/p&gt;

&lt;p&gt;But the moment the requirements become:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TikTok too&lt;/li&gt;
&lt;li&gt;full DM automation&lt;/li&gt;
&lt;li&gt;auto-reply everywhere&lt;/li&gt;
&lt;li&gt;ban-proof behavior&lt;/li&gt;
&lt;li&gt;no human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;you are not building a bot anymore.&lt;/p&gt;

&lt;p&gt;You are building a governor.&lt;/p&gt;

&lt;p&gt;And you should scope it like one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical checklist before you touch the prompt again
&lt;/h2&gt;

&lt;p&gt;If your social reply bot is unstable, answer these first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which channels have official reply APIs for my exact use case?&lt;/li&gt;
&lt;li&gt;Which channels expect webhooks instead of polling?&lt;/li&gt;
&lt;li&gt;Where do messaging windows change what I’m allowed to send?&lt;/li&gt;
&lt;li&gt;What are the hard rate limits per endpoint and token?&lt;/li&gt;
&lt;li&gt;Which messages should never be auto-sent?&lt;/li&gt;
&lt;li&gt;What is my fallback when the official API surface is narrower than the product requirement?&lt;/li&gt;
&lt;li&gt;What gets human review by default?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot answer those, you do not have a prompt problem.&lt;/p&gt;

&lt;p&gt;You have an orchestration problem pretending to be a model problem.&lt;/p&gt;

&lt;p&gt;And honestly, that’s good news.&lt;/p&gt;

&lt;p&gt;Because orchestration is fixable.&lt;/p&gt;

&lt;p&gt;Usually the answer is not “find a smarter model.”&lt;br&gt;
Usually the answer is “build a stricter supervisor.”&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>api</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
