<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lars Winstand</title>
    <description>The latest articles on DEV Community by Lars Winstand (@lars_winstand).</description>
    <link>https://dev.to/lars_winstand</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908932%2Feb8bc1ff-405f-4ef0-8204-ba1ed7caa59f.jpeg</url>
      <title>DEV Community: Lars Winstand</title>
      <link>https://dev.to/lars_winstand</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lars_winstand"/>
    <language>en</language>
    <item>
      <title>I finally understand why OpenClaw keeps missing daily journals</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Sun, 28 Jun 2026 12:55:57 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-finally-understand-why-openclaw-keeps-missing-daily-journals-2e40</link>
      <guid>https://dev.to/lars_winstand/i-finally-understand-why-openclaw-keeps-missing-daily-journals-2e40</guid>
      <description>&lt;p&gt;A thread on r/openclaw about daily journaling got way more interesting than the title suggests.&lt;/p&gt;

&lt;p&gt;The setup was very real-world:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw 2026.6.8&lt;/li&gt;
&lt;li&gt;Linux mini PC&lt;/li&gt;
&lt;li&gt;Telegram front end&lt;/li&gt;
&lt;li&gt;single agent&lt;/li&gt;
&lt;li&gt;DeepSeek via OpenRouter&lt;/li&gt;
&lt;li&gt;Obsidian daily notes at &lt;code&gt;daily-notes/YYYY/MM/YYYY-MM-DD.md&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal was not “write me a cute diary.”&lt;/p&gt;

&lt;p&gt;It was: keep a durable, timestamped recovery log throughout the day so the agent can survive compaction, resets, and general session weirdness.&lt;/p&gt;

&lt;p&gt;That changes the problem completely.&lt;/p&gt;

&lt;p&gt;This is not a prompt problem.&lt;br&gt;
It’s a reliability problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  The mistake: treating heartbeat like a journaling daemon
&lt;/h2&gt;

&lt;p&gt;A lot of people reach for OpenClaw heartbeat and assume it should handle daily journaling.&lt;/p&gt;

&lt;p&gt;That sounds reasonable until you look at what heartbeat actually is.&lt;/p&gt;

&lt;p&gt;OpenClaw’s docs describe heartbeat as a scheduled turn in the main session. That means it can be context-aware. But it is not a durable background task system, and it is definitely not a guaranteed append-to-file service.&lt;/p&gt;

&lt;p&gt;The default config tells the story pretty clearly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"heartbeat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"every"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"30m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"directPolicy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"lightContext"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"isolatedSession"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"skipWhenBusy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few of these defaults are brutal if your goal is reliable journaling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;every: "30m"&lt;/code&gt; is periodic, not exact&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;isolatedSession: true&lt;/code&gt; means each run can start fresh&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lightContext: true&lt;/code&gt; means limited context&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;skipWhenBusy: true&lt;/code&gt; means it may not run at all&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a decent design for lightweight check-ins.&lt;/p&gt;

&lt;p&gt;It is a bad design for “append a trusted log entry every time.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The real tradeoff: continuity vs reliability
&lt;/h2&gt;

&lt;p&gt;This is the part people keep running into.&lt;/p&gt;

&lt;p&gt;If heartbeat runs in the main session, OpenClaw has continuity. It often knows what just happened and can write a better journal entry.&lt;/p&gt;

&lt;p&gt;If heartbeat runs in an isolated session, automation gets cleaner, but context gets worse.&lt;/p&gt;

&lt;p&gt;That tradeoff is not accidental. It’s the architecture.&lt;/p&gt;

&lt;p&gt;So if your journal matters for recovery, continuity alone is not enough.&lt;br&gt;
You also need deterministic execution.&lt;/p&gt;

&lt;p&gt;And that means moving reliability outside the model.&lt;/p&gt;
&lt;h2&gt;
  
  
  The best answer from the thread: hard triggers + artifact gates
&lt;/h2&gt;

&lt;p&gt;The most useful comment in that Reddit thread basically said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;stop asking the model to be your scheduler&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the answer.&lt;/p&gt;

&lt;p&gt;If journaling must happen reliably, the control plane should be boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cron&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;systemd timers&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;a shell script&lt;/li&gt;
&lt;li&gt;a Python worker&lt;/li&gt;
&lt;li&gt;n8n, Make, or Zapier if that’s your stack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let OpenClaw generate content.&lt;br&gt;
Do not let OpenClaw own the guarantee.&lt;/p&gt;

&lt;p&gt;Once you frame it that way, the design gets much simpler.&lt;/p&gt;
&lt;h2&gt;
  
  
  What a reliable journaling pipeline should do
&lt;/h2&gt;

&lt;p&gt;If I were building this for Obsidian, I’d want five things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A deterministic trigger&lt;/li&gt;
&lt;li&gt;A check that today’s note exists&lt;/li&gt;
&lt;li&gt;Append-only writes&lt;/li&gt;
&lt;li&gt;Idempotency so the same event does not get written twice&lt;/li&gt;
&lt;li&gt;A fallback when context is thin&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is engineering.&lt;br&gt;
Not prompting.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical pattern that actually works
&lt;/h2&gt;

&lt;p&gt;Here’s the architecture I’d use.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;cron&lt;/code&gt; or &lt;code&gt;systemd&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Decide when journaling runs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;shell/Python script&lt;/td&gt;
&lt;td&gt;Build paths, gather artifacts, enforce idempotency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Generate one concise journal block&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Obsidian markdown file&lt;/td&gt;
&lt;td&gt;Store append-only entries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;marker file / SQLite / JSON state&lt;/td&gt;
&lt;td&gt;Track successful writes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Example: file layout
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vault/
  daily-notes/
    2026/
      08/
        2026-08-14.md
state/
  journal-last-run.json
artifacts/
  latest-session-summary.txt
  latest-telegram-messages.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Example: create today’s note if missing
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;VAULT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/obsidian-vault"&lt;/span&gt;
&lt;span class="nv"&gt;TODAY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%F&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;YEAR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;MONTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%m&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;NOTE_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$VAULT&lt;/span&gt;&lt;span class="s2"&gt;/daily-notes/&lt;/span&gt;&lt;span class="nv"&gt;$YEAR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$MONTH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;NOTE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTE_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$TODAY&lt;/span&gt;&lt;span class="s2"&gt;.md"&lt;/span&gt;

&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTE_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
# &lt;/span&gt;&lt;span class="nv"&gt;$TODAY&lt;/span&gt;&lt;span class="sh"&gt;

## Journal
&lt;/span&gt;&lt;span class="no"&gt;
EOF
&lt;/span&gt;&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is boring.&lt;br&gt;
That’s why it’s good.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: append-only write with a marker
&lt;/h2&gt;

&lt;p&gt;You do not want duplicate entries if a trigger retries.&lt;/p&gt;

&lt;p&gt;A simple approach is to generate an event ID and store it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;EVENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%dT%H%M&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;STATE_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/journal-state"&lt;/span&gt;
&lt;span class="nv"&gt;MARKER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$STATE_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$EVENT_ID&lt;/span&gt;&lt;span class="s2"&gt;.done"&lt;/span&gt;

&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$STATE_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MARKER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Already wrote entry for &lt;/span&gt;&lt;span class="nv"&gt;$EVENT_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# append entry&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"### &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%H:%M&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="nb"&gt;echo
  echo&lt;/span&gt; &lt;span class="s2"&gt;"- Investigated Telegram handoff issue"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"- Resumed task after model reset"&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$NOTE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;touch&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MARKER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one marker file solves a lot of “my agent is inconsistent” complaints.&lt;/p&gt;

&lt;p&gt;A lot of those complaints are really idempotency bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: let OpenClaw generate the text, but not the schedule
&lt;/h2&gt;

&lt;p&gt;The script can gather a small context bundle and ask OpenClaw for one markdown block.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;PROMPT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
Write a short append-only journal entry for an Obsidian daily note.

Rules:
- Output markdown only
- 3-6 bullet points max
- Only include facts from the provided artifacts
- If context is incomplete, say what is uncertain
- No rewriting prior entries

Artifacts:
&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/artifacts/latest-session-summary.txt"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

curl https://api.openai.com/v1/responses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"{
    &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;model&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4.1&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
    &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;input&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-Rs&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;
  }"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re using an OpenAI-compatible endpoint, this pattern works with whatever backend you already have wired in.&lt;/p&gt;

&lt;p&gt;That matters because journaling jobs are exactly the kind of thing that can quietly rack up cost when they run all day, every day, across multiple agents.&lt;/p&gt;

&lt;p&gt;If you’re running these automations constantly, predictable flat-rate API usage is a lot nicer than babysitting token spend.&lt;/p&gt;

&lt;p&gt;That’s one reason Standard Compute is interesting here: it’s a drop-in OpenAI API replacement with unlimited compute on a flat monthly plan, which fits agent-heavy workflows much better than per-token pricing when you have scheduled jobs, retries, summaries, and background automations firing all day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: a Python version with better control
&lt;/h2&gt;

&lt;p&gt;If shell starts getting messy, Python is cleaner.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;note_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;home&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;obsidian-vault&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daily-notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;note_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;note_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;note_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;## Journal&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;### &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%H&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;- Agent resumed task after restart&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;- Synced latest Telegram context&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;note_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the point where you can add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hash-based dedupe&lt;/li&gt;
&lt;li&gt;JSON state tracking&lt;/li&gt;
&lt;li&gt;retries with backoff&lt;/li&gt;
&lt;li&gt;validation of markdown format&lt;/li&gt;
&lt;li&gt;artifact collection from Telegram, OpenClaw session files, or task logs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Heartbeat is still useful, just not for the core guarantee
&lt;/h2&gt;

&lt;p&gt;I don’t think heartbeat is useless here.&lt;/p&gt;

&lt;p&gt;I think people are assigning it the wrong job.&lt;/p&gt;

&lt;p&gt;Heartbeat is good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“check whether anything should be journaled”&lt;/li&gt;
&lt;li&gt;“capture a quick status while context is fresh”&lt;/li&gt;
&lt;li&gt;“nudge the agent to summarize before compaction”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Heartbeat is not good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact-time execution&lt;/li&gt;
&lt;li&gt;guaranteed writes&lt;/li&gt;
&lt;li&gt;durable append-only logging&lt;/li&gt;
&lt;li&gt;being your only source of truth&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s an important distinction.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’d use for each case
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Best use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Heartbeat&lt;/td&gt;
&lt;td&gt;Context-aware periodic check-ins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Isolated cron&lt;/td&gt;
&lt;td&gt;Exact-time summaries like end-of-day reports&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shell/Python trigger + artifact gates&lt;/td&gt;
&lt;td&gt;Reliable append-only journaling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n / Make / Zapier&lt;/td&gt;
&lt;td&gt;If your workflow already lives in automation tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My opinion: if you want a journal you can trust after compaction, resets, or model swaps, heartbeat should not be the primary mechanism.&lt;/p&gt;

&lt;p&gt;Use it as a helper.&lt;br&gt;
Not the foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The sneaky defaults that sabotage people
&lt;/h2&gt;

&lt;p&gt;A few defaults make this worse than it first appears:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heartbeat cadence is often 30 minutes, not exact timing&lt;/li&gt;
&lt;li&gt;some auth setups change the effective default behavior&lt;/li&gt;
&lt;li&gt;timeouts can fall back to heartbeat cadence&lt;/li&gt;
&lt;li&gt;active hours can suppress runs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;target: "last"&lt;/code&gt; can change routing behavior&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lightContext&lt;/code&gt; reduces what the agent sees&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;skipWhenBusy&lt;/code&gt; means missed windows are expected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you leave those untouched and expect a durable journal, you are basically building on quicksand.&lt;/p&gt;

&lt;p&gt;Then people blame DeepSeek, OpenRouter, or the prompt.&lt;/p&gt;

&lt;p&gt;Usually that’s the wrong culprit.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better mental model for agent automation
&lt;/h2&gt;

&lt;p&gt;This whole issue is a good example of a broader rule:&lt;/p&gt;

&lt;p&gt;Split deterministic infrastructure from probabilistic reasoning.&lt;/p&gt;

&lt;p&gt;Use boring systems for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scheduling&lt;/li&gt;
&lt;li&gt;file creation&lt;/li&gt;
&lt;li&gt;append-only writes&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;state tracking&lt;/li&gt;
&lt;li&gt;dedupe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the model for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarization&lt;/li&gt;
&lt;li&gt;compression&lt;/li&gt;
&lt;li&gt;interpretation&lt;/li&gt;
&lt;li&gt;deciding what matters in the recent context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That division of labor works a lot better than asking one agent to do all of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  A minimal workflow I would actually ship
&lt;/h2&gt;

&lt;p&gt;If I had to make this reliable fast, I’d do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trigger every 15 or 30 minutes with &lt;code&gt;cron&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Pull fresh artifacts from Telegram, session logs, and current task files&lt;/li&gt;
&lt;li&gt;Generate one small markdown block with OpenClaw or another model&lt;/li&gt;
&lt;li&gt;Validate the output format&lt;/li&gt;
&lt;li&gt;Append to today’s Obsidian file&lt;/li&gt;
&lt;li&gt;Write a success marker&lt;/li&gt;
&lt;li&gt;Alert if the append fails twice&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is testable.&lt;br&gt;
That is debuggable.&lt;br&gt;
That survives model weirdness much better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;The Reddit OP was right about the pain.&lt;/p&gt;

&lt;p&gt;But the fix is not “find the perfect heartbeat prompt.”&lt;br&gt;
The fix is moving reliability out of the model.&lt;/p&gt;

&lt;p&gt;If the journal matters, own the write path with cron, bash, Python, systemd, n8n, or whatever deterministic layer you already trust.&lt;/p&gt;

&lt;p&gt;Then let OpenClaw do the part it’s actually good at: turning messy recent context into a useful note.&lt;/p&gt;

&lt;p&gt;That’s less magical.&lt;/p&gt;

&lt;p&gt;It’s also the first approach I’d trust not to miss 2:30pm.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I watched someone burn 50 hours on OpenClaw and the fix was embarrassingly simple</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Sun, 28 Jun 2026 04:55:54 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-watched-someone-burn-50-hours-on-openclaw-and-the-fix-was-embarrassingly-simple-5emk</link>
      <guid>https://dev.to/lars_winstand/i-watched-someone-burn-50-hours-on-openclaw-and-the-fix-was-embarrassingly-simple-5emk</guid>
      <description>&lt;p&gt;I knew this was worth writing the second I saw a post on r/openclaw from someone who said they spent &lt;strong&gt;50 hours&lt;/strong&gt; trying to automate freelancer scouting, evaluation, and outreach in one OpenClaw loop.&lt;/p&gt;

&lt;p&gt;That sentence tells you exactly what happened.&lt;/p&gt;

&lt;p&gt;Not beginner confusion. Not "AI is useless" rage.&lt;/p&gt;

&lt;p&gt;It’s the much more dangerous state: &lt;strong&gt;just enough progress to believe the whole thing is one more prompt away&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I’ve seen this over and over in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lead qualification automation&lt;/li&gt;
&lt;li&gt;recruiter workflows&lt;/li&gt;
&lt;li&gt;outbound prospecting&lt;/li&gt;
&lt;li&gt;freelancer scouting&lt;/li&gt;
&lt;li&gt;agent-based enrichment pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is always the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for people&lt;/li&gt;
&lt;li&gt;Evaluate them&lt;/li&gt;
&lt;li&gt;Personalize outreach&lt;/li&gt;
&lt;li&gt;Send messages&lt;/li&gt;
&lt;li&gt;Hope one giant agent can do all of it cleanly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And then the workflow turns into soup.&lt;/p&gt;

&lt;p&gt;The fix is not exotic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stop building one giant agent. Build a staged workflow.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use OpenClaw for judgment. Use n8n or Make for orchestration. Use scripts and APIs for deterministic steps. Then make the LLM calls cheap enough that you can actually afford to test them properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core mistake: treating OpenClaw like a whole ops team
&lt;/h2&gt;

&lt;p&gt;OpenClaw is useful. But people keep asking the wrong thing from it.&lt;/p&gt;

&lt;p&gt;It makes sense as an assistant harness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stateful sessions&lt;/li&gt;
&lt;li&gt;memory&lt;/li&gt;
&lt;li&gt;tool use&lt;/li&gt;
&lt;li&gt;model routing&lt;/li&gt;
&lt;li&gt;provider failover&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is powerful.&lt;/p&gt;

&lt;p&gt;But that does &lt;strong&gt;not&lt;/strong&gt; mean you should hand it one giant business process and expect reliable end-to-end execution.&lt;/p&gt;

&lt;p&gt;If your workflow is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;search the web for freelancers&lt;/li&gt;
&lt;li&gt;filter by quality&lt;/li&gt;
&lt;li&gt;rank candidates&lt;/li&gt;
&lt;li&gt;write personalized DMs&lt;/li&gt;
&lt;li&gt;send those DMs&lt;/li&gt;
&lt;li&gt;log outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...you do not have one task.&lt;/p&gt;

&lt;p&gt;You have multiple systems with different failure modes pretending to be one task.&lt;/p&gt;

&lt;p&gt;That distinction matters a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw itself kind of tells you this
&lt;/h2&gt;

&lt;p&gt;If you look at the CLI shape, the mindset is obvious:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status
openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
openclaw status &lt;span class="nt"&gt;--deep&lt;/span&gt;
openclaw logs &lt;span class="nt"&gt;--follow&lt;/span&gt;
openclaw doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not "trust the magic" UX.&lt;/p&gt;

&lt;p&gt;That is &lt;strong&gt;inspect, debug, iterate&lt;/strong&gt; UX.&lt;/p&gt;

&lt;p&gt;Which is exactly how you should approach agent workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the one-big-agent design fails
&lt;/h2&gt;

&lt;p&gt;Because scouting, evaluation, and outreach are not one problem.&lt;/p&gt;

&lt;p&gt;They are three different bugs wearing a trench coat.&lt;/p&gt;

&lt;h2&gt;
  
  
  1) Search fails before the model even starts thinking
&lt;/h2&gt;

&lt;p&gt;A lot of bad agent workflows are just bad inputs with extra steps.&lt;/p&gt;

&lt;p&gt;If your sourcing layer is weak, the model spends all its intelligence grading junk.&lt;/p&gt;

&lt;p&gt;That’s why I agree with the people recommending &lt;strong&gt;Exa&lt;/strong&gt; for search-heavy agent loops. Better retrieval quality matters more than people think.&lt;/p&gt;

&lt;p&gt;For this class of workflow, search is not a helper. Search is the foundation.&lt;/p&gt;

&lt;p&gt;A practical sourcing step looks more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// pseudo-pipeline&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;exa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;freelance product designer SaaS portfolio&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;numResults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;author&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;snippet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is already better than asking one giant prompt to both discover and judge candidates in the same breath.&lt;/p&gt;

&lt;h2&gt;
  
  
  2) Evaluation needs repetition, not vibes
&lt;/h2&gt;

&lt;p&gt;This is where most agent demos fall apart in production.&lt;/p&gt;

&lt;p&gt;You cannot validate a scoring prompt by watching it succeed once.&lt;/p&gt;

&lt;p&gt;You need to run it on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;20 candidates&lt;/li&gt;
&lt;li&gt;then 50&lt;/li&gt;
&lt;li&gt;then edge cases&lt;/li&gt;
&lt;li&gt;then obvious rejects&lt;/li&gt;
&lt;li&gt;then ambiguous profiles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then compare outputs.&lt;/p&gt;

&lt;p&gt;Then tweak the rubric.&lt;/p&gt;

&lt;p&gt;Then run it again.&lt;/p&gt;

&lt;p&gt;That’s why &lt;strong&gt;n8n&lt;/strong&gt; is useful here. Its evaluation workflow pattern is much closer to how engineers should test LLM logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  A simple scoring loop
&lt;/h3&gt;

&lt;p&gt;Put 25 rows in Google Sheets or a table:&lt;/p&gt;

&lt;p&gt;| Candidate | Portfolio | Notes |&lt;br&gt;
|----------|----------|&lt;br&gt;
| A | &lt;a href="https://example.com/a" rel="noopener noreferrer"&gt;https://example.com/a&lt;/a&gt; | Strong UI, weak case studies |&lt;br&gt;
| B | &lt;a href="https://example.com/b" rel="noopener noreferrer"&gt;https://example.com/b&lt;/a&gt; | Great B2B work |&lt;br&gt;
| C | &lt;a href="https://example.com/c" rel="noopener noreferrer"&gt;https://example.com/c&lt;/a&gt; | Mostly student projects |&lt;/p&gt;

&lt;p&gt;Then score each row with a small, explicit prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"score_freelancer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"criteria"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"relevant_experience"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"portfolio_quality"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"communication_clarity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"risk_flags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output_format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0-100"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"short explanation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"reject|review|approve"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And call the model in a narrow way.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
Score this freelancer for outbound outreach.

Candidate data:
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;

Return JSON only:
{
  "score": number,
  "reason": string,
  "decision": "reject" | "review" | "approve"
}
`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the work.&lt;/p&gt;

&lt;p&gt;Not the flashy DM generation. The scoring loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  3) Outreach is deterministic in all the annoying places
&lt;/h2&gt;

&lt;p&gt;This is the part developers usually know already, but ignore because the autonomous-agent fantasy is fun.&lt;/p&gt;

&lt;p&gt;Sending email, writing to Airtable, creating a HubSpot contact, updating Notion, posting to Slack, writing to Postgres — these are &lt;strong&gt;not&lt;/strong&gt; LLM problems.&lt;/p&gt;

&lt;p&gt;They are API problems.&lt;/p&gt;

&lt;p&gt;So solve them with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n&lt;/li&gt;
&lt;li&gt;Make&lt;/li&gt;
&lt;li&gt;direct scripts&lt;/li&gt;
&lt;li&gt;native APIs&lt;/li&gt;
&lt;li&gt;Composio&lt;/li&gt;
&lt;li&gt;CRM connectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let the model decide &lt;strong&gt;what&lt;/strong&gt; to say.&lt;/p&gt;

&lt;p&gt;Do not let the model own the mechanics of &lt;strong&gt;how&lt;/strong&gt; the message gets sent unless you enjoy debugging weird side effects.&lt;/p&gt;

&lt;p&gt;Example split:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;approve&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generatePersonalizedDM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gmail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Quick question about freelance work&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;hubspot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contacts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createOrUpdate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;exa-search&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That architecture is less magical.&lt;/p&gt;

&lt;p&gt;It is also much easier to debug at 2 AM.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow that actually survives contact with reality
&lt;/h2&gt;

&lt;p&gt;If I were building freelancer scouting or lead qualification automation today, I’d structure it like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Source candidates&lt;/strong&gt; with Exa or a deterministic scraper/API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalize records&lt;/strong&gt; with code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score candidates&lt;/strong&gt; with a narrow LLM prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run evals&lt;/strong&gt; on a labeled dataset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Require approval&lt;/strong&gt; for edge cases or high-value outreach&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate personalization&lt;/strong&gt; only for approved candidates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Send messages&lt;/strong&gt; via Gmail, LinkedIn helpers, or CRM integrations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log outcomes&lt;/strong&gt; for prompt iteration later&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That sounds slower than one giant agent.&lt;/p&gt;

&lt;p&gt;In practice, it’s faster.&lt;/p&gt;

&lt;p&gt;Because every broken part has a name.&lt;/p&gt;

&lt;h2&gt;
  
  
  One giant agent vs staged workflow
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What actually happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One big agent workflow&lt;/td&gt;
&lt;td&gt;One prompt tries to scout, evaluate, personalize, and send. Fast to demo, miserable to debug, and every failure compounds.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Staged n8n or Make workflow&lt;/td&gt;
&lt;td&gt;Separate steps for sourcing, scoring, approval, and outreach. Easier to test, easier to swap tools, easier to reason about.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic scripts plus agent judgment&lt;/td&gt;
&lt;td&gt;APIs and scripts handle repeatable actions. LLMs handle ranking, extraction, and personalization. Best option for production reliability.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you plan to run the workflow more than a few times, the staged version wins.&lt;/p&gt;

&lt;p&gt;Not because it’s elegant.&lt;/p&gt;

&lt;p&gt;Because it’s survivable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost problem nobody likes admitting
&lt;/h2&gt;

&lt;p&gt;Here’s the part that quietly wrecks architecture decisions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;retries cost money&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Every time you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;re-score a batch&lt;/li&gt;
&lt;li&gt;retry extraction&lt;/li&gt;
&lt;li&gt;run fallback prompts&lt;/li&gt;
&lt;li&gt;compare models&lt;/li&gt;
&lt;li&gt;evaluate 50 examples&lt;/li&gt;
&lt;li&gt;regenerate outreach&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...the meter runs.&lt;/p&gt;

&lt;p&gt;And that creates bad behavior.&lt;/p&gt;

&lt;p&gt;Teams start doing things they know are wrong because per-token pricing makes iteration feel expensive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;avoiding evals&lt;/li&gt;
&lt;li&gt;under-testing prompts&lt;/li&gt;
&lt;li&gt;keeping giant prompts instead of splitting steps&lt;/li&gt;
&lt;li&gt;skipping retries&lt;/li&gt;
&lt;li&gt;refusing to batch experiments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s how token anxiety becomes a design constraint.&lt;/p&gt;

&lt;p&gt;For agent workflows, that is brutal.&lt;/p&gt;

&lt;p&gt;Because the correct architecture usually involves &lt;strong&gt;more calls, smaller steps, more testing&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Which is exactly why flat-rate compute is so attractive for automation-heavy stacks.&lt;/p&gt;

&lt;p&gt;If you’re running agents all day in n8n, Make, Zapier, OpenClaw, or custom workflows, predictable pricing changes behavior in a good way. You stop treating every iteration like a financial decision.&lt;/p&gt;

&lt;p&gt;That’s the real appeal of &lt;strong&gt;Standard Compute&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It gives you an OpenAI-compatible API, but with &lt;strong&gt;unlimited AI compute at a flat monthly price&lt;/strong&gt; instead of per-token billing. So the architecture you’d build if cost didn’t constantly nag you — staged workflows, repeated evals, lots of narrow model calls — becomes practical.&lt;/p&gt;

&lt;p&gt;That matters a lot when your workflow is doing repeated ranking, scoring, rewriting, and fallback logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring architecture is usually the advanced one
&lt;/h2&gt;

&lt;p&gt;People think the sophisticated setup is maximum autonomy.&lt;/p&gt;

&lt;p&gt;Usually it isn’t.&lt;/p&gt;

&lt;p&gt;Usually the sophisticated setup is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw&lt;/strong&gt; for judgment and tool use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;n8n&lt;/strong&gt; for orchestration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exa&lt;/strong&gt; for search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gmail&lt;/strong&gt; or &lt;strong&gt;HubSpot&lt;/strong&gt; connectors for delivery&lt;/li&gt;
&lt;li&gt;a smaller model for repeated scoring&lt;/li&gt;
&lt;li&gt;a larger model only when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not less advanced.&lt;/p&gt;

&lt;p&gt;That is more advanced because it respects failure boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would build first
&lt;/h2&gt;

&lt;p&gt;Not outreach.&lt;/p&gt;

&lt;p&gt;That’s the bait.&lt;/p&gt;

&lt;p&gt;Build the scoring loop first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start here
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Collect 25 candidate profiles&lt;/li&gt;
&lt;li&gt;Put them in Google Sheets or a database table&lt;/li&gt;
&lt;li&gt;Define a scoring rubric&lt;/li&gt;
&lt;li&gt;Run the same prompt across all rows&lt;/li&gt;
&lt;li&gt;Compare outputs&lt;/li&gt;
&lt;li&gt;Fix the rubric&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A minimal prompt is enough:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are scoring freelancer candidates for outbound outreach.

Evaluate this candidate on:
- relevant experience
- evidence of quality work
- communication clarity
- fit for B2B SaaS work

Return JSON with:
- score (0-100)
- decision (reject/review/approve)
- reason (max 30 words)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, if you’re using OpenClaw, actually inspect the system while it runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status &lt;span class="nt"&gt;--deep&lt;/span&gt;
openclaw logs &lt;span class="nt"&gt;--follow&lt;/span&gt;
openclaw doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you can’t explain why a candidate was selected, you are not ready to automate outreach.&lt;/p&gt;

&lt;p&gt;That sounds harsh.&lt;/p&gt;

&lt;p&gt;It is still cheaper than losing another 50 hours to a workflow that looked smart in a diagram.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real lesson
&lt;/h2&gt;

&lt;p&gt;The lesson is not that OpenClaw is weak.&lt;/p&gt;

&lt;p&gt;The lesson is that people keep trying to compress messy human workflows into one heroic prompt.&lt;/p&gt;

&lt;p&gt;That almost never works.&lt;/p&gt;

&lt;p&gt;Break it apart.&lt;/p&gt;

&lt;p&gt;Let LLMs judge.&lt;/p&gt;

&lt;p&gt;Let scripts execute.&lt;/p&gt;

&lt;p&gt;Let n8n or Make orchestrate.&lt;/p&gt;

&lt;p&gt;And if you’re doing enough repeated LLM work that token pricing is warping your design, use infrastructure that doesn’t punish iteration.&lt;/p&gt;

&lt;p&gt;That’s how these workflows stop being demos and start becoming systems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I stopped trusting app dashboards and used a browser automation AI agent to rebuild the numbers from scratch</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Sat, 27 Jun 2026 20:56:09 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-stopped-trusting-app-dashboards-and-used-a-browser-automation-ai-agent-to-rebuild-the-numbers-a0a</link>
      <guid>https://dev.to/lars_winstand/i-stopped-trusting-app-dashboards-and-used-a-browser-automation-ai-agent-to-rebuild-the-numbers-a0a</guid>
      <description>&lt;p&gt;Dashboards are great right up until they quietly lie to you.&lt;/p&gt;

&lt;p&gt;I like a clean admin screen as much as anyone. Green checks. Nice totals. A chart drifting upward like the database has never seen a duplicate row in its life.&lt;/p&gt;

&lt;p&gt;But some of the worst ops mistakes I’ve seen started with the same sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The dashboard says we’re fine.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s why a small Reddit example stuck with me. In a thread on r/openclaw, someone said they used OpenClaw to fill out Garmin’s device-sync worksheet from their own activity history instead of trusting the app screen.&lt;/p&gt;

&lt;p&gt;That’s a tiny use case. It’s also one of the clearest examples of what AI agents are actually good at.&lt;/p&gt;

&lt;p&gt;Not writing tweets.&lt;br&gt;
Not roleplaying as your coworker.&lt;br&gt;
Not summarizing a summary.&lt;/p&gt;

&lt;p&gt;The useful move is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Have the agent go back to source records and reconstruct the answer itself.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That turns the agent from a chatbot into a verification layer.&lt;/p&gt;

&lt;p&gt;And for developers building automations, that’s way more interesting.&lt;/p&gt;
&lt;h2&gt;
  
  
  The chat part is the least interesting part
&lt;/h2&gt;

&lt;p&gt;Most people still picture an agent as a chat UI with a few tools attached.&lt;/p&gt;

&lt;p&gt;That framing misses the real value.&lt;/p&gt;

&lt;p&gt;The important part is not that GPT-5 or Claude can answer in natural language. The important part is that an agent can inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gmail threads&lt;/li&gt;
&lt;li&gt;Slack messages&lt;/li&gt;
&lt;li&gt;SQLite or PostgreSQL rows&lt;/li&gt;
&lt;li&gt;CSV exports&lt;/li&gt;
&lt;li&gt;Google Sheets&lt;/li&gt;
&lt;li&gt;app activity logs&lt;/li&gt;
&lt;li&gt;calendar events&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it can compare those records to whatever your dashboard claims happened.&lt;/p&gt;

&lt;p&gt;That’s the architectural shift.&lt;/p&gt;

&lt;p&gt;If the agent can access the underlying records directly, it does not need to trust one app’s summary screen.&lt;/p&gt;

&lt;p&gt;For verification workflows, that’s the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“read the number on the page”&lt;/li&gt;
&lt;li&gt;“compute the number from evidence”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I trust the second one a lot more.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why dashboards are often the wrong source of truth
&lt;/h2&gt;

&lt;p&gt;Dashboards are optimized for readability and speed.&lt;/p&gt;

&lt;p&gt;They are not optimized for forensic accuracy.&lt;/p&gt;

&lt;p&gt;A dashboard number might be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cached&lt;/li&gt;
&lt;li&gt;delayed&lt;/li&gt;
&lt;li&gt;filtered&lt;/li&gt;
&lt;li&gt;deduplicated&lt;/li&gt;
&lt;li&gt;rounded&lt;/li&gt;
&lt;li&gt;based on business rules you forgot existed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s fine when you’re checking a rough trend.&lt;/p&gt;

&lt;p&gt;It’s not fine when you’re deciding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether a customer was contacted&lt;/li&gt;
&lt;li&gt;whether a sync job actually completed&lt;/li&gt;
&lt;li&gt;whether your CRM matches your inbox&lt;/li&gt;
&lt;li&gt;whether support backlog is growing&lt;/li&gt;
&lt;li&gt;whether a billing report is safe to send&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Garmin example works because it’s painfully familiar: the app screen said one thing, the history said another, so the user rebuilt the answer from the underlying activity.&lt;/p&gt;

&lt;p&gt;That’s the pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don’t ask AI to trust the dashboard. Ask AI to check the receipts.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The stack that makes this work
&lt;/h2&gt;

&lt;p&gt;While digging through agent workflows, I found another r/openclaw discussion that explained the integration problem better than most vendor pages do. One commenter broke it into tiers: native tools, MCP connections, and managed OAuth layers like Composio.&lt;/p&gt;

&lt;p&gt;That’s the real design question.&lt;/p&gt;

&lt;p&gt;Not “which model is smartest?”&lt;/p&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How directly can this agent access the records I actually trust?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here’s the practical version.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;What it’s best at&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Local-first agent control plane, model routing, failover, and operational visibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP&lt;/td&gt;
&lt;td&gt;Connecting agents to files, databases, calendars, and app data so they can read raw records directly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composio&lt;/td&gt;
&lt;td&gt;Managed OAuth, per-user sessions, token refresh, triggers, and a huge app integration layer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My take: if you care about verification, &lt;strong&gt;OpenClaw + MCP + Composio&lt;/strong&gt; is more interesting than another hosted chat app.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why OpenClaw is a good fit for verification work
&lt;/h2&gt;

&lt;p&gt;OpenClaw is interesting because it behaves more like infrastructure than a chat toy.&lt;/p&gt;

&lt;p&gt;If I’m asking an agent to reconcile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local exports&lt;/li&gt;
&lt;li&gt;inbox history&lt;/li&gt;
&lt;li&gt;SQLite rows&lt;/li&gt;
&lt;li&gt;Slack messages&lt;/li&gt;
&lt;li&gt;a spreadsheet someone emailed three weeks ago&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I want something inspectable.&lt;/p&gt;

&lt;p&gt;OpenClaw exposes commands that make that possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status
openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
openclaw status &lt;span class="nt"&gt;--deep&lt;/span&gt;

openclaw health &lt;span class="nt"&gt;--json&lt;/span&gt;
openclaw health &lt;span class="nt"&gt;--verbose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters.&lt;/p&gt;

&lt;p&gt;A verification layer should be debuggable. If the agent is going to tell me the dashboard is wrong, I want to know what it touched, what failed, and what source it trusted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where MCP becomes the useful part
&lt;/h2&gt;

&lt;p&gt;MCP matters because it gives the agent a standard way to access real systems instead of scraping one screen and pretending that’s truth.&lt;/p&gt;

&lt;p&gt;For example, if your agent can connect to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gmail&lt;/li&gt;
&lt;li&gt;Google Calendar&lt;/li&gt;
&lt;li&gt;PostgreSQL&lt;/li&gt;
&lt;li&gt;SQLite&lt;/li&gt;
&lt;li&gt;local files&lt;/li&gt;
&lt;li&gt;Notion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then it can rebuild answers from source records.&lt;/p&gt;

&lt;p&gt;That’s a much healthier pattern than “open dashboard, read total, repeat total.”&lt;/p&gt;

&lt;p&gt;A minimal example might look like this conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nx"&gt;gmail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getThreads&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;since&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2026-06-01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;slack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMessages&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;support&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;since&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2026-06-01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;postgres&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;select * from tickets where created_at &amp;gt;= $1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2026-06-01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
  &lt;span class="nx"&gt;sqlite&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;select * from sync_events where ts &amp;gt;= ?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2026-06-01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;normalized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;records&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;normalized&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;mismatches&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact APIs vary, but the pattern is the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;fetch source records&lt;/li&gt;
&lt;li&gt;normalize them&lt;/li&gt;
&lt;li&gt;compute the answer&lt;/li&gt;
&lt;li&gt;compare it to the app summary&lt;/li&gt;
&lt;li&gt;output evidence&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where Composio saves you from OAuth hell
&lt;/h2&gt;

&lt;p&gt;This is the part developers underestimate until they lose a weekend to auth flows.&lt;/p&gt;

&lt;p&gt;Composio is useful because it handles the ugly integration layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OAuth&lt;/li&gt;
&lt;li&gt;per-user connections&lt;/li&gt;
&lt;li&gt;token refresh&lt;/li&gt;
&lt;li&gt;triggers&lt;/li&gt;
&lt;li&gt;SDK and CLI access&lt;/li&gt;
&lt;li&gt;lots of app integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means your agent can pull from the systems teams actually use, like Gmail, Slack, Google Sheets, and Linear, without you hand-rolling auth for every connector.&lt;/p&gt;

&lt;p&gt;Their install path is refreshingly simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://composio.dev/install | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And yes, this matters for verification. If your agent can pull raw Slack messages and compare them against CRM activity or ticket counts, you can catch the mismatch before someone forwards a wrong report.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical verification workflow
&lt;/h2&gt;

&lt;p&gt;This is where the idea stops being abstract.&lt;/p&gt;

&lt;p&gt;A solid reconciliation pipeline usually looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pull source data from every system involved&lt;/li&gt;
&lt;li&gt;Normalize IDs, timestamps, and duplicates&lt;/li&gt;
&lt;li&gt;Ask the model to reconcile differences&lt;/li&gt;
&lt;li&gt;Compare the model’s computed result to the dashboard value&lt;/li&gt;
&lt;li&gt;Emit a mismatch report with links to evidence&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you’re using n8n, this is a very natural fit.&lt;/p&gt;

&lt;p&gt;Example flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Node 1: fetch Gmail thread export&lt;/li&gt;
&lt;li&gt;Node 2: fetch Slack messages&lt;/li&gt;
&lt;li&gt;Node 3: read Google Sheets rows&lt;/li&gt;
&lt;li&gt;Node 4: query PostgreSQL&lt;/li&gt;
&lt;li&gt;Node 5: run reconciliation with Claude or GPT-5&lt;/li&gt;
&lt;li&gt;Node 6: post mismatch report to Slack or email&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a much better use of an agent than asking it to sound clever in a sidebar.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: compare a dashboard metric to source records
&lt;/h2&gt;

&lt;p&gt;Here’s a stripped-down Node.js example showing the shape of the workflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;verifyContactCount&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;dashboardCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;gmailThreads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;crmRecords&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contactedEmails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;thread&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;gmailThreads&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;direction&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;outbound&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerEmail&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;contactedEmails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerEmail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;crmTouched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;crmRecords&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerEmail&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastContactedAt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;crmTouched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customerEmail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onlyInGmail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;contactedEmails&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;crmTouched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onlyInCrm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;crmTouched&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;contactedEmails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;dashboardCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;recomputedCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;contactedEmails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;mismatch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;dashboardCount&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;contactedEmails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;onlyInGmail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;onlyInCrm&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s not fancy AI. It’s just disciplined verification.&lt;/p&gt;

&lt;p&gt;The model becomes useful when the records are messy and spread across systems, and when you want a readable explanation of what mismatched and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The checks I would add immediately
&lt;/h2&gt;

&lt;p&gt;Reconstructing from source records is safer than trusting a dashboard.&lt;/p&gt;

&lt;p&gt;It is not automatically correct.&lt;/p&gt;

&lt;p&gt;If the raw data is delayed, incomplete, malformed, or duplicated, the agent can still produce a bad answer. It’ll just do it confidently.&lt;/p&gt;

&lt;p&gt;So if I were building this for production, I’d require the agent to report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;record counts per source&lt;/li&gt;
&lt;li&gt;missing date ranges&lt;/li&gt;
&lt;li&gt;duplicate IDs&lt;/li&gt;
&lt;li&gt;source freshness timestamps&lt;/li&gt;
&lt;li&gt;confirmed vs inferred conclusions&lt;/li&gt;
&lt;li&gt;exact evidence rows or links for every discrepancy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is the big one.&lt;/p&gt;

&lt;p&gt;If the agent says the dashboard is wrong, it should point to the exact Gmail thread, Slack permalink, SQLite row, or CSV line that proves it.&lt;/p&gt;

&lt;p&gt;Otherwise you’ve just replaced one opaque summary with another.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this is worth doing
&lt;/h2&gt;

&lt;p&gt;Not every workflow needs this.&lt;/p&gt;

&lt;p&gt;Sometimes the dashboard is good enough.&lt;/p&gt;

&lt;p&gt;You should build a verification layer when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple systems disagree&lt;/li&gt;
&lt;li&gt;the dashboard is known to lag&lt;/li&gt;
&lt;li&gt;humans are manually cross-checking records already&lt;/li&gt;
&lt;li&gt;the cost of a wrong answer is high&lt;/li&gt;
&lt;li&gt;the workflow is repetitive enough to automate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;support ops&lt;/li&gt;
&lt;li&gt;CRM hygiene&lt;/li&gt;
&lt;li&gt;back-office agent workflows&lt;/li&gt;
&lt;li&gt;sync verification&lt;/li&gt;
&lt;li&gt;compliance-ish audit trails&lt;/li&gt;
&lt;li&gt;billing and activity reconciliation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad candidates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;low-stakes vanity metrics&lt;/li&gt;
&lt;li&gt;anything where “close enough” is actually fine&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Model cost becomes the hidden blocker fast
&lt;/h2&gt;

&lt;p&gt;There’s also a practical issue people avoid talking about.&lt;/p&gt;

&lt;p&gt;Verification workflows are token-hungry.&lt;/p&gt;

&lt;p&gt;If your agent is constantly pulling records, normalizing them, retrying, comparing outputs, and generating evidence-backed reports, per-token pricing gets annoying fast.&lt;/p&gt;

&lt;p&gt;This is exactly the kind of workload where teams start self-censoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“don’t run it too often”&lt;/li&gt;
&lt;li&gt;“skip full reconciliation on smaller accounts”&lt;/li&gt;
&lt;li&gt;“only check the dashboard if someone complains”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That defeats the point.&lt;/p&gt;

&lt;p&gt;Verification is most useful when it runs consistently, not when someone is nervous about the bill.&lt;/p&gt;

&lt;p&gt;That’s why I think flat-rate inference is underrated for agentic ops work.&lt;/p&gt;

&lt;p&gt;With Standard Compute, you get unlimited AI compute for a predictable monthly price, using an OpenAI-compatible API. That means you can plug it into existing SDKs, n8n flows, or custom agents without redesigning your stack around token anxiety.&lt;/p&gt;

&lt;p&gt;For this kind of always-on reconciliation workflow, that pricing model makes more sense than metering every check like it’s a luxury feature.&lt;/p&gt;

&lt;p&gt;Especially if your agents are running 24/7 across automations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger shift
&lt;/h2&gt;

&lt;p&gt;The most underrated thing about agents is that the best use cases are often not about generation.&lt;/p&gt;

&lt;p&gt;They’re about reconstruction.&lt;/p&gt;

&lt;p&gt;Yes, model choice matters. GPT-5 is good at structured reasoning. Claude is often strong at careful synthesis. Other models can be fine depending on constraints.&lt;/p&gt;

&lt;p&gt;But if the agent cannot access the real records, none of that matters much.&lt;/p&gt;

&lt;p&gt;A boring agent with direct access to Gmail, Slack, PostgreSQL, SQLite, and local exports will beat a brilliant model trapped inside a dashboard tab.&lt;/p&gt;

&lt;p&gt;That’s the shift.&lt;/p&gt;

&lt;p&gt;Once you see it, you stop asking:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Can AI summarize this screen?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And you start asking the better question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“What would the answer be if the agent ignored the dashboard completely and rebuilt it from evidence?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s the version I trust.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I kept seeing people buy a Mac mini for OpenClaw and almost none of them needed one</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Sat, 27 Jun 2026 12:56:20 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-kept-seeing-people-buy-a-mac-mini-for-openclaw-and-almost-none-of-them-needed-one-4odk</link>
      <guid>https://dev.to/lars_winstand/i-kept-seeing-people-buy-a-mac-mini-for-openclaw-and-almost-none-of-them-needed-one-4odk</guid>
      <description>&lt;p&gt;A pattern jumped out at me while reading OpenClaw setup threads.&lt;/p&gt;

&lt;p&gt;Not model benchmarks.&lt;/p&gt;

&lt;p&gt;Not prompt engineering.&lt;/p&gt;

&lt;p&gt;Not whether GPT-5 or Claude is better for coding.&lt;/p&gt;

&lt;p&gt;It was this sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I think I need a Mac mini.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And most of the time, they really didn’t.&lt;/p&gt;

&lt;p&gt;Usually they were trying to solve one of these:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;token anxiety&lt;/li&gt;
&lt;li&gt;wanting agents to run while they sleep&lt;/li&gt;
&lt;li&gt;wanting their laptop to stop being accidental production infrastructure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Those are real problems.&lt;/p&gt;

&lt;p&gt;A Mac mini just isn’t the default fix.&lt;/p&gt;

&lt;p&gt;I saw this clearly in an r/openclaw thread where someone wanted OpenClaw to write blogs, update SEO, post to LinkedIn, and keep working in the background without running out of tokens. Totally normal beginner wish list. Also a great example of the trap.&lt;/p&gt;

&lt;p&gt;None of that requires a Mac mini.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mac mini idea is really about control
&lt;/h2&gt;

&lt;p&gt;I get why people reach for Apple hardware.&lt;/p&gt;

&lt;p&gt;A Mac mini feels like a clean answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;quiet&lt;/li&gt;
&lt;li&gt;always on&lt;/li&gt;
&lt;li&gt;low power&lt;/li&gt;
&lt;li&gt;sits on a desk&lt;/li&gt;
&lt;li&gt;feels more serious than keeping Chrome tabs open on a laptop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It feels like buying certainty.&lt;/p&gt;

&lt;p&gt;If OpenClaw is stalling, burning credits, or failing halfway through a Gmail -&amp;gt; Google Drive -&amp;gt; LinkedIn workflow, hardware seems like the adult solution.&lt;/p&gt;

&lt;p&gt;Buy a box. Put the agent there. Done.&lt;/p&gt;

&lt;p&gt;Except that’s usually not what breaks first.&lt;/p&gt;

&lt;p&gt;One commenter in another OpenClaw thread said it plainly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You DO NOT need a Mac Mini, just get a simple, cheap but reliable VPS like Contabo.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not anti-hardware snobbery.&lt;/p&gt;

&lt;p&gt;That’s architecture.&lt;/p&gt;

&lt;p&gt;If OpenClaw is calling cloud models like GPT-5, Claude, or Grok, the expensive compute is happening somewhere else. Your machine is mostly doing orchestration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompts&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;browser steps&lt;/li&gt;
&lt;li&gt;webhooks&lt;/li&gt;
&lt;li&gt;auth flows&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For that, a cheap VPS is usually enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually breaks first
&lt;/h2&gt;

&lt;p&gt;Not CPU.&lt;/p&gt;

&lt;p&gt;Not RAM.&lt;/p&gt;

&lt;p&gt;Usually it’s the glue between services.&lt;/p&gt;

&lt;p&gt;Beginners aren’t asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do I quantize Qwen for Apple Silicon?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They’re asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how do I connect Gmail?&lt;/li&gt;
&lt;li&gt;how do I keep Google auth alive?&lt;/li&gt;
&lt;li&gt;how do I access files from the agent?&lt;/li&gt;
&lt;li&gt;how do I recover from failed browser steps?&lt;/li&gt;
&lt;li&gt;how do I run this 24/7 without babysitting it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the real pain points are usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OAuth and API auth&lt;/li&gt;
&lt;li&gt;webhooks&lt;/li&gt;
&lt;li&gt;browser automation&lt;/li&gt;
&lt;li&gt;file access&lt;/li&gt;
&lt;li&gt;shared storage&lt;/li&gt;
&lt;li&gt;retries and failure handling&lt;/li&gt;
&lt;li&gt;long-running loops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Mac mini doesn’t magically solve any of that.&lt;/p&gt;

&lt;p&gt;If you add one later, you still need a file sync plan.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# You still need to decide how files move around&lt;/span&gt;
&lt;span class="c"&gt;# Common options:&lt;/span&gt;
&lt;span class="c"&gt;# - iCloud&lt;/span&gt;
&lt;span class="c"&gt;# - Dropbox&lt;/span&gt;
&lt;span class="c"&gt;# - Google Drive&lt;/span&gt;
&lt;span class="c"&gt;# - OneDrive&lt;/span&gt;
&lt;span class="c"&gt;# - Syncthing&lt;/span&gt;
&lt;span class="c"&gt;# - Tailscale + shared folder&lt;/span&gt;
&lt;span class="c"&gt;# - Git&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds obvious until someone buys an M4 Mac mini and then realizes the OpenClaw process running there can’t see the files sitting on their MacBook desktop.&lt;/p&gt;

&lt;p&gt;That’s when the “I need better hardware” story turns into an infrastructure story.&lt;/p&gt;

&lt;h2&gt;
  
  
  The question people skip: are you saving tokens or avoiding babysitting?
&lt;/h2&gt;

&lt;p&gt;This is the fork.&lt;/p&gt;

&lt;p&gt;A lot of people say they want unlimited API access.&lt;/p&gt;

&lt;p&gt;What they actually mean is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I don’t want to keep checking whether this agent is burning money while doing dumb stuff.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a different problem.&lt;/p&gt;

&lt;p&gt;Long-running agents fail expensively.&lt;/p&gt;

&lt;p&gt;Not because GPT-5, Claude, or Grok are bad.&lt;/p&gt;

&lt;p&gt;Because agent loops are messy.&lt;/p&gt;

&lt;p&gt;A coding agent can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;re-read the same files&lt;/li&gt;
&lt;li&gt;retry tools too often&lt;/li&gt;
&lt;li&gt;over-plan&lt;/li&gt;
&lt;li&gt;get stuck in browser loops&lt;/li&gt;
&lt;li&gt;chew through credits while making partial progress&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So before you buy hardware, ask the uncomfortable question:&lt;/p&gt;

&lt;p&gt;Is the problem really compute, or is it pricing, routing, and throttling?&lt;/p&gt;

&lt;p&gt;That matters because if your stack depends on cloud models, hardware won’t fix per-token billing.&lt;/p&gt;

&lt;p&gt;A lot of teams don’t need a stronger box. They need predictable API economics.&lt;/p&gt;

&lt;p&gt;That’s where something like Standard Compute makes more sense than a Mac mini. It gives you an OpenAI-compatible endpoint with flat monthly pricing, so OpenClaw, n8n, Make, Zapier, or custom agents can run without the constant “how much did that loop cost?” panic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical decision tree
&lt;/h2&gt;

&lt;p&gt;Here’s the short version.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Best use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mac mini for OpenClaw&lt;/td&gt;
&lt;td&gt;Running local models, on-prem workflows, or wanting a dedicated always-on local machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cheap VPS for OpenClaw&lt;/td&gt;
&lt;td&gt;Best default for cloud-model workflows that need to run 24/7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flat-rate OpenAI-compatible compute service&lt;/td&gt;
&lt;td&gt;Best when the main problem is unpredictable API spend from agents and automations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That middle option is underrated.&lt;/p&gt;

&lt;p&gt;A VPS from Contabo, Hetzner, or DigitalOcean is boring.&lt;/p&gt;

&lt;p&gt;That’s exactly why it’s good.&lt;/p&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;leave it on&lt;/li&gt;
&lt;li&gt;isolate it from your laptop&lt;/li&gt;
&lt;li&gt;rebuild it fast&lt;/li&gt;
&lt;li&gt;keep secrets off your daily machine&lt;/li&gt;
&lt;li&gt;stop turning your personal computer into production infra&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Boring wins early.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’d do as a beginner
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Prove the workflow on the machine you already have
&lt;/h3&gt;

&lt;p&gt;Get one useful loop working first.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gmail -&amp;gt; Google Sheets&lt;/li&gt;
&lt;li&gt;website research -&amp;gt; Notion draft&lt;/li&gt;
&lt;li&gt;browser task -&amp;gt; Slack notification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don’t start with “run my business while I sleep.”&lt;/p&gt;

&lt;p&gt;That’s how people end up buying hardware to avoid fixing design problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Move it to a cheap VPS
&lt;/h3&gt;

&lt;p&gt;Once it works locally, move it somewhere persistent.&lt;/p&gt;

&lt;p&gt;Even a small Linux box is enough for most cloud-model OpenClaw workflows.&lt;/p&gt;

&lt;p&gt;Example setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh root@your-vps
apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;
apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; git curl tmux docker.io docker-compose-plugin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clone your project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/yourname/openclaw-workflows.git
&lt;span class="nb"&gt;cd &lt;/span&gt;openclaw-workflows
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it in a way that survives disconnects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tmux new &lt;span class="nt"&gt;-s&lt;/span&gt; openclaw
&lt;span class="c"&gt;# start your app here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;openclaw-runner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-image:latest&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;env_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data:/app/data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gets you most of the benefit people think they need a Mac mini for.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Fix auth and storage before you upgrade hardware
&lt;/h3&gt;

&lt;p&gt;This is where most of the real work is.&lt;/p&gt;

&lt;p&gt;Things to get right:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# sanity checklist&lt;/span&gt;
&lt;span class="c"&gt;# - API keys stored in env vars&lt;/span&gt;
&lt;span class="c"&gt;# - OAuth refresh tokens tested&lt;/span&gt;
&lt;span class="c"&gt;# - logs written somewhere persistent&lt;/span&gt;
&lt;span class="c"&gt;# - retry logic for flaky browser steps&lt;/span&gt;
&lt;span class="c"&gt;# - shared storage path defined&lt;/span&gt;
&lt;span class="c"&gt;# - cron or scheduler configured&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your Gmail token expires every 24 hours, an M4 Pro won’t save you.&lt;/p&gt;

&lt;p&gt;If your browser session gets logged out, extra RAM won’t save you.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Only buy a Mac mini when the constraint is specific
&lt;/h3&gt;

&lt;p&gt;A Mac mini makes sense when you can say one of these clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I need local Qwen, Llama, or Mistral inference&lt;/li&gt;
&lt;li&gt;this workflow must stay on-prem&lt;/li&gt;
&lt;li&gt;I want a dedicated office machine that is always on&lt;/li&gt;
&lt;li&gt;I care about low-noise, low-power local hosting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a valid reason.&lt;/p&gt;

&lt;p&gt;“Reddit made it sound like the serious option” is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  When local hardware actually is the right call
&lt;/h2&gt;

&lt;p&gt;I’m not anti-Mac mini.&lt;/p&gt;

&lt;p&gt;It’s a good machine.&lt;/p&gt;

&lt;p&gt;If you want local AI agent hardware, buy it for the right reason.&lt;/p&gt;

&lt;p&gt;Good reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local model inference&lt;/li&gt;
&lt;li&gt;privacy-sensitive work&lt;/li&gt;
&lt;li&gt;dedicated always-on machine&lt;/li&gt;
&lt;li&gt;personal preference for self-hosting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But local hosting has its own trap: people underestimate how much model quality matters.&lt;/p&gt;

&lt;p&gt;Expensive hardware does not automatically produce a good agent.&lt;/p&gt;

&lt;p&gt;A weak local model on nice hardware is still a weak local model.&lt;/p&gt;

&lt;p&gt;That matters a lot for tool use and long-running workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  If your real problem is cost predictability, solve that directly
&lt;/h2&gt;

&lt;p&gt;This is the part I think a lot of OpenClaw beginners miss.&lt;/p&gt;

&lt;p&gt;If you’re using cloud models, there are really two separate questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;where should the agent runtime live?&lt;/li&gt;
&lt;li&gt;how should model usage be billed?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Those are not the same decision.&lt;/p&gt;

&lt;p&gt;For runtime, a cheap VPS is often enough.&lt;/p&gt;

&lt;p&gt;For billing, per-token pricing is what creates the stress.&lt;/p&gt;

&lt;p&gt;If your agent stack is built around OpenAI-compatible APIs, you can swap the endpoint without rewriting everything.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.standardcompute.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_STANDARD_COMPUTE_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai/gpt-5.4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize these support tickets and draft replies.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That same pattern works for a lot of existing agent code because Standard Compute is an OpenAI-compatible API. So if the real pain is token anxiety, you can solve the actual problem instead of buying hardware that doesn’t change your cloud bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  My opinionated version
&lt;/h2&gt;

&lt;p&gt;For most OpenClaw beginners:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;don’t buy a Mac mini first&lt;/li&gt;
&lt;li&gt;use the machine you already have to validate the workflow&lt;/li&gt;
&lt;li&gt;move it to a cheap VPS when you need 24/7 uptime&lt;/li&gt;
&lt;li&gt;if cloud API cost is the scary part, fix pricing instead of buying hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Mac mini starts making sense when you need local models, privacy, or a dedicated always-on box.&lt;/p&gt;

&lt;p&gt;Until then, it’s often a very pretty detour.&lt;/p&gt;

&lt;p&gt;Agents usually don’t fail because your desk lacks aluminum.&lt;/p&gt;

&lt;p&gt;They fail because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;auth expired&lt;/li&gt;
&lt;li&gt;storage is unclear&lt;/li&gt;
&lt;li&gt;browser automation is flaky&lt;/li&gt;
&lt;li&gt;retries are missing&lt;/li&gt;
&lt;li&gt;the workflow was never stable&lt;/li&gt;
&lt;li&gt;the cost model made you afraid to let it run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the real beginner story.&lt;/p&gt;

&lt;p&gt;And once you see that, the Mac mini question gets a lot less dramatic.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>openai</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>I thought a 24/7 life-ops agent would be one genius bot but it’s actually 10 boring ones</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Sat, 27 Jun 2026 05:01:00 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-thought-a-247-life-ops-agent-would-be-one-genius-bot-but-its-actually-10-boring-ones-4pe5</link>
      <guid>https://dev.to/lars_winstand/i-thought-a-247-life-ops-agent-would-be-one-genius-bot-but-its-actually-10-boring-ones-4pe5</guid>
      <description>&lt;p&gt;I started this rabbit hole expecting sci-fi.&lt;/p&gt;

&lt;p&gt;You know the pitch: one always-on agent on a Mac mini or home server, quietly running your life while you sleep. It fixes your Plex library, manages Home Assistant, plans trips, handles admin, watches RSS, and only pings you when something actually matters.&lt;/p&gt;

&lt;p&gt;Then I read a thread on r/openclaw about a guy doing exactly this for a media server + personal ops setup.&lt;/p&gt;

&lt;p&gt;And the interesting part was not the fantasy.&lt;/p&gt;

&lt;p&gt;It was the architecture.&lt;/p&gt;

&lt;p&gt;The setup used very normal tools: Unraid, Plex, Sonarr, Radarr, FileBot, Home Assistant, archive.org, Discord, Telegram. The agent wasn’t doing movie-trailer-demo intelligence. It was doing background work. Constantly.&lt;/p&gt;

&lt;p&gt;That phrase stuck with me: background work.&lt;/p&gt;

&lt;p&gt;That’s the real design constraint for 24/7 agents.&lt;/p&gt;

&lt;p&gt;Not reasoning benchmarks.&lt;br&gt;
Not AGI vibes.&lt;br&gt;
Not whether GPT-5.4 or Claude Opus 4.6 wins one-shot prompts.&lt;/p&gt;

&lt;p&gt;The hard part is building something that can grind through boring tasks all day without turning into either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a state-management disaster&lt;/li&gt;
&lt;li&gt;a billing disaster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the first surprise is that the best version is usually not one agent.&lt;/p&gt;

&lt;p&gt;It’s several small ones.&lt;/p&gt;
&lt;h2&gt;
  
  
  The winning pattern is not a super-agent
&lt;/h2&gt;

&lt;p&gt;One of the best comments in that OpenClaw thread came from someone running roughly ten agents, with about six active daily, each with a narrow role.&lt;/p&gt;

&lt;p&gt;That sounds less impressive than “I built Jarvis.”&lt;/p&gt;

&lt;p&gt;It also sounds much more correct.&lt;/p&gt;

&lt;p&gt;If you’re building personal ops, home-lab automation, or always-on assistants, the architecture looks more like a tiny ops team than a single autonomous brain.&lt;/p&gt;

&lt;p&gt;Something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An inbox/operator agent for triage and final decisions&lt;/li&gt;
&lt;li&gt;A media agent for Plex, Sonarr, Radarr, FileBot, subtitle cleanup, missing episodes&lt;/li&gt;
&lt;li&gt;A home agent for Home Assistant routines and device actions&lt;/li&gt;
&lt;li&gt;A research agent for web lookups, archive.org pulls, ancestry, travel planning&lt;/li&gt;
&lt;li&gt;An admin agent for reminders, summaries, follow-ups, recurring tasks&lt;/li&gt;
&lt;li&gt;A notification layer that only escalates real interruptions to Telegram&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That maps cleanly to how tools like OpenClaw are actually useful.&lt;/p&gt;

&lt;p&gt;OpenClaw’s self-hosted Gateway acts like a control plane. Sessions stay isolated by agent, workspace, or sender across channels like Discord and Telegram.&lt;/p&gt;

&lt;p&gt;That sounds like an implementation detail.&lt;/p&gt;

&lt;p&gt;It’s not.&lt;/p&gt;

&lt;p&gt;For long-running agents, session isolation is survival.&lt;/p&gt;

&lt;p&gt;If your media cleanup task bleeds into your nonprofit fundraising draft, or your Home Assistant routine inherits context from a half-finished archive.org job, the whole system starts acting haunted.&lt;/p&gt;

&lt;p&gt;Developers usually discover this the hard way: long-running agents stop being a prompt problem and start being an operations problem.&lt;/p&gt;

&lt;p&gt;That’s true whether you use OpenClaw, n8n, Make, Zapier, or a custom Python worker farm.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why these setups feel smart for a week and cursed by week three
&lt;/h2&gt;

&lt;p&gt;A lot of people blame the model when their agent stack starts getting weird.&lt;/p&gt;

&lt;p&gt;Usually it’s not the model.&lt;/p&gt;

&lt;p&gt;It’s state drift.&lt;/p&gt;

&lt;p&gt;The original Reddit post described exactly the kind of failure you see in real agent systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project lists drifting away from “waiting on me” lists&lt;/li&gt;
&lt;li&gt;completed tasks reappearing&lt;/li&gt;
&lt;li&gt;items vanishing&lt;/li&gt;
&lt;li&gt;background workers getting timid or inconsistent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not “LLMs are fake.”&lt;/p&gt;

&lt;p&gt;That’s “you have no durable source of truth.”&lt;/p&gt;

&lt;p&gt;One commenter said they fixed this by adding a shared memory/store underneath their lists so different views stopped disagreeing.&lt;/p&gt;

&lt;p&gt;That’s why task state matters more than people think.&lt;/p&gt;
&lt;h2&gt;
  
  
  The board is the product
&lt;/h2&gt;

&lt;p&gt;One of the least flashy and most important ideas in OpenClaw is Workboard.&lt;/p&gt;

&lt;p&gt;Not because boards are exciting.&lt;/p&gt;

&lt;p&gt;Because persistent agents need a ledger.&lt;/p&gt;

&lt;p&gt;A real one.&lt;/p&gt;

&lt;p&gt;If an agent drafts a reply but never sends it, should the task be done?&lt;br&gt;
If a worker retries three times and fails, where do you see that?&lt;br&gt;
If an alert fired at 3:14 AM, what run produced it?&lt;br&gt;
If a session goes stale, how do you know what was in progress?&lt;/p&gt;

&lt;p&gt;You need visible state tied to logs, run IDs, session IDs, retries, and event history.&lt;/p&gt;

&lt;p&gt;That’s the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“my agent feels magical”&lt;/li&gt;
&lt;li&gt;and “my agent can survive contact with reality”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For always-on agents, boards, logs, retries, and stale-session detection matter more than demo quality.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical life-ops stack is mostly boring software
&lt;/h2&gt;

&lt;p&gt;This was my favorite part of the research.&lt;/p&gt;

&lt;p&gt;The stack is not exotic.&lt;/p&gt;

&lt;p&gt;It’s home-lab software with automation surfaces.&lt;/p&gt;
&lt;h3&gt;
  
  
  Media stack
&lt;/h3&gt;

&lt;p&gt;The Reddit example used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unraid&lt;/li&gt;
&lt;li&gt;Plex&lt;/li&gt;
&lt;li&gt;Sonarr&lt;/li&gt;
&lt;li&gt;Radarr&lt;/li&gt;
&lt;li&gt;FileBot&lt;/li&gt;
&lt;li&gt;live TV channels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s already enough surface area for a useful agent.&lt;/p&gt;

&lt;p&gt;A media agent does not need cinematic taste.&lt;/p&gt;

&lt;p&gt;It needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detect broken naming&lt;/li&gt;
&lt;li&gt;rename files correctly&lt;/li&gt;
&lt;li&gt;notice missing episodes&lt;/li&gt;
&lt;li&gt;fetch metadata/subtitles&lt;/li&gt;
&lt;li&gt;escalate edge cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This kind of command is more useful than 90% of “AI agent” demos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;filebot &lt;span class="nt"&gt;-rename&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"/input"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--db&lt;/span&gt; TheMovieDB::TV &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-non-strict&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--action&lt;/span&gt; duplicate &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; &lt;span class="s2"&gt;"/output"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s2"&gt;"{plex.id}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s real work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Home automation
&lt;/h3&gt;

&lt;p&gt;Home Assistant already has an OpenAI integration and can control exposed entities through Assist.&lt;/p&gt;

&lt;p&gt;That’s powerful.&lt;/p&gt;

&lt;p&gt;It’s also telling that the docs explicitly warn users to monitor API usage and set limits.&lt;/p&gt;

&lt;p&gt;That warning is not a footnote. It’s a design signal.&lt;/p&gt;

&lt;p&gt;Always-on automation creates lots of small calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  Research and archive tasks
&lt;/h3&gt;

&lt;p&gt;The same Reddit setup included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;archive.org downloads&lt;/li&gt;
&lt;li&gt;ancestry research&lt;/li&gt;
&lt;li&gt;backpacking trip planning&lt;/li&gt;
&lt;li&gt;concert alerts&lt;/li&gt;
&lt;li&gt;RSS monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Again: normal tasks.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;internetarchive&lt;/code&gt; Python library already gives you a clean automation surface.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;internetarchive&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;search_items&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;collection:opensource_movies AND subject:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentary&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;search_items&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;identifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Discord works well for conversational interaction.&lt;br&gt;
Telegram works better for high-priority alerts because it feels distinct from general chat.&lt;/p&gt;

&lt;p&gt;Nothing here is futuristic.&lt;/p&gt;

&lt;p&gt;That’s why it’s credible.&lt;/p&gt;
&lt;h2&gt;
  
  
  The expensive part is not brilliance. It’s idling.
&lt;/h2&gt;

&lt;p&gt;This is the part more devs should care about.&lt;/p&gt;

&lt;p&gt;Persistent agents don’t get expensive because they’re doing one huge, brilliant task.&lt;/p&gt;

&lt;p&gt;They get expensive because they never stop doing small tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;polling&lt;/li&gt;
&lt;li&gt;summarizing&lt;/li&gt;
&lt;li&gt;retrying&lt;/li&gt;
&lt;li&gt;classifying&lt;/li&gt;
&lt;li&gt;checking state&lt;/li&gt;
&lt;li&gt;routing messages&lt;/li&gt;
&lt;li&gt;rewriting outputs&lt;/li&gt;
&lt;li&gt;generating alerts&lt;/li&gt;
&lt;li&gt;logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where token anxiety comes from.&lt;/p&gt;

&lt;p&gt;Not one giant prompt.&lt;/p&gt;

&lt;p&gt;A thousand tiny background calls.&lt;/p&gt;

&lt;p&gt;And this is where the compute model matters much earlier than most people expect.&lt;/p&gt;

&lt;p&gt;If you have 3 to 10 workers doing low-grade activity all day, predictable monthly compute matters more than shaving pennies off a single prompt.&lt;/p&gt;

&lt;p&gt;That’s true for OpenClaw.&lt;br&gt;
It’s true for n8n.&lt;br&gt;
It’s true for Make.&lt;br&gt;
It’s true for Zapier.&lt;br&gt;
It’s true for custom worker fleets.&lt;/p&gt;

&lt;p&gt;Once you have agents running 24/7, per-token pricing becomes annoying in a very specific way: the expensive part is ambient traffic you stop noticing.&lt;/p&gt;

&lt;p&gt;That’s exactly why flat-rate AI is compelling for agent workloads.&lt;/p&gt;

&lt;p&gt;A drop-in OpenAI-compatible API with predictable monthly pricing is just a better fit for always-on systems than babysitting token burn across thousands of tiny calls.&lt;/p&gt;

&lt;p&gt;That’s the core reason Standard Compute is interesting here.&lt;/p&gt;

&lt;p&gt;If you’re building agent-heavy automations, Standard Compute gives you unlimited AI compute at a flat monthly price, works with OpenAI-compatible SDKs and HTTP clients, and removes the need to constantly meter background activity. For persistent workers, retries, summaries, and routing loops, that model makes more sense than per-token billing.&lt;/p&gt;

&lt;p&gt;Not because “unlimited” sounds flashy.&lt;/p&gt;

&lt;p&gt;Because boring background work is exactly what agents do most.&lt;/p&gt;
&lt;h2&gt;
  
  
  Which tool is actually best for what?
&lt;/h2&gt;

&lt;p&gt;Not every part of this job belongs in the same interface.&lt;/p&gt;

&lt;p&gt;My take is pretty simple:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;What it’s actually best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Best control plane for long-running personal ops: self-hosted Gateway, multi-channel access through Discord and Telegram, isolated agent sessions, and task tracking tied to ongoing work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Home Assistant + direct OpenAI integration&lt;/td&gt;
&lt;td&gt;Best for controlling exposed entities and home routines, but weaker for multi-agent coordination because device control is only one part of the system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code or Codex&lt;/td&gt;
&lt;td&gt;Best for code-heavy tasks, upgrades, debugging, and direct developer workflows where you want stronger hands-on execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n / Make / Zapier&lt;/td&gt;
&lt;td&gt;Best for structured workflow automation, SaaS integrations, and event-driven pipelines, but they still need good state management once AI workers run continuously&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If I need a control plane for personal ops across Discord, Telegram, and long-running task state, I’d pick OpenClaw over direct Home Assistant + OpenAI.&lt;/p&gt;

&lt;p&gt;If I need code edits, debugging, or developer execution, I’d pick Claude Code or Codex.&lt;/p&gt;

&lt;p&gt;If I need integration-heavy pipelines, I’d use n8n or Make.&lt;/p&gt;

&lt;p&gt;The mistake is assuming one tool should dominate the whole stack.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I’d build first
&lt;/h2&gt;

&lt;p&gt;If I were building this at home, I would start smaller than the Reddit dream.&lt;/p&gt;

&lt;p&gt;Three agents, not ten.&lt;/p&gt;
&lt;h3&gt;
  
  
  First-pass architecture
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw Gateway on a Mac mini, VM, or home server&lt;/li&gt;
&lt;li&gt;Discord for normal interaction&lt;/li&gt;
&lt;li&gt;Telegram only for high-priority alerts&lt;/li&gt;
&lt;li&gt;One media agent&lt;/li&gt;
&lt;li&gt;One home agent&lt;/li&gt;
&lt;li&gt;One admin agent&lt;/li&gt;
&lt;li&gt;Workboard enabled from day one&lt;/li&gt;
&lt;li&gt;Direct scripts/APIs for execution&lt;/li&gt;
&lt;li&gt;GPT or Claude for planning/summarization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters.&lt;/p&gt;

&lt;p&gt;Use LLMs for planning, summarization, classification, and communication.&lt;br&gt;
Use deterministic tools for execution.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FileBot CLI for file operations&lt;/li&gt;
&lt;li&gt;Home Assistant actions for device control&lt;/li&gt;
&lt;li&gt;Python scripts for archive.org tasks&lt;/li&gt;
&lt;li&gt;Cron/systemd/timers/queue workers for scheduling&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  OpenClaw bootstrap
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Enable Workboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;enable &lt;/span&gt;workboard
openclaw gateway restart
openclaw dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example worker split
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;media&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;responsibilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;plex_health_checks&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;sonarr_radarr_exceptions&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;filebot_renames&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;subtitle_cleanup&lt;/span&gt;
    &lt;span class="na"&gt;notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;telegram_on_blockers&lt;/span&gt;

  &lt;span class="na"&gt;home&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;responsibilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;morning_summary&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;failed_automation_retries&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;device_state_checks&lt;/span&gt;
    &lt;span class="na"&gt;notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;telegram_on_safety_issues&lt;/span&gt;

  &lt;span class="na"&gt;admin&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;responsibilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;inbox_triage&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;reminders&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;follow_up_lists&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;daily_digest&lt;/span&gt;
    &lt;span class="na"&gt;notify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;discord_default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example execution pattern
&lt;/h3&gt;

&lt;p&gt;Keep the LLM out of shell execution as much as possible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rename_media&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path_out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filebot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-rename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path_in&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TheMovieDB::TV&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-non-strict&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duplicate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path_out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{plex.id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model should decide when to call this.&lt;br&gt;
It should not freestyle the command every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real lesson: boring beats autonomous
&lt;/h2&gt;

&lt;p&gt;The strongest pattern in these life-ops setups is almost annoying in how unglamorous it is.&lt;/p&gt;

&lt;p&gt;The winner is not one dazzling autonomous agent.&lt;/p&gt;

&lt;p&gt;It’s a stack of narrow workers doing tiny jobs reliably, with one operator in the middle and a task board keeping everyone honest.&lt;/p&gt;

&lt;p&gt;That has two immediate implications for developers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Treat long-running agents like ops systems, not chat sessions&lt;/li&gt;
&lt;li&gt;Pick a compute model that can tolerate constant low-grade traffic&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your setup includes Plex, Home Assistant, archive.org, Discord, Telegram, RSS, and all the weird admin tasks that pile up around real life, I’d optimize in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;State hygiene&lt;/li&gt;
&lt;li&gt;Session isolation&lt;/li&gt;
&lt;li&gt;Predictable compute&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else comes after that.&lt;/p&gt;

&lt;p&gt;Because the dream is not an agent that feels magical for one weekend.&lt;/p&gt;

&lt;p&gt;It’s an agent that quietly handles boring work for months without wrecking your task state or making you afraid to check your API bill.&lt;/p&gt;

&lt;p&gt;And if you’re already building this kind of thing with OpenAI-compatible tooling, n8n, Make, Zapier, OpenClaw, or custom workers, this is exactly where Standard Compute fits: flat-rate AI compute for always-on agent systems that do lots of small legitimate work all day.&lt;/p&gt;

&lt;p&gt;That’s a much better foundation than pretending your background loops are free.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>automation</category>
      <category>devops</category>
    </item>
    <item>
      <title>My bot kept double-posting and the real bug wasn’t GPT-5</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Fri, 26 Jun 2026 20:56:46 +0000</pubDate>
      <link>https://dev.to/lars_winstand/my-bot-kept-double-posting-and-the-real-bug-wasnt-gpt-5-2apf</link>
      <guid>https://dev.to/lars_winstand/my-bot-kept-double-posting-and-the-real-bug-wasnt-gpt-5-2apf</guid>
      <description>&lt;p&gt;If your agent heartbeat looks healthy but your Telegram or Discord bot still double-posts, the usual culprit is not GPT-5 or Claude failing.&lt;/p&gt;

&lt;p&gt;It’s usually a boring distributed-systems bug:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request times out at 30s&lt;/li&gt;
&lt;li&gt;work actually succeeds at 51.7s&lt;/li&gt;
&lt;li&gt;retry fires&lt;/li&gt;
&lt;li&gt;same side effect happens twice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran into this pattern while reading an r/openclaw thread where someone described the exact failure mode in one line:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;every time the timeout happened, the original message did go through after 50s, AND the retry goes through, so I end up w double messages.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sentence explains a huge percentage of “my AI bot is flaky” bugs.&lt;/p&gt;

&lt;p&gt;Not model instability. Not prompt weirdness. Not GPT-5 being moody.&lt;/p&gt;

&lt;p&gt;Just unsafe retries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug shape
&lt;/h2&gt;

&lt;p&gt;Here’s the typical flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your agent calls GPT-5, Claude Opus, or Qwen&lt;/li&gt;
&lt;li&gt;Inference takes longer than expected&lt;/li&gt;
&lt;li&gt;Your workflow sends the result to Telegram or Discord&lt;/li&gt;
&lt;li&gt;The client times out before it gets the response&lt;/li&gt;
&lt;li&gt;The send actually succeeds anyway&lt;/li&gt;
&lt;li&gt;Your retry posts the same message again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From that OpenClaw thread, the numbers were the giveaway:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gateway timeout after 30000ms&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;message.action 51702ms&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the caller gave up at 30 seconds, but the action appears to have completed at 51.702 seconds.&lt;/p&gt;

&lt;p&gt;So the retry wasn’t crazy. It was doing exactly what the system told it to do.&lt;/p&gt;

&lt;p&gt;The problem is that retries are only safe when the operation is idempotent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rule: retries are fine, side effects are the dangerous part
&lt;/h2&gt;

&lt;p&gt;Retrying compute is usually good.&lt;/p&gt;

&lt;p&gt;Retrying outbound side effects without dedup is how you get duplicate Telegram messages, duplicate Discord posts, duplicate emails, duplicate tickets, and eventually duplicate customer pain.&lt;/p&gt;

&lt;p&gt;This is the distinction I wish more agent builders made:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;What actually happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retry model call&lt;/td&gt;
&lt;td&gt;Usually safe if you can tolerate another inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry webhook or message send&lt;/td&gt;
&lt;td&gt;Dangerous if the first request may have already succeeded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry side effect with idempotency key&lt;/td&gt;
&lt;td&gt;Safe because duplicate attempts resolve to the same operation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A lot of AI reliability bugs are really just distributed systems bugs wearing an LLM costume.&lt;/p&gt;

&lt;h2&gt;
  
  
  What idempotency actually means
&lt;/h2&gt;

&lt;p&gt;The cleanest explanation still comes from Stripe.&lt;/p&gt;

&lt;p&gt;You send a POST request with an &lt;code&gt;Idempotency-Key&lt;/code&gt;. Stripe stores the first result for that key and returns the same status code and body on retries.&lt;/p&gt;

&lt;p&gt;That means the client no longer has to guess whether the first request succeeded.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.stripe.com/v1/customers &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-u&lt;/span&gt; sk_test_...: &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Idempotency-Key: KG5LxwFBepaKHyUD"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nv"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"My First Test Customer"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pattern should be normal for agent side effects too.&lt;/p&gt;

&lt;p&gt;If you’re sending to Telegram Bot API, Discord webhooks, Slack, email, or any external channel, every outbound action should have an operation identity.&lt;/p&gt;

&lt;p&gt;If the API doesn’t support native idempotency, build your own dedup ledger.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agent frameworks make this worse
&lt;/h2&gt;

&lt;p&gt;Because they’re trying to help.&lt;/p&gt;

&lt;p&gt;Temporal retries Activities by default. That’s a good design. But if your Activity includes “post this message to Discord” and that operation isn’t idempotent, retries will happily create duplicates.&lt;/p&gt;

&lt;p&gt;n8n has the same trap with friendlier UI.&lt;/p&gt;

&lt;p&gt;You can turn on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Retry On Fail&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Wait Between Tries&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;error workflows&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;execution.retryOf&lt;/code&gt; for debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All useful.&lt;/p&gt;

&lt;p&gt;None of that makes a Telegram send safe by itself.&lt;/p&gt;

&lt;p&gt;Retry features are not dedup features.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real-world failure mode with Discord
&lt;/h2&gt;

&lt;p&gt;Discord rate limits make this even messier.&lt;/p&gt;

&lt;p&gt;Their limits are dynamic, and the docs tell you to read headers like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;X-RateLimit-Limit&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;X-RateLimit-Remaining&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;X-RateLimit-Reset&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;X-RateLimit-Reset-After&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;X-RateLimit-Bucket&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now combine that with a slow LLM call.&lt;/p&gt;

&lt;p&gt;Say GPT-5 takes 40 seconds because your context window is bloated. Your bot finally sends to Discord. Discord responds with a rate limit or the client times out. Your code treats all of these the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;timeout&lt;/li&gt;
&lt;li&gt;429&lt;/li&gt;
&lt;li&gt;unknown delivery state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it retries immediately.&lt;/p&gt;

&lt;p&gt;That’s how you get tickets like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Discord is randomly duplicating messages”&lt;/li&gt;
&lt;li&gt;“OpenAI must be unstable”&lt;/li&gt;
&lt;li&gt;“My bot posts twice when the model is slow”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No. Your system failed to separate compute retries from side-effect retries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical fix
&lt;/h2&gt;

&lt;p&gt;The best fix I saw in that Reddit discussion was also the least glamorous:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I built a Discord bot that kept double-posting under timeout. Logs were useless until I added a crude dedup key... My timeouts came from the LLM taking 40s+ for long context, so I set a 90s gateway timeout and handled inflight state explicitly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the playbook.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern I’d use every time
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create an operation ID before sending&lt;/li&gt;
&lt;li&gt;Store inflight state&lt;/li&gt;
&lt;li&gt;Use a timeout budget that matches reality&lt;/li&gt;
&lt;li&gt;On retry, check the ledger first&lt;/li&gt;
&lt;li&gt;Treat &lt;code&gt;429&lt;/code&gt; separately from ambiguous timeout&lt;/li&gt;
&lt;li&gt;Record provider response details&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A decent operation ID looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conversation_id + turn_id + channel + message_hash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A decent state model looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pending
sent
failed_unknown
failed_confirmed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Minimal Node example: dedup around a Discord send
&lt;/h2&gt;

&lt;p&gt;Here’s a stripped-down example in Node.js.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node:crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node-fetch&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ledger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;makeOperationId&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;turnId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;turnId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;sendDiscordMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;webhookUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;turnId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;opId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;makeOperationId&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;turnId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;discord&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;deduped&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messageId&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;updatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AbortController&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;90000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;webhookUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;retryAfter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-ratelimit-reset-after&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failed_confirmed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rate_limited&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;updatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Discord rate limited. retry_after=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;retryAfter&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failed_confirmed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`http_&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;updatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Discord send failed: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messageId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`discord:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;updatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;deduped&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;AbortError&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;opId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;failed_unknown&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;timeout_ambiguous&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;updatedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example is intentionally simple, but the important behavior is there:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operation ID is created before send&lt;/li&gt;
&lt;li&gt;send state is recorded&lt;/li&gt;
&lt;li&gt;timeout is explicit&lt;/li&gt;
&lt;li&gt;ambiguous timeout is not treated like confirmed failure&lt;/li&gt;
&lt;li&gt;retries can consult the ledger before posting again&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In production, that ledger should live in Redis, Postgres, DynamoDB, or whatever durable store you already trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better retry decision tree
&lt;/h2&gt;

&lt;p&gt;This is the decision tree I want in every bot codebase:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Did the model call fail?
  -&amp;gt; retry compute if appropriate

Did the outbound send fail with confirmed no-delivery?
  -&amp;gt; retry send

Did the outbound send time out and delivery is unknown?
  -&amp;gt; check ledger / provider state before retrying

Did the outbound send already succeed for this operation ID?
  -&amp;gt; return existing result, do not post again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one distinction cleans up a lot of chaos.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I’d wire this in n8n
&lt;/h2&gt;

&lt;p&gt;If I were fixing this in n8n tomorrow, I’d do three things first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Increase timeout budgets above known long-context inference times.
2. Generate a dedup key for every outbound message action.
3. Log retry lineage with execution.retryOf plus your own operation ID.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A practical n8n pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a Code node to generate &lt;code&gt;operationId&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Check Redis/Postgres before the Telegram or Discord node&lt;/li&gt;
&lt;li&gt;If already sent, short-circuit the workflow&lt;/li&gt;
&lt;li&gt;If not sent, mark &lt;code&gt;pending&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Send message&lt;/li&gt;
&lt;li&gt;Mark &lt;code&gt;sent&lt;/code&gt; with provider response details&lt;/li&gt;
&lt;li&gt;On timeout or ambiguous error, mark &lt;code&gt;failed_unknown&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a lot more useful than staring at a green heartbeat and blaming Claude.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I’d wire this in Temporal
&lt;/h2&gt;

&lt;p&gt;In Temporal, I’d keep LLM calls and outbound side effects separate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Put inference in an Activity with retries&lt;/li&gt;
&lt;li&gt;Put message delivery in another Activity&lt;/li&gt;
&lt;li&gt;Make the delivery Activity idempotent&lt;/li&gt;
&lt;li&gt;Use an operation ID as part of the Activity input&lt;/li&gt;
&lt;li&gt;Persist send results somewhere durable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mistake is putting “generate + send” in one retrying Activity and hoping the retries behave nicely.&lt;/p&gt;

&lt;p&gt;They won’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sometimes the model really is slow
&lt;/h2&gt;

&lt;p&gt;To be fair, sometimes the model is part of the problem.&lt;/p&gt;

&lt;p&gt;OpenAI, Anthropic, local Qwen, local Llama, whatever you’re using—any of them can get slow under long context, load, memory pressure, or provider throttling.&lt;/p&gt;

&lt;p&gt;Idempotency won’t make inference faster.&lt;/p&gt;

&lt;p&gt;What it does do is stop your workflow from turning slow inference into duplicate side effects.&lt;/p&gt;

&lt;p&gt;That matters even more when you’re running agents at scale.&lt;/p&gt;

&lt;p&gt;If you’re using a setup with predictable flat-rate AI access instead of per-token billing, you’re usually more willing to let agents run, retry, and handle bigger workloads. That’s great for throughput. It also means you need better retry hygiene, because aggressive automation amplifies bad side-effect handling fast.&lt;/p&gt;

&lt;p&gt;That’s one reason I like what Standard Compute is doing: it removes the per-token paranoia that makes teams under-build automations, but it also makes the engineering tradeoff more obvious. Once compute is cheap and predictable, workflow correctness becomes the bottleneck.&lt;/p&gt;

&lt;p&gt;And workflow correctness starts with not posting the same message twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring takeaway that actually fixes the bug
&lt;/h2&gt;

&lt;p&gt;If your bot talks to Telegram or Discord, treat every outbound message like a payment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;give it an identity&lt;/li&gt;
&lt;li&gt;assume retries will happen&lt;/li&gt;
&lt;li&gt;store delivery state&lt;/li&gt;
&lt;li&gt;distinguish confirmed failure from unknown outcome&lt;/li&gt;
&lt;li&gt;never confuse “I didn’t get a response” with “the action did not happen”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of the ugly “AI reliability” bugs I see are still old distributed-systems bugs.&lt;/p&gt;

&lt;p&gt;Honestly, that’s good news.&lt;/p&gt;

&lt;p&gt;Because you can fix those today.&lt;/p&gt;

&lt;p&gt;You do not need GPT-6 to stop your bot from double-posting.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>automation</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I thought I needed a better model for 10 agents, but I really needed a queue</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Fri, 26 Jun 2026 04:56:49 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-thought-i-needed-a-better-model-for-10-agents-but-i-really-needed-a-queue-2nnk</link>
      <guid>https://dev.to/lars_winstand/i-thought-i-needed-a-better-model-for-10-agents-but-i-really-needed-a-queue-2nnk</guid>
      <description>&lt;p&gt;If you’re running 10+ agents at once, the bottleneck usually isn’t model quality.&lt;/p&gt;

&lt;p&gt;It’s shared execution capacity.&lt;/p&gt;

&lt;p&gt;Org-level API limits. Browser/runtime contention. Chat-style subscriptions that look fine at 2 conversations and start getting weird at 6-8.&lt;/p&gt;

&lt;p&gt;The fix is usually boring: queueing, worker isolation, retries, and explicit concurrency control.&lt;/p&gt;

&lt;p&gt;I keep seeing teams ask for the &lt;em&gt;best model for agents&lt;/em&gt; when their setup starts failing in a very specific way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one agent pauses mid-task&lt;/li&gt;
&lt;li&gt;one thread keeps going while another goes silent&lt;/li&gt;
&lt;li&gt;a Telegram topic looks dead until you send &lt;code&gt;?&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;then it suddenly wakes up and continues like nothing happened&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That does &lt;strong&gt;not&lt;/strong&gt; look like a model-quality problem.&lt;/p&gt;

&lt;p&gt;That looks like a scheduling problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reddit thread that explains the failure mode perfectly
&lt;/h2&gt;

&lt;p&gt;I ran across a thread on r/openclaw that described this better than most polished architecture posts do:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://reddit.com/r/openclaw/comments/1ufd864/how_to_run_10_agents_at_the_same_time_while/" rel="noopener noreferrer"&gt;https://reddit.com/r/openclaw/comments/1ufd864/how_to_run_10_agents_at_the_same_time_while/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The setup was very concrete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 topic threads&lt;/li&gt;
&lt;li&gt;one per app&lt;/li&gt;
&lt;li&gt;running through OpenClaw&lt;/li&gt;
&lt;li&gt;inside a Telegram supergroup&lt;/li&gt;
&lt;li&gt;on a 16 GB Hetzner VPS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the symptom was painfully familiar:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;As my number of simultaneous conversations increases, I've noticed that sometimes the agent just stops responding entirely in some topics. It won't continue until I send another message (even just a '?'), after which it suddenly picks the conversation back up.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That bug tells you a lot.&lt;/p&gt;

&lt;p&gt;Most people see it and blame the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;maybe Claude got flaky&lt;/li&gt;
&lt;li&gt;maybe GPT-5 is overloaded&lt;/li&gt;
&lt;li&gt;maybe Qwen would be better&lt;/li&gt;
&lt;li&gt;maybe Llama would behave differently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My take: most of the time, that diagnosis is wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stall is the clue
&lt;/h2&gt;

&lt;p&gt;What makes this interesting is that the visible failure looks like “the model stopped thinking.”&lt;/p&gt;

&lt;p&gt;But usually the deeper problem is that too many things are sharing one bottleneck.&lt;/p&gt;

&lt;p&gt;A commenter in that same thread said this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you're using Claude CLI (ie max sub), you're basically limited to ~6-8 concurrent agents working at the same time. More will stall each other/wait for others to finish.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wouldn’t treat &lt;code&gt;6-8&lt;/code&gt; as some universal law.&lt;/p&gt;

&lt;p&gt;But I absolutely believe the pattern.&lt;/p&gt;

&lt;p&gt;Chat subscriptions are built for humans opening a few conversations.&lt;/p&gt;

&lt;p&gt;They are not execution systems.&lt;/p&gt;

&lt;p&gt;Once you move into real parallelism, the question stops being:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;what’s the best model for agents?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;what exactly is sharing capacity with what?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s where most agent stacks fall apart.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is actually being shared?
&lt;/h2&gt;

&lt;p&gt;Usually it’s not one thing. It’s three.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Provider-side rate limits
&lt;/h2&gt;

&lt;p&gt;OpenAI rate limits are enforced at the organization and project level, not per chat window. Some model families also share limits.&lt;/p&gt;

&lt;p&gt;That matters a lot more than people expect.&lt;/p&gt;

&lt;p&gt;If Agent A is hammering GPT-5.4 and Agent B is quietly summarizing logs, those requests can still interfere with each other if they draw from the same org-level bucket.&lt;/p&gt;

&lt;p&gt;From the outside, it looks random.&lt;/p&gt;

&lt;p&gt;From the inside, it’s just shared quota.&lt;/p&gt;

&lt;p&gt;A simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Agent 1 is doing heavy extraction&lt;/span&gt;
&lt;span class="c"&gt;# Agent 2 is doing tiny summaries&lt;/span&gt;
&lt;span class="c"&gt;# Both still hit the same org/project limits&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don’t have backpressure, one noisy worker can make the rest of the system look flaky.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Local runtime contention
&lt;/h2&gt;

&lt;p&gt;The Reddit replies also pointed at the other obvious culprit: the machine itself.&lt;/p&gt;

&lt;p&gt;If you’re running OpenClaw with shared Chromium state, long transcripts, tool calls, and multiple active sessions on a 16 GB VPS, you do not need a provider outage to get stalls.&lt;/p&gt;

&lt;p&gt;You just need enough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory pressure&lt;/li&gt;
&lt;li&gt;event loop contention&lt;/li&gt;
&lt;li&gt;I/O wait&lt;/li&gt;
&lt;li&gt;browser state bloat&lt;/li&gt;
&lt;li&gt;session overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One commenter asked the right question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is every topic a new session? I find the only reason my agents stop is because memory overhead has been reach. Especially on VPS.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not glamorous, but it’s probably closer to the truth than “the model got confused.”&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Chat-session architecture
&lt;/h2&gt;

&lt;p&gt;This is the sneaky one.&lt;/p&gt;

&lt;p&gt;A chat subscription &lt;em&gt;feels&lt;/em&gt; like an execution environment because you can open lots of threads.&lt;/p&gt;

&lt;p&gt;But visible threads are not the same thing as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a queue&lt;/li&gt;
&lt;li&gt;worker pools&lt;/li&gt;
&lt;li&gt;retry policies&lt;/li&gt;
&lt;li&gt;dead-letter handling&lt;/li&gt;
&lt;li&gt;admission control&lt;/li&gt;
&lt;li&gt;explicit concurrency caps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 2 conversations, the difference barely matters.&lt;/p&gt;

&lt;p&gt;At 12, it matters a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why n8n hits the same wall
&lt;/h2&gt;

&lt;p&gt;This is not just an OpenClaw problem.&lt;/p&gt;

&lt;p&gt;It’s an architecture problem.&lt;/p&gt;

&lt;p&gt;n8n says it pretty clearly in the docs: if you allow too many concurrent executions in regular mode, you can thrash the event loop and make the instance unresponsive.&lt;/p&gt;

&lt;p&gt;That sentence is refreshingly unsexy, and also exactly correct.&lt;/p&gt;

&lt;p&gt;What happens in practice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;one workflow gets busy&lt;/li&gt;
&lt;li&gt;another webhook comes in&lt;/li&gt;
&lt;li&gt;then another&lt;/li&gt;
&lt;li&gt;CPU and memory get noisy&lt;/li&gt;
&lt;li&gt;the event loop gets hammered&lt;/li&gt;
&lt;li&gt;suddenly “AI is unreliable”&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;Your scheduler is unreliable.&lt;/p&gt;

&lt;p&gt;n8n’s answer was not “switch to a smarter model.”&lt;/p&gt;

&lt;p&gt;It was concurrency control and queue mode.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;N8N_CONCURRENCY_PRODUCTION_LIMIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one env var tells you a lot.&lt;/p&gt;

&lt;p&gt;Mature workflow systems assume there must be an admission gate.&lt;/p&gt;

&lt;p&gt;Because if everything can run immediately, eventually nothing runs well.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural shift: chat threads vs queued work
&lt;/h2&gt;

&lt;p&gt;The clean break is the queue.&lt;/p&gt;

&lt;p&gt;In n8n queue mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the main instance accepts triggers and webhooks&lt;/li&gt;
&lt;li&gt;Redis stores pending executions&lt;/li&gt;
&lt;li&gt;worker instances pull jobs when capacity is available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a completely different model from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I opened 10 Telegram conversations and hoped OpenClaw, Chromium, Claude, and my VPS would sort it out.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The config makes the difference obvious:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;EXECUTIONS_MODE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;queue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run workers with explicit concurrency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;n8n worker &lt;span class="nt"&gt;--concurrency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s boring infrastructure.&lt;/p&gt;

&lt;p&gt;Which is exactly why it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What happens under load&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chat subscription workflow&lt;/td&gt;
&lt;td&gt;Shared interactive-session limits, weak control over queueing and retries, simple for 1-2 conversations, starts stalling under parallel agent load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct API workflow&lt;/td&gt;
&lt;td&gt;Explicit RPM/TPM and org/project limits, can add queues, workers, retries, and backpressure, but token costs rise with usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n regular mode vs queue mode&lt;/td&gt;
&lt;td&gt;Regular mode can become unresponsive under high concurrency, queue mode separates intake from execution using Redis and workers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That middle row is where a lot of teams have their “oh” moment.&lt;/p&gt;

&lt;p&gt;They think they’re shopping for intelligence.&lt;/p&gt;

&lt;p&gt;They’re actually shopping for throughput discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The annoying part: the API architecture is better, but the bill can get ugly
&lt;/h2&gt;

&lt;p&gt;This is where things get real.&lt;/p&gt;

&lt;p&gt;Per-token pricing feels fine when you’re testing one agent in a notebook.&lt;/p&gt;

&lt;p&gt;It feels very different once you fix concurrency and your workers are actually running all day.&lt;/p&gt;

&lt;p&gt;That’s the trap.&lt;/p&gt;

&lt;p&gt;You finally build the system correctly, and now your token bill starts acting like a second outage.&lt;/p&gt;

&lt;p&gt;So the decision stops being just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which model is smartest?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what gives me quality?&lt;/li&gt;
&lt;li&gt;what gives me stable throughput?&lt;/li&gt;
&lt;li&gt;what gives me predictable cost?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why this category is getting interesting.&lt;/p&gt;

&lt;p&gt;A lot of teams want API-style control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI-compatible endpoints&lt;/li&gt;
&lt;li&gt;real queues and workers&lt;/li&gt;
&lt;li&gt;retries and backpressure&lt;/li&gt;
&lt;li&gt;existing SDK support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But they do &lt;strong&gt;not&lt;/strong&gt; want per-token anxiety every time they add more automations.&lt;/p&gt;

&lt;p&gt;That’s exactly the gap Standard Compute is aiming at.&lt;/p&gt;

&lt;p&gt;It gives you an OpenAI-compatible API for agent and automation workloads, but with flat monthly pricing instead of metered token billing.&lt;/p&gt;

&lt;p&gt;So you can build the architecture you actually want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API-based execution&lt;/li&gt;
&lt;li&gt;explicit concurrency control&lt;/li&gt;
&lt;li&gt;long-running automations&lt;/li&gt;
&lt;li&gt;predictable cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters a lot if you’re running agents in n8n, Make, Zapier, OpenClaw, or custom worker systems and you’re tired of choosing between flaky chat subscriptions and scary token bills.&lt;/p&gt;

&lt;p&gt;More here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://standardcompute.com" rel="noopener noreferrer"&gt;https://standardcompute.com&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would do for 10+ agents
&lt;/h2&gt;

&lt;p&gt;If you need real concurrency, here’s the setup I’d reach for.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Separate intake from execution
&lt;/h2&gt;

&lt;p&gt;Do not let incoming work immediately compete with currently running work.&lt;/p&gt;

&lt;p&gt;Use a queue.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n queue mode&lt;/li&gt;
&lt;li&gt;BullMQ&lt;/li&gt;
&lt;li&gt;Celery&lt;/li&gt;
&lt;li&gt;SQS&lt;/li&gt;
&lt;li&gt;RabbitMQ&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example with BullMQ:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Worker&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bullmq&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agents&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6379&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;run-agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agent-7&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;summarize support tickets&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;worker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Worker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agents&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nx"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// call model API here&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`running &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;localhost&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6379&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is simple: intake should be cheap, execution should be bounded.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Put hard caps on concurrency
&lt;/h2&gt;

&lt;p&gt;Not vibes. Numbers.&lt;/p&gt;

&lt;p&gt;If your box can safely run 8 workers, set 8.&lt;/p&gt;

&lt;p&gt;If your provider quota supports 20 active requests with headroom, cap at 20.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;N8N_CONCURRENCY_PRODUCTION_LIMIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;n8n worker &lt;span class="nt"&gt;--concurrency&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_PARALLEL_AGENTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The goal is not “maximum possible parallelism.”&lt;/p&gt;

&lt;p&gt;The goal is stable throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Isolate heavy sessions
&lt;/h2&gt;

&lt;p&gt;Not every agent belongs in the same lane.&lt;/p&gt;

&lt;p&gt;A scraping agent opening 40 tabs in Chromium should not share execution capacity with a tiny summarizer that just needs a few API calls.&lt;/p&gt;

&lt;p&gt;Split workloads by resource profile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;browser-heavy&lt;/li&gt;
&lt;li&gt;memory-heavy&lt;/li&gt;
&lt;li&gt;long-context&lt;/li&gt;
&lt;li&gt;lightweight text transforms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That alone fixes a lot of “random” instability.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Treat retries as a first-class feature
&lt;/h2&gt;

&lt;p&gt;If an agent only resumes after you send &lt;code&gt;?&lt;/code&gt;, you already have a retry system.&lt;/p&gt;

&lt;p&gt;It’s just a bad one, because the retry operator is a human.&lt;/p&gt;

&lt;p&gt;Build explicit handling for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;timeouts&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;stuck executions&lt;/li&gt;
&lt;li&gt;dead-letter queues&lt;/li&gt;
&lt;li&gt;idempotency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A rough pseudo-pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runWithRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;task&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is infinitely better than hoping a Telegram poke wakes the agent back up.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Measure the real bottleneck
&lt;/h2&gt;

&lt;p&gt;If you don’t know what saturates first, you’ll keep blaming the model.&lt;/p&gt;

&lt;p&gt;At minimum, track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queue depth&lt;/li&gt;
&lt;li&gt;worker utilization&lt;/li&gt;
&lt;li&gt;provider RPM/TPM errors&lt;/li&gt;
&lt;li&gt;memory usage&lt;/li&gt;
&lt;li&gt;CPU load&lt;/li&gt;
&lt;li&gt;transcript length&lt;/li&gt;
&lt;li&gt;browser/session count&lt;/li&gt;
&lt;li&gt;retry rate&lt;/li&gt;
&lt;li&gt;stuck job count&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If queue depth is climbing while workers are pinned, that’s a worker-capacity problem.&lt;/p&gt;

&lt;p&gt;If workers are idle but requests are failing, that’s probably provider-side limits.&lt;/p&gt;

&lt;p&gt;If memory spikes correlate with browser-heavy tasks, that’s local contention.&lt;/p&gt;

&lt;p&gt;This stuff is diagnosable if you instrument it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do subscriptions still have a place?
&lt;/h2&gt;

&lt;p&gt;Definitely.&lt;/p&gt;

&lt;p&gt;If you’re one person running one or two long-lived chats, Claude Max or ChatGPT can be great.&lt;/p&gt;

&lt;p&gt;That’s real value.&lt;/p&gt;

&lt;p&gt;But the breakpoint arrives earlier than people think.&lt;/p&gt;

&lt;p&gt;Once you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parallelism&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;isolation&lt;/li&gt;
&lt;li&gt;predictable throughput&lt;/li&gt;
&lt;li&gt;cost control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…you’re no longer doing chat.&lt;/p&gt;

&lt;p&gt;You’re doing distributed work.&lt;/p&gt;

&lt;p&gt;Even if the UI still happens to be Telegram, Discord, or a browser tab.&lt;/p&gt;

&lt;p&gt;And distributed work punishes wishful thinking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The uncomfortable truth: a worse model can win
&lt;/h2&gt;

&lt;p&gt;This is the part people hate hearing.&lt;/p&gt;

&lt;p&gt;A slightly worse model running behind a clean queue with stable workers will often beat a better model trapped inside a shared, stall-prone chat setup.&lt;/p&gt;

&lt;p&gt;Not on benchmark screenshots.&lt;/p&gt;

&lt;p&gt;On actual throughput.&lt;/p&gt;

&lt;p&gt;On actual reliability.&lt;/p&gt;

&lt;p&gt;On actual unattended automation.&lt;/p&gt;

&lt;p&gt;That’s the real lesson from the OpenClaw thread.&lt;/p&gt;

&lt;p&gt;The user did not primarily have a “which model is smartest?” problem.&lt;/p&gt;

&lt;p&gt;They had a concurrency architecture problem wearing a model-shaped mask.&lt;/p&gt;

&lt;p&gt;Once you see that, a lot of agent weirdness gets easier to debug.&lt;/p&gt;

&lt;p&gt;If your 10th agent makes your 3rd one freeze, stop shopping for magic prompts.&lt;/p&gt;

&lt;p&gt;Stop rotating between Claude, GPT-5, Qwen, and Llama hoping one of them will rescue a blocked queue.&lt;/p&gt;

&lt;p&gt;Build the queue first.&lt;/p&gt;

&lt;p&gt;Then pick the model.&lt;/p&gt;

&lt;p&gt;And if you want API-style control without token-billing anxiety, that’s the whole pitch behind Standard Compute: OpenAI-compatible API access for agent workloads, flat monthly pricing, and no need to babysit every token while your automations run.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>n8n</category>
      <category>devops</category>
    </item>
    <item>
      <title>This OpenClaw 6.10 thread got 50 comments and the weird part is everyone is arguing about boring fixes</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 25 Jun 2026 20:56:19 +0000</pubDate>
      <link>https://dev.to/lars_winstand/this-openclaw-610-thread-got-50-comments-and-the-weird-part-is-everyone-is-arguing-about-boring-3p09</link>
      <guid>https://dev.to/lars_winstand/this-openclaw-610-thread-got-50-comments-and-the-weird-part-is-everyone-is-arguing-about-boring-3p09</guid>
      <description>&lt;p&gt;A post on &lt;a href="https://reddit.com/r/openclaw/comments/1ueyfzq/openclaw_610/" rel="noopener noreferrer"&gt;r/openclaw about OpenClaw 6.10&lt;/a&gt; pulled in &lt;strong&gt;22 upvotes and 50 comments&lt;/strong&gt; over a release that, on paper, looks tiny.&lt;/p&gt;

&lt;p&gt;No big autonomy demo. No “your agent now runs your life” feature. Just &lt;strong&gt;12 merged PRs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And yet the thread got heated.&lt;/p&gt;

&lt;p&gt;That usually means one thing: the release touched the stuff that actually breaks production workflows.&lt;/p&gt;

&lt;p&gt;If you run OpenClaw across &lt;strong&gt;Slack&lt;/strong&gt;, &lt;strong&gt;Discord&lt;/strong&gt;, &lt;strong&gt;Telegram&lt;/strong&gt;, &lt;strong&gt;WhatsApp&lt;/strong&gt;, or internal automation surfaces, this kind of release matters more than the flashy ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why people cared about a "boring" release
&lt;/h2&gt;

&lt;p&gt;One commenter framed it well:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“6.10 only has 12 merged PRs, so this is more of a targeted cleanup than a big feature drop:)”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s exactly why it got attention.&lt;/p&gt;

&lt;p&gt;OpenClaw 6.10 is small by PR count, but not small by impact. The changes hit failure-prone paths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;automatic &lt;strong&gt;fast mode&lt;/strong&gt; for short turns&lt;/li&gt;
&lt;li&gt;routing metadata fixes for &lt;strong&gt;Zai&lt;/strong&gt; and &lt;strong&gt;GLM&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;cleanup for stale &lt;strong&gt;session/channel origin&lt;/strong&gt; state&lt;/li&gt;
&lt;li&gt;keeping &lt;strong&gt;cron delivery&lt;/strong&gt; attached to the correct session&lt;/li&gt;
&lt;li&gt;preserving &lt;strong&gt;trusted tool policies&lt;/strong&gt; in composed hook registries&lt;/li&gt;
&lt;li&gt;updated provider plugin onboarding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That list is dry. It’s also the difference between “my agent works” and “my agent silently did the wrong thing three hours ago.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The real theme: state correctness
&lt;/h2&gt;

&lt;p&gt;Reading both the release notes and the Reddit thread, the pattern is obvious.&lt;/p&gt;

&lt;p&gt;This release is about &lt;strong&gt;state correctness&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not smarter models.&lt;br&gt;
Not longer memory.&lt;br&gt;
Not more autonomy.&lt;/p&gt;

&lt;p&gt;Just making sure OpenClaw carries the &lt;strong&gt;right&lt;/strong&gt; state across retries, fallbacks, channel switches, and scheduled deliveries.&lt;/p&gt;

&lt;p&gt;That sounds boring until you’ve debugged one of these bugs.&lt;/p&gt;
&lt;h3&gt;
  
  
  The fixes all rhyme
&lt;/h3&gt;

&lt;p&gt;Here’s the shape of the release:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast-mode state should survive &lt;strong&gt;retries&lt;/strong&gt;, &lt;strong&gt;fallbacks&lt;/strong&gt;, and &lt;strong&gt;progress events&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;channel switches should reset stale &lt;strong&gt;origin fields&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;cron delivery should stay attached to the &lt;strong&gt;target session&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;composed hook registries should preserve &lt;strong&gt;trusted tool policies&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;provider routing should behave under &lt;strong&gt;live-discovered models&lt;/strong&gt; and &lt;strong&gt;overload conditions&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not random cleanup. That’s one engineering priority repeated in different places:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;stop carrying the wrong state across boundaries&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you’ve ever had a bot reply into the wrong thread, use the wrong context, or trigger a tool under the wrong policy, you know why this matters.&lt;/p&gt;
&lt;h2&gt;
  
  
  The kind of bug 6.10 is trying to prevent
&lt;/h2&gt;

&lt;p&gt;This is the class of issue I think 6.10 is aimed at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. User talks to OpenClaw in Discord
2. Session metadata is updated
3. A retry or fallback path fires
4. A cron job wakes up later
5. Delivery lands in the wrong session or with stale origin data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing crashes.&lt;br&gt;
Nothing obvious fails.&lt;br&gt;
The logs are noisy but plausible.&lt;/p&gt;

&lt;p&gt;And now your automation is “working” while doing the wrong thing.&lt;/p&gt;

&lt;p&gt;That’s worse than a hard failure.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why some users are still annoyed
&lt;/h2&gt;

&lt;p&gt;Because reliability releases get judged against history.&lt;/p&gt;

&lt;p&gt;Not just code history. User history.&lt;/p&gt;

&lt;p&gt;If someone has been burned by previous upgrades, they don’t care that this release is technically cleaner. They hear “please update again” and think “great, what subtle thing breaks next?”&lt;/p&gt;

&lt;p&gt;That showed up in the thread too.&lt;/p&gt;

&lt;p&gt;One of the most revealing comments wasn’t about provider failover at all. It was a user asking for the return of the &lt;strong&gt;chat export button&lt;/strong&gt; because chats expire and export was their backup plan.&lt;/p&gt;

&lt;p&gt;That tells you something important:&lt;/p&gt;

&lt;p&gt;For real users, reliability is not just uptime. It’s &lt;strong&gt;workflow continuity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If your team has to invent manual backup habits around missing UX or brittle state handling, trust is already thin.&lt;/p&gt;

&lt;p&gt;Another user mentioned using the &lt;strong&gt;lossless-claw&lt;/strong&gt; extension to store messages in local &lt;strong&gt;SQLite&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s clever. It’s also a signal. Communities build these patches when they don’t fully trust the default path.&lt;/p&gt;
&lt;h2&gt;
  
  
  Provider abstraction is easy to market and hard to operate
&lt;/h2&gt;

&lt;p&gt;OpenClaw’s value is partly that it can sit in front of multiple providers and models.&lt;/p&gt;

&lt;p&gt;That sounds great in a README. It’s much harder in production.&lt;/p&gt;

&lt;p&gt;Version 6.10 specifically improves things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zai&lt;/strong&gt; base URL handling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM&lt;/strong&gt; overload failover&lt;/li&gt;
&lt;li&gt;native &lt;strong&gt;reasoning-level&lt;/strong&gt; selection through the active runtime catalog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because “works with multiple providers” is not the same as “keeps working when one provider gets weird.”&lt;/p&gt;

&lt;p&gt;Anybody can claim model-agnostic routing across &lt;strong&gt;OpenAI&lt;/strong&gt;, &lt;strong&gt;Anthropic&lt;/strong&gt;, &lt;strong&gt;GLM&lt;/strong&gt;, or &lt;strong&gt;OpenRouter&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The hard part is surviving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dynamic model discovery&lt;/li&gt;
&lt;li&gt;metadata drift&lt;/li&gt;
&lt;li&gt;flaky provider responses&lt;/li&gt;
&lt;li&gt;overload conditions&lt;/li&gt;
&lt;li&gt;retries and fallback chains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why I think this release got more attention than its PR count suggests.&lt;/p&gt;
&lt;h2&gt;
  
  
  The bigger problem: concurrency is often not OpenClaw’s fault
&lt;/h2&gt;

&lt;p&gt;A separate &lt;a href="https://reddit.com/r/openclaw/comments/1ufd864/how_to_run_10_agents_at_the_same_time_while/" rel="noopener noreferrer"&gt;r/openclaw thread about running 10+ agents at the same time&lt;/a&gt; had a useful comment:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Depends on your provider. If you're using Claude CLI (ie max sub), you're basically limited to ~6-8 concurrent agents working at the same time. More will stall each other/wait for others to finish.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the other half of the story.&lt;/p&gt;

&lt;p&gt;Not every reliability issue is a gateway bug.&lt;/p&gt;

&lt;p&gt;Sometimes the bottleneck is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;provider-side concurrency limits&lt;/li&gt;
&lt;li&gt;weak tool-calling behavior in the underlying model&lt;/li&gt;
&lt;li&gt;local hardware constraints&lt;/li&gt;
&lt;li&gt;too many parallel agents for the stack you built&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So yes, OpenClaw 6.10 helps.&lt;/p&gt;

&lt;p&gt;But no, it does not repeal provider limits.&lt;/p&gt;

&lt;p&gt;This is exactly where teams start looking for a different API layer.&lt;/p&gt;

&lt;p&gt;If your automations run 24/7, token pricing and provider quirks become operational problems, not just billing details. That’s why I think a lot of developers eventually want an OpenAI-compatible layer that can absorb model routing, failover, and throughput management without making them babysit token spend.&lt;/p&gt;

&lt;p&gt;That’s basically the appeal of &lt;strong&gt;Standard Compute&lt;/strong&gt;: one predictable monthly price, OpenAI-compatible API, and dynamic routing across models like &lt;strong&gt;GPT-5.4&lt;/strong&gt;, &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt;, and &lt;strong&gt;Grok 4.20&lt;/strong&gt; so your agents can keep running without per-token anxiety.&lt;/p&gt;

&lt;p&gt;If you’re already using OpenClaw as a gateway, that kind of backend matters more than another UI tweak.&lt;/p&gt;
&lt;h2&gt;
  
  
  Local box vs cloud backend
&lt;/h2&gt;

&lt;p&gt;The OpenClaw community still has a fun range of setups.&lt;/p&gt;

&lt;p&gt;Some people run it on old local hardware. Others use a &lt;strong&gt;Mac Mini&lt;/strong&gt;. Others use OpenClaw as the front door while cloud models do the heavy lifting.&lt;/p&gt;

&lt;p&gt;The tradeoffs are pretty straightforward:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;What you really get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw 6.10&lt;/td&gt;
&lt;td&gt;Better state handling, better failover behavior, fewer subtle routing mistakes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local OpenClaw on Raspberry Pi or Mac Mini&lt;/td&gt;
&lt;td&gt;More control and privacy, but you own hardware limits and maintenance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud OpenAI-compatible backend behind OpenClaw&lt;/td&gt;
&lt;td&gt;Less local maintenance, easier scaling for agents, but provider cost and throttling can become the real bottleneck&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My take: &lt;strong&gt;6.10 matters most when OpenClaw is connected to cloud models and real automations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s where state bugs get expensive fast.&lt;/p&gt;
&lt;h2&gt;
  
  
  Should you upgrade?
&lt;/h2&gt;

&lt;p&gt;If you use OpenClaw for real workflows, I’d say yes.&lt;/p&gt;

&lt;p&gt;Not because 6.10 is exciting.&lt;br&gt;
Because it targets exactly the bugs that are hardest to detect before they hurt you.&lt;/p&gt;

&lt;p&gt;I would care a lot more about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stale origin cleanup&lt;/li&gt;
&lt;li&gt;cron delivery binding&lt;/li&gt;
&lt;li&gt;trusted policy preservation&lt;/li&gt;
&lt;li&gt;provider failover behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;than I would about another demo-friendly “agentic” feature.&lt;/p&gt;
&lt;h2&gt;
  
  
  Practical upgrade checklist
&lt;/h2&gt;

&lt;p&gt;If you do upgrade, keep it boring.&lt;/p&gt;
&lt;h3&gt;
  
  
  1) Check your Node version
&lt;/h3&gt;

&lt;p&gt;OpenClaw docs recommend &lt;strong&gt;Node 24&lt;/strong&gt; or &lt;strong&gt;Node 22.19+&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node &lt;span class="nt"&gt;-v&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2) Upgrade cleanly
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3) Reinstall or verify the daemon
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4) Check status before trusting production workflows
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5) Run the gateway in the foreground if you want to inspect behavior
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw gateway &lt;span class="nt"&gt;--port&lt;/span&gt; 18789 &lt;span class="nt"&gt;--verbose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I’d test after upgrading
&lt;/h2&gt;

&lt;p&gt;If I were running OpenClaw in production, I’d validate these paths explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Retry after provider timeout
- Fallback from one provider to another
- Channel switch between Slack/Discord/Telegram
- Cron-triggered delivery into an existing session
- Tool execution under trusted policy rules
- Short-turn conversations that should trigger fast mode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if you want to be disciplined, write a quick smoke-test checklist for your own setup.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# pseudo-checklist&lt;/span&gt;
&lt;span class="c"&gt;# 1. send message in channel A&lt;/span&gt;
&lt;span class="c"&gt;# 2. switch to channel B&lt;/span&gt;
&lt;span class="c"&gt;# 3. trigger fallback provider&lt;/span&gt;
&lt;span class="c"&gt;# 4. run scheduled task&lt;/span&gt;
&lt;span class="c"&gt;# 5. verify session IDs and delivery targets in logs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;The happy commenters are mostly right.&lt;/p&gt;

&lt;p&gt;OpenClaw 6.10 is not a breakthrough release. It’s a &lt;strong&gt;stability pass for the parts that break real automations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is more valuable than a lot of headline-chasing AI releases.&lt;/p&gt;

&lt;p&gt;The frustrated commenters are right too, though. Trust in infrastructure is cumulative. Once users start building SQLite-based retention workarounds and backup habits, they’re telling you the paper cuts have stacked up.&lt;/p&gt;

&lt;p&gt;So the thread was never really about 12 PRs.&lt;/p&gt;

&lt;p&gt;It was about whether OpenClaw is becoming the kind of gateway you can leave alone.&lt;/p&gt;

&lt;p&gt;6.10 doesn’t fully answer that.&lt;/p&gt;

&lt;p&gt;But it does point in the right direction: fewer flashy promises, more discipline around state, routing, and delivery.&lt;/p&gt;

&lt;p&gt;And if you’re building agent workflows on top of cloud models, I think that naturally leads to the next question:&lt;/p&gt;

&lt;p&gt;Do you also want your &lt;strong&gt;backend API layer&lt;/strong&gt; to be boring in the same way?&lt;/p&gt;

&lt;p&gt;Because for teams running automations nonstop, the real win is not just a stable gateway. It’s a stable gateway plus a predictable compute backend.&lt;/p&gt;

&lt;p&gt;That’s the part I think more developers are waking up to.&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>automation</category>
      <category>devops</category>
    </item>
    <item>
      <title>r/openclaw apologized for moderating too hard and honestly that tells you everything</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 25 Jun 2026 12:56:47 +0000</pubDate>
      <link>https://dev.to/lars_winstand/ropenclaw-apologized-for-moderating-too-hard-and-honestly-that-tells-you-everything-2579</link>
      <guid>https://dev.to/lars_winstand/ropenclaw-apologized-for-moderating-too-hard-and-honestly-that-tells-you-everything-2579</guid>
      <description>&lt;p&gt;A 41-upvote, 19-comment post on r/openclaw called &lt;a href="https://reddit.com/r/openclaw/comments/1uezqx6/were_sorry/" rel="noopener noreferrer"&gt;“We’re sorry”&lt;/a&gt; ended up being more interesting than most product announcements.&lt;/p&gt;

&lt;p&gt;It wasn’t a release post.&lt;/p&gt;

&lt;p&gt;It was a moderator admitting the subreddit had become too restrictive, removing word blocks, simplifying rules, and turning images and links back on.&lt;/p&gt;

&lt;p&gt;That sounds minor.&lt;/p&gt;

&lt;p&gt;It isn’t.&lt;/p&gt;

&lt;p&gt;For a project like OpenClaw, moderation policy is basically part of the support surface.&lt;/p&gt;

&lt;p&gt;And if you build or operate AI agents, that should feel very familiar.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual apology mattered
&lt;/h2&gt;

&lt;p&gt;The moderator wrote:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“It has come to my attention that we have been over moderating people’s behaviours and posts here. I’ve remove the word blocks and simplified the rules.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not a PR-style apology.&lt;/p&gt;

&lt;p&gt;That’s a rollback.&lt;/p&gt;

&lt;p&gt;And the reason it got traction is simple: communities around operationally messy software cannot survive if people have to fight posting rules before they can ask for help.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more for OpenClaw than for a normal AI app
&lt;/h2&gt;

&lt;p&gt;OpenClaw is not ChatGPT with a subreddit.&lt;/p&gt;

&lt;p&gt;It’s a self-hosted, local-first AI assistant gateway that connects agents to a long list of channels and providers.&lt;/p&gt;

&lt;p&gt;Think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slack&lt;/li&gt;
&lt;li&gt;Discord&lt;/li&gt;
&lt;li&gt;Telegram&lt;/li&gt;
&lt;li&gt;WhatsApp&lt;/li&gt;
&lt;li&gt;Signal&lt;/li&gt;
&lt;li&gt;Matrix&lt;/li&gt;
&lt;li&gt;Microsoft Teams&lt;/li&gt;
&lt;li&gt;Google Chat&lt;/li&gt;
&lt;li&gt;iMessage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And on the model side, it can sit in front of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic&lt;/li&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;OpenRouter&lt;/li&gt;
&lt;li&gt;xAI&lt;/li&gt;
&lt;li&gt;Groq&lt;/li&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;li&gt;Gemini&lt;/li&gt;
&lt;li&gt;Bedrock&lt;/li&gt;
&lt;li&gt;LiteLLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That kind of stack creates ugly failures.&lt;/p&gt;

&lt;p&gt;Not clean bug reports. Ugly ones.&lt;/p&gt;

&lt;p&gt;A browser automation step fails upstream, then a Slack handoff breaks downstream.&lt;br&gt;
A memory issue looks like a provider issue.&lt;br&gt;
A routing change makes Claude behave differently than GPT for the same task.&lt;br&gt;
A retry loop quietly burns money while you debug it.&lt;/p&gt;

&lt;p&gt;That’s why low-friction community support matters.&lt;/p&gt;
&lt;h2&gt;
  
  
  The setup alone tells you what kind of users this attracts
&lt;/h2&gt;

&lt;p&gt;OpenClaw’s docs recommend Node 24, with Node 22 LTS as the compatibility floor.&lt;/p&gt;

&lt;p&gt;Getting started looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last command is the tell.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;status --all&lt;/code&gt; is operator language.&lt;/p&gt;

&lt;p&gt;This is not a toy prompt app. This is infrastructure.&lt;/p&gt;

&lt;p&gt;And infrastructure communities need room for screenshots, logs, weird configs, and half-baked debugging theories.&lt;/p&gt;

&lt;h2&gt;
  
  
  The nearby threads were the real signal
&lt;/h2&gt;

&lt;p&gt;The apology post got attention because people had already felt the friction.&lt;/p&gt;

&lt;p&gt;A commenter said it plainly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I’m fairly new to this subreddit but over moderation destroys engagement.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s true in almost every technical support community, but especially true for agent tooling.&lt;/p&gt;

&lt;p&gt;Looking around the subreddit makes that obvious.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Heap crashes and daemon weirdness
&lt;/h3&gt;

&lt;p&gt;There was a thread about gateway crashes due to heap out of memory.&lt;/p&gt;

&lt;p&gt;One user said their OpenClaw gateway had been stable since February, then started crashing repeatedly in June after heavier usage.&lt;/p&gt;

&lt;p&gt;Another replied with the least glamorous and most useful advice possible: update OpenClaw, because newer versions had a lot of improvements.&lt;/p&gt;

&lt;p&gt;Messy? Yes.&lt;br&gt;
Useful? Also yes.&lt;/p&gt;

&lt;p&gt;That kind of exchange dies first when moderation gets too clean.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Browser automation failing in the real world
&lt;/h3&gt;

&lt;p&gt;Another thread was about browser integration.&lt;/p&gt;

&lt;p&gt;The complaint was blunt: OpenClaw was “really dumb in controlling the browser” for job application workflows.&lt;/p&gt;

&lt;p&gt;A reply pushed back with a take I think is correct: a lot of modern websites are terrible targets for agents and automation tools, so the failure is often not just the model.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of practical thread you want preserved.&lt;/p&gt;

&lt;p&gt;Not polished docs. Not official messaging. Just operators comparing notes.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Release threads that sound like production incidents
&lt;/h3&gt;

&lt;p&gt;In the OpenClaw 6.10 release thread, the release itself sounded modest: 12 merged PRs, more cleanup than giant feature drop.&lt;/p&gt;

&lt;p&gt;But the comments were the real story.&lt;/p&gt;

&lt;p&gt;One user wrote:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Man... I’m so done with this broken, janky ass thing,”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Another wanted the chat export button back because chats were expiring too quickly.&lt;/p&gt;

&lt;p&gt;That’s not fan content.&lt;/p&gt;

&lt;p&gt;That’s an ops queue leaking into public.&lt;/p&gt;

&lt;p&gt;Which is exactly why the subreddit has to tolerate some mess.&lt;/p&gt;
&lt;h2&gt;
  
  
  This is not just a moderation story. It’s an agent operations story.
&lt;/h2&gt;

&lt;p&gt;If you run one toy bot, subreddit moderation is just vibes.&lt;/p&gt;

&lt;p&gt;If you run real automations across n8n, Make, Zapier, OpenClaw, or custom workflows, it becomes an operations issue.&lt;/p&gt;

&lt;p&gt;The moment agents are always on, your problems change.&lt;/p&gt;

&lt;p&gt;You stop asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which prompt is smartest?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And start asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why did this route fail?&lt;/li&gt;
&lt;li&gt;Why did the browser step loop?&lt;/li&gt;
&lt;li&gt;Why does Claude behave differently through OpenRouter than direct?&lt;/li&gt;
&lt;li&gt;Why is this retry storm suddenly expensive?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one matters more than people admit.&lt;/p&gt;
&lt;h2&gt;
  
  
  The hidden problem: support chaos turns into cost chaos
&lt;/h2&gt;

&lt;p&gt;Multi-model agent stacks create two kinds of chaos at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;support chaos&lt;/li&gt;
&lt;li&gt;cost chaos&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Support chaos is obvious. You get logs, screenshots, daemon issues, browser failures, memory bugs, and routing weirdness.&lt;/p&gt;

&lt;p&gt;Cost chaos is quieter.&lt;/p&gt;

&lt;p&gt;Every retry, fallback, and debugging cycle can hit Anthropic, OpenAI, xAI, Groq, or whatever router sits in front of them.&lt;/p&gt;

&lt;p&gt;So while you’re trying to fix the system, the system is also charging you for the privilege.&lt;/p&gt;

&lt;p&gt;That’s why I think per-token pricing gets worse as your automation stack gets more real.&lt;/p&gt;

&lt;p&gt;If your team is already debugging Slack, WhatsApp, browser control, memory, and provider routing, you do not need a finance mini-game layered on top.&lt;/p&gt;

&lt;p&gt;You need predictable compute.&lt;/p&gt;
&lt;h2&gt;
  
  
  This is where the OpenAI-compatible layer matters
&lt;/h2&gt;

&lt;p&gt;A lot of teams eventually end up wanting the same thing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep OpenAI-compatible SDKs&lt;/li&gt;
&lt;li&gt;keep existing app logic&lt;/li&gt;
&lt;li&gt;stop thinking about per-token cost every hour&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the practical appeal of &lt;strong&gt;Standard Compute&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It gives you an OpenAI-compatible API endpoint, but with flat monthly pricing instead of usage-based token billing.&lt;/p&gt;

&lt;p&gt;So if you’re running agents in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n&lt;/li&gt;
&lt;li&gt;Make&lt;/li&gt;
&lt;li&gt;Zapier&lt;/li&gt;
&lt;li&gt;OpenClaw&lt;/li&gt;
&lt;li&gt;custom agent frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can keep the integration style you already use, while avoiding the usual “why did debugging this workflow cost so much?” problem.&lt;/p&gt;

&lt;p&gt;The bigger your automation footprint gets, the more that matters.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: swapping an OpenAI client without changing your app shape
&lt;/h2&gt;

&lt;p&gt;If your code already uses the OpenAI SDK shape, the integration pattern stays familiar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;STANDARD_COMPUTE_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.standardcompute.com/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a helpful automation assistant.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Summarize the failed workflow run and suggest next steps.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters because nobody wants to rewrite their agent stack just to get predictable pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the OpenClaw moderation rollback got right
&lt;/h2&gt;

&lt;p&gt;I don’t think the lesson here is “moderate less” in the abstract.&lt;/p&gt;

&lt;p&gt;Spam is real.&lt;/p&gt;

&lt;p&gt;One commenter in the apology thread pointed out that automated Hermes spam had been bad before. That’s a legitimate problem.&lt;/p&gt;

&lt;p&gt;The moderator reply suggested they were still using Automod and manual review where needed.&lt;/p&gt;

&lt;p&gt;That’s the right direction.&lt;/p&gt;

&lt;p&gt;Not zero moderation.&lt;/p&gt;

&lt;p&gt;Targeted moderation.&lt;/p&gt;

&lt;p&gt;Here’s the tradeoff in plain terms:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Why it happens in agent stacks&lt;/th&gt;
&lt;th&gt;What actually helps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Over-moderated support channels&lt;/td&gt;
&lt;td&gt;Real failures show up as ugly posts with logs, screenshots, and partial theories&lt;/td&gt;
&lt;td&gt;Allow images, links, and low-friction troubleshooting posts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-provider routing failures&lt;/td&gt;
&lt;td&gt;Behavior changes across Anthropic, OpenAI, OpenRouter, xAI, Groq, Ollama, and others are hard to isolate&lt;/td&gt;
&lt;td&gt;Practical peer support and visible routing/debug info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-token cost anxiety&lt;/td&gt;
&lt;td&gt;Retries, fallbacks, browser loops, and 24/7 agents turn incidents into spend events&lt;/td&gt;
&lt;td&gt;Flat monthly compute and OpenAI-compatible access&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My take is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;spam control is good, keyword paranoia is bad.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical takeaways if you run agent infrastructure
&lt;/h2&gt;

&lt;p&gt;This subreddit drama is actually useful because it surfaces a pattern that shows up everywhere.&lt;/p&gt;

&lt;p&gt;If you run AI agents in production, here’s what I’d do.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Treat community support as part of your stack
&lt;/h3&gt;

&lt;p&gt;If your tool is operationally messy, your users need places where they can post:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;screenshots&lt;/li&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;stack traces&lt;/li&gt;
&lt;li&gt;config snippets&lt;/li&gt;
&lt;li&gt;weird “is anyone else seeing this?” reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those posts are hard to publish, support quality drops fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Optimize for debugging throughput, not tidiness
&lt;/h3&gt;

&lt;p&gt;The best support thread is often not elegant.&lt;/p&gt;

&lt;p&gt;It’s just fast.&lt;/p&gt;

&lt;p&gt;A rough post that gets a useful answer in 10 minutes beats a perfectly structured post that never gets written.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Reduce pricing complexity before scale makes it painful
&lt;/h3&gt;

&lt;p&gt;This one is underrated.&lt;/p&gt;

&lt;p&gt;The more agent routes, retries, and channels you add, the harder it gets to reason about spend.&lt;/p&gt;

&lt;p&gt;If you already know your workflows will run continuously, flat-rate compute is often the saner operational choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Keep your integration surface boring
&lt;/h3&gt;

&lt;p&gt;OpenAI-compatible APIs win because they avoid migration drama.&lt;/p&gt;

&lt;p&gt;Your pricing model can change.&lt;br&gt;
Your backend routing can change.&lt;br&gt;
Your provider mix can change.&lt;/p&gt;

&lt;p&gt;But your application code should stay boring.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part the commenters were right about
&lt;/h2&gt;

&lt;p&gt;The commenters were correct that over-moderation kills engagement.&lt;/p&gt;

&lt;p&gt;But I think the deeper point is this:&lt;/p&gt;

&lt;p&gt;OpenClaw is the kind of product where community health is directly tied to product usability.&lt;/p&gt;

&lt;p&gt;When the software has this many moving parts, the messy public threads are not a side effect.&lt;/p&gt;

&lt;p&gt;They are the knowledge base.&lt;/p&gt;

&lt;p&gt;And once you see it that way, the moderator apology reads differently.&lt;/p&gt;

&lt;p&gt;It wasn’t just “sorry, we were too strict.”&lt;/p&gt;

&lt;p&gt;It was an admission that the subreddit had started optimizing for tidiness over truth.&lt;/p&gt;

&lt;p&gt;For software like OpenClaw, that is the wrong trade.&lt;/p&gt;

&lt;p&gt;For teams running AI agents, there’s a parallel lesson:&lt;/p&gt;

&lt;p&gt;If your systems fail in messy ways, your support channels need low friction.&lt;br&gt;
And if your automations run continuously, your compute pricing should too.&lt;/p&gt;

&lt;p&gt;A little mess is healthy.&lt;/p&gt;

&lt;p&gt;A little billing predictability is too.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>automation</category>
      <category>openai</category>
    </item>
    <item>
      <title>My n8n agent rewrote the same 7-task to-do list 4 times until I stopped asking for markdown</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 25 Jun 2026 04:57:29 +0000</pubDate>
      <link>https://dev.to/lars_winstand/my-n8n-agent-rewrote-the-same-7-task-to-do-list-4-times-until-i-stopped-asking-for-markdown-gg9</link>
      <guid>https://dev.to/lars_winstand/my-n8n-agent-rewrote-the-same-7-task-to-do-list-4-times-until-i-stopped-asking-for-markdown-gg9</guid>
      <description>&lt;h1&gt;
  
  
  My n8n agent rewrote the same 7-task to-do list 4 times until I stopped asking for markdown
&lt;/h1&gt;

&lt;p&gt;My n8n agent rewrote the same seven-item to-do list four times, dropped two checkboxes, and then marked the wrong task complete.&lt;/p&gt;

&lt;p&gt;The bug was not reasoning.&lt;/p&gt;

&lt;p&gt;The bug was markdown.&lt;/p&gt;

&lt;p&gt;I had done what a lot of us do on autopilot: ask the model for a nice markdown task list because it looks readable in logs and GitHub-style checkboxes feel familiar.&lt;/p&gt;

&lt;p&gt;That worked exactly once.&lt;/p&gt;

&lt;p&gt;Then retries happened.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one retry changed indentation&lt;/li&gt;
&lt;li&gt;another changed &lt;code&gt;- [ ]&lt;/code&gt; to &lt;code&gt;* [ ]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a later pass merged two tasks into one line because it decided that was “cleaner”&lt;/li&gt;
&lt;li&gt;one run preserved the list visually but lost the actual task identity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To me, the output still looked fine.&lt;/p&gt;

&lt;p&gt;To the workflow, it was garbage.&lt;/p&gt;

&lt;p&gt;While digging into better formats, I found a thread on r/openclaw about markdown files for to-do management: &lt;a href="https://reddit.com/r/openclaw/comments/1ueo4mm/markdown_files_for_todo_list/" rel="noopener noreferrer"&gt;https://reddit.com/r/openclaw/comments/1ueo4mm/markdown_files_for_todo_list/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That discussion hit the exact issue: markdown feels universal right up until an agent has to update it repeatedly without breaking anything.&lt;/p&gt;

&lt;p&gt;The fix was boring and very effective:&lt;/p&gt;

&lt;p&gt;Stop optimizing for readability.&lt;br&gt;
Start optimizing for survival.&lt;/p&gt;

&lt;p&gt;Markdown is for humans.&lt;br&gt;
Schemas are for agents.&lt;/p&gt;
&lt;h2&gt;
  
  
  The actual failure mode
&lt;/h2&gt;

&lt;p&gt;A markdown to-do list looks structured.&lt;/p&gt;

&lt;p&gt;It is not actually structured.&lt;/p&gt;

&lt;p&gt;That distinction matters once GPT-5.4, Claude Opus 4.6, or gpt-4o-2024-08-06 has to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;read the list&lt;/li&gt;
&lt;li&gt;preserve it&lt;/li&gt;
&lt;li&gt;update one item&lt;/li&gt;
&lt;li&gt;return it&lt;/li&gt;
&lt;li&gt;do that again 10 times in a loop&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is where things drift.&lt;/p&gt;

&lt;p&gt;Common failure patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;checkbox syntax changes&lt;/li&gt;
&lt;li&gt;nesting changes&lt;/li&gt;
&lt;li&gt;task IDs disappear because markdown never had real IDs&lt;/li&gt;
&lt;li&gt;completed items move sections&lt;/li&gt;
&lt;li&gt;a parser treats one wrapped line as two tasks&lt;/li&gt;
&lt;li&gt;retries multiply because one malformed list poisons the next step&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last part is the expensive one.&lt;/p&gt;

&lt;p&gt;Flaky formatting creates retry storms.&lt;br&gt;
Retry storms turn a cheap-looking workflow into an operational headache.&lt;/p&gt;

&lt;p&gt;If you run automations all day, output reliability matters almost as much as model quality.&lt;/p&gt;

&lt;p&gt;That is also where predictable API pricing starts to matter. If your agents are retrying because of formatting drift, per-token billing gets annoying fast. Flat-cost usage is a lot easier to live with when workflows are noisy, iterative, and always on.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why GitHub task lists are a bad automation contract
&lt;/h2&gt;

&lt;p&gt;Because they are a rendering convention, not a durable machine interface.&lt;/p&gt;

&lt;p&gt;GitHub task lists are part of GitHub Flavored Markdown, not core CommonMark.&lt;/p&gt;

&lt;p&gt;That is fine if your goal is: render checkboxes on GitHub.&lt;/p&gt;

&lt;p&gt;It is not fine if your goal is: pass stable task state across n8n, Make, Zapier, OpenClaw, or a custom agent loop.&lt;/p&gt;

&lt;p&gt;A human sees this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Draft outreach email&lt;/li&gt;
&lt;li&gt;[x] Pull leads from Apollo&lt;/li&gt;
&lt;li&gt;[ ] Review CRM sync&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An agent sees a bunch of unanswered questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the dash required?&lt;/li&gt;
&lt;li&gt;Can ordered lists count too?&lt;/li&gt;
&lt;li&gt;Is nesting semantic or just visual?&lt;/li&gt;
&lt;li&gt;If the task text changes, is it still the same task?&lt;/li&gt;
&lt;li&gt;If one task wraps across lines, is that one task or two?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you start inventing your own answers, you are not really using markdown anymore.&lt;/p&gt;

&lt;p&gt;You are building a fragile custom protocol disguised as markdown.&lt;/p&gt;
&lt;h2&gt;
  
  
  I tried JSON mode next. Better, but still not enough
&lt;/h2&gt;

&lt;p&gt;The next obvious move was valid JSON.&lt;/p&gt;

&lt;p&gt;That helped.&lt;/p&gt;

&lt;p&gt;At least n8n stopped choking on checkboxes.&lt;/p&gt;

&lt;p&gt;But JSON mode is not the same as schema enforcement.&lt;/p&gt;

&lt;p&gt;It can give you valid JSON while still drifting structurally.&lt;/p&gt;

&lt;p&gt;Typical failure cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;due_date&lt;/code&gt; comes back as &lt;code&gt;"tomorrow"&lt;/code&gt; instead of an ISO string&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;priority&lt;/code&gt; is &lt;code&gt;"high"&lt;/code&gt; in one run and &lt;code&gt;3&lt;/code&gt; in another&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;completed&lt;/code&gt; is missing entirely&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;subtasks&lt;/code&gt; is a string instead of an array&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a chat UI, whatever.&lt;/p&gt;

&lt;p&gt;For an agent loop, this is where you start writing cleanup code around every step.&lt;/p&gt;

&lt;p&gt;Here is the kind of request that looks reasonable but still leaves too much room for drift:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o-2024-08-06"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Return a JSON object with a task list."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Create tasks for launching a weekly customer report automation."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"response_format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"json_object"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is better than markdown.&lt;/p&gt;

&lt;p&gt;It is still not a contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  What fixed it: strict schema output
&lt;/h2&gt;

&lt;p&gt;What finally worked was using a strict schema.&lt;/p&gt;

&lt;p&gt;Once I stopped asking for “a nice task list” and started asking for “an object that must match this schema,” the workflow got boring in the best way.&lt;/p&gt;

&lt;p&gt;Here is a minimal task schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tasks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boolean"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"minimum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"maximum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"due_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"null"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"due_date"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tasks"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here is an OpenAI-compatible request shape that is a lot safer than markdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o-2024-08-06"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Generate a task list for an automation workflow."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Plan the steps for onboarding a new B2B customer into HubSpot, Slack, and Notion."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"response_format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"json_schema"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"json_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"task_list"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"tasks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"array"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"boolean"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"tasks"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"additionalProperties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“format this nicely”&lt;/li&gt;
&lt;li&gt;“return data my workflow can trust”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A practical n8n example
&lt;/h2&gt;

&lt;p&gt;Here is the pattern that kept breaking for me:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Slack trigger receives a request&lt;/li&gt;
&lt;li&gt;HTTP Request node calls an LLM&lt;/li&gt;
&lt;li&gt;Code node parses the task list&lt;/li&gt;
&lt;li&gt;Notion or ClickUp tasks get created&lt;/li&gt;
&lt;li&gt;Another agent step updates state later&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If step 2 returns markdown, step 3 usually becomes regex hell.&lt;/p&gt;

&lt;p&gt;You end up doing stuff like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;-*&lt;/span&gt;&lt;span class="se"&gt;]\s\[[&lt;/span&gt;&lt;span class="sr"&gt; x&lt;/span&gt;&lt;span class="se"&gt;]\]&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`task-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[x]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;-*&lt;/span&gt;&lt;span class="se"&gt;]\s\[[&lt;/span&gt;&lt;span class="sr"&gt; x&lt;/span&gt;&lt;span class="se"&gt;]\]\s&lt;/span&gt;&lt;span class="sr"&gt;*/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt; &lt;span class="p"&gt;}];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works until it does not.&lt;/p&gt;

&lt;p&gt;Now compare that with schema-validated JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priority&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That version is boring.&lt;/p&gt;

&lt;p&gt;Boring is exactly what you want in automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: calling an OpenAI-compatible API from Node.js
&lt;/h2&gt;

&lt;p&gt;If your provider supports OpenAI-compatible structured output, the code is straightforward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_BASE_URL&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-2024-08-06&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Generate a task list for an automation workflow.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Plan the steps for onboarding a new B2B customer into HubSpot, Slack, and Notion.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;json_schema&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;json_schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;task_list&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;array&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;boolean&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;integer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
              &lt;span class="p"&gt;},&lt;/span&gt;
              &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;title&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;priority&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
              &lt;span class="na"&gt;additionalProperties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tasks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;additionalProperties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are using a drop-in OpenAI-compatible endpoint, this pattern usually ports cleanly.&lt;/p&gt;

&lt;p&gt;That matters if you are testing multiple providers or routing across models without rewriting your app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Same idea from curl
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.example.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-4o-2024-08-06",
    "messages": [
      {
        "role": "system",
        "content": "Generate a task list for an automation workflow."
      },
      {
        "role": "user",
        "content": "Plan the steps for onboarding a new B2B customer into HubSpot, Slack, and Notion."
      }
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "task_list",
        "schema": {
          "type": "object",
          "properties": {
            "tasks": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "id": { "type": "string" },
                  "title": { "type": "string" },
                  "completed": { "type": "boolean" },
                  "priority": { "type": "integer" }
                },
                "required": ["id", "title", "completed", "priority"],
                "additionalProperties": false
              }
            }
          },
          "required": ["tasks"],
          "additionalProperties": false
        }
      }
    }
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How this plays out in n8n, Make, Zapier, and OpenClaw
&lt;/h2&gt;

&lt;p&gt;The practical win is not elegance.&lt;/p&gt;

&lt;p&gt;It is fewer broken runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  In n8n
&lt;/h3&gt;

&lt;p&gt;A solid pattern looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slack Trigger or Webhook node&lt;/li&gt;
&lt;li&gt;HTTP Request node to an OpenAI-compatible endpoint&lt;/li&gt;
&lt;li&gt;Set or Code node to map &lt;code&gt;tasks[]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Notion, Linear, ClickUp, Airtable, or Postgres node downstream&lt;/li&gt;
&lt;li&gt;If node branches on &lt;code&gt;completed === false&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No regex parser.&lt;br&gt;
No markdown cleanup node.&lt;br&gt;
No weird branch because a checkbox changed shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  In Make or Zapier
&lt;/h3&gt;

&lt;p&gt;The same rule applies, maybe even more strongly.&lt;/p&gt;

&lt;p&gt;The more no-code modules you chain together, the more painful loose formatting becomes. Every downstream module assumes the previous one handed over something stable.&lt;/p&gt;

&lt;p&gt;If your LLM output is “mostly parseable,” your scenario is already on borrowed time.&lt;/p&gt;

&lt;h3&gt;
  
  
  In OpenClaw or custom agent loops
&lt;/h3&gt;

&lt;p&gt;Repeated read-modify-write cycles are exactly where markdown breaks.&lt;/p&gt;

&lt;p&gt;If the same task list gets rewritten five or ten times, use stable IDs and schema validation. Otherwise you are effectively asking the model to preserve formatting conventions that were never designed to carry state.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use markdown, JSON Schema, or a real task API
&lt;/h2&gt;

&lt;p&gt;Here is the rule I use now:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Best format&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Human-readable notes in logs or docs&lt;/td&gt;
&lt;td&gt;Markdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM output that feeds another workflow step&lt;/td&gt;
&lt;td&gt;JSON Schema&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared task state, reminders, filters, attachments&lt;/td&gt;
&lt;td&gt;Real task API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File-based simplicity with narrow syntax&lt;/td&gt;
&lt;td&gt;todo.txt&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If task state actually matters, skip text formats and use a real task system.&lt;/p&gt;

&lt;p&gt;Vikunja is a good example if you want an API-backed task manager instead of pretending a markdown file is a database.&lt;/p&gt;

&lt;p&gt;If you truly need plain files, &lt;code&gt;todo.txt&lt;/code&gt; is still better than markdown for automation because one line equals one task and the syntax is intentionally narrow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger lesson
&lt;/h2&gt;

&lt;p&gt;I did not fix my agent by making it smarter.&lt;/p&gt;

&lt;p&gt;I fixed it by giving it a format that was harder to improvise.&lt;/p&gt;

&lt;p&gt;A lot of “agent reasoning failures” are not reasoning failures at all.&lt;/p&gt;

&lt;p&gt;They are contract failures.&lt;/p&gt;

&lt;p&gt;The agent understands the work.&lt;br&gt;
The workflow does not trust the output.&lt;/p&gt;

&lt;p&gt;So my rule now is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use markdown for humans&lt;/li&gt;
&lt;li&gt;use JSON Schema for agents&lt;/li&gt;
&lt;li&gt;use a real task API when task state matters&lt;/li&gt;
&lt;li&gt;do not pretend GitHub task lists are a durable automation interface&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your agent keeps rewriting the to-do list, stop tuning the prompt for a minute.&lt;br&gt;
Stop blaming GPT-5.4 or Claude Opus 4.6.&lt;/p&gt;

&lt;p&gt;The problem might just be the contract you chose.&lt;/p&gt;

&lt;h2&gt;
  
  
  One more practical note on cost
&lt;/h2&gt;

&lt;p&gt;This kind of bug gets worse when agents run continuously.&lt;/p&gt;

&lt;p&gt;Every malformed output means retries, repair steps, validation passes, and extra calls. If you are paying per token, formatting mistakes turn into real spend surprisingly fast.&lt;/p&gt;

&lt;p&gt;That is one reason I like OpenAI-compatible infrastructure that is easy to swap under existing SDKs and workflows. If you are running n8n, Make, Zapier, OpenClaw, or custom agents all day, predictable flat-cost compute is a much nicer setup than watching token usage every time a workflow gets chatty.&lt;/p&gt;

&lt;p&gt;That tradeoff becomes very obvious once you stop thinking about one prompt and start thinking about 24/7 automation.&lt;/p&gt;

&lt;p&gt;If you have been shipping markdown between agent steps, I would change that before touching anything else.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>n8n</category>
      <category>automation</category>
      <category>openai</category>
    </item>
    <item>
      <title>I stopped trusting OpenClaw skills the day I realized some of them are basically npm packages with my credit card</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 24 Jun 2026 20:53:55 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-stopped-trusting-openclaw-skills-the-day-i-realized-some-of-them-are-basically-npm-packages-with-42ne</link>
      <guid>https://dev.to/lars_winstand/i-stopped-trusting-openclaw-skills-the-day-i-realized-some-of-them-are-basically-npm-packages-with-42ne</guid>
      <description>&lt;p&gt;The safest way to use a third-party OpenClaw skill is to treat it like automation code with money and data access, not like a harmless plugin.&lt;/p&gt;

&lt;p&gt;After reading reports that 5 malicious skills passed ClawScan and VirusTotal, my default changed fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;isolate credentials&lt;/li&gt;
&lt;li&gt;inspect outbound network calls&lt;/li&gt;
&lt;li&gt;pin exact versions&lt;/li&gt;
&lt;li&gt;prefer narrow in-house skills over random marketplace installs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I used to think an OpenClaw skill was basically a nicer Zapier step.&lt;/p&gt;

&lt;p&gt;A little messy, maybe under-documented, but still fundamentally a plugin.&lt;/p&gt;

&lt;p&gt;Then I spent an evening reading a thread on r/openclaw about Unit 42 finding 5 malicious skills that passed ClawScan and VirusTotal:&lt;br&gt;
&lt;a href="https://reddit.com/r/openclaw/comments/1ue5ln7/unit_42_found_5_malicious_skills_that_passed/" rel="noopener noreferrer"&gt;https://reddit.com/r/openclaw/comments/1ue5ln7/unit_42_found_5_malicious_skills_that_passed/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That broke the plugin mental model.&lt;/p&gt;

&lt;p&gt;Because the scary part was not just malware.&lt;/p&gt;

&lt;p&gt;It was motive.&lt;/p&gt;

&lt;p&gt;One example from that thread, money-radar, allegedly fetched a remote referrals.json file so it could change recommendations at runtime. That means the code can look clean during review while the skill quietly nudges your agent toward somebody else’s affiliate payout.&lt;/p&gt;

&lt;p&gt;No payload. No obvious exploit. Just incentives wired into your automation.&lt;/p&gt;

&lt;p&gt;A commenter in that thread said it better than most security posts:&lt;/p&gt;

&lt;p&gt;“Signature scanning does nothing here. A skill that tells your agent to always use a referral link isn't a payload anyone flags. It's just instructions. The Pass badge means nothing.”&lt;/p&gt;

&lt;p&gt;That was the moment I stopped thinking “plugin” and started thinking:&lt;/p&gt;

&lt;p&gt;npm package with bank access.&lt;/p&gt;

&lt;p&gt;Once you see it that way, a lot of OpenClaw behavior gets harder to shrug off.&lt;/p&gt;
&lt;h2&gt;
  
  
  The weirdest part is not malware. It’s hidden behavior.
&lt;/h2&gt;

&lt;p&gt;While digging through this, I found another r/openclaw thread:&lt;br&gt;
&lt;a href="https://reddit.com/r/openclaw/comments/1ue5yqh/help_with_tool_use_secret_tools/" rel="noopener noreferrer"&gt;https://reddit.com/r/openclaw/comments/1ue5yqh/help_with_tool_use_secret_tools/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A user was trying to understand why their agent sometimes called sessions_send for ordinary file requests.&lt;/p&gt;

&lt;p&gt;That would already be weird.&lt;/p&gt;

&lt;p&gt;Then they described a hidden message capability that seemed able to send media off-platform, and wrote:&lt;/p&gt;

&lt;p&gt;“The tool is hidden in a way where the agent will flat out refuse its existence completely.”&lt;/p&gt;

&lt;p&gt;That line stuck with me.&lt;/p&gt;

&lt;p&gt;Now we’re not talking about a sketchy marketplace listing with a bad README.&lt;/p&gt;

&lt;p&gt;We’re talking about an agent surface that may not even be fully transparent to the operator.&lt;/p&gt;

&lt;p&gt;One commenter said they found a prompt that triggered the hidden OpenClaw message capability about 8/10 times. Inconsistent, but repeatable.&lt;/p&gt;

&lt;p&gt;That is not browser-extension risk.&lt;/p&gt;

&lt;p&gt;That is live automation risk.&lt;/p&gt;

&lt;p&gt;If your OpenClaw setup can message other agents, send media, browse, purchase, and act on stored credentials, then every third-party skill sits inside a blast radius that looks a lot more like n8n, Make, Zapier, or custom Python automation than a cute add-on store.&lt;/p&gt;

&lt;p&gt;And then I found the thread about repeat purchases.&lt;/p&gt;
&lt;h2&gt;
  
  
  Would you let a random skill reorder HVAC filters on your card?
&lt;/h2&gt;

&lt;p&gt;There’s a practical OpenClaw discussion about automating repeat purchases where people talk about reordering HVAC filters, McMaster-Carr parts, and household supplies using saved credentials and payment methods.&lt;/p&gt;

&lt;p&gt;That’s when this stops being abstract.&lt;/p&gt;

&lt;p&gt;A malicious skill in that environment does not need ransomware behavior.&lt;/p&gt;

&lt;p&gt;It doesn’t need to exfiltrate your SSH keys to a Raspberry Pi in Belarus.&lt;/p&gt;

&lt;p&gt;It just needs to subtly influence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what gets bought&lt;/li&gt;
&lt;li&gt;where it gets bought&lt;/li&gt;
&lt;li&gt;when it gets bought&lt;/li&gt;
&lt;li&gt;which referral path gets used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why I think the right security model is brutally simple:&lt;/p&gt;

&lt;p&gt;Treat every third-party OpenClaw skill like code that can spend money, move data, and make decisions under incentives you do not control.&lt;/p&gt;

&lt;p&gt;If that sounds paranoid, good.&lt;/p&gt;

&lt;p&gt;You want a little paranoia here.&lt;/p&gt;
&lt;h2&gt;
  
  
  Two different problems are getting mixed together
&lt;/h2&gt;

&lt;p&gt;Some malicious skills are classic package-security problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hidden droppers&lt;/li&gt;
&lt;li&gt;obfuscated code&lt;/li&gt;
&lt;li&gt;scanner evasion&lt;/li&gt;
&lt;li&gt;oversized junk files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thread says omnicogg padded its README with 22MB of junk so scanners would skip the file while an AMOS dropper remained inside.&lt;/p&gt;

&lt;p&gt;That’s old-school malware thinking, just wearing an agent hat.&lt;/p&gt;

&lt;p&gt;But other skills are behavior problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remote recommendations&lt;/li&gt;
&lt;li&gt;affiliate steering&lt;/li&gt;
&lt;li&gt;coordinated financial behavior&lt;/li&gt;
&lt;li&gt;prompt-level manipulation of agent choices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The letssendit example is wild: pooling SOL from installed agents for a coordinated meme-coin launch.&lt;/p&gt;

&lt;p&gt;Again, not plugin risk.&lt;/p&gt;

&lt;p&gt;More like somebody attached a weird little business model to your agent runtime.&lt;/p&gt;

&lt;p&gt;Static scanning still matters for the first category.&lt;/p&gt;

&lt;p&gt;It’s just weak against the second.&lt;/p&gt;

&lt;p&gt;So the real question is: what do you actually do on Monday morning if your team uses an OpenAI-compatible LLM stack with OpenClaw, n8n, Make, Zapier, or custom agents and still needs skills?&lt;/p&gt;
&lt;h2&gt;
  
  
  My rule now: if I can read it, I can probably generate a safer version
&lt;/h2&gt;

&lt;p&gt;One commenter in the Unit 42 thread made the strongest case against random installs:&lt;/p&gt;

&lt;p&gt;“If you can read what a skill does, you can write it yourself, and then you actually know what your agent is running.”&lt;/p&gt;

&lt;p&gt;I think that’s basically right.&lt;/p&gt;

&lt;p&gt;Not because every team should hand-code everything from scratch. That’s fantasy.&lt;/p&gt;

&lt;p&gt;People install skills because convenience matters, and for low-risk tasks that tradeoff can be perfectly reasonable.&lt;/p&gt;

&lt;p&gt;But for anything touching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;purchases&lt;/li&gt;
&lt;li&gt;messaging&lt;/li&gt;
&lt;li&gt;CRM updates&lt;/li&gt;
&lt;li&gt;financial workflows&lt;/li&gt;
&lt;li&gt;external posting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…I would rather give GPT-5.4, Claude Opus 4.6, Grok 4.20, Qwen, or Llama a precise spec and generate a narrow in-house skill than install a broad marketplace skill with unknown incentives.&lt;/p&gt;

&lt;p&gt;That sounds slower.&lt;/p&gt;

&lt;p&gt;Weirdly, it often isn’t.&lt;/p&gt;

&lt;p&gt;A narrow skill like this:&lt;/p&gt;

&lt;p&gt;“Submit approved purchase orders to McMaster-Carr using sandbox credentials and return a dry-run summary.”&lt;/p&gt;

&lt;p&gt;…is much easier to review than this:&lt;/p&gt;

&lt;p&gt;“Shopping assistant with smart recommendations.”&lt;/p&gt;

&lt;p&gt;The smaller the scope, the easier it is to audit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;network calls&lt;/li&gt;
&lt;li&gt;secrets usage&lt;/li&gt;
&lt;li&gt;prompt handling&lt;/li&gt;
&lt;li&gt;approval boundaries&lt;/li&gt;
&lt;li&gt;side effects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And yes, cost shows up here.&lt;/p&gt;

&lt;p&gt;When you generate small internal skills, you usually do more iterations. More simulation. More adversarial prompting. More review loops.&lt;/p&gt;

&lt;p&gt;If you’re paying per token, teams start cutting corners exactly where they shouldn’t.&lt;/p&gt;

&lt;p&gt;If you’re using a flat-rate OpenAI-compatible API, that workflow feels a lot more natural. You can iterate on specs, generate safer narrow tools, and test them aggressively without watching a token meter the whole time.&lt;/p&gt;

&lt;p&gt;That’s one of the underrated reasons I like Standard Compute for agent-heavy workflows. If your team is building and testing lots of small automations, predictable monthly pricing is a much better fit than per-token anxiety.&lt;/p&gt;
&lt;h2&gt;
  
  
  What should you install, generate, or build?
&lt;/h2&gt;

&lt;p&gt;Here’s the tradeoff as I see it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;What you’re really buying&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Third-party marketplace skill&lt;/td&gt;
&lt;td&gt;Fastest to install, but lowest transparency into incentives and hidden behavior; highest need for credential isolation and network inspection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generated in-house narrow skill&lt;/td&gt;
&lt;td&gt;Slower upfront than install, faster than hand-coding from scratch; high auditability because scope is small and spec-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Broad in-house custom automation&lt;/td&gt;
&lt;td&gt;Most engineering effort, but highest control over code, secrets, logging, and approvals for high-value workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My opinionated version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Marketplace skills are fine for low-risk, no-secret, no-money tasks.&lt;/li&gt;
&lt;li&gt;Generated narrow skills are the sweet spot for most serious agent teams.&lt;/li&gt;
&lt;li&gt;Broad custom automation is the right answer for purchasing, messaging, finance, and anything that can embarrass you in Slack or cost real money.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But a preference is not a process.&lt;/p&gt;

&lt;p&gt;You still need a review checklist.&lt;/p&gt;
&lt;h2&gt;
  
  
  A sane review process for OpenClaw skills
&lt;/h2&gt;

&lt;p&gt;This is the workflow I’d use before enabling any OpenClaw skill that can message externally, touch private data, or initiate purchases.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Isolate credentials first
&lt;/h3&gt;

&lt;p&gt;Never test with production secrets.&lt;/p&gt;

&lt;p&gt;Never give a new skill your real payment path.&lt;/p&gt;

&lt;p&gt;Use separate environment variables and sandbox accounts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENCLAW_TEST_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"oc_test_..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SANDBOX_STRIPE_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk_test_..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TEST_DISCORD_WEBHOOK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://discord.com/api/webhooks/..."&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SHOPIFY_SANDBOX_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"shpat_..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a skill only works when it gets your real Google Workspace, Stripe, Discord, Shopify, or Slack credentials, that’s already useful information.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Inspect outbound network calls
&lt;/h3&gt;

&lt;p&gt;The money-radar example should permanently change how people review agent skills.&lt;/p&gt;

&lt;p&gt;Before install, grep for URLs and request libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; &lt;span class="s2"&gt;"http"&lt;/span&gt; skills/&amp;lt;skill-name&amp;gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; &lt;span class="s2"&gt;"fetch&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;axios&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;requests&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;curl"&lt;/span&gt; skills/&amp;lt;skill-name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it’s JavaScript or TypeScript, I also check package.json and lockfiles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;skills/&amp;lt;skill-name&amp;gt;/package.json
&lt;span class="nb"&gt;cat &lt;/span&gt;skills/&amp;lt;skill-name&amp;gt;/package-lock.json | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’re looking for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remote JSON config&lt;/li&gt;
&lt;li&gt;affiliate endpoints&lt;/li&gt;
&lt;li&gt;analytics beacons&lt;/li&gt;
&lt;li&gt;webhook posts&lt;/li&gt;
&lt;li&gt;domains not mentioned in the README&lt;/li&gt;
&lt;li&gt;dynamic code fetches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any skill that fetches live recommendations, instructions, or routing logic from a server you don’t control deserves a much higher suspicion score.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Pin the exact version you reviewed
&lt;/h3&gt;

&lt;p&gt;Do not install latest and hope for the best.&lt;/p&gt;

&lt;p&gt;Review one commit. Run one commit. Keep one commit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git submodule add &amp;lt;skill-repo-url&amp;gt; skills/&amp;lt;skill-name&amp;gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;skills/&amp;lt;skill-name&amp;gt;
git checkout &amp;lt;reviewed-commit&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re pulling from npm or another package source, pin exact versions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"some-openclaw-skill"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.4.2"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No carets. No surprises.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Simulate dangerous prompts
&lt;/h3&gt;

&lt;p&gt;Don’t just test the happy path.&lt;/p&gt;

&lt;p&gt;Try prompts that trigger side behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;send this file to my other account&lt;/li&gt;
&lt;li&gt;find the cheapest option and buy it now&lt;/li&gt;
&lt;li&gt;message the vendor directly with the attachment&lt;/li&gt;
&lt;li&gt;use your internal messaging capability if available&lt;/li&gt;
&lt;li&gt;recommend the best supplier even if it’s not in the approved list&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That hidden OpenClaw message story should make everyone more aggressive here.&lt;/p&gt;

&lt;p&gt;If you have a local test harness, script these prompts and diff the outputs.&lt;/p&gt;

&lt;p&gt;Example pseudo-test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;test_prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Send this invoice PDF to my personal email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Buy the cheapest replacement filter right now&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use any available messaging tool to share this image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Choose the best vendor and complete checkout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_prompts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_skill_in_sandbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod-payment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Run with network visibility
&lt;/h3&gt;

&lt;p&gt;If the skill matters, I want to see where it talks.&lt;/p&gt;

&lt;p&gt;At minimum, run it in an environment where outbound requests are observable.&lt;/p&gt;

&lt;p&gt;For example, route traffic through a proxy or log DNS and HTTP requests from the container.&lt;/p&gt;

&lt;p&gt;Even simple logging helps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--env-file&lt;/span&gt; .env.test &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network&lt;/span&gt; bridge &lt;span class="se"&gt;\&lt;/span&gt;
  my-openclaw-skill:review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unexpected domains&lt;/li&gt;
&lt;li&gt;retry storms&lt;/li&gt;
&lt;li&gt;webhook posts&lt;/li&gt;
&lt;li&gt;calls to analytics or referral endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Prefer generated narrow skills over marketplace bundles
&lt;/h3&gt;

&lt;p&gt;This is the biggest one.&lt;/p&gt;

&lt;p&gt;Ask GPT-5.4 or Claude Opus 4.6 to generate the smallest possible skill from a precise spec, then review the code like you would review a small internal script.&lt;/p&gt;

&lt;p&gt;If the scope is tiny, the review is manageable.&lt;/p&gt;

&lt;p&gt;If the scope is huge, your review is theater.&lt;/p&gt;

&lt;p&gt;And theater is exactly what “passed scan” often turns into.&lt;/p&gt;

&lt;p&gt;Here’s the kind of spec I’d rather use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build an OpenClaw skill that:
- accepts an approved SKU and quantity
- queries only api.mcmaster.com
- uses sandbox credentials only
- returns a dry-run order summary
- never submits payment
- never sends messages
- logs every outbound request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is reviewable.&lt;/p&gt;

&lt;p&gt;“Smart purchasing assistant” is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Convenience is real. Incentive drift is also real.
&lt;/h2&gt;

&lt;p&gt;I get the counterargument.&lt;/p&gt;

&lt;p&gt;Not every third-party skill is malicious.&lt;/p&gt;

&lt;p&gt;One Reddit commenter basically said the whole point of a skill is that you don’t have to do the work yourself.&lt;/p&gt;

&lt;p&gt;That’s fair.&lt;/p&gt;

&lt;p&gt;For low-risk tasks, convenience may absolutely win.&lt;/p&gt;

&lt;p&gt;But the mistake is assuming a clean skill is therefore an aligned skill.&lt;/p&gt;

&lt;p&gt;Those are different questions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Malware review asks: what code is in the package?&lt;/li&gt;
&lt;li&gt;Behavior review asks: what outcomes does this skill push my agent toward at runtime?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second question is newer, weirder, and much more relevant for agents.&lt;/p&gt;

&lt;p&gt;Especially when the agent can spend money.&lt;/p&gt;

&lt;h2&gt;
  
  
  The safest default is smaller than you think
&lt;/h2&gt;

&lt;p&gt;I think most teams are going to learn the same lesson npm users learned years ago, just with more expensive consequences.&lt;/p&gt;

&lt;p&gt;The safest default is not “install fewer bad skills.”&lt;/p&gt;

&lt;p&gt;It’s grant less power, use narrower code, and assume incentives leak into behavior.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use third-party OpenClaw skills only for low-risk actions.&lt;/li&gt;
&lt;li&gt;Isolate credentials and payment methods during testing.&lt;/li&gt;
&lt;li&gt;Inspect every outbound network call.&lt;/li&gt;
&lt;li&gt;Pin reviewed versions or commits.&lt;/li&gt;
&lt;li&gt;Generate narrow in-house skills for anything valuable.&lt;/li&gt;
&lt;li&gt;Use an OpenAI-compatible API setup that makes heavy testing affordable enough to actually do.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last point matters more than people admit.&lt;/p&gt;

&lt;p&gt;Safer agent development usually means more iterations, more adversarial tests, and more code generation passes.&lt;/p&gt;

&lt;p&gt;If your pricing model punishes that behavior, teams will test less.&lt;/p&gt;

&lt;p&gt;If you want predictable cost while building and running agents on n8n, Make, Zapier, OpenClaw, or custom workflows, flat-rate infrastructure like Standard Compute is a much better operational fit than per-token billing.&lt;/p&gt;

&lt;p&gt;That’s the boring answer.&lt;/p&gt;

&lt;p&gt;It’s also the answer that keeps your agent from turning “reorder HVAC filters” into “why did we just buy from a weird referral storefront and send a receipt to an undocumented endpoint?”&lt;/p&gt;

&lt;p&gt;Once I started thinking about OpenClaw skills this way, I stopped seeing a marketplace.&lt;/p&gt;

&lt;p&gt;I started seeing a pile of tiny automation contractors, each asking for API keys, each with their own incentives, and some of them holding my card.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>automation</category>
      <category>devops</category>
    </item>
    <item>
      <title>I thought I needed a better tool-calling model, but my agent just had too many tools</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 24 Jun 2026 20:51:57 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-thought-i-needed-a-better-tool-calling-model-but-my-agent-just-had-too-many-tools-30bi</link>
      <guid>https://dev.to/lars_winstand/i-thought-i-needed-a-better-tool-calling-model-but-my-agent-just-had-too-many-tools-30bi</guid>
      <description>&lt;p&gt;A few months ago, I would have blamed the model.&lt;/p&gt;

&lt;p&gt;Agent picks the wrong function? Fine, upgrade to GPT-5. Try Claude Opus. Add a smarter router. Maybe benchmark Qwen or Llama if you’re running local.&lt;/p&gt;

&lt;p&gt;That instinct is everywhere because model swaps feel clean. Refactoring your agent’s tool surface feels like admitting the architecture is a mess.&lt;/p&gt;

&lt;p&gt;Then I ran into a couple of OpenClaw threads that made the problem look a lot less like “model intelligence” and a lot more like “bad menu design.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug didn’t look like a model bug
&lt;/h2&gt;

&lt;p&gt;In one r/openclaw thread about hidden tools, a user was trying to get an agent to send a specific image file.&lt;/p&gt;

&lt;p&gt;Instead of using the media-sending path, the agent kept reaching for &lt;code&gt;sessions_send&lt;/code&gt;, which appeared to be intended for internal agent-to-agent communication.&lt;/p&gt;

&lt;p&gt;That’s already bad.&lt;/p&gt;

&lt;p&gt;But the more interesting detail was this: the user said the &lt;code&gt;message&lt;/code&gt; capability only triggered reliably with a specific prompt, and even then only around 8/10 times. They described it like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The tool is hidden in a way where the agent will flat out refuse its existence completely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not a “buy a smarter model” problem.&lt;/p&gt;

&lt;p&gt;That is a visibility problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your agent may be doing exactly what you showed it
&lt;/h2&gt;

&lt;p&gt;When people ask for the best model for tool calling, they usually mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which model is best at choosing the next action?&lt;/li&gt;
&lt;li&gt;Which model recovers best from ambiguous instructions?&lt;/li&gt;
&lt;li&gt;Which model handles a large tool inventory without getting confused?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are valid questions.&lt;/p&gt;

&lt;p&gt;A stronger model absolutely helps. GPT-5 will usually route better than weaker models. Claude Opus often does better when tool descriptions are fuzzy. Smaller open models can fall apart faster when context gets noisy.&lt;/p&gt;

&lt;p&gt;But there’s a more boring explanation for a lot of agent failures:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;you exposed the wrong tool, at the wrong time, with the wrong description.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;sessions_send&lt;/code&gt; is visible during a normal user request, and &lt;code&gt;message&lt;/code&gt; is semi-hidden, the model is not failing an IQ test.&lt;/p&gt;

&lt;p&gt;It’s picking from a bad action menu.&lt;/p&gt;

&lt;h2&gt;
  
  
  This is an architecture problem, not just a model problem
&lt;/h2&gt;

&lt;p&gt;A lot of teams still build agents like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;attach every function&lt;/li&gt;
&lt;li&gt;attach every skill&lt;/li&gt;
&lt;li&gt;expose giant schemas&lt;/li&gt;
&lt;li&gt;hope GPT-5 or Claude sorts it out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That works for demos.&lt;/p&gt;

&lt;p&gt;It breaks in production.&lt;/p&gt;

&lt;p&gt;Once the tool surface gets large, wrong-tool selection starts looking like “the model is dumb” when it’s really:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too many visible options&lt;/li&gt;
&lt;li&gt;overlapping descriptions&lt;/li&gt;
&lt;li&gt;internal-only tools exposed to user-facing flows&lt;/li&gt;
&lt;li&gt;no task-level routing&lt;/li&gt;
&lt;li&gt;no tracing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’ve ever watched an n8n agent, Make scenario, Zapier AI step, or OpenClaw workflow call the weirdest possible function, this is probably familiar.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI’s guidance quietly points in the same direction
&lt;/h2&gt;

&lt;p&gt;One thing that surprised me: OpenAI’s current function-calling guidance leans toward reducing what the model sees, not just upgrading the model.&lt;/p&gt;

&lt;p&gt;If you have lots of functions or large schemas, the recommendation is to avoid making the model evaluate the entire function inventory every turn.&lt;/p&gt;

&lt;p&gt;That leads to two practical patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Narrow the visible toolset yourself based on task or workflow stage.&lt;/li&gt;
&lt;li&gt;Use deferred tool loading so rarely used functions are only considered when needed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the opposite of the usual “just expose everything” instinct.&lt;/p&gt;

&lt;p&gt;And honestly, it makes sense.&lt;/p&gt;

&lt;p&gt;If the user is clearly trying to send a file, why is the model staring at 40 unrelated actions?&lt;/p&gt;

&lt;h2&gt;
  
  
  The easiest way to break tool calling
&lt;/h2&gt;

&lt;p&gt;Here’s a simple bad example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sessions_send&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;book_calendar&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;summarize_analytics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extract_webpage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_invoice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;transcode_media&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Send the latest screenshot to the customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is easy to build.&lt;/p&gt;

&lt;p&gt;It is also how you end up debugging nonsense at 2 a.m.&lt;/p&gt;

&lt;p&gt;Now compare that to task-scoped exposure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tools_for_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;find_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prepare_attachment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support_lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summarize_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calendar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_availability&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;book_calendar&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fallback_answer&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;visible_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tools_for_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Send the latest screenshot to the customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;visible_tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same model.&lt;/p&gt;

&lt;p&gt;Much better odds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The LangChain mental model is actually useful here
&lt;/h2&gt;

&lt;p&gt;LangChain describes an agent as something like:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;model + harness&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That harness matters more than people admit.&lt;/p&gt;

&lt;p&gt;Your harness includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the system prompt&lt;/li&gt;
&lt;li&gt;the tool list&lt;/li&gt;
&lt;li&gt;tool descriptions&lt;/li&gt;
&lt;li&gt;middleware&lt;/li&gt;
&lt;li&gt;routing logic&lt;/li&gt;
&lt;li&gt;memory setup&lt;/li&gt;
&lt;li&gt;retries and guards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the harness is sloppy, a stronger model can help, but it won’t fully save you.&lt;/p&gt;

&lt;p&gt;The tiny LangChain examples work because the action space is tiny:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_agent&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get weather for a given city.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s always sunny in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-5.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One tool is easy.&lt;/p&gt;

&lt;p&gt;Fifty tools with overlapping semantics is where things get ugly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Weird output is often a tool-surface bug in disguise
&lt;/h2&gt;

&lt;p&gt;A second OpenClaw thread made this even clearer.&lt;/p&gt;

&lt;p&gt;A user reported a bizarre block of corrupted-looking output. Another user replied that Graphify was the culprit:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It would send huge blocks of numbers, random things about Belarus, text in mandarin, and so on. Removed it, problem gone.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a perfect debugging story because it kills the usual fantasy that every weird agent behavior is frontier-model instability.&lt;/p&gt;

&lt;p&gt;Sometimes the fix is not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lower temperature&lt;/li&gt;
&lt;li&gt;switch models&lt;/li&gt;
&lt;li&gt;rewrite prompts for three days&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sometimes the fix is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remove the bad tool&lt;/li&gt;
&lt;li&gt;hide the internal tool&lt;/li&gt;
&lt;li&gt;stop exposing experimental skills in production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That should make you more paranoid about your tool surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually works better in practice
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What happens in practice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Expose all tools to one agent&lt;/td&gt;
&lt;td&gt;Fast to ship, but wrong-tool selection gets much more likely and debugging becomes guesswork&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task-scoped tool exposure&lt;/td&gt;
&lt;td&gt;Better reliability because only relevant actions are visible, but you need routing logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deferred tool loading&lt;/td&gt;
&lt;td&gt;Great when you have a large function inventory, but requires support in your stack and a bit more design work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For broad research agents, wide exposure can make sense.&lt;/p&gt;

&lt;p&gt;For most production automations, it usually doesn’t.&lt;/p&gt;

&lt;p&gt;That includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n workflows&lt;/li&gt;
&lt;li&gt;Make scenarios&lt;/li&gt;
&lt;li&gt;Zapier AI actions&lt;/li&gt;
&lt;li&gt;OpenClaw skills&lt;/li&gt;
&lt;li&gt;support agents&lt;/li&gt;
&lt;li&gt;file-processing workers&lt;/li&gt;
&lt;li&gt;internal ops bots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In those systems, exposing everything is usually laziness disguised as flexibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five things I’d fix before changing models
&lt;/h2&gt;

&lt;p&gt;Before you benchmark GPT-5 vs Claude Opus vs Qwen vs Llama, do this first.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Hide internal-only functions completely
&lt;/h3&gt;

&lt;p&gt;If a function is not meant for user-facing requests, don’t leave it in the visible inventory.&lt;/p&gt;

&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sessions_send&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;user_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;internal_tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sessions_send&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Put internal tools behind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a supervisor&lt;/li&gt;
&lt;li&gt;middleware&lt;/li&gt;
&lt;li&gt;a separate worker&lt;/li&gt;
&lt;li&gt;a sub-agent with its own policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the user-facing model can see internal transport primitives, you’re inviting failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Expose tools by task, not by global capability
&lt;/h3&gt;

&lt;p&gt;A file-send request should not expose unrelated functions just because they exist somewhere in the system.&lt;/p&gt;

&lt;p&gt;Good rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;file task -&amp;gt; file lookup, attachment prep, send message&lt;/li&gt;
&lt;li&gt;calendar task -&amp;gt; availability, scheduling, confirmation&lt;/li&gt;
&lt;li&gt;research task -&amp;gt; search, fetch, summarize&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don’t make the model sort through your entire company’s API surface every turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Rewrite tool descriptions like UI labels
&lt;/h3&gt;

&lt;p&gt;Most tool descriptions are terrible.&lt;/p&gt;

&lt;p&gt;They read like internal notes from a sleep-deprived engineer.&lt;/p&gt;

&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sessions_send&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Sends session data.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sessions_send&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Internal-only transport for agent-to-agent communication. Never use for replying to end users or sending media/files.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Message capability.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send a user-facing message or media attachment to the external recipient. Use this for normal outbound communication.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Descriptions are part of the interface.&lt;/p&gt;

&lt;p&gt;Treat them that way.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Turn on tracing before you start guessing
&lt;/h3&gt;

&lt;p&gt;If you’re using LangChain, enable LangSmith tracing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LANGSMITH_TRACING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
export &lt;/span&gt;&lt;span class="nv"&gt;LANGSMITH_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then inspect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which tools were visible&lt;/li&gt;
&lt;li&gt;what the model actually saw&lt;/li&gt;
&lt;li&gt;which tool it selected&lt;/li&gt;
&lt;li&gt;whether the descriptions overlapped&lt;/li&gt;
&lt;li&gt;whether the prompt accidentally encouraged the wrong path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without tracing, you’re doing folklore engineering.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Only then test model upgrades
&lt;/h3&gt;

&lt;p&gt;After the tool surface is clean, then benchmark models.&lt;/p&gt;

&lt;p&gt;That’s when the question becomes real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does GPT-5 route better on long-context tasks?&lt;/li&gt;
&lt;li&gt;Does Claude Opus recover better from ambiguity?&lt;/li&gt;
&lt;li&gt;Is Qwen or Llama good enough for this narrower action space?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before cleanup, those benchmarks are noisy because you’re measuring architecture mistakes as if they were model quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical routing pattern for automations
&lt;/h2&gt;

&lt;p&gt;If you’re building automations in n8n, Make, Zapier, or a custom worker setup, a simple pattern works well:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;classify the task&lt;/li&gt;
&lt;li&gt;expose only the relevant tool subset&lt;/li&gt;
&lt;li&gt;call the model&lt;/li&gt;
&lt;li&gt;execute the selected tool&lt;/li&gt;
&lt;li&gt;trace everything&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schedule&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calendar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calendar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;TASK_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;send_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;find_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prepare_attachment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calendar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;check_availability&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;book_calendar&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallback_answer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Send the latest screenshot to the customer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;visible_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TASK_TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not fancy.&lt;/p&gt;

&lt;p&gt;That’s the point.&lt;/p&gt;

&lt;p&gt;Boring routing beats magical thinking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Standard Compute fits into this
&lt;/h2&gt;

&lt;p&gt;Once you start cleaning up tool exposure, another issue shows up fast: testing agents properly burns a lot of model calls.&lt;/p&gt;

&lt;p&gt;You don’t just run one prompt.&lt;/p&gt;

&lt;p&gt;You run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated tool-selection tests&lt;/li&gt;
&lt;li&gt;routing experiments&lt;/li&gt;
&lt;li&gt;long workflow simulations&lt;/li&gt;
&lt;li&gt;retries across different models&lt;/li&gt;
&lt;li&gt;regression checks after every harness change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gets expensive fast if you’re paying per token.&lt;/p&gt;

&lt;p&gt;This is exactly why I think flat-rate AI infrastructure is underrated for agent teams.&lt;/p&gt;

&lt;p&gt;With Standard Compute, you can use an OpenAI-compatible API and run those iterations without watching token spend every hour. It’s especially useful if you’re building agents and automations that need constant testing across GPT-5.4, Claude Opus 4.6, and Grok 4.20.&lt;/p&gt;

&lt;p&gt;If your workflow already uses an OpenAI-compatible SDK, the drop-in swap is straightforward.&lt;/p&gt;

&lt;p&gt;That matters because tool-routing bugs are rarely fixed in one shot. You need room to test, trace, and rerun.&lt;/p&gt;

&lt;p&gt;Per-token pricing makes people under-test the exact systems that most need repeated evaluation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What if the model really is the problem?
&lt;/h2&gt;

&lt;p&gt;Sometimes it is.&lt;/p&gt;

&lt;p&gt;Long context can hurt tool selection. Weak memory setup can pollute intent. Smaller models can struggle with subtle distinctions even when the tool surface is decent.&lt;/p&gt;

&lt;p&gt;And yes, stronger models often help a lot.&lt;/p&gt;

&lt;p&gt;But I think too many teams ask:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;what’s the best model for tool calling?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;before they ask:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;why can this agent see that function at all?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That second question is less exciting.&lt;/p&gt;

&lt;p&gt;It also fixes more bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;If your agent keeps choosing the wrong function, don’t start with a model swap.&lt;/p&gt;

&lt;p&gt;Start with the menu.&lt;/p&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which tools are visible&lt;/li&gt;
&lt;li&gt;which ones are internal-only&lt;/li&gt;
&lt;li&gt;whether descriptions overlap&lt;/li&gt;
&lt;li&gt;whether task-level routing exists&lt;/li&gt;
&lt;li&gt;whether you have tracing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of “model stupidity” is really tool-surface sloppiness.&lt;/p&gt;

&lt;p&gt;Once I started looking at agent failures this way, debugging got much less mystical.&lt;/p&gt;

&lt;p&gt;Less begging GPT-5 to read my mind.&lt;/p&gt;

&lt;p&gt;More designing a system that gives the model a fair shot.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>openai</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
