<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lars Winstand</title>
    <description>The latest articles on DEV Community by Lars Winstand (@lars_winstand).</description>
    <link>https://dev.to/lars_winstand</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908932%2Feb8bc1ff-405f-4ef0-8204-ba1ed7caa59f.jpeg</url>
      <title>DEV Community: Lars Winstand</title>
      <link>https://dev.to/lars_winstand</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lars_winstand"/>
    <language>en</language>
    <item>
      <title>I stopped trying to make my agent fully autonomous and made it ask my phone first</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 11 Jun 2026 17:49:26 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-stopped-trying-to-make-my-agent-fully-autonomous-and-made-it-ask-my-phone-first-40h2</link>
      <guid>https://dev.to/lars_winstand/i-stopped-trying-to-make-my-agent-fully-autonomous-and-made-it-ask-my-phone-first-40h2</guid>
      <description>&lt;h1&gt;
  
  
  I stopped trying to make my agent fully autonomous and made it ask my phone first
&lt;/h1&gt;

&lt;p&gt;The safest pattern I’ve found for agent workflows is not full autonomy.&lt;/p&gt;

&lt;p&gt;It’s a pause-before-action approval step on your phone.&lt;/p&gt;

&lt;p&gt;If the agent wants to charge a card, delete records, send a customer message, or change an account setting, it has to stop and ask first.&lt;/p&gt;

&lt;p&gt;For custom Python agents, LangGraph already gives you this with &lt;code&gt;interrupt()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For automation teams, n8n can do almost the same thing with a Wait node and a unique &lt;code&gt;$execution.resumeUrl&lt;/code&gt; per run.&lt;/p&gt;

&lt;p&gt;That one design choice has done more for production safety than any prompt tweak I’ve tried.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with “fully autonomous” agents
&lt;/h2&gt;

&lt;p&gt;A lot of agent demos look great right up until money or external side effects show up.&lt;/p&gt;

&lt;p&gt;Draft a reply? Fine.&lt;/p&gt;

&lt;p&gt;Summarize tickets? Fine.&lt;/p&gt;

&lt;p&gt;Pull context from Notion, HubSpot, Linear, Gmail, and Slack? Great.&lt;/p&gt;

&lt;p&gt;But the second the agent wants to do one of these, the mood changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;send money&lt;/li&gt;
&lt;li&gt;delete data&lt;/li&gt;
&lt;li&gt;email a customer&lt;/li&gt;
&lt;li&gt;change billing settings&lt;/li&gt;
&lt;li&gt;publish externally&lt;/li&gt;
&lt;li&gt;update production systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not a toy workflow anymore.&lt;/p&gt;

&lt;p&gt;That’s an irreversible action.&lt;/p&gt;

&lt;p&gt;And when teams get burned, it usually isn’t because GPT-5, Claude Opus, Grok, or Llama suddenly became useless.&lt;/p&gt;

&lt;p&gt;It’s because the workflow had no clean stop condition.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern: ask a human right before the side effect
&lt;/h2&gt;

&lt;p&gt;I like this pattern because it’s boring.&lt;/p&gt;

&lt;p&gt;Boring is good when the alternative is “the agent deleted the wrong records at 2:14 AM.”&lt;/p&gt;

&lt;p&gt;The idea is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Let the agent do the expensive reasoning.&lt;/li&gt;
&lt;li&gt;Let it gather context, draft the action, and prepare the payload.&lt;/li&gt;
&lt;li&gt;Pause right before the risky tool call.&lt;/li&gt;
&lt;li&gt;Send an approval request to a human.&lt;/li&gt;
&lt;li&gt;Resume only if approved.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That gives you a useful middle ground:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;not full autonomy&lt;/li&gt;
&lt;li&gt;not useless read-only agents&lt;/li&gt;
&lt;li&gt;not “hope the evals catch it”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just a hard boundary before damage can happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangGraph makes this surprisingly clean
&lt;/h2&gt;

&lt;p&gt;If you’re building custom agents, LangGraph already has the right primitive: &lt;code&gt;interrupt()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Minimal example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;interrupt&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;approval_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;approved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;interrupt&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund_charge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;approved&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pauses execution and waits for a human to resume it.&lt;/p&gt;

&lt;p&gt;The important part is that LangGraph can persist state, so this is not a hacky sleep loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  What you actually need for this to work
&lt;/h3&gt;

&lt;p&gt;You need durable state and a thread ID.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund-ord-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And your interrupt payload should be JSON-serializable if you want to send it to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a mobile app&lt;/li&gt;
&lt;li&gt;Slack&lt;/li&gt;
&lt;li&gt;Telegram&lt;/li&gt;
&lt;li&gt;a custom approval page&lt;/li&gt;
&lt;li&gt;SMS + webhook flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A more realistic sketch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;interrupt&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;risky_action_gate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;approval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;interrupt&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approval_required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;change_billing_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;before&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;proposed_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requested_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requested_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;approval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rejected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then your external approval handler can resume the graph after the human taps approve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why &lt;code&gt;interrupt()&lt;/code&gt; changes the behavior of the whole agent
&lt;/h2&gt;

&lt;p&gt;Before &lt;code&gt;interrupt()&lt;/code&gt;, the agent is basically trusted to make the final call.&lt;/p&gt;

&lt;p&gt;After &lt;code&gt;interrupt()&lt;/code&gt;, the agent becomes a preparer.&lt;/p&gt;

&lt;p&gt;That’s a much better role for LLMs in risky workflows.&lt;/p&gt;

&lt;p&gt;Let the model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gather context&lt;/li&gt;
&lt;li&gt;decide what action it thinks should happen&lt;/li&gt;
&lt;li&gt;build the draft payload&lt;/li&gt;
&lt;li&gt;explain why&lt;/li&gt;
&lt;li&gt;show a diff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But let a human own the irreversible yes/no.&lt;/p&gt;

&lt;p&gt;That split is practical.&lt;/p&gt;

&lt;h2&gt;
  
  
  n8n can do this without custom agent infrastructure
&lt;/h2&gt;

&lt;p&gt;This is the part more teams should pay attention to.&lt;/p&gt;

&lt;p&gt;You do not need to build a full agent runtime to get this pattern.&lt;/p&gt;

&lt;p&gt;In n8n, use the Wait node.&lt;/p&gt;

&lt;p&gt;For approvals, the useful mode is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;On Webhook Call&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workflow can prepare the action, hit the Wait node, and send the unique &lt;code&gt;{{$execution.resumeUrl}}&lt;/code&gt; somewhere a human can tap from their phone.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slack message with Approve / Reject buttons&lt;/li&gt;
&lt;li&gt;Telegram bot message&lt;/li&gt;
&lt;li&gt;email with approval link&lt;/li&gt;
&lt;li&gt;internal mobile-friendly approval page&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  n8n approval flow example
&lt;/h3&gt;

&lt;p&gt;A practical flow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trigger from support ticket / CRM event / webhook&lt;/li&gt;
&lt;li&gt;Use OpenAI-compatible chat model to analyze the request&lt;/li&gt;
&lt;li&gt;Build a proposed action&lt;/li&gt;
&lt;li&gt;If action is risky, route to Wait node&lt;/li&gt;
&lt;li&gt;Send &lt;code&gt;{{$execution.resumeUrl}}&lt;/code&gt; to approver&lt;/li&gt;
&lt;li&gt;Resume only on approval&lt;/li&gt;
&lt;li&gt;Execute Stripe / HubSpot / database / email action&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The nice part is that each execution gets its own unique resume URL.&lt;/p&gt;

&lt;p&gt;That means multiple runs can pause safely at the same time.&lt;/p&gt;

&lt;p&gt;No weird global state.&lt;/p&gt;

&lt;p&gt;No polling mess.&lt;/p&gt;

&lt;p&gt;No “which request was this approval for?” confusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: n8n sends a phone approval before a refund
&lt;/h2&gt;

&lt;p&gt;Imagine this workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer asks for a refund&lt;/li&gt;
&lt;li&gt;AI agent reviews order history and policy&lt;/li&gt;
&lt;li&gt;Workflow calculates recommended refund amount&lt;/li&gt;
&lt;li&gt;Human gets a phone approval link&lt;/li&gt;
&lt;li&gt;Stripe refund only happens after approval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The approval message should include real details, not vague summaries.&lt;/p&gt;

&lt;p&gt;Good:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Approve refund?
Customer: cus_123
Order: ord_456
Amount: $84.00
Reason: duplicate charge
Destination: Stripe refund to original payment method
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Approve customer update?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That second version is how people accidentally approve nonsense.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick implementation sketch for n8n
&lt;/h2&gt;

&lt;p&gt;You can wire this up with something like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI node or HTTP Request node for model output&lt;/li&gt;
&lt;li&gt;IF node to detect risky action&lt;/li&gt;
&lt;li&gt;Wait node in webhook mode&lt;/li&gt;
&lt;li&gt;HTTP Request / Slack / Telegram node to send approval link&lt;/li&gt;
&lt;li&gt;downstream action node after resume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re generating the message in an expression, you might do something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Action: refund_charge
Customer: {{$json.customerId}}
Amount: {{$json.amount}}
Reason: {{$json.reason}}
Approve: {{$execution.resumeUrl}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s enough to get a durable human-in-the-loop gate into a real workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which option should you use?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Best use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph interrupts&lt;/td&gt;
&lt;td&gt;Custom Python agents that need precise pause/resume behavior before dangerous tool calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n Wait node approvals&lt;/td&gt;
&lt;td&gt;Low-code or ops-heavy workflows that need a simple phone-friendly approval step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slack escalation / human fallback&lt;/td&gt;
&lt;td&gt;Cases where the model is uncertain and needs review, not necessarily hard approval&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My take:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you already build agents in Python, use LangGraph.&lt;/li&gt;
&lt;li&gt;If your team lives in n8n, use Wait nodes.&lt;/li&gt;
&lt;li&gt;If the issue is uncertainty rather than danger, use escalation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But for payments, deletes, sends, and account changes, I’d default to explicit approval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Yes, this adds friction. That’s the point.
&lt;/h2&gt;

&lt;p&gt;Developers often treat friction like failure.&lt;/p&gt;

&lt;p&gt;For risky actions, friction is a feature.&lt;/p&gt;

&lt;p&gt;You want a human to see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the exact recipient&lt;/li&gt;
&lt;li&gt;the amount&lt;/li&gt;
&lt;li&gt;the records being deleted&lt;/li&gt;
&lt;li&gt;the before/after diff&lt;/li&gt;
&lt;li&gt;the destination account&lt;/li&gt;
&lt;li&gt;the outbound message text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A 5-second approval pause is much cheaper than a 3-hour cleanup.&lt;/p&gt;

&lt;p&gt;The trick is selective approval, not approval everywhere.&lt;/p&gt;

&lt;p&gt;Do not make a human approve every low-risk classification or summary.&lt;/p&gt;

&lt;p&gt;Do make a human approve things that can cost money, affect users, or break systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What still goes wrong
&lt;/h2&gt;

&lt;p&gt;This pattern helps a lot, but it doesn’t solve everything.&lt;/p&gt;

&lt;p&gt;The biggest failure mode is bad approval UX.&lt;/p&gt;

&lt;p&gt;If the phone prompt is vague, the human is still approving blind.&lt;/p&gt;

&lt;p&gt;Your approval request should show:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What action will happen&lt;/li&gt;
&lt;li&gt;Who or what it affects&lt;/li&gt;
&lt;li&gt;Any amount, destination, or recipient&lt;/li&gt;
&lt;li&gt;A preview or diff&lt;/li&gt;
&lt;li&gt;Enough context to spot something weird&lt;/li&gt;
&lt;li&gt;A clear audit trail&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the human is approving a fluffy model-generated summary instead of concrete facts, that is not oversight.&lt;/p&gt;

&lt;p&gt;That is just outsourcing the mistake to a smaller screen.&lt;/p&gt;

&lt;h2&gt;
  
  
  This also changes the economics of agent workflows
&lt;/h2&gt;

&lt;p&gt;There’s another effect people miss.&lt;/p&gt;

&lt;p&gt;Once you add a safe approval boundary, teams usually let agents do more real work before the final click.&lt;/p&gt;

&lt;p&gt;That means more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;planning&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;context gathering&lt;/li&gt;
&lt;li&gt;tool orchestration&lt;/li&gt;
&lt;li&gt;draft generation&lt;/li&gt;
&lt;li&gt;verification passes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: more inference.&lt;/p&gt;

&lt;p&gt;And that’s exactly where per-token pricing starts to feel annoying.&lt;/p&gt;

&lt;p&gt;If your agents are running all day across support, ops, billing, routing, and account workflows, you end up optimizing around cost instead of usefulness.&lt;/p&gt;

&lt;p&gt;That’s why flat-rate compute is a much better fit for this style of automation.&lt;/p&gt;

&lt;p&gt;With Standard Compute, you can run OpenAI-compatible agent workflows without babysitting token spend every time the agent needs another reasoning pass, another tool call, or another approval-prep step.&lt;/p&gt;

&lt;p&gt;For teams building in n8n, Make, Zapier, OpenClaw, or custom stacks, that matters a lot more than people admit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concrete rule I’d use in production
&lt;/h2&gt;

&lt;p&gt;Here’s the rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the task is reversible and low-risk, automate it.&lt;/li&gt;
&lt;li&gt;If the task is high-risk and irreversible, pause for approval.&lt;/li&gt;
&lt;li&gt;If the task is ambiguous, escalate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;Simple rules beat fancy agent philosophy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;The best pattern I’ve found for risky agent work is not “make the model smarter until trust appears.”&lt;/p&gt;

&lt;p&gt;It’s “make the boundary sharper.”&lt;/p&gt;

&lt;p&gt;Put the checkpoint right before the side effect.&lt;/p&gt;

&lt;p&gt;Let the agent think.&lt;/p&gt;

&lt;p&gt;Let the human approve.&lt;/p&gt;

&lt;p&gt;If I were designing a production workflow today in LangGraph, n8n, OpenClaw, Make, or Zapier, phone approval for risky actions would be a default primitive.&lt;/p&gt;

&lt;p&gt;Not an enterprise add-on.&lt;/p&gt;

&lt;p&gt;Not a future enhancement.&lt;/p&gt;

&lt;p&gt;A default.&lt;/p&gt;

&lt;p&gt;Because that’s the first agent pattern I’ve used that feels like something I’d actually trust on a Tuesday.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>n8n</category>
      <category>python</category>
    </item>
    <item>
      <title>If your agent touches health data, do the boring part first</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Thu, 11 Jun 2026 09:37:22 +0000</pubDate>
      <link>https://dev.to/lars_winstand/if-your-agent-touches-health-data-do-the-boring-part-first-3h6f</link>
      <guid>https://dev.to/lars_winstand/if-your-agent-touches-health-data-do-the-boring-part-first-3h6f</guid>
      <description>&lt;p&gt;I’ll say it plainly: the first health-adjacent agent workflow I’d trust is not an AI doctor.&lt;/p&gt;

&lt;p&gt;It’s a narrow pipeline that takes 6 months of Apple Watch sleep data, cleans timestamps, maps records into a fixed sleep-diary schema, flags broken rows, and stops for human review before anything reaches a clinician.&lt;/p&gt;

&lt;p&gt;That sounds unsexy.&lt;/p&gt;

&lt;p&gt;Good.&lt;/p&gt;

&lt;p&gt;That’s exactly why it’s the first version I’d trust.&lt;/p&gt;

&lt;p&gt;I landed on this after reading a post on r/openclaw where someone said they had their AI assistant turn months of Apple Watch sleep data into the diary their sleep clinic requested, and the data gotchas were brutal.&lt;/p&gt;

&lt;p&gt;That sentence contains the whole product.&lt;/p&gt;

&lt;p&gt;Not “AI healthcare.”&lt;br&gt;
Not “autonomous wellness.”&lt;br&gt;
Not a GPT-5 wrapper with a soothing UI pretending it understands sleep medicine.&lt;/p&gt;

&lt;p&gt;Just a very practical engineering problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parse ugly export data&lt;/li&gt;
&lt;li&gt;normalize time boundaries&lt;/li&gt;
&lt;li&gt;fit it into a clinician-friendly format&lt;/li&gt;
&lt;li&gt;fail loudly on bad rows&lt;/li&gt;
&lt;li&gt;require a human to approve it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a real use case.&lt;/p&gt;

&lt;p&gt;And if you build automations in n8n, Make, Zapier, OpenClaw, or Python, it should feel familiar: the hard part is not the final prompt. The hard part is the ugly middle.&lt;/p&gt;
&lt;h2&gt;
  
  
  The hard part is ETL, not reasoning
&lt;/h2&gt;

&lt;p&gt;Most health-agent demos skip the only part that matters.&lt;/p&gt;

&lt;p&gt;They show the polished summary. They show Claude or GPT-5 saying something calm and articulate. They show a dashboard.&lt;/p&gt;

&lt;p&gt;I don’t think that’s the hard part.&lt;/p&gt;

&lt;p&gt;The hard part is ETL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extraction&lt;/li&gt;
&lt;li&gt;transformation&lt;/li&gt;
&lt;li&gt;loading&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For sleep data, that means dealing with stuff like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;timestamps crossing midnight&lt;/li&gt;
&lt;li&gt;timezone normalization&lt;/li&gt;
&lt;li&gt;naps vs overnight sleep&lt;/li&gt;
&lt;li&gt;missing start or end times&lt;/li&gt;
&lt;li&gt;overlapping intervals&lt;/li&gt;
&lt;li&gt;gaps from the device not recording&lt;/li&gt;
&lt;li&gt;clinic-specific diary formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you get any of that wrong, the model summary at the end is not helpful. It is actively misleading.&lt;/p&gt;

&lt;p&gt;That’s why I think the boring pipeline is the real product.&lt;/p&gt;
&lt;h2&gt;
  
  
  The workflow I’d actually ship
&lt;/h2&gt;

&lt;p&gt;If I had to build this today, I would keep the architecture aggressively narrow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Apple Health export
  -&amp;gt; parse sleep records
  -&amp;gt; validate schema
  -&amp;gt; normalize timestamps and diary dates
  -&amp;gt; map into fixed sleep-diary format
  -&amp;gt; optionally generate plain-language notes
  -&amp;gt; require human approval
  -&amp;gt; export for clinic submission
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;No diagnosis.&lt;br&gt;
No treatment suggestions.&lt;br&gt;
No “you may have a circadian disorder” nonsense.&lt;/p&gt;

&lt;p&gt;Just a structured transformation pipeline with a review gate.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical implementation shape
&lt;/h2&gt;

&lt;p&gt;Here’s how I’d break it up in code or in an automation builder.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Parse the export deterministically
&lt;/h3&gt;

&lt;p&gt;Don’t ask an LLM to parse Apple Health exports if you can avoid it.&lt;/p&gt;

&lt;p&gt;Use deterministic code first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;csv&lt;/span&gt;

&lt;span class="n"&gt;REQUIRED_FIELDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_sleep_rows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;REQUIRED_FIELDS&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;row&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing fields: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;row&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bad timestamp: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;row&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end must be after start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is boring code.&lt;/p&gt;

&lt;p&gt;That’s the point.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Normalize diary boundaries explicitly
&lt;/h3&gt;

&lt;p&gt;Sleep data is annoying because humans think in nights, not rows.&lt;/p&gt;

&lt;p&gt;A sleep segment from 11:42 PM to 6:18 AM belongs to one sleep episode, but it spans two calendar dates.&lt;/p&gt;

&lt;p&gt;You need a rule.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;diary_date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Example rule: assign sleep to the date it started,
&lt;/span&gt;    &lt;span class="c1"&gt;# unless start time is before 6 AM, then assign to previous day.
&lt;/span&gt;    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You may choose a different rule.&lt;/p&gt;

&lt;p&gt;What matters is that the rule is explicit, testable, and visible to the reviewer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Validate before any model sees the data
&lt;/h3&gt;

&lt;p&gt;This is where most “agent” demos get sloppy.&lt;/p&gt;

&lt;p&gt;If rows overlap, if timestamps are missing, if timezone conversion changed the diary date, that should be surfaced before GPT-5, Claude, or any other model writes a nice paragraph about it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_intervals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;prev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;curr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;curr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;overlap detected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;previous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prev&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;curr&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If validation fails, stop.&lt;/p&gt;

&lt;p&gt;Not “best effort.”&lt;br&gt;
Not “the model can probably infer it.”&lt;/p&gt;

&lt;p&gt;Stop.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where an LLM actually helps
&lt;/h2&gt;

&lt;p&gt;I’m not arguing against LLMs.&lt;/p&gt;

&lt;p&gt;I’m arguing for using them in the one place they’re actually useful here: turning already-clean structured data into readable notes.&lt;/p&gt;

&lt;p&gt;Example prompt shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"diary_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-14"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sleep_start"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-14T23:42:00-08:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sleep_end"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-15T06:18:00-08:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_sleep_minutes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;396&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"awakenings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"missing_fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then ask the model for something narrow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write a short plain-language sleep diary note using only the provided fields.
Do not infer diagnosis.
Do not add medical advice.
If fields are missing, say that explicitly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a good LLM task.&lt;/p&gt;

&lt;p&gt;Freeform parsing of raw health exports is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you’re building this in n8n, Make, Zapier, or OpenClaw
&lt;/h2&gt;

&lt;p&gt;The pattern is the same no matter what stack you use.&lt;/p&gt;

&lt;h3&gt;
  
  
  n8n shape
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Webhook / File Trigger
-&amp;gt; Code node: parse export
-&amp;gt; IF node: validation errors?
-&amp;gt; Code node: normalize diary schema
-&amp;gt; OpenAI-compatible chat node: plain-language note generation
-&amp;gt; Human approval step
-&amp;gt; Export to CSV / email / EHR-compatible handoff
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Make shape
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Watch files
-&amp;gt; Parse JSON/XML/CSV
-&amp;gt; Router for validation failures
-&amp;gt; Transform records
-&amp;gt; LLM module for summary text
-&amp;gt; Approval scenario
-&amp;gt; Final export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Custom Python shape
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python ingest.py &lt;span class="nt"&gt;--input&lt;/span&gt; apple_health_export.xml
python validate.py &lt;span class="nt"&gt;--input&lt;/span&gt; parsed_sleep.json
python normalize.py &lt;span class="nt"&gt;--input&lt;/span&gt; validated_sleep.json
python summarize.py &lt;span class="nt"&gt;--input&lt;/span&gt; diary.json
python export.py &lt;span class="nt"&gt;--input&lt;/span&gt; reviewed_diary.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The stack is not the interesting part.&lt;/p&gt;

&lt;p&gt;The boundary design is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why multi-agent health workflows make me nervous
&lt;/h2&gt;

&lt;p&gt;I like agents. I build with them.&lt;/p&gt;

&lt;p&gt;I still think people reach for multi-agent setups way too early.&lt;/p&gt;

&lt;p&gt;If your workflow touches clinician-facing paperwork, every extra agent is another place for state to drift, retries to multiply, and outputs to become harder to audit.&lt;/p&gt;

&lt;p&gt;For this class of workflow, I want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one parser&lt;/li&gt;
&lt;li&gt;one validator&lt;/li&gt;
&lt;li&gt;one formatter&lt;/li&gt;
&lt;li&gt;one optional LLM summarizer&lt;/li&gt;
&lt;li&gt;one human reviewer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s enough.&lt;/p&gt;

&lt;p&gt;If you need three agents debating whether a 2:07 AM sleep segment belongs to Tuesday or Wednesday, your architecture is already too clever.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost trap shows up fast
&lt;/h2&gt;

&lt;p&gt;This is also where pricing matters more than people admit.&lt;/p&gt;

&lt;p&gt;A real workflow like this does not run once.&lt;/p&gt;

&lt;p&gt;It gets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tested on partial exports&lt;/li&gt;
&lt;li&gt;rerun after schema fixes&lt;/li&gt;
&lt;li&gt;retried after validation failures&lt;/li&gt;
&lt;li&gt;regenerated after human feedback&lt;/li&gt;
&lt;li&gt;replayed when formatting changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means lots of repeated calls.&lt;/p&gt;

&lt;p&gt;And if you’re paying per token every time the workflow loops, the architecture starts punishing you for being careful.&lt;/p&gt;

&lt;p&gt;That’s one reason I think flat-rate inference is underrated for production automations.&lt;/p&gt;

&lt;p&gt;If you’re using an OpenAI-compatible endpoint from Standard Compute, you can keep the same client setup while routing requests behind the scenes and avoid designing around token anxiety.&lt;/p&gt;

&lt;p&gt;That changes behavior.&lt;/p&gt;

&lt;p&gt;Teams are more willing to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add validation passes&lt;/li&gt;
&lt;li&gt;split deterministic steps from model steps&lt;/li&gt;
&lt;li&gt;retry safely&lt;/li&gt;
&lt;li&gt;keep human review in the loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a better engineering incentive than “please make fewer calls because finance is watching.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture rule that matters
&lt;/h2&gt;

&lt;p&gt;Here’s the rule I’d use beyond health data too:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;deterministic preprocessing first, model summarization second&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That applies to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sleep diaries&lt;/li&gt;
&lt;li&gt;invoices&lt;/li&gt;
&lt;li&gt;support tickets&lt;/li&gt;
&lt;li&gt;compliance forms&lt;/li&gt;
&lt;li&gt;CRM cleanup&lt;/li&gt;
&lt;li&gt;any workflow where bad structure upstream creates fake confidence downstream&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The safer pattern is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;parse&lt;/li&gt;
&lt;li&gt;validate&lt;/li&gt;
&lt;li&gt;normalize&lt;/li&gt;
&lt;li&gt;structure&lt;/li&gt;
&lt;li&gt;summarize&lt;/li&gt;
&lt;li&gt;review&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A lot of teams still do the reverse. They dump messy input into a model and hope the model invents structure on the way out.&lt;/p&gt;

&lt;p&gt;That works right up until the workflow matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’d require before calling this safe enough
&lt;/h2&gt;

&lt;p&gt;A few non-negotiables.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Separate source-derived fields from model-generated text
&lt;/h3&gt;

&lt;p&gt;If a timestamp came from Apple Health, label it as source-derived.&lt;/p&gt;

&lt;p&gt;If a sentence came from GPT-5 or Claude, label it as model-generated.&lt;/p&gt;

&lt;p&gt;Those are not the same thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Broken rows fail loudly
&lt;/h3&gt;

&lt;p&gt;Missing start time? Reject it.&lt;/p&gt;

&lt;p&gt;Overlapping intervals? Flag them.&lt;/p&gt;

&lt;p&gt;Timezone normalization changed the diary date? Show it.&lt;/p&gt;

&lt;p&gt;Silently smoothing over bad data is exactly how trust gets destroyed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Human review is a real gate
&lt;/h3&gt;

&lt;p&gt;Not a decorative checkbox.&lt;/p&gt;

&lt;p&gt;A human should be able to inspect the generated diary against the underlying records before it gets exported.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The workflow must admit uncertainty
&lt;/h3&gt;

&lt;p&gt;Wearable data is messy.&lt;br&gt;
Clinics want different formats.&lt;br&gt;
Some records will be incomplete.&lt;/p&gt;

&lt;p&gt;A good workflow should say “unknown” when something is unknown.&lt;/p&gt;

&lt;p&gt;That is a feature, not a failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The weird part: narrower feels smarter
&lt;/h2&gt;

&lt;p&gt;The more I think about this category, the more I think the best “health agent” barely feels like an agent.&lt;/p&gt;

&lt;p&gt;It feels like a disciplined conveyor belt with one carefully fenced-off language model near the end.&lt;/p&gt;

&lt;p&gt;No fake bedside manner.&lt;br&gt;
No diagnosis theater.&lt;br&gt;
No pretending a summary is the same thing as medical judgment.&lt;/p&gt;

&lt;p&gt;Just a boring pipeline that survives the ugly data problems, produces a structured artifact, and hands it to a human.&lt;/p&gt;

&lt;p&gt;That may sound small.&lt;/p&gt;

&lt;p&gt;I think it’s exactly the right size.&lt;/p&gt;

&lt;p&gt;And honestly, that lesson travels well outside health data too:&lt;/p&gt;

&lt;p&gt;The more sensitive the workflow, the less your automation should improvise.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>healthtech</category>
      <category>devops</category>
    </item>
    <item>
      <title>I thought creative AI needed better prompts, but it actually needed 4-step LLM routing</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 10 Jun 2026 17:37:44 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-thought-creative-ai-needed-better-prompts-but-it-actually-needed-4-step-llm-routing-5ehc</link>
      <guid>https://dev.to/lars_winstand/i-thought-creative-ai-needed-better-prompts-but-it-actually-needed-4-step-llm-routing-5ehc</guid>
      <description>&lt;p&gt;I keep seeing developers try to build a “creative AI agent” by writing one giant prompt and hoping GPT-5 or Claude Opus can do everything.&lt;/p&gt;

&lt;p&gt;That usually works for 10 minutes.&lt;/p&gt;

&lt;p&gt;Then the real workflow shows up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;research trends&lt;/li&gt;
&lt;li&gt;turn those into a usable brief&lt;/li&gt;
&lt;li&gt;generate mockups&lt;/li&gt;
&lt;li&gt;organize outputs for review&lt;/li&gt;
&lt;li&gt;wait for a human decision&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, the problem is no longer prompting.&lt;br&gt;
It’s routing.&lt;/p&gt;

&lt;p&gt;That clicked for me while reading a small r/openclaw thread from a jewelry designer. The post itself wasn’t huge, but the question was dead-on: they didn’t need more ideas from ChatGPT. They needed an agent that could run more of the workflow.&lt;/p&gt;

&lt;p&gt;That’s the important distinction.&lt;/p&gt;

&lt;p&gt;Most people say they want AI for creativity.&lt;br&gt;
What they actually want is a repeatable pipeline that turns vague inputs into reviewable deliverables.&lt;/p&gt;
&lt;h2&gt;
  
  
  The real gap is not ideation
&lt;/h2&gt;

&lt;p&gt;ChatGPT-style brainstorming feels productive because it gives you instant output.&lt;/p&gt;

&lt;p&gt;Ask for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;10 product concepts&lt;/li&gt;
&lt;li&gt;a seasonal moodboard direction&lt;/li&gt;
&lt;li&gt;prompt ideas for image generation&lt;/li&gt;
&lt;li&gt;a trend summary from TikTok or Pinterest&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’ll get something useful.&lt;/p&gt;

&lt;p&gt;But then you still have to do the annoying part:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;check constraints&lt;/li&gt;
&lt;li&gt;create multiple directions&lt;/li&gt;
&lt;li&gt;name files&lt;/li&gt;
&lt;li&gt;sort references&lt;/li&gt;
&lt;li&gt;save assets somewhere sane&lt;/li&gt;
&lt;li&gt;hand the work to a human&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not “creative chatting.”&lt;br&gt;
That is orchestration.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chat-based brainstorming&lt;/th&gt;
&lt;th&gt;Agent pipeline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Output is mostly ideas&lt;/td&gt;
&lt;td&gt;Output is structured deliverables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State lives in one long conversation&lt;/td&gt;
&lt;td&gt;State lives in tasks, folders, and records&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human role is ad hoc prompting&lt;/td&gt;
&lt;td&gt;Human role is explicit approval&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If a workflow repeats, the answer is usually not “write a better mega-prompt.”&lt;/p&gt;

&lt;p&gt;It’s:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;break the work into stages&lt;/li&gt;
&lt;li&gt;assign the right model to each stage&lt;/li&gt;
&lt;li&gt;make handoffs explicit&lt;/li&gt;
&lt;li&gt;keep a human in the loop&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Why one model keeps disappointing you
&lt;/h2&gt;

&lt;p&gt;Because you’re asking one model to be all of these at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;trend researcher&lt;/li&gt;
&lt;li&gt;creative director&lt;/li&gt;
&lt;li&gt;manufacturing sanity checker&lt;/li&gt;
&lt;li&gt;image prompt writer&lt;/li&gt;
&lt;li&gt;file organizer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not a prompt problem.&lt;br&gt;
That’s bad staffing.&lt;/p&gt;

&lt;p&gt;The useful setup here is model-specific routing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grok for trend search and intake&lt;/li&gt;
&lt;li&gt;Claude Opus for creative reasoning and brief writing&lt;/li&gt;
&lt;li&gt;GPT-5-class image models for mockups&lt;/li&gt;
&lt;li&gt;n8n or Make for storage, naming, and handoff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A general-purpose model can fake this.&lt;br&gt;
It just tends to do it unevenly and expensively.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Single-model workflow&lt;/th&gt;
&lt;th&gt;Routed workflow&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One model handles every task&lt;/td&gt;
&lt;td&gt;Each task gets a model that fits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failures are vague&lt;/td&gt;
&lt;td&gt;Failures are isolated by stage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Easy to prototype&lt;/td&gt;
&lt;td&gt;Easier to operate repeatedly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expensive if every step uses the best model&lt;/td&gt;
&lt;td&gt;Cheaper when cheap steps stay cheap&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  My favorite split for this kind of workflow
&lt;/h2&gt;

&lt;p&gt;If I were building this today, I’d split responsibilities like this:&lt;/p&gt;
&lt;h3&gt;
  
  
  1) Grok for trend intake
&lt;/h3&gt;

&lt;p&gt;Use Grok when the task is web-heavy and signal-oriented.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scrape current aesthetic trends&lt;/li&gt;
&lt;li&gt;summarize competitor launches&lt;/li&gt;
&lt;li&gt;collect references from Pinterest/TikTok/blogs&lt;/li&gt;
&lt;li&gt;cluster repeated motifs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2) Claude Opus for reasoning and brief writing
&lt;/h3&gt;

&lt;p&gt;Use Claude Opus when the task needs taste, synthesis, and contradiction detection.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;turn trend data into a coherent brief&lt;/li&gt;
&lt;li&gt;identify conflicts like “minimalist but highly ornate”&lt;/li&gt;
&lt;li&gt;map concepts to customer segment or price point&lt;/li&gt;
&lt;li&gt;produce a human-reviewable summary&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  3) GPT-5-class image model for visual exploration
&lt;/h3&gt;

&lt;p&gt;Use image generation only after the brief is approved.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate prompt variants&lt;/li&gt;
&lt;li&gt;produce mockups for 3-5 directions&lt;/li&gt;
&lt;li&gt;create image batches for review&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  4) n8n or Make for the boring grown-up work
&lt;/h3&gt;

&lt;p&gt;This is where a lot of agent demos fall apart.&lt;/p&gt;

&lt;p&gt;You still need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;file naming&lt;/li&gt;
&lt;li&gt;folder creation&lt;/li&gt;
&lt;li&gt;Airtable or Notion updates&lt;/li&gt;
&lt;li&gt;Google Drive uploads&lt;/li&gt;
&lt;li&gt;Slack notifications&lt;/li&gt;
&lt;li&gt;review gates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is n8n/Make territory, not “just ask the LLM nicely” territory.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the pipeline actually looks like
&lt;/h2&gt;

&lt;p&gt;Here’s the version I’d actually ship.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;main agent
  -&amp;gt; trend search agent (Grok)
  -&amp;gt; brief writer agent (Claude Opus)
  -&amp;gt; constraint checker
  -&amp;gt; image prompt generator
  -&amp;gt; mockup generator (GPT-5-class image model)
  -&amp;gt; output aggregator
  -&amp;gt; n8n/Make workflow for storage and handoff
  -&amp;gt; human approval
  -&amp;gt; optional second pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here’s a more concrete JSON-style representation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"workflow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trend_search"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"grok"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trend_summary.json"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"brief_generation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trend_summary.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"creative_brief.md"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"constraint_check"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"creative_brief.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"constraints.md"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mockup_generation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-5-image"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"creative_brief.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"constraints.md"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mockups/"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"handoff"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"n8n"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google_drive + airtable + slack_review"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  OpenClaw for agent loops, n8n for production plumbing
&lt;/h2&gt;

&lt;p&gt;I like OpenClaw for agent delegation and multi-step reasoning.&lt;/p&gt;

&lt;p&gt;I like n8n and Make for deterministic business-process work.&lt;/p&gt;

&lt;p&gt;That split matters.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;OpenClaw-style agent setup&lt;/th&gt;
&lt;th&gt;n8n or Make automation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best for iterative agent behavior&lt;/td&gt;
&lt;td&gt;Best for explicit workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Good at task delegation&lt;/td&gt;
&lt;td&gt;Good at connectors and state transitions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Great for experimentation&lt;/td&gt;
&lt;td&gt;Better for production handoff&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you try to force OpenClaw to do everything, you end up rebuilding workflow automation badly.&lt;/p&gt;

&lt;p&gt;If you try to force n8n to do all the reasoning, you end up with a brittle maze of prompts.&lt;/p&gt;

&lt;p&gt;Use each tool for what it’s good at.&lt;/p&gt;

&lt;h2&gt;
  
  
  The human has to be in the diagram
&lt;/h2&gt;

&lt;p&gt;This part gets skipped in a lot of “autonomous agent” posts.&lt;/p&gt;

&lt;p&gt;Creative workflows need approval points.&lt;/p&gt;

&lt;p&gt;A human still has to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this trend relevant to our customer?&lt;/li&gt;
&lt;li&gt;Does this fit the brand?&lt;/li&gt;
&lt;li&gt;Is this manufacturable?&lt;/li&gt;
&lt;li&gt;Which direction deserves another round?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you remove that step, you don’t get autonomy.&lt;br&gt;
You get polished nonsense at scale.&lt;/p&gt;

&lt;p&gt;The right output is not “final design.”&lt;br&gt;
The right output is a clean review package.&lt;/p&gt;

&lt;p&gt;Something like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;trend summary&lt;/li&gt;
&lt;li&gt;design brief&lt;/li&gt;
&lt;li&gt;constraint check&lt;/li&gt;
&lt;li&gt;prompt set&lt;/li&gt;
&lt;li&gt;mockup batch&lt;/li&gt;
&lt;li&gt;organized assets&lt;/li&gt;
&lt;li&gt;human decision&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step is not failure.&lt;br&gt;
That’s the product.&lt;/p&gt;
&lt;h2&gt;
  
  
  The cost problem is real
&lt;/h2&gt;

&lt;p&gt;This kind of workflow is iterative by default.&lt;/p&gt;

&lt;p&gt;That means cost can explode if every stage uses the most expensive model.&lt;/p&gt;

&lt;p&gt;And this is exactly where teams building agents start feeling token anxiety:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;every retry costs money&lt;/li&gt;
&lt;li&gt;every branch costs money&lt;/li&gt;
&lt;li&gt;every background run costs money&lt;/li&gt;
&lt;li&gt;every automation becomes something you have to monitor financially&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cheap steps should stay cheap.&lt;br&gt;
Expensive models should be reserved for the places where quality actually matters.&lt;/p&gt;

&lt;p&gt;A sane routing pattern looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cheap/local model for classification, labeling, cleanup&lt;/li&gt;
&lt;li&gt;mid-tier model for standard agent tasks&lt;/li&gt;
&lt;li&gt;premium model for synthesis, judgment, or final review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That principle matters more than the exact vendor lineup.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: a practical implementation sketch
&lt;/h2&gt;

&lt;p&gt;Here’s a very stripped-down Python example showing stage routing through an OpenAI-compatible client.&lt;/p&gt;

&lt;p&gt;If you’re using Standard Compute, the point is that you can keep the OpenAI-compatible API shape while routing workloads across different models without redesigning your entire app around per-token cost paranoia.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.standardcompute.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_STANDARD_COMPUTE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_trend_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grok-4.20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find current trend signals and summarize them.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_brief&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trend_summary&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4.6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Turn trend research into a concise creative brief with constraints.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;trend_summary&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_mockup_prompts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;brief&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5.4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate image prompts for 4 distinct visual directions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;brief&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if you want to test the API with curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.standardcompute.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$STANDARD_COMPUTE_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "claude-opus-4.6",
    "messages": [
      {"role": "system", "content": "Write a creative brief from trend research."},
      {"role": "user", "content": "Summer jewelry trends: coastal textures, shell forms, brushed silver, soft asymmetry."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters for developers because the best routing strategy is often operationally annoying under normal per-token pricing.&lt;/p&gt;

&lt;p&gt;If your workflow runs every day across n8n, Make, Zapier, OpenClaw, or custom agents, cost predictability becomes part of system design, not just finance.&lt;/p&gt;

&lt;p&gt;That’s the part a lot of AI blog posts skip.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to automate first
&lt;/h2&gt;

&lt;p&gt;Not image generation.&lt;/p&gt;

&lt;p&gt;That’s the flashy trap.&lt;/p&gt;

&lt;p&gt;Start with trend intake and brief generation.&lt;/p&gt;

&lt;p&gt;Why?&lt;br&gt;
Because consistency starts upstream.&lt;/p&gt;

&lt;p&gt;If your inputs are messy, your mockups will just be messy faster and more expensively.&lt;/p&gt;

&lt;p&gt;This is the order I’d use:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;scheduled trend search via Grok or OpenClaw search&lt;/li&gt;
&lt;li&gt;brief generation via Claude Opus&lt;/li&gt;
&lt;li&gt;constraint check against real-world limitations&lt;/li&gt;
&lt;li&gt;prompt set generation for multiple directions&lt;/li&gt;
&lt;li&gt;mockup generation with a GPT-5-class image model&lt;/li&gt;
&lt;li&gt;asset organization in Google Drive, Airtable, or Notion via n8n/Make&lt;/li&gt;
&lt;li&gt;human review gate before second-round exploration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is much less magical than “AI designs my product line.”&lt;/p&gt;

&lt;p&gt;It is also the version that survives contact with production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for developers building agents
&lt;/h2&gt;

&lt;p&gt;The lesson here is bigger than jewelry or design workflows.&lt;/p&gt;

&lt;p&gt;If you’re building AI agents for any repeatable business process, the pattern is the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one model is rarely the best worker for every job&lt;/li&gt;
&lt;li&gt;routing beats mega-prompts&lt;/li&gt;
&lt;li&gt;explicit handoffs beat giant chat histories&lt;/li&gt;
&lt;li&gt;human approval beats fake autonomy&lt;/li&gt;
&lt;li&gt;predictable cost matters if the workflow runs constantly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point is why products like Standard Compute are interesting for agent builders.&lt;/p&gt;

&lt;p&gt;If you’re wiring together OpenClaw, n8n, Make, Zapier, or your own background workers, the hard part is not just getting good outputs.&lt;/p&gt;

&lt;p&gt;It’s getting good outputs repeatedly without turning every automation into a billing event you have to babysit.&lt;/p&gt;

&lt;p&gt;Unlimited AI compute with an OpenAI-compatible API is not just a pricing trick.&lt;br&gt;
It changes what kinds of multi-step agent workflows are practical to run all day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;The useful creative assistant is not the one that gives you more ideas.&lt;/p&gt;

&lt;p&gt;It’s the one that shows up tomorrow with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;research already collected&lt;/li&gt;
&lt;li&gt;a brief already written&lt;/li&gt;
&lt;li&gt;mockups already grouped&lt;/li&gt;
&lt;li&gt;assets already organized&lt;/li&gt;
&lt;li&gt;a clear place for a human to say yes or no&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not better prompting.&lt;br&gt;
That’s better routing.&lt;/p&gt;

&lt;p&gt;And honestly, once you see the difference, it’s hard to go back to one giant chat window pretending to be a workflow.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>automation</category>
      <category>devops</category>
    </item>
    <item>
      <title>I think the best OpenAI API alternative for customer email is a 4-step draft workflow, not an “AI employee”</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Wed, 10 Jun 2026 09:38:26 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-think-the-best-openai-api-alternative-for-customer-email-is-a-4-step-draft-workflow-not-an-ai-e86</link>
      <guid>https://dev.to/lars_winstand/i-think-the-best-openai-api-alternative-for-customer-email-is-a-4-step-draft-workflow-not-an-ai-e86</guid>
      <description>&lt;p&gt;I clicked a Reddit thread because the title was so bad I assumed the post would be useless.&lt;/p&gt;

&lt;p&gt;It was basically: fire your staff, replace them with OpenClaw.&lt;/p&gt;

&lt;p&gt;The post had a score of 0. The comments were roasting it. Fair enough.&lt;/p&gt;

&lt;p&gt;But buried inside the worst possible framing was a genuinely solid pattern for customer email automation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read inbound email&lt;/li&gt;
&lt;li&gt;fetch live pricing / order / inventory data&lt;/li&gt;
&lt;li&gt;draft a reply&lt;/li&gt;
&lt;li&gt;escalate weird cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;Not an AI employee.&lt;br&gt;
Not a digital teammate.&lt;br&gt;
Not a fake support rep with “human-level reasoning.”&lt;/p&gt;

&lt;p&gt;Just a narrow workflow with live system access.&lt;/p&gt;

&lt;p&gt;And honestly, that’s the part a lot of teams miss.&lt;/p&gt;

&lt;p&gt;If you’re building support automation, the useful unit is usually not “replace support.” It’s “automate the boring 30% safely.”&lt;/p&gt;
&lt;h2&gt;
  
  
  The useful idea: shrink the task surface
&lt;/h2&gt;

&lt;p&gt;The line from the thread that actually mattered was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The hard part is the employee has to look up our system for product pricing, orders, inventory, etc. Now OpenClaw can do all of that with CLI and MCP.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s the whole game.&lt;/p&gt;

&lt;p&gt;The breakthrough is not that OpenClaw became an employee.&lt;br&gt;
The breakthrough is that someone reduced the job to tasks agents can actually do reliably:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classify intent&lt;/li&gt;
&lt;li&gt;look up order status&lt;/li&gt;
&lt;li&gt;look up pricing&lt;/li&gt;
&lt;li&gt;check inventory&lt;/li&gt;
&lt;li&gt;draft a response&lt;/li&gt;
&lt;li&gt;hand off when confidence is low&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That maps cleanly to function calling and MCP.&lt;/p&gt;

&lt;p&gt;It does not map cleanly to “handle customer relationships like a human.”&lt;/p&gt;

&lt;p&gt;That second version is how you get hallucinated discounts, fake shipping updates, and apology emails about orders that never existed.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why MCP makes this much more real
&lt;/h2&gt;

&lt;p&gt;For support, the big failure mode is obvious: the model makes stuff up.&lt;/p&gt;

&lt;p&gt;Pricing is not in the model.&lt;br&gt;
Your ERP is not in the model.&lt;br&gt;
Today’s inventory count is definitely not in the model.&lt;/p&gt;

&lt;p&gt;MCP and function calls fix that by letting the model ask your systems directly.&lt;/p&gt;

&lt;p&gt;For a customer email workflow, that means the model can do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;read the email&lt;/li&gt;
&lt;li&gt;decide which tools it needs&lt;/li&gt;
&lt;li&gt;call &lt;code&gt;get_order_status(order_id)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;call &lt;code&gt;lookup_price(sku)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;write a draft using actual data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is a completely different reliability profile from “answer from prompt context and vibes.”&lt;/p&gt;
&lt;h2&gt;
  
  
  The safest version does not send anything
&lt;/h2&gt;

&lt;p&gt;This is the rollout pattern I’d recommend to almost everyone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inbound email comes in&lt;/li&gt;
&lt;li&gt;model classifies it&lt;/li&gt;
&lt;li&gt;tools fetch live data&lt;/li&gt;
&lt;li&gt;model writes a Gmail draft&lt;/li&gt;
&lt;li&gt;human reviews and sends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stopping at draft creation is the key move.&lt;/p&gt;

&lt;p&gt;It turns a risky automation project into a review workflow.&lt;/p&gt;

&lt;p&gt;That gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;side-by-side comparison with human replies&lt;/li&gt;
&lt;li&gt;an audit trail&lt;/li&gt;
&lt;li&gt;a way to measure accuracy before auto-send&lt;/li&gt;
&lt;li&gt;a clean path to enable automation only for low-risk categories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you skip this phase and go straight to autonomous sending, you’re basically volunteering to debug trust in production.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical architecture
&lt;/h2&gt;

&lt;p&gt;Here’s the version I’d actually build.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Classify&lt;/td&gt;
&lt;td&gt;Detect intent: order status, pricing question, refund request, inventory check, escalation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Retrieve&lt;/td&gt;
&lt;td&gt;Call Shopify, NetSuite, Postgres, internal APIs, or CLI tools via MCP/function calling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Draft&lt;/td&gt;
&lt;td&gt;Generate a Gmail draft with the retrieved facts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Review / send&lt;/td&gt;
&lt;td&gt;Human approves at first; later auto-send only for safe categories&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That’s boring.&lt;/p&gt;

&lt;p&gt;Which is why it works.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: tool-based support drafting
&lt;/h2&gt;

&lt;p&gt;If you’re using an OpenAI-compatible API, the call shape is straightforward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-5.4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_order_status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Return shipping and fulfillment status for an order"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lookup_price"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Return current pricing for a SKU"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"check_inventory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Return current inventory for a SKU"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Customer asks whether order 18422 has shipped and whether SKU-A13 is in stock. Draft a reply."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is not the exact model name.&lt;/p&gt;

&lt;p&gt;The important part is the contract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tools return structured facts&lt;/li&gt;
&lt;li&gt;the model drafts from those facts&lt;/li&gt;
&lt;li&gt;the workflow decides whether to escalate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Example function contracts
&lt;/h2&gt;

&lt;p&gt;This is the level of specificity I’d use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;OrderStatusArgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;OrderStatusResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;fulfillmentStatus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;unfulfilled&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;partial&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fulfilled&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;shipmentStatus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;not_shipped&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;in_transit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;delivered&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;trackingNumber&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;estimatedDelivery&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;PriceLookupArgs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;customerTier&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;PriceLookupResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;currency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;discountApplied&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your tools return fuzzy text blobs, your drafts will be fuzzy too.&lt;/p&gt;

&lt;p&gt;If your tools return strict structured data, your support pipeline gets much easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gmail draft creation is underrated
&lt;/h2&gt;

&lt;p&gt;A lot of teams think support automation has to be all-or-nothing.&lt;/p&gt;

&lt;p&gt;It doesn’t.&lt;/p&gt;

&lt;p&gt;You can create drafts and keep humans in the loop while you evaluate quality.&lt;/p&gt;

&lt;p&gt;Minimal Node example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;googleapis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gmail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gmail&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;toBase64Url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/=+$/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createDraft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;`To: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;`Subject: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type: text/plain; charset=utf-8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;body&lt;/span&gt;
  &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\r\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;toBase64Url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mime&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gmail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;drafts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;me&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;requestBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one implementation detail changes the rollout strategy completely.&lt;/p&gt;

&lt;h2&gt;
  
  
  A local service shape I’d actually ship
&lt;/h2&gt;

&lt;p&gt;If I were wiring this up quickly, I’d split it into three services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;support-router/
  src/classify.ts
  src/policy.ts
mcp-tools/
  src/order-status.ts
  src/pricing.ts
  src/inventory.ts
gmail-drafter/
  src/create-draft.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the pipeline becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;incoming email -&amp;gt; classify -&amp;gt; call tools -&amp;gt; generate draft -&amp;gt; human review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;incoming email -&amp;gt; classify -&amp;gt; call tools -&amp;gt; auto-send &lt;span class="k"&gt;for &lt;/span&gt;safe intents -&amp;gt; escalate everything &lt;span class="k"&gt;else&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I would automate first
&lt;/h2&gt;

&lt;p&gt;I would start with the lowest-risk, highest-repeat categories.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Good candidate?&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Order status&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Usually structured, easy to verify, low ambiguity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inventory check&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Pull from a source of truth and answer directly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Basic pricing&lt;/td&gt;
&lt;td&gt;Yes, with guardrails&lt;/td&gt;
&lt;td&gt;Fine if pricing rules are clean and customer-specific exceptions are handled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refund disputes&lt;/td&gt;
&lt;td&gt;Not first&lt;/td&gt;
&lt;td&gt;Higher risk, policy-heavy, emotional context matters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wholesale account issues&lt;/td&gt;
&lt;td&gt;Not first&lt;/td&gt;
&lt;td&gt;Contract terms and negotiated pricing create failure risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Angry escalation emails&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Tone and judgment matter more than speed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is the part that gets lost in the “AI employee” pitch.&lt;/p&gt;

&lt;p&gt;Support is not one task.&lt;br&gt;
It’s a pile of tasks with very different risk levels.&lt;/p&gt;

&lt;p&gt;Treating them all the same is a design mistake.&lt;/p&gt;

&lt;h2&gt;
  
  
  The big vendors already chose the boring answer
&lt;/h2&gt;

&lt;p&gt;This is the funny part.&lt;/p&gt;

&lt;p&gt;If you listen to the loudest AI people online, everyone is building autonomous workers.&lt;/p&gt;

&lt;p&gt;If you look at what Intercom and Zendesk actually sell, they’re mostly building scoped support systems with grounding, simulation, and escalation.&lt;/p&gt;

&lt;p&gt;That tells you a lot.&lt;/p&gt;

&lt;p&gt;The market already voted.&lt;/p&gt;

&lt;p&gt;The winning pattern is not “general AI employee.”&lt;br&gt;
It’s “tight workflow with live data and handoff.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost is where DIY gets weird fast
&lt;/h2&gt;

&lt;p&gt;This is also where a lot of agent projects fall apart.&lt;/p&gt;

&lt;p&gt;Support automation sounds cheap until every step hits the most expensive model.&lt;/p&gt;

&lt;p&gt;A sane pipeline uses different models for different jobs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cheap model for intent classification&lt;/li&gt;
&lt;li&gt;retrieval/tool layer for facts&lt;/li&gt;
&lt;li&gt;stronger model for customer-facing drafting&lt;/li&gt;
&lt;li&gt;human review for edge cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s much better than using one giant premium model for every token of every email.&lt;/p&gt;

&lt;p&gt;And this is exactly why OpenAI-compatible routing matters.&lt;/p&gt;

&lt;p&gt;If your code already talks to an OpenAI-style API, you can swap providers or route between models without rebuilding the whole stack.&lt;/p&gt;

&lt;p&gt;For teams running automations all day in n8n, Make, Zapier, OpenClaw, or custom workers, that flexibility matters a lot.&lt;/p&gt;

&lt;p&gt;Per-token billing punishes experimentation.&lt;br&gt;
It also punishes long-running agent workflows with lots of intermediate steps.&lt;/p&gt;

&lt;p&gt;That’s one reason Standard Compute is interesting here: it gives you an OpenAI-compatible endpoint with flat monthly pricing, so you can run this kind of multi-step workflow without doing token math every five minutes.&lt;/p&gt;

&lt;p&gt;That is a much better fit for agent pipelines than treating every classification, tool call, retry, and draft pass like a billing event you need to babysit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I’d build with Standard Compute
&lt;/h2&gt;

&lt;p&gt;If I were implementing this stack today with Standard Compute, I’d do something like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep my existing OpenAI SDK client&lt;/li&gt;
&lt;li&gt;point it at Standard Compute’s API&lt;/li&gt;
&lt;li&gt;route cheap classification to a smaller model&lt;/li&gt;
&lt;li&gt;route nuanced draft generation to GPT-5.4 or Claude Opus 4.6&lt;/li&gt;
&lt;li&gt;keep Gmail drafts as the default output&lt;/li&gt;
&lt;li&gt;only auto-send after weeks of parallel review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;predictable monthly cost&lt;/li&gt;
&lt;li&gt;freedom to test routing strategies&lt;/li&gt;
&lt;li&gt;no per-token anxiety while iterating&lt;/li&gt;
&lt;li&gt;compatibility with existing agent/automation code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For support workflows, that’s a real advantage because the architecture is inherently multi-step.&lt;/p&gt;

&lt;h2&gt;
  
  
  A rollout plan that won’t blow up trust
&lt;/h2&gt;

&lt;p&gt;If you want this to work in production, I’d do it in phases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: draft only
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;classify inbound email&lt;/li&gt;
&lt;li&gt;fetch data from systems of record&lt;/li&gt;
&lt;li&gt;create Gmail drafts&lt;/li&gt;
&lt;li&gt;compare draft quality with human replies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: auto-send only for safest intents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;order status&lt;/li&gt;
&lt;li&gt;inventory checks&lt;/li&gt;
&lt;li&gt;simple pricing questions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 3: confidence-based routing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;auto-send only when tool outputs are complete and confidence is high&lt;/li&gt;
&lt;li&gt;escalate everything else&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 4: continuous evaluation
&lt;/h3&gt;

&lt;p&gt;Track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;draft acceptance rate&lt;/li&gt;
&lt;li&gt;human edit distance&lt;/li&gt;
&lt;li&gt;escalation rate&lt;/li&gt;
&lt;li&gt;incorrect factual statements&lt;/li&gt;
&lt;li&gt;customer satisfaction by category&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you don’t have those metrics, you’re not operating a support automation system.&lt;br&gt;
You’re just hoping.&lt;/p&gt;

&lt;h2&gt;
  
  
  The main thing I think people get wrong
&lt;/h2&gt;

&lt;p&gt;The mistake is trying to automate all of support at once.&lt;/p&gt;

&lt;p&gt;That usually means automating trust away.&lt;/p&gt;

&lt;p&gt;A customer asking whether order 18422 shipped is not the same problem as a wholesale buyer disputing negotiated pricing.&lt;/p&gt;

&lt;p&gt;One is a retrieval problem.&lt;br&gt;
The other is a judgment problem.&lt;/p&gt;

&lt;p&gt;Good agent systems respect that difference.&lt;br&gt;
Bad ones flatten everything into “the model will handle it.”&lt;/p&gt;

&lt;p&gt;It won’t. Not reliably.&lt;/p&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;The Reddit post had terrible framing.&lt;/p&gt;

&lt;p&gt;But the implementation idea inside it was good.&lt;/p&gt;

&lt;p&gt;The best OpenAI API alternative setup for customer email is usually not a full AI employee.&lt;br&gt;
It’s a bounded workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read inbound email&lt;/li&gt;
&lt;li&gt;fetch live data through MCP or function calls&lt;/li&gt;
&lt;li&gt;draft the reply&lt;/li&gt;
&lt;li&gt;escalate edge cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is much less glamorous than “replace your team.”&lt;/p&gt;

&lt;p&gt;It is also much closer to something I’d actually trust in production.&lt;/p&gt;

&lt;p&gt;And if you’re running this as an always-on automation, predictable cost matters almost as much as model quality.&lt;/p&gt;

&lt;p&gt;That’s why the combo I like is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;narrow workflow design&lt;/li&gt;
&lt;li&gt;OpenAI-compatible API calls&lt;/li&gt;
&lt;li&gt;multi-model routing&lt;/li&gt;
&lt;li&gt;draft-first rollout&lt;/li&gt;
&lt;li&gt;flat-cost infrastructure like Standard Compute for the actual agent runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not a flashy story.&lt;/p&gt;

&lt;p&gt;It’s just the version that survives contact with real support operations.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>automation</category>
      <category>developers</category>
    </item>
    <item>
      <title>I looked into OpenAI OAuth for OpenClaw, and the scary part isn’t what most people think</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Tue, 09 Jun 2026 17:36:26 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-looked-into-openai-oauth-for-openclaw-and-the-scary-part-isnt-what-most-people-think-39d</link>
      <guid>https://dev.to/lars_winstand/i-looked-into-openai-oauth-for-openclaw-and-the-scary-part-isnt-what-most-people-think-39d</guid>
      <description>&lt;p&gt;A lot of people hit the OpenAI OAuth screen in OpenClaw and immediately assume the worst.&lt;/p&gt;

&lt;p&gt;Reasonable reaction, honestly.&lt;/p&gt;

&lt;p&gt;You see a consent flow, you’re about to wire an agent into real workflows, and your brain jumps straight to: did I just give this thing access to my files, my MCP servers, my apps, my browser, my entire digital life?&lt;/p&gt;

&lt;p&gt;I went down that rabbit hole after seeing a thread on r/openclaw where someone asked basically that exact question: is OpenClaw getting access to everything if I sign in with OpenAI?&lt;/p&gt;

&lt;p&gt;Short version: usually no.&lt;/p&gt;

&lt;p&gt;But the actual problem is still bad enough that teams should care.&lt;/p&gt;

&lt;p&gt;The big mistake is not “OAuth means OpenClaw owns everything.” The big mistake is tying a production agent to a human being’s personal OpenAI identity, billing, and permissions.&lt;/p&gt;

&lt;p&gt;That’s the part I think people are underestimating.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fear is pointed at the wrong layer
&lt;/h2&gt;

&lt;p&gt;When people say “I authenticated OpenClaw with OpenAI,” they often collapse multiple permission layers into one blob.&lt;/p&gt;

&lt;p&gt;That blob usually includes assumptions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw can now access every MCP server I’ve ever configured&lt;/li&gt;
&lt;li&gt;OpenClaw can read local files automatically&lt;/li&gt;
&lt;li&gt;OpenClaw inherits every SaaS connector tied to my account&lt;/li&gt;
&lt;li&gt;OpenClaw can use whatever browser sessions or tools I already have&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s usually not how this works.&lt;/p&gt;

&lt;p&gt;OpenAI OAuth is generally about OpenAI-side identity and model access.&lt;/p&gt;

&lt;p&gt;Tool access is a separate layer.&lt;/p&gt;

&lt;p&gt;If OpenClaw can call a filesystem tool, a browser tool, or an MCP connector, that access usually comes from that tool being configured, exposed, and approved separately.&lt;/p&gt;

&lt;p&gt;So if your concern is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Does signing into OpenAI automatically expose my whole machine and all my connectors?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In most setups, no.&lt;/p&gt;

&lt;p&gt;That’s the good news.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OpenAI OAuth in OpenClaw usually gives you
&lt;/h2&gt;

&lt;p&gt;In practice, the OAuth flow is usually about some combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your OpenAI identity&lt;/li&gt;
&lt;li&gt;access to a project or account context&lt;/li&gt;
&lt;li&gt;model usage under that account&lt;/li&gt;
&lt;li&gt;billing and rate-limit behavior attached to that account or project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it usually does &lt;em&gt;not&lt;/em&gt; do by itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;grant local filesystem access&lt;/li&gt;
&lt;li&gt;expose third-party SaaS apps&lt;/li&gt;
&lt;li&gt;inherit every MCP server automatically&lt;/li&gt;
&lt;li&gt;unlock browser sessions&lt;/li&gt;
&lt;li&gt;bypass tool-specific approval layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction matters a lot.&lt;/p&gt;

&lt;p&gt;Model access is one thing.&lt;/p&gt;

&lt;p&gt;Tool execution is another thing.&lt;/p&gt;

&lt;p&gt;If you’re debugging OpenClaw permissions, think in layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Who is paying for and authorizing model calls?&lt;/li&gt;
&lt;li&gt;What tools are exposed to the agent?&lt;/li&gt;
&lt;li&gt;What approvals or guardrails exist for those tools?&lt;/li&gt;
&lt;li&gt;What environment is the agent actually running in?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s a way better mental model than “OAuth = everything.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem: personal identity in production
&lt;/h2&gt;

&lt;p&gt;Even if the OAuth scope is narrower than people fear, using your personal OpenAI account for a production agent is still bad architecture.&lt;/p&gt;

&lt;p&gt;Full stop.&lt;/p&gt;

&lt;p&gt;Here’s why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;billing is tied to one person&lt;/li&gt;
&lt;li&gt;usage is mixed with personal experimentation&lt;/li&gt;
&lt;li&gt;rate limits are tied to the wrong identity&lt;/li&gt;
&lt;li&gt;offboarding gets ugly&lt;/li&gt;
&lt;li&gt;password changes can break automation&lt;/li&gt;
&lt;li&gt;ownership is ambiguous when something fails&lt;/li&gt;
&lt;li&gt;debugging becomes political because one person’s account is now infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the actual scary part.&lt;/p&gt;

&lt;p&gt;Not magical permission expansion.&lt;/p&gt;

&lt;p&gt;Operational fragility.&lt;/p&gt;

&lt;p&gt;If your OpenClaw agent is doing customer support, internal research, lead enrichment, ticket triage, or background automations, and it depends on &lt;code&gt;alice@company.com&lt;/code&gt; logging into OpenAI, you don’t have a production system.&lt;/p&gt;

&lt;p&gt;You have a prototype that survived long enough to become a liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick smell test
&lt;/h2&gt;

&lt;p&gt;If any of these are true, your setup is probably wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;only one employee understands how the agent is authenticated&lt;/li&gt;
&lt;li&gt;the agent breaks if that employee leaves&lt;/li&gt;
&lt;li&gt;billing shows up on someone’s personal card or personal workspace&lt;/li&gt;
&lt;li&gt;nobody can separate test usage from production usage&lt;/li&gt;
&lt;li&gt;you’re afraid to rotate credentials because you’re not sure what else will break&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not an OAuth problem.&lt;/p&gt;

&lt;p&gt;That’s an ownership problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What teams should use instead
&lt;/h2&gt;

&lt;p&gt;If you’re staying on OpenAI directly, use team-grade primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;projects&lt;/li&gt;
&lt;li&gt;scoped API keys&lt;/li&gt;
&lt;li&gt;service accounts where possible&lt;/li&gt;
&lt;li&gt;budgets&lt;/li&gt;
&lt;li&gt;rate limits&lt;/li&gt;
&lt;li&gt;restricted permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenAI projects already give you better control than “just sign in with my account and ship it.”&lt;/p&gt;

&lt;p&gt;A sane setup looks more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# bad: one person's personal environment variable on a laptop&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-personal-key-from-someone's-account"&lt;/span&gt;

&lt;span class="c"&gt;# better: project-scoped credential managed for the workflow&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"sk-project-scoped-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if you’re deploying OpenClaw in containers or automation runners, the credential should belong to the workload, not to a human.&lt;/p&gt;

&lt;p&gt;Example with Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;openclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/example/openclaw:latest&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${OPENAI_API_KEY}&lt;/span&gt;
      &lt;span class="na"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${OPENAI_BASE_URL}&lt;/span&gt;
    &lt;span class="na"&gt;env_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=sc_your_workflow_credential_here
OPENAI_BASE_URL=https://api.standardcompute.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the architecture you want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;credential belongs to the workflow&lt;/li&gt;
&lt;li&gt;environment owns the secret&lt;/li&gt;
&lt;li&gt;swapping providers doesn’t require app rewrites&lt;/li&gt;
&lt;li&gt;one employee’s account is no longer your production dependency&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why cost structure matters more than people admit
&lt;/h2&gt;

&lt;p&gt;There’s a second problem hiding behind the identity issue.&lt;/p&gt;

&lt;p&gt;Per-token billing is a bad fit for agents.&lt;/p&gt;

&lt;p&gt;It’s tolerable for occasional chat.&lt;/p&gt;

&lt;p&gt;It gets weird fast when you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;background loops&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;tool-calling chains&lt;/li&gt;
&lt;li&gt;multi-step reasoning&lt;/li&gt;
&lt;li&gt;scheduled automations&lt;/li&gt;
&lt;li&gt;multiple agents running 24/7&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once an agent becomes useful, it starts consuming more tokens. Success becomes a billing event.&lt;/p&gt;

&lt;p&gt;That’s backwards.&lt;/p&gt;

&lt;p&gt;Developers running OpenClaw, n8n, Make, Zapier, OpenClaw, or custom agent stacks usually don’t want to babysit token burn all day. They want a system they can leave running.&lt;/p&gt;

&lt;p&gt;That’s why predictable monthly pricing is operationally cleaner than metered token pricing for automation-heavy workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup I’d actually recommend
&lt;/h2&gt;

&lt;p&gt;If you’re running real automations, I’d split the problem into two decisions:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Fix identity
&lt;/h3&gt;

&lt;p&gt;Use a separate agent or workflow credential.&lt;/p&gt;

&lt;p&gt;Not a personal OpenAI login.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Fix billing
&lt;/h3&gt;

&lt;p&gt;Use an OpenAI-compatible backend with predictable pricing if the workload is continuous.&lt;/p&gt;

&lt;p&gt;That’s where Standard Compute fits well.&lt;/p&gt;

&lt;p&gt;It gives you an OpenAI-compatible endpoint, so existing SDKs and HTTP clients still work, but you’re not tying production agents to per-token billing spikes.&lt;/p&gt;

&lt;p&gt;For teams using OpenClaw, n8n, Make, Zapier, or custom automations, that matters a lot.&lt;/p&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;flat monthly pricing&lt;/li&gt;
&lt;li&gt;no per-token billing anxiety&lt;/li&gt;
&lt;li&gt;credentials owned by the workflow, not a person&lt;/li&gt;
&lt;li&gt;compatibility with existing OpenAI client code&lt;/li&gt;
&lt;li&gt;better fit for always-on agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A minimal Python example looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sc_your_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.standardcompute.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful agent.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize the last 10 support tickets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the curl version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.standardcompute.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$STANDARD_COMPUTE_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Summarize these support tickets"}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the kind of boring infrastructure choice I like.&lt;/p&gt;

&lt;p&gt;Boring is good.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical checklist for OpenClaw users
&lt;/h2&gt;

&lt;p&gt;If you’re evaluating your current setup, here’s the checklist I’d use.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Good answer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Who owns the credential?&lt;/td&gt;
&lt;td&gt;The team or workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Who pays for usage?&lt;/td&gt;
&lt;td&gt;A shared project or platform account&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can one employee leaving break the agent?&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Can you rotate the key safely?&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Are tool permissions separate from model auth?&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Are costs predictable for 24/7 automation?&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you answer “one employee” or “not really” to several of those, fix that before you scale usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local testing is different from production
&lt;/h2&gt;

&lt;p&gt;To be fair, local testing is a different standard.&lt;/p&gt;

&lt;p&gt;If you’re just trying OpenClaw on your machine, using the fastest path is fine.&lt;/p&gt;

&lt;p&gt;Prototype quickly.&lt;/p&gt;

&lt;p&gt;Just don’t confuse prototype convenience with production design.&lt;/p&gt;

&lt;p&gt;This is fine for a test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"temporary-dev-key"&lt;/span&gt;
openclaw dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not fine as your long-term deployment story:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# six months later and somehow this is still prod&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"my-personal-key"&lt;/span&gt;
./run-the-agent-that-handles-customer-workflows.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pattern survives in way too many teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  My takeaway
&lt;/h2&gt;

&lt;p&gt;The OpenAI OAuth screen in OpenClaw is not usually the catastrophe people imagine.&lt;/p&gt;

&lt;p&gt;The bigger issue is simpler and more boring:&lt;/p&gt;

&lt;p&gt;teams are using personal identity where they should be using workload identity.&lt;/p&gt;

&lt;p&gt;And once that agent starts doing real work, per-token pricing becomes the second architectural mistake right behind it.&lt;/p&gt;

&lt;p&gt;So yes, investigate permissions.&lt;/p&gt;

&lt;p&gt;But don’t stop there.&lt;/p&gt;

&lt;p&gt;If you’re serious about OpenClaw in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stop using personal OpenAI logins&lt;/li&gt;
&lt;li&gt;use scoped credentials tied to the workflow&lt;/li&gt;
&lt;li&gt;keep tool permissions separate and explicit&lt;/li&gt;
&lt;li&gt;use predictable-cost infrastructure for always-on agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That advice applies to OpenClaw, n8n, Make, Zapier, and basically every custom agent stack I’ve seen.&lt;/p&gt;

&lt;p&gt;The OAuth screen probably isn’t the thing that should scare you.&lt;/p&gt;

&lt;p&gt;Your architecture probably is.&lt;/p&gt;

</description>
      <category>openai</category>
      <category>oauth</category>
      <category>agents</category>
      <category>devops</category>
    </item>
    <item>
      <title>The moment an OpenClaw prompt should become a skill, script, or n8n job</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Tue, 09 Jun 2026 09:39:15 +0000</pubDate>
      <link>https://dev.to/lars_winstand/the-moment-an-openclaw-prompt-should-become-a-skill-script-or-n8n-job-29oe</link>
      <guid>https://dev.to/lars_winstand/the-moment-an-openclaw-prompt-should-become-a-skill-script-or-n8n-job-29oe</guid>
      <description>&lt;p&gt;I keep seeing the same failure mode in agent builds.&lt;/p&gt;

&lt;p&gt;Someone gets OpenClaw to do something smart once.&lt;/p&gt;

&lt;p&gt;It checks a government page. Classifies a PDF. Rewrites a report. Posts a summary to Discord.&lt;/p&gt;

&lt;p&gt;It works.&lt;/p&gt;

&lt;p&gt;Everyone gets excited.&lt;/p&gt;

&lt;p&gt;Then they leave the whole thing inside one giant prompt for the next three months.&lt;/p&gt;

&lt;p&gt;That’s when the pain starts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the prompt keeps growing&lt;/li&gt;
&lt;li&gt;behavior gets less predictable&lt;/li&gt;
&lt;li&gt;costs stay variable&lt;/li&gt;
&lt;li&gt;debugging gets miserable&lt;/li&gt;
&lt;li&gt;nobody knows which part is actually stable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A lot of agent demos die right there.&lt;/p&gt;

&lt;p&gt;The hard part isn’t getting OpenClaw to do something once.&lt;/p&gt;

&lt;p&gt;The hard part is noticing when the clever prompt should stop being a prompt and become a skill, a script, or an n8n job.&lt;/p&gt;

&lt;p&gt;While researching this, I found a thread on r/openclaw that captured the maturity curve really well. One user described the workflow like this: first prove it’s possible, fumble through it, then turn the lessons into a skill if you need reliability and expect to do it a lot.&lt;/p&gt;

&lt;p&gt;That’s the whole game.&lt;/p&gt;

&lt;h2&gt;
  
  
  My rule of thumb
&lt;/h2&gt;

&lt;p&gt;Use this ladder:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;When to use it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt in chat / system instructions / TOOLS.md&lt;/td&gt;
&lt;td&gt;You’re still discovering the workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw skill&lt;/td&gt;
&lt;td&gt;You’re repeating the task and want less prompt sprawl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Script or n8n node&lt;/td&gt;
&lt;td&gt;The step is stable, deterministic, and runs often&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Short version:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prompt when you’re exploring.&lt;/li&gt;
&lt;li&gt;Skill when you’re repeating.&lt;/li&gt;
&lt;li&gt;Code when the behavior is known.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That sounds obvious, but people skip step 2 and delay step 3 for way too long.&lt;/p&gt;

&lt;h2&gt;
  
  
  A prompt is a sketch, not an architecture
&lt;/h2&gt;

&lt;p&gt;A good prompt is a sketch.&lt;/p&gt;

&lt;p&gt;A bad production architecture is also a sketch that nobody admitted was temporary.&lt;/p&gt;

&lt;p&gt;One of the better examples I saw was a workflow that checked fire bans and bulletins from authority websites. That is a perfectly reasonable thing to prototype in OpenClaw chat.&lt;/p&gt;

&lt;p&gt;You need to answer a few questions first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which site matters?&lt;/li&gt;
&lt;li&gt;What page contains the bulletin?&lt;/li&gt;
&lt;li&gt;What counts as a relevant update?&lt;/li&gt;
&lt;li&gt;What output format do you want?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s discovery work. Use the model.&lt;/p&gt;

&lt;p&gt;But once you know the site, the schedule, and the extraction rules, dragging the full reasoning chain through every run is usually the wrong move.&lt;/p&gt;

&lt;p&gt;If the same site gets checked every morning, that’s not a conversation anymore.&lt;/p&gt;

&lt;p&gt;That’s a job.&lt;/p&gt;

&lt;p&gt;And jobs want boring machinery.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first signal: you pasted the same instructions twice
&lt;/h2&gt;

&lt;p&gt;The second I catch myself reusing the same instructions, I consider turning it into an OpenClaw skill.&lt;/p&gt;

&lt;p&gt;Not Python yet.&lt;br&gt;
Not n8n yet.&lt;br&gt;
A skill.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because skills are the middle layer most people underuse.&lt;/p&gt;

&lt;p&gt;They do three useful things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;package repeated behavior&lt;/li&gt;
&lt;li&gt;reduce how much context you resend&lt;/li&gt;
&lt;li&gt;create a reusable interface for a task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters more than it sounds.&lt;/p&gt;

&lt;p&gt;If you leave repeated instructions in chat or &lt;code&gt;TOOLS.md&lt;/code&gt;, you keep paying context rent. Every run drags the same explanation back into the model.&lt;/p&gt;

&lt;p&gt;A skill narrows that down.&lt;/p&gt;

&lt;p&gt;Instead of this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read the bulletin page. Ignore navigation text. Extract only active fire bans.
Normalize dates to ISO format. If there are no active bans, say NONE.
Return JSON with region, status, effective_date, source_url.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…every single time, you package it once and call the skill.&lt;/p&gt;

&lt;p&gt;That doesn’t make the workflow deterministic, but it does stop the prompt from turning into a landfill.&lt;/p&gt;

&lt;h2&gt;
  
  
  When a skill is the right answer
&lt;/h2&gt;

&lt;p&gt;I used to think the real choice was prompt vs code.&lt;/p&gt;

&lt;p&gt;I don’t think that anymore.&lt;/p&gt;

&lt;p&gt;OpenClaw skills are useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the task repeats&lt;/li&gt;
&lt;li&gt;the output shape is mostly known&lt;/li&gt;
&lt;li&gt;the input is still messy&lt;/li&gt;
&lt;li&gt;edge cases are still being discovered&lt;/li&gt;
&lt;li&gt;you want lower context overhead without freezing the workflow too early&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the sweet spot for a lot of semi-structured automation work.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extracting fields from ugly PDFs&lt;/li&gt;
&lt;li&gt;classifying inbound support messages&lt;/li&gt;
&lt;li&gt;summarizing inconsistent incident reports&lt;/li&gt;
&lt;li&gt;turning long text into structured JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You still want model flexibility.&lt;/p&gt;

&lt;p&gt;You just don’t want to keep re-explaining the job.&lt;/p&gt;

&lt;h2&gt;
  
  
  When code wins
&lt;/h2&gt;

&lt;p&gt;Once a step needs to run the same way every time, I stop being diplomatic.&lt;/p&gt;

&lt;p&gt;Code wins.&lt;/p&gt;

&lt;p&gt;Not because LLMs are bad.&lt;/p&gt;

&lt;p&gt;Because repeated reasoning is wasteful when the rule is already known.&lt;/p&gt;

&lt;p&gt;If your logic is basically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fetch page&lt;/li&gt;
&lt;li&gt;parse HTML&lt;/li&gt;
&lt;li&gt;compare timestamp&lt;/li&gt;
&lt;li&gt;dedupe items&lt;/li&gt;
&lt;li&gt;route based on threshold&lt;/li&gt;
&lt;li&gt;send Slack/Discord/email alert&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…then that’s software, not prompting.&lt;/p&gt;

&lt;p&gt;Here’s a dead simple example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt-shaped solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Check https://example.gov/fire-bans.
Find the latest active bulletin.
Extract title, date, region, and restriction level.
Compare it with the previous result.
If changed, post a summary to Discord.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Script-shaped solution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bs4&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BeautifulSoup&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="n"&gt;URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.gov/fire-bans&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;span class="n"&gt;soup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeautifulSoup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;html.parser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;bulletin&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;soup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.latest-bulletin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bulletin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bulletin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bulletin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bulletin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.restriction-level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;checked_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the page structure is stable, this will beat a prompt every time on reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  If it runs every day, schedule it
&lt;/h2&gt;

&lt;p&gt;This is another place people make things harder than they need to be.&lt;/p&gt;

&lt;p&gt;They build a workflow that should run every hour, then try to keep an agent alive forever.&lt;/p&gt;

&lt;p&gt;Now they’re dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;heartbeats&lt;/li&gt;
&lt;li&gt;session state&lt;/li&gt;
&lt;li&gt;polling loops&lt;/li&gt;
&lt;li&gt;recovery behavior&lt;/li&gt;
&lt;li&gt;weird timeout bugs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a lot of complexity for a job that just needed a schedule.&lt;/p&gt;

&lt;p&gt;If the task is deterministic, scheduling beats perpetual reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  n8n already solves this
&lt;/h3&gt;

&lt;p&gt;If you’re using n8n, use &lt;code&gt;Schedule Trigger&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Example workflow shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Schedule Trigger -&amp;gt; HTTP Request -&amp;gt; HTML Extract -&amp;gt; Code -&amp;gt; IF -&amp;gt; Discord
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or if you still need model help for one fuzzy step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Schedule Trigger -&amp;gt; HTTP Request -&amp;gt; LLM extraction -&amp;gt; Code -&amp;gt; Database -&amp;gt; Notification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A practical cron example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;*&lt;/span&gt;/30 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That runs every 30 minutes.&lt;/p&gt;

&lt;p&gt;In n8n, you can do the same thing with the built-in trigger UI or a custom cron expression.&lt;/p&gt;

&lt;p&gt;That’s usually better than inventing an always-on agent runtime for no reason.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical split that actually works
&lt;/h2&gt;

&lt;p&gt;This is the split I recommend for OpenClaw + n8n builds.&lt;/p&gt;

&lt;p&gt;Use the model for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classifying messy text&lt;/li&gt;
&lt;li&gt;extracting data from inconsistent docs&lt;/li&gt;
&lt;li&gt;summarizing unstructured content&lt;/li&gt;
&lt;li&gt;handling edge cases you haven’t fully mapped yet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use code or native nodes for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;date formatting&lt;/li&gt;
&lt;li&gt;validation&lt;/li&gt;
&lt;li&gt;deduplication&lt;/li&gt;
&lt;li&gt;threshold checks&lt;/li&gt;
&lt;li&gt;routing logic&lt;/li&gt;
&lt;li&gt;scheduled polling&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;cleanup you already understand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you do this well, the model handles ambiguity and the workflow handles everything else.&lt;/p&gt;

&lt;p&gt;That’s a much healthier architecture than asking GPT-5.4 or Claude Opus 4.6 to keep improvising around logic you already know.&lt;/p&gt;

&lt;h2&gt;
  
  
  Structured output is a good bridge
&lt;/h2&gt;

&lt;p&gt;If you’re not ready to move fully to code, at least force structure.&lt;/p&gt;

&lt;p&gt;For example, schema-constrained output is a good bridge between “LLM did something useful” and “automation can trust this enough to continue.”&lt;/p&gt;

&lt;p&gt;Example pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Bulletin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;effective_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;source_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-2024-08-06&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract active fire ban details.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...page content here...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Bulletin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That won’t replace deterministic logic.&lt;/p&gt;

&lt;p&gt;But it does reduce downstream guesswork and makes it easier to move stable pieces into code later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost matters, but architecture matters more
&lt;/h2&gt;

&lt;p&gt;The obvious argument for moving repeated jobs out of giant prompts is cost.&lt;/p&gt;

&lt;p&gt;And yes, that matters.&lt;/p&gt;

&lt;p&gt;If you’re paying per token, repeated prompt-heavy workflows become something you have to constantly estimate and monitor.&lt;/p&gt;

&lt;p&gt;That’s annoying enough in a prototype.&lt;/p&gt;

&lt;p&gt;It gets worse when the workflow is running all day in n8n, Make, Zapier, OpenClaw, or a custom agent stack.&lt;/p&gt;

&lt;p&gt;At that point, even if caching and batching help, you still have two problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;you’re paying for repeated context&lt;/li&gt;
&lt;li&gt;you’re using a model to guess at steps that are no longer ambiguous&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s why flat-rate, OpenAI-compatible services like Standard Compute are interesting for automation teams.&lt;/p&gt;

&lt;p&gt;You can keep the same SDKs and HTTP clients, but stop treating every scheduled run like a tiny budget event.&lt;/p&gt;

&lt;p&gt;That doesn’t mean “use LLMs for everything.”&lt;/p&gt;

&lt;p&gt;It means when you do need model calls, predictable pricing is a much better fit for always-on automations and agent workflows than constantly watching token burn.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bookkeeping example is the giveaway
&lt;/h2&gt;

&lt;p&gt;A good boundary test is bookkeeping.&lt;/p&gt;

&lt;p&gt;Bookkeeping is mostly rule-based.&lt;/p&gt;

&lt;p&gt;If OCR fails on a receipt, sure, use a model to classify the merchant or infer a category.&lt;/p&gt;

&lt;p&gt;But once the validation rules, account mappings, and posting logic are known, hiding that logic in prompts is just expensive prose.&lt;/p&gt;

&lt;p&gt;That’s not an agent architecture.&lt;/p&gt;

&lt;p&gt;That’s business logic wearing a chatbot costume.&lt;/p&gt;

&lt;h2&gt;
  
  
  My decision test
&lt;/h2&gt;

&lt;p&gt;When I’m looking at an OpenClaw workflow, I ask four questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Am I still discovering the process?&lt;/li&gt;
&lt;li&gt;Am I repeating the same instructions?&lt;/li&gt;
&lt;li&gt;Does this step need to run the same way every time?&lt;/li&gt;
&lt;li&gt;Does it run on a schedule?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If yes to #1, stay in chat.&lt;/li&gt;
&lt;li&gt;If yes to #2, make a skill.&lt;/li&gt;
&lt;li&gt;If yes to #3, move it toward code.&lt;/li&gt;
&lt;li&gt;If yes to #4, use cron or n8n Schedule Trigger.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the framework.&lt;/p&gt;

&lt;p&gt;Start messy.&lt;br&gt;
Package what repeats.&lt;br&gt;
Code what stabilizes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example migration path
&lt;/h2&gt;

&lt;p&gt;Here’s what this looks like in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1: discover in OpenClaw
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Goal: monitor a regulator page and summarize any new enforcement bulletins.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You use OpenClaw chat to figure out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where the page lives&lt;/li&gt;
&lt;li&gt;what counts as a bulletin&lt;/li&gt;
&lt;li&gt;what fields matter&lt;/li&gt;
&lt;li&gt;what a good summary looks like&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: turn repeated extraction into a skill
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Skill: extract_enforcement_bulletin
Input: raw page content
Output: structured bulletin JSON
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you’ve reduced prompt sprawl and made the extraction reusable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 3: move stable orchestration into n8n
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Schedule Trigger
-&amp;gt; HTTP Request
-&amp;gt; OpenClaw skill / LLM extraction
-&amp;gt; Code node for dedupe + validation
-&amp;gt; Post to Slack/Discord
-&amp;gt; Save to database
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 4: replace stable model steps with code where possible
&lt;/h3&gt;

&lt;p&gt;If the page format is predictable enough, swap the LLM extraction for deterministic parsing.&lt;/p&gt;

&lt;p&gt;That’s the graduation path.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you only remember one thing
&lt;/h2&gt;

&lt;p&gt;The real skill in agent engineering is not getting OpenClaw to do something impressive once.&lt;/p&gt;

&lt;p&gt;It’s noticing when the impressive part is over.&lt;/p&gt;

&lt;p&gt;That’s the moment to replace prompt cleverness with boring systems on purpose.&lt;/p&gt;

&lt;p&gt;For most OpenClaw + n8n workflows, the path is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use chat to discover the process&lt;/li&gt;
&lt;li&gt;turn repeated work into an OpenClaw skill&lt;/li&gt;
&lt;li&gt;move stable high-frequency steps into Python or an n8n Code node&lt;/li&gt;
&lt;li&gt;schedule the job instead of keeping an agent artificially awake&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once a task runs the same way every day, paying to keep re-explaining it is usually the wrong architecture.&lt;/p&gt;

&lt;p&gt;And if that workflow is running constantly, predictable flat-rate compute is a much better fit than babysitting per-token costs all month.&lt;/p&gt;

&lt;p&gt;That’s the part more teams should optimize for.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>n8n</category>
      <category>devops</category>
    </item>
    <item>
      <title>My Telegram bot stopped replying after OpenClaw 2026.6.1 — it was a full disk, not GPT-5</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Tue, 09 Jun 2026 01:37:12 +0000</pubDate>
      <link>https://dev.to/lars_winstand/my-telegram-bot-stopped-replying-after-openclaw-202661-it-was-a-full-disk-not-gpt-5-41pk</link>
      <guid>https://dev.to/lars_winstand/my-telegram-bot-stopped-replying-after-openclaw-202661-it-was-a-full-disk-not-gpt-5-41pk</guid>
      <description>&lt;p&gt;I love how quickly we all blame the interesting part of the stack.&lt;/p&gt;

&lt;p&gt;Telegram bot goes silent? Must be GPT-5. Or Claude Opus 4.6. Or provider routing. Or some weird prompt regression. Maybe OpenClaw changed how sessions work. Maybe the model had a bad day.&lt;/p&gt;

&lt;p&gt;And then the logs say:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ENOSPC: no space left on device, write
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was the real cause in an OpenClaw 2026.6.1 failure I was looking into this week.&lt;/p&gt;

&lt;p&gt;The visible symptom was classic agent weirdness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Telegram bot not replying&lt;/li&gt;
&lt;li&gt;TUI not producing output&lt;/li&gt;
&lt;li&gt;repeated &lt;code&gt;assistant turn failed before producing content&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;model shown as &lt;code&gt;openai/gpt-5.5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;local runtime at &lt;code&gt;ws://127.0.0.1:18789&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only looked at the surface, you’d absolutely start by swapping models or debugging prompts.&lt;/p&gt;

&lt;p&gt;Wrong first move.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure looked like a model problem
&lt;/h2&gt;

&lt;p&gt;The runtime command looked normal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw tui - ws://127.0.0.1:18789 - agent main - session main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The visible error in the UI was vague:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[assistant turn failed before producing content]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the actual failure was much simpler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;run error: ENOSPC: no space left on device, write
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s not GPT-5 failing.&lt;br&gt;
That’s your local runtime hitting the storage layer before the model can return anything.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why this wastes so much debugging time
&lt;/h2&gt;

&lt;p&gt;Agent failures often present at the top of the stack and originate at the bottom.&lt;/p&gt;

&lt;p&gt;When a Telegram bot stops replying, you usually don’t get a nice banner saying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Disk usage: 100%
SQLite writes failing
Session store corrupted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get silence.&lt;/p&gt;

&lt;p&gt;So people do the reasonable thing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry with GPT-5&lt;/li&gt;
&lt;li&gt;retry with Claude Opus 4.6&lt;/li&gt;
&lt;li&gt;switch providers&lt;/li&gt;
&lt;li&gt;lower temperature&lt;/li&gt;
&lt;li&gt;trim prompts&lt;/li&gt;
&lt;li&gt;blame context windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All valid tests.&lt;br&gt;
Still the wrong first tests if the machine itself is unhealthy.&lt;/p&gt;

&lt;p&gt;Long-running agents are great at slowly creating operational problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;session history grows&lt;/li&gt;
&lt;li&gt;logs grow&lt;/li&gt;
&lt;li&gt;SQLite files grow&lt;/li&gt;
&lt;li&gt;plugin state grows&lt;/li&gt;
&lt;li&gt;Telegram history grows&lt;/li&gt;
&lt;li&gt;local caches grow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re running OpenClaw on a VPS, a tiny cloud box, a home server, or a machine you haven’t checked in months, disk is a very normal way to fail.&lt;/p&gt;
&lt;h2&gt;
  
  
  OpenClaw 2026.6.1 seems to expose two classes of problems
&lt;/h2&gt;

&lt;p&gt;The disk issue was the obvious one.&lt;/p&gt;

&lt;p&gt;But it wasn’t the only clue.&lt;/p&gt;

&lt;p&gt;There were also upgrade and state warnings around plugin metadata and SQLite state, including messages like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Left plugin install index in place because shared SQLite state has conflicting plugin install metadata for: codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the kind of warning that tells you freeing disk might not be enough.&lt;/p&gt;

&lt;p&gt;You may also be dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;partially migrated local state&lt;/li&gt;
&lt;li&gt;plugin install metadata conflicts&lt;/li&gt;
&lt;li&gt;provider/plugin changes after upgrade&lt;/li&gt;
&lt;li&gt;stale SQLite state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the sequence becomes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;bot stops replying&lt;/li&gt;
&lt;li&gt;you free space&lt;/li&gt;
&lt;li&gt;restart OpenClaw&lt;/li&gt;
&lt;li&gt;it still behaves strangely&lt;/li&gt;
&lt;li&gt;now you blame the model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Still maybe not the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The update may not have broken your agent — it may have exposed old mess
&lt;/h2&gt;

&lt;p&gt;One useful detail from OpenClaw 2026.6.1 discussions: provider handling changed from bundled providers to plugins.&lt;/p&gt;

&lt;p&gt;That matters a lot.&lt;/p&gt;

&lt;p&gt;If your config expected one layout and the new version expects plugin installs plus updated config, the symptoms can look like model failure even when the real issue is local runtime setup.&lt;/p&gt;

&lt;p&gt;A practical fix people mentioned was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you upgraded and didn’t run doctor, do that before you touch prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three boring failures that all look dramatic
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure source&lt;/th&gt;
&lt;th&gt;What it looks like&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage exhaustion (&lt;code&gt;ENOSPC&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Assistant fails before producing content; Telegram goes silent; writes fail in local runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plugin/provider migration issues&lt;/td&gt;
&lt;td&gt;Breakage right after upgrade; doctor warnings; missing plugins; provider config stops matching reality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model/context config mismatch&lt;/td&gt;
&lt;td&gt;Errors like &lt;code&gt;context too large&lt;/code&gt;; execution failures caused by bad config rather than model quality&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is the pattern I think more agent teams need to internalize:&lt;/p&gt;

&lt;p&gt;Check the machine first.&lt;br&gt;
Check local state second.&lt;br&gt;
Check migrations third.&lt;br&gt;
Then start blaming models.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I’d check first when a Telegram bot goes silent
&lt;/h2&gt;

&lt;p&gt;Here’s the order I’d use.&lt;/p&gt;
&lt;h3&gt;
  
  
  1) Check disk space immediately
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;df&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If you want to find the obvious offenders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;du&lt;/span&gt; &lt;span class="nt"&gt;-sh&lt;/span&gt; ./&lt;span class="k"&gt;*&lt;/span&gt; 2&amp;gt;/dev/null | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or for system-wide pain points:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo du&lt;/span&gt; &lt;span class="nt"&gt;-xh&lt;/span&gt; / | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-h&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Things worth checking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw session storage&lt;/li&gt;
&lt;li&gt;SQLite database files&lt;/li&gt;
&lt;li&gt;logs&lt;/li&gt;
&lt;li&gt;cache directories&lt;/li&gt;
&lt;li&gt;Telegram-related state&lt;/li&gt;
&lt;li&gt;temp files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you see &lt;code&gt;ENOSPC&lt;/code&gt;, stop debugging prompts. Fix storage first.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Run OpenClaw doctor
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Especially after upgrading to 2026.6.1 or later.&lt;/p&gt;

&lt;p&gt;If OpenClaw moved providers to plugins and your old config still assumes bundled providers, doctor is likely to tell you faster than trial-and-error will.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Look for migration and plugin warnings
&lt;/h3&gt;

&lt;p&gt;Search logs for anything involving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;SQLite&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;migration&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;plugin&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;metadata&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;codex&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;provider&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of the kind of thing that matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conflicting plugin install metadata
legacy migration behavior
missing provider plugin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If those show up after an upgrade, don’t assume the state store is trustworthy.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Verify provider and model config
&lt;/h3&gt;

&lt;p&gt;Make sure the provider plugins you actually installed match what your config references.&lt;/p&gt;

&lt;p&gt;Also verify context settings.&lt;/p&gt;

&lt;p&gt;If OpenClaw thinks a model supports one context size and the provider setup says otherwise, you can get failures that look like model instability but are really config mismatch.&lt;/p&gt;

&lt;h3&gt;
  
  
  5) Only now test prompts and model selection
&lt;/h3&gt;

&lt;p&gt;Once the machine is healthy and the local state is sane, then it makes sense to compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-5&lt;/li&gt;
&lt;li&gt;Claude Opus 4.6&lt;/li&gt;
&lt;li&gt;Grok 4.20&lt;/li&gt;
&lt;li&gt;Qwen variants&lt;/li&gt;
&lt;li&gt;Llama variants&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also where having an OpenAI-compatible endpoint helps. If your app can switch providers without rewriting your integration, isolating model-vs-runtime issues gets much easier.&lt;/p&gt;

&lt;p&gt;That’s one reason I like the drop-in API approach Standard Compute takes: you can keep your existing OpenAI SDK or HTTP client, swap the backend, and test whether the problem is model routing or your local runtime without rebuilding the app. More importantly, if you’re running agents 24/7, flat-rate compute means you can do that testing without watching token spend every minute.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sometimes it really is the model config
&lt;/h2&gt;

&lt;p&gt;To be fair, not every issue here is disk or migration state.&lt;/p&gt;

&lt;p&gt;There were also reports around &lt;code&gt;context too large&lt;/code&gt; after updating.&lt;/p&gt;

&lt;p&gt;That’s real.&lt;/p&gt;

&lt;p&gt;But even then, I’d still classify it as a configuration problem before I’d call it a model problem.&lt;/p&gt;

&lt;p&gt;There’s a big difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Claude got worse”&lt;/li&gt;
&lt;li&gt;“GPT-5 is flaky”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“my runtime registered the wrong context size”&lt;/li&gt;
&lt;li&gt;“my provider plugin setup no longer matches config”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One is model blame.&lt;br&gt;
The other is operations.&lt;/p&gt;

&lt;p&gt;Most of the time, operations wins.&lt;/p&gt;
&lt;h2&gt;
  
  
  Minimal debugging checklist
&lt;/h2&gt;

&lt;p&gt;If I were writing the incident note, it would be this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If Telegram bot stops replying after an OpenClaw update:

1. Check disk space
2. Search logs for ENOSPC
3. Run `openclaw doctor`
4. Inspect migration/plugin warnings
5. Verify provider plugin installation
6. Verify model/context config
7. Only then compare models or prompts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That order saves hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  The unsexy lesson
&lt;/h2&gt;

&lt;p&gt;If your agent dies right after an upgrade, assume boring infrastructure first.&lt;/p&gt;

&lt;p&gt;Not because models never fail.&lt;br&gt;
Because local failures are much more common than people want to admit.&lt;/p&gt;

&lt;p&gt;The smarter the stack gets, the more embarrassing the outages become.&lt;/p&gt;

&lt;p&gt;A Telegram bot running through OpenClaw, talking to &lt;code&gt;openai/gpt-5.5&lt;/code&gt;, connected over &lt;code&gt;ws://127.0.0.1:18789&lt;/code&gt;, can still be taken down by the least glamorous error in computing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;no space left on device
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s good news, honestly.&lt;/p&gt;

&lt;p&gt;Boring problems are fixable.&lt;/p&gt;

&lt;p&gt;And if you’re building long-running agents in OpenClaw, n8n, Make, Zapier, or custom loops, this is the operational habit worth keeping:&lt;/p&gt;

&lt;p&gt;Models second. Machine first.&lt;/p&gt;

&lt;p&gt;If the runtime can’t write to disk, GPT-5 never even gets a chance to be wrong.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>openclaw</category>
      <category>debugging</category>
    </item>
    <item>
      <title>The first browser-agent workflow teams will actually run at scale is way smaller than the demos</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Mon, 08 Jun 2026 17:39:23 +0000</pubDate>
      <link>https://dev.to/lars_winstand/the-first-browser-agent-workflow-teams-will-actually-run-at-scale-is-way-smaller-than-the-demos-i9l</link>
      <guid>https://dev.to/lars_winstand/the-first-browser-agent-workflow-teams-will-actually-run-at-scale-is-way-smaller-than-the-demos-i9l</guid>
      <description>&lt;p&gt;I knew browser-agent demos had a credibility problem the first time I watched one spend four minutes clicking through a dashboard while someone narrated how it was "changing work."&lt;/p&gt;

&lt;p&gt;Nobody could tell if it was impressive or broken.&lt;/p&gt;

&lt;p&gt;That’s the issue.&lt;/p&gt;

&lt;p&gt;The browser-agent workflows teams will actually deploy first are not giant autonomous-employee fantasies. They’re tiny, boring, checkable chores.&lt;/p&gt;

&lt;p&gt;Think:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scan a McDonald’s receipt QR code&lt;/li&gt;
&lt;li&gt;open the survey&lt;/li&gt;
&lt;li&gt;fill the form&lt;/li&gt;
&lt;li&gt;return the coupon code in Telegram&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a real demo.&lt;/p&gt;

&lt;p&gt;Either the coupon code exists or it doesn’t.&lt;/p&gt;

&lt;p&gt;For developers, that matters more than a long reasoning trace.&lt;/p&gt;

&lt;p&gt;And once you notice that, a second thing becomes obvious fast: the first browser agents people actually run in production are also the ones that make token-metered pricing annoying almost immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  The best browser-agent demo I’ve seen is a free burger
&lt;/h2&gt;

&lt;p&gt;While looking through OpenClaw discussions, I found a thread on r/openclaw where someone described their most visually impressive live demo:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Most impressive visually that I've done with my claw? Scan him the QR code on the back of my McDonalds receipt and have him fill out the survey to get me a free burger."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a much better demo than most of the "AI employee" stuff floating around.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because it has the 3 properties browser-agent demos need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The task is bounded&lt;/li&gt;
&lt;li&gt;The result is instantly checkable&lt;/li&gt;
&lt;li&gt;The failure mode is obvious&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the agent gets stuck on a form, everyone sees it.&lt;/p&gt;

&lt;p&gt;If it succeeds, everyone knows it.&lt;/p&gt;

&lt;p&gt;No benchmark chart required.&lt;/p&gt;

&lt;p&gt;Another commenter in the same thread basically got the broader lesson:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A fun case, for most lazy onlookers this will generate a wow. Zooming out, u can demo it for any QR code signup/discount code process. Call it the QR Genie and the crowd will go wild"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not really about fast food.&lt;/p&gt;

&lt;p&gt;It’s about choosing browser tasks that are easy to verify.&lt;/p&gt;

&lt;h2&gt;
  
  
  The wow moment is not the reasoning trace
&lt;/h2&gt;

&lt;p&gt;A lot of AI demos still assume the impressive part is the hidden thinking.&lt;/p&gt;

&lt;p&gt;For browser automation, that’s backwards.&lt;/p&gt;

&lt;p&gt;The browser is brutally honest.&lt;/p&gt;

&lt;p&gt;If OpenClaw clicks the wrong button, you see it.&lt;br&gt;
If OpenAI Operator hits a CAPTCHA, you see it.&lt;br&gt;
If a Browser-use flow loops because a selector changed, you see it.&lt;/p&gt;

&lt;p&gt;That’s why tiny browser chores land so much harder than giant vague workflows.&lt;/p&gt;

&lt;p&gt;The audience can validate the output with their own eyes.&lt;/p&gt;

&lt;p&gt;That’s also why the small demos are the first ones teams trust enough to operationalize.&lt;/p&gt;
&lt;h2&gt;
  
  
  OpenAI basically told us this already
&lt;/h2&gt;

&lt;p&gt;When OpenAI launched Operator on January 23, 2025, it did not lead with "replace your operations team."&lt;/p&gt;

&lt;p&gt;The examples were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;filling out forms&lt;/li&gt;
&lt;li&gt;ordering groceries&lt;/li&gt;
&lt;li&gt;creating memes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And OpenAI emphasized that the user can take over at any point.&lt;/p&gt;

&lt;p&gt;That’s a pretty strong signal.&lt;/p&gt;

&lt;p&gt;If OpenAI wanted to sell a total-autonomy fantasy, it had every opportunity. Instead it framed Operator as supervised browser assistance.&lt;/p&gt;

&lt;p&gt;Later, on July 17, 2025, OpenAI updated the post to say Operator was being integrated into ChatGPT as agent mode. Same message, really: useful browser assistant first, magic robot employee later.&lt;/p&gt;

&lt;p&gt;The benchmark numbers point the same way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;38.1% on OSWorld&lt;/li&gt;
&lt;li&gt;58.1% on WebArena&lt;/li&gt;
&lt;li&gt;87% on WebVoyager&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Interesting? Yes.&lt;/p&gt;

&lt;p&gt;A green light to hand over your whole company to autonomous browser agents? Not even close.&lt;/p&gt;
&lt;h2&gt;
  
  
  Anthropic was more honest than most vendors
&lt;/h2&gt;

&lt;p&gt;Anthropic’s October 2024 computer-use announcement called the feature experimental and said it could be cumbersome and error-prone.&lt;/p&gt;

&lt;p&gt;Good. More companies should talk like that.&lt;/p&gt;

&lt;p&gt;Because browser automation is messy in ways text-only agents are not.&lt;/p&gt;

&lt;p&gt;Still, the capability ceiling is real. Anthropic named partners like Asana, Canva, DoorDash, Replit, and The Browser Company. It also reported gains like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SWE-bench Verified: 33.4% -&amp;gt; 49.0%&lt;/li&gt;
&lt;li&gt;TAU-bench retail: 62.6% -&amp;gt; 69.2%&lt;/li&gt;
&lt;li&gt;TAU-bench airline: 36.0% -&amp;gt; 46.0%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So yes, these systems can do more than coupon redemption.&lt;/p&gt;

&lt;p&gt;But the small demos matter because they’re honest about what works now.&lt;/p&gt;
&lt;h2&gt;
  
  
  OpenClaw’s problem is not capability. It’s legibility.
&lt;/h2&gt;

&lt;p&gt;I like OpenClaw a lot.&lt;/p&gt;

&lt;p&gt;The idea is strong: a local-first control plane for agents that can live in WhatsApp, Telegram, Slack, Discord, Signal, iMessage, and other channels, with stateful sessions, memory, tools, and model-agnostic routing.&lt;/p&gt;

&lt;p&gt;That’s powerful.&lt;/p&gt;

&lt;p&gt;It’s also a lot.&lt;/p&gt;

&lt;p&gt;In another r/openclaw thread, one user described the product as both a gift and a curse because it’s so open-ended.&lt;/p&gt;

&lt;p&gt;That feels right.&lt;/p&gt;

&lt;p&gt;Blank-canvas products are powerful for experienced builders and confusing for everyone else.&lt;/p&gt;

&lt;p&gt;Developers don’t just need capability. They need a first win.&lt;/p&gt;

&lt;p&gt;And for browser agents, the best first win is a tiny automation with an undeniable outcome.&lt;/p&gt;

&lt;p&gt;Even the OpenClaw troubleshooting flow tells you this is serious software:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status
openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
openclaw gateway probe
openclaw gateway status
openclaw doctor
openclaw channels status &lt;span class="nt"&gt;--probe&lt;/span&gt;
openclaw logs &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s fine. Serious systems need real diagnostics.&lt;/p&gt;

&lt;p&gt;But it also means starter workflows matter a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which stack is best for tiny browser chores?
&lt;/h2&gt;

&lt;p&gt;If your goal is a believable browser-agent workflow, I’d split the current options like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it’s best at&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Best for chat-native demos where progress updates in Telegram, Slack, or Discord are part of the experience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Operator / ChatGPT agent mode&lt;/td&gt;
&lt;td&gt;Best reference for supervised remote-browser interaction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser-use&lt;/td&gt;
&lt;td&gt;Best fit for developers who want SDK-first, repeatable browser automation with persistence, auth, cookies, and production-oriented ergonomics&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;If you want a live Telegram-based demo where the agent narrates progress and returns a coupon code, OpenClaw is very legible.&lt;/p&gt;

&lt;p&gt;If you want the best-known reference point for remote browser interaction, Operator is still the obvious comparison.&lt;/p&gt;

&lt;p&gt;If you want to build repeatable programmatic browser tasks, Browser-use is the most practical starting point right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Browser-use has the right vibe: less magic, more throughput
&lt;/h2&gt;

&lt;p&gt;What I like about Browser-use is that it is trying to finish the task, not perform intelligence theater.&lt;/p&gt;

&lt;p&gt;Its positioning is basically: browser tasks, speed, persistence, lower cost.&lt;/p&gt;

&lt;p&gt;That’s the right posture for developers.&lt;/p&gt;

&lt;p&gt;A minimal example is refreshingly direct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;browser_use&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatBrowserUse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatBrowserUse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find the number 1 post on Show HN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And with the SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;browser_use_sdk.v3&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncBrowserUse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncBrowserUse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List the top 20 posts on Hacker News today with their points&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install flow is straightforward too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;browser-use browser-use-sdk python-dotenv
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BROWSER_USE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the energy I want from browser automation tooling.&lt;/p&gt;

&lt;p&gt;Not "behold, AGI."&lt;/p&gt;

&lt;p&gt;Just: here’s the task, here’s the API, let’s go.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical rule: demo tasks with hard proof
&lt;/h2&gt;

&lt;p&gt;If you’re building browser-agent workflows, here’s the simplest heuristic I’ve found:&lt;/p&gt;

&lt;p&gt;Pick tasks where success produces an artifact.&lt;/p&gt;

&lt;p&gt;Good examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a coupon code&lt;/li&gt;
&lt;li&gt;a confirmation page&lt;/li&gt;
&lt;li&gt;a booking reference&lt;/li&gt;
&lt;li&gt;a submitted rebate ID&lt;/li&gt;
&lt;li&gt;a completed signup with a visible success state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bad examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"manage my workflow"&lt;/li&gt;
&lt;li&gt;"do research on this site"&lt;/li&gt;
&lt;li&gt;"handle customer ops"&lt;/li&gt;
&lt;li&gt;"run this business process end to end"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those bigger tasks may be real eventually, but they’re weak demos because the audience cannot verify them quickly.&lt;/p&gt;

&lt;p&gt;For DEV readers, I’d frame it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If success is not machine-checkable or human-checkable in under 5 seconds,
it's probably a bad first browser-agent workflow.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tiny chores are not too small. They’re the wedge.
&lt;/h2&gt;

&lt;p&gt;The obvious pushback is that coupon flows and rebate forms undersell what these systems can eventually do.&lt;/p&gt;

&lt;p&gt;Fair.&lt;/p&gt;

&lt;p&gt;But credibility compounds.&lt;/p&gt;

&lt;p&gt;A workflow like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;read QR code&lt;/li&gt;
&lt;li&gt;open site&lt;/li&gt;
&lt;li&gt;fill form&lt;/li&gt;
&lt;li&gt;survive friction&lt;/li&gt;
&lt;li&gt;return usable output&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;teaches users three important things immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the agent can handle real-world input&lt;/li&gt;
&lt;li&gt;the agent can work through messy browser state&lt;/li&gt;
&lt;li&gt;the agent can return something useful right now&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once people believe those three things, they’re ready for the bigger workflows.&lt;/p&gt;

&lt;p&gt;If you start with "autonomous employee," you lose them before the interesting part.&lt;/p&gt;

&lt;h2&gt;
  
  
  The production problem shows up fast: retries cost money
&lt;/h2&gt;

&lt;p&gt;This is the part almost every browser-agent post skips.&lt;/p&gt;

&lt;p&gt;The tiny chores that make the best demos are also the first chores teams automate at volume.&lt;/p&gt;

&lt;p&gt;A one-off QR survey becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an n8n workflow&lt;/li&gt;
&lt;li&gt;a Make scenario&lt;/li&gt;
&lt;li&gt;a Zapier automation&lt;/li&gt;
&lt;li&gt;an OpenClaw flow&lt;/li&gt;
&lt;li&gt;a custom Python service calling an OpenAI-compatible API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And browser runs are rarely clean.&lt;/p&gt;

&lt;p&gt;They retry.&lt;br&gt;
They take screenshots.&lt;br&gt;
They reread the page.&lt;br&gt;
They narrate progress.&lt;br&gt;
They loop when selectors break.&lt;br&gt;
They recover from validation errors.&lt;/p&gt;

&lt;p&gt;That means the cheap-looking demo can become an expensive production workflow very quickly under per-token pricing.&lt;/p&gt;

&lt;p&gt;This is exactly where Standard Compute becomes relevant.&lt;/p&gt;

&lt;p&gt;If you’re running AI agents or browser automations through n8n, Make, Zapier, OpenClaw, or your own code, the question is not just:&lt;/p&gt;

&lt;p&gt;"Can this browser task work?"&lt;/p&gt;

&lt;p&gt;It’s also:&lt;/p&gt;

&lt;p&gt;"Can I afford to let it run all month without babysitting usage?"&lt;/p&gt;

&lt;p&gt;Standard Compute is a drop-in OpenAI API replacement with flat monthly pricing. That matters a lot once browser-agent workflows move from demo to always-on automation.&lt;/p&gt;

&lt;p&gt;You keep the OpenAI-compatible SDK or HTTP client you already use. But instead of watching token usage every time a browser flow gets flaky, you get predictable monthly cost.&lt;/p&gt;

&lt;p&gt;That is a much better fit for agents that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry often&lt;/li&gt;
&lt;li&gt;run continuously&lt;/li&gt;
&lt;li&gt;fan out across many tasks&lt;/li&gt;
&lt;li&gt;generate lots of intermediate steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For browser automation, cost predictability is not a nice-to-have. It becomes operationally important very fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical stack for developers building this now
&lt;/h2&gt;

&lt;p&gt;If I were wiring up a small browser-agent workflow today, I’d think about the stack in layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Recommended approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Browser agent runtime&lt;/td&gt;
&lt;td&gt;Browser-use for SDK-first repeatable automation, or OpenClaw if chat-native orchestration is part of the product&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;n8n, Make, Zapier, or custom Python/Node workers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM endpoint&lt;/td&gt;
&lt;td&gt;OpenAI-compatible API so you can swap providers without rewriting app code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost control&lt;/td&gt;
&lt;td&gt;Flat-rate compute via Standard Compute once the workflow starts running continuously&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That combination gives you something most AI demos don’t:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a believable workflow&lt;/li&gt;
&lt;li&gt;a programmable interface&lt;/li&gt;
&lt;li&gt;a path to production&lt;/li&gt;
&lt;li&gt;a cost model that doesn’t get weird under retries&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I would actually demo to engineers
&lt;/h2&gt;

&lt;p&gt;If the goal is to make developers care, I would not demo the biggest workflow.&lt;/p&gt;

&lt;p&gt;I’d demo the most undeniable one.&lt;/p&gt;

&lt;p&gt;My shortlist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;receipt QR surveys that return a coupon code&lt;/li&gt;
&lt;li&gt;promo-code redemption from email or SMS&lt;/li&gt;
&lt;li&gt;simple rebate submissions with image upload&lt;/li&gt;
&lt;li&gt;appointment confirmations with a visible success page&lt;/li&gt;
&lt;li&gt;account signup flows with a clear completed state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These all have the same advantage:&lt;/p&gt;

&lt;p&gt;The audience does not need to trust your narration.&lt;/p&gt;

&lt;p&gt;They can verify the result themselves.&lt;/p&gt;

&lt;p&gt;That’s the lesson the browser-agent market keeps relearning.&lt;/p&gt;

&lt;p&gt;The best demo is not the hardest-looking one.&lt;/p&gt;

&lt;p&gt;It’s the one with the least room for argument.&lt;/p&gt;

&lt;p&gt;A free burger coupon in Telegram beats a ten-minute speech about autonomous work.&lt;/p&gt;

&lt;p&gt;And once you see that, the rest of the market makes more sense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI Operator’s examples make sense&lt;/li&gt;
&lt;li&gt;Anthropic’s caution makes sense&lt;/li&gt;
&lt;li&gt;OpenClaw’s need for clearer starter workflows makes sense&lt;/li&gt;
&lt;li&gt;Browser-use’s production focus makes sense&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first browser-agent workflow teams will actually run at scale is not a moonshot.&lt;/p&gt;

&lt;p&gt;It’s a chore.&lt;/p&gt;

&lt;p&gt;And that’s exactly why it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you’re building these workflows
&lt;/h2&gt;

&lt;p&gt;My advice is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with a bounded browser task&lt;/li&gt;
&lt;li&gt;Make success obvious&lt;/li&gt;
&lt;li&gt;Instrument retries early&lt;/li&gt;
&lt;li&gt;Assume selectors will break&lt;/li&gt;
&lt;li&gt;Don’t wait too long to fix the cost model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your agents are already using an OpenAI-compatible client, Standard Compute is the obvious thing to test when usage starts getting noisy. Flat monthly pricing is a much better match for always-on browser automation than per-token billing.&lt;/p&gt;

&lt;p&gt;That’s not the glamorous part of the stack.&lt;/p&gt;

&lt;p&gt;It’s the part that lets you keep the workflow running after the demo.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>devtools</category>
      <category>api</category>
    </item>
    <item>
      <title>I got excited about free Nemotron and Kimi too, then my 24/7 agent started falling apart</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Mon, 08 Jun 2026 09:38:10 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-got-excited-about-free-nemotron-and-kimi-too-then-my-247-agent-started-falling-apart-1418</link>
      <guid>https://dev.to/lars_winstand/i-got-excited-about-free-nemotron-and-kimi-too-then-my-247-agent-started-falling-apart-1418</guid>
      <description>&lt;p&gt;A few weeks ago I went down a Reddit rabbit hole looking at model options for always-on agents.&lt;/p&gt;

&lt;p&gt;First thread: someone on r/openclaw was hyped that NVIDIA was giving personal users free access to strong models like Nemotron Ultra, DeepSeek, Kimi, GLM, and MiniMax. Their review was basically: fast as hell.&lt;/p&gt;

&lt;p&gt;Fair.&lt;/p&gt;

&lt;p&gt;If you run OpenClaw, n8n, Make, Zapier, or your own agent stack, free access to good models feels amazing for about five minutes. You immediately start doing the math:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;maybe this bot can run 24/7&lt;/li&gt;
&lt;li&gt;maybe I can stop watching token spend&lt;/li&gt;
&lt;li&gt;maybe this whole thing is suddenly cheap enough to leave on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then I found the other thread.&lt;/p&gt;

&lt;p&gt;Someone had spent 15 days trying to get OpenClaw working properly, added $10 to OpenRouter, and still hit this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;free models on open router not working says all models are temporarily rate limited. Please try again in a few minutes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the entire problem in one screenshot.&lt;/p&gt;

&lt;p&gt;Free models are great for testing.&lt;/p&gt;

&lt;p&gt;They are often bad infrastructure for automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interactive chat and automation are different workloads
&lt;/h2&gt;

&lt;p&gt;This gets glossed over constantly.&lt;/p&gt;

&lt;p&gt;If you're manually chatting with Kimi or Nemotron and hit a rate limit, you wait, refresh, switch models, complain a little, move on.&lt;/p&gt;

&lt;p&gt;If your agent is answering users in Slack, Discord, Telegram, WhatsApp, or a support inbox, that same rate limit becomes a production issue.&lt;/p&gt;

&lt;p&gt;One failed request is not one failed request.&lt;br&gt;
It is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a broken workflow&lt;/li&gt;
&lt;li&gt;a retry storm&lt;/li&gt;
&lt;li&gt;a delayed response to a real user&lt;/li&gt;
&lt;li&gt;a support thread you now have to read&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why "works for me" testing is not the same as "works in production".&lt;/p&gt;
&lt;h2&gt;
  
  
  What usually breaks first
&lt;/h2&gt;

&lt;p&gt;Usually not OpenClaw.&lt;br&gt;
Usually not n8n.&lt;br&gt;
Usually not Docker.&lt;br&gt;
Usually not your Raspberry Pi.&lt;/p&gt;

&lt;p&gt;It is the upstream model provider.&lt;/p&gt;

&lt;p&gt;OpenClaw's setup and health tooling actually makes this pretty obvious:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
openclaw dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And when things start acting weird:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status
openclaw health &lt;span class="nt"&gt;--json&lt;/span&gt;
openclaw doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those commands help you answer the important question:&lt;/p&gt;

&lt;p&gt;Is my runtime broken, or is Anthropic, OpenAI, OpenRouter, or NVIDIA refusing requests right now?&lt;/p&gt;

&lt;p&gt;A lot of teams debug the wrong layer because the agent framework is the thing they can see.&lt;/p&gt;

&lt;p&gt;But for always-on systems, the real failure domain is often provider availability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;temporary rate limits&lt;/li&gt;
&lt;li&gt;shared free-tier saturation&lt;/li&gt;
&lt;li&gt;model-specific outages&lt;/li&gt;
&lt;li&gt;throttling during bursts&lt;/li&gt;
&lt;li&gt;silent changes in access rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Free pools are where this shows up fastest.&lt;/p&gt;

&lt;h2&gt;
  
  
  "But I'm under the limit" does not save you
&lt;/h2&gt;

&lt;p&gt;This is the sneaky part.&lt;/p&gt;

&lt;p&gt;OpenAI's own rate limit docs explain that limits are often enforced in shorter windows than people expect. A provider can advertise 60,000 requests per minute and still enforce that as 1,000 requests per second.&lt;/p&gt;

&lt;p&gt;So yes, you can be under the documented limit and still get smacked by burst traffic.&lt;/p&gt;

&lt;p&gt;Agents make this worse because they do not behave like one person chatting in one browser tab.&lt;/p&gt;

&lt;p&gt;They:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fan out tool calls&lt;/li&gt;
&lt;li&gt;run parallel sessions&lt;/li&gt;
&lt;li&gt;retry on failures&lt;/li&gt;
&lt;li&gt;wake up on schedules&lt;/li&gt;
&lt;li&gt;process webhook bursts&lt;/li&gt;
&lt;li&gt;chain multiple model calls inside one user action&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means your nice-looking average throughput numbers are lying to you.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete example: why your test passes and prod fails
&lt;/h2&gt;

&lt;p&gt;Let's say your workflow looks harmless:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Receive webhook&lt;/li&gt;
&lt;li&gt;Summarize payload&lt;/li&gt;
&lt;li&gt;Classify intent&lt;/li&gt;
&lt;li&gt;Generate response&lt;/li&gt;
&lt;li&gt;Retry on timeout&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In code, that can turn into something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summarize&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;classify&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;respond&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;intent&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks simple.&lt;/p&gt;

&lt;p&gt;Now add reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;50 webhook events arrive at once&lt;/li&gt;
&lt;li&gt;each event triggers 3 model calls&lt;/li&gt;
&lt;li&gt;10% of requests retry once&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not 50 requests.&lt;br&gt;
That is closer to 165 requests in a short burst.&lt;/p&gt;

&lt;p&gt;And if you're on a shared free pool, that burst lands right when everyone else is also trying to use the same model.&lt;/p&gt;

&lt;p&gt;That is how "free and fast" turns into "temporarily rate limited".&lt;/p&gt;
&lt;h2&gt;
  
  
  What to optimize for instead
&lt;/h2&gt;

&lt;p&gt;My opinion: once an agent matters, stop optimizing for free and start optimizing for continuity.&lt;/p&gt;

&lt;p&gt;That does not mean every workflow needs the most expensive model.&lt;br&gt;
It means the stack needs boring reliability features:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A stable runtime layer&lt;/li&gt;
&lt;li&gt;Routing across multiple models/providers&lt;/li&gt;
&lt;li&gt;Fallback behavior&lt;/li&gt;
&lt;li&gt;Predictable pricing&lt;/li&gt;
&lt;li&gt;Retry logic you actually control&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The real question is not:&lt;/p&gt;

&lt;p&gt;"Can I get Nemotron Ultra or Kimi for free today?"&lt;/p&gt;

&lt;p&gt;The real question is:&lt;/p&gt;

&lt;p&gt;"What happens when that model is rate limited and my workflow still has to run?"&lt;/p&gt;
&lt;h2&gt;
  
  
  The 3 realistic options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;What happens in practice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free NVIDIA or OpenRouter model access&lt;/td&gt;
&lt;td&gt;Great for testing, demos, and manual use. Weakest option for 24/7 automation because availability changes and shared rate pools get saturated.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct paid provider API&lt;/td&gt;
&lt;td&gt;Better than free tiers, but you still deal with provider-specific RPM/TPM limits, burst throttling, and growing token bills.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flat-rate routed API layer&lt;/td&gt;
&lt;td&gt;Better fit for always-on agents because you get one endpoint, predictable monthly cost, and routing/fallback across models.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That last category is where things get interesting for teams running real automations.&lt;/p&gt;

&lt;p&gt;If you're building agents in n8n, Make, Zapier, OpenClaw, or custom code, predictable monthly cost matters almost as much as uptime.&lt;/p&gt;

&lt;p&gt;Because the thing that kills good automations is not just downtime.&lt;br&gt;
It's also cost anxiety.&lt;/p&gt;
&lt;h2&gt;
  
  
  n8n quietly points toward the right architecture
&lt;/h2&gt;

&lt;p&gt;One thing I like in n8n: when the built-in OpenAI node is not enough, the docs basically tell you to use the HTTP Request node and call the API directly.&lt;/p&gt;

&lt;p&gt;That is not a hack.&lt;br&gt;
That is the grown-up path.&lt;/p&gt;

&lt;p&gt;Because once you care about reliability, you usually need things the simple node does not expose cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;custom retry rules&lt;/li&gt;
&lt;li&gt;timeout control&lt;/li&gt;
&lt;li&gt;provider-specific headers&lt;/li&gt;
&lt;li&gt;fallback logic&lt;/li&gt;
&lt;li&gt;circuit breakers&lt;/li&gt;
&lt;li&gt;model switching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A minimal pattern looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Summarize this support ticket and classify urgency"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then if that provider is unhappy, your app should be able to switch upstream without rewriting the whole workflow.&lt;/p&gt;

&lt;p&gt;That is why OpenAI-compatible APIs are useful. They reduce migration pain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: simple fallback wrapper
&lt;/h2&gt;

&lt;p&gt;If you are calling an OpenAI-compatible endpoint from Node, the wrapper can stay very small.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;providers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;primary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PRIMARY_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PRIMARY_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4.1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;backup&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BACKUP_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BACKUP_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;callProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/v1/responses`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;input&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; failed: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;callWithFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;All providers failed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That does not solve every problem.&lt;br&gt;
But it solves the most common bad assumption: one model endpoint will always be there when your automation needs it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Standard Compute fits
&lt;/h2&gt;

&lt;p&gt;This is the reason products like Standard Compute exist.&lt;/p&gt;

&lt;p&gt;If your workload is an always-on agent or automation, the value is not just model access. It is operational sanity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one OpenAI-compatible endpoint&lt;/li&gt;
&lt;li&gt;flat monthly pricing instead of per-token surprises&lt;/li&gt;
&lt;li&gt;routing across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20&lt;/li&gt;
&lt;li&gt;better fit for n8n, Make, Zapier, OpenClaw, and custom agent stacks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters when your workflow is not a toy and you do not want to redesign it every time a free tier gets crowded.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free models are not useless. They're just for a different job.
&lt;/h2&gt;

&lt;p&gt;I still use free model access.&lt;/p&gt;

&lt;p&gt;It is genuinely useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt iteration&lt;/li&gt;
&lt;li&gt;model comparisons&lt;/li&gt;
&lt;li&gt;side projects&lt;/li&gt;
&lt;li&gt;one-off experiments&lt;/li&gt;
&lt;li&gt;manual testing&lt;/li&gt;
&lt;li&gt;evaluating quality before committing traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I want to compare Nemotron, Kimi, DeepSeek, GLM, MiniMax, Claude, GPT, or Qwen on a prompt, free access is great.&lt;/p&gt;

&lt;p&gt;If I need a bot to stay online all week, free access is not the thing I want to bet on.&lt;/p&gt;

&lt;p&gt;That is the distinction.&lt;/p&gt;

&lt;h2&gt;
  
  
  My current rule
&lt;/h2&gt;

&lt;p&gt;This is the rule I use now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;free models for evaluation&lt;/li&gt;
&lt;li&gt;direct paid APIs for controlled workloads&lt;/li&gt;
&lt;li&gt;routed, predictable-cost API access for always-on agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That split has saved me a lot of pointless debugging.&lt;/p&gt;

&lt;p&gt;Because the trap is not that free models are bad.&lt;/p&gt;

&lt;p&gt;The trap is assuming a model that works today is the same thing as an automation stack that still works next Tuesday.&lt;/p&gt;

&lt;p&gt;If your agent only needs to impress you for ten minutes, free Nemotron and Kimi are awesome.&lt;/p&gt;

&lt;p&gt;If your agent needs to survive retries, burst traffic, and one provider having a bad afternoon, build for continuity instead.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>devops</category>
      <category>api</category>
    </item>
    <item>
      <title>I finally understood why always-on agents wreck finance workflows when 1 bot can see every account</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Mon, 08 Jun 2026 01:36:57 +0000</pubDate>
      <link>https://dev.to/lars_winstand/i-finally-understood-why-always-on-agents-wreck-finance-workflows-when-1-bot-can-see-every-account-506b</link>
      <guid>https://dev.to/lars_winstand/i-finally-understood-why-always-on-agents-wreck-finance-workflows-when-1-bot-can-see-every-account-506b</guid>
      <description>&lt;p&gt;I read a small r/openclaw thread about a dental practice dashboard and expected bookkeeping drama.&lt;/p&gt;

&lt;p&gt;What it actually contained was a pretty solid systems design lesson:&lt;/p&gt;

&lt;p&gt;If one always-on agent can see your personal account, rental property account, and business account in the same workspace, your finance automation is already on the path to bad decisions.&lt;/p&gt;

&lt;p&gt;Not because OpenClaw is broken.&lt;br&gt;
Not because GPT-5.4 or Claude Opus 4.6 are bad at finance.&lt;br&gt;
Because shared context is the bug.&lt;/p&gt;

&lt;p&gt;The thread started with a familiar failure mode: QuickBooks data plus mixed bank transactions plus one giant table plus an agent trying to force-match invoices to deposits.&lt;/p&gt;

&lt;p&gt;That setup blew up fast.&lt;/p&gt;

&lt;p&gt;What fixed it was much more boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define what counts as practice-related&lt;/li&gt;
&lt;li&gt;stop treating unlike records as directly matchable&lt;/li&gt;
&lt;li&gt;isolate domains&lt;/li&gt;
&lt;li&gt;add a human review path for mismatches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not a prompt trick.&lt;br&gt;
That is architecture.&lt;/p&gt;
&lt;h2&gt;
  
  
  The real problem: fake certainty from mixed financial context
&lt;/h2&gt;

&lt;p&gt;One line from the thread stuck with me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;what finally worked was being really specific about what "practice related" means and telling it to flag the mismatches instead of trying to force-reconcile them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is exactly right.&lt;/p&gt;

&lt;p&gt;A lot of agent builders assume finance automation fails because the model gets confused.&lt;/p&gt;

&lt;p&gt;Sometimes it does.&lt;/p&gt;

&lt;p&gt;But more often the model is doing exactly what you asked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take a pile of semi-related financial records&lt;/li&gt;
&lt;li&gt;pretend they belong to one coherent stream&lt;/li&gt;
&lt;li&gt;produce a clean answer even when the source systems disagree&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is how you get confident nonsense.&lt;/p&gt;

&lt;p&gt;QuickBooks receivables are not the same thing as bank deposits.&lt;/p&gt;

&lt;p&gt;In the dental practice example, QuickBooks tracked what insurance owed. The bank feed tracked what actually landed after adjustments. If your agent treats those as interchangeable, it will happily invent matches that look tidy and are totally wrong.&lt;/p&gt;

&lt;p&gt;Messy output is annoying.&lt;br&gt;
Neat but wrong output is dangerous.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why the single-agent pattern fails in production
&lt;/h2&gt;

&lt;p&gt;The temptation is obvious.&lt;/p&gt;

&lt;p&gt;One workspace.&lt;br&gt;
One memory.&lt;br&gt;
One big prompt.&lt;br&gt;
One OpenAI-compatible endpoint so your existing SDK code still works.&lt;/p&gt;

&lt;p&gt;You tell yourself you will add boundaries later.&lt;/p&gt;

&lt;p&gt;You usually do not.&lt;/p&gt;

&lt;p&gt;Here is what the single finance agent pattern tends to do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It reuses labels from one account in another account.&lt;/li&gt;
&lt;li&gt;It leaks sensitive context into tasks that never needed it.&lt;/li&gt;
&lt;li&gt;It tries to reconcile records from different accounting states.&lt;/li&gt;
&lt;li&gt;It becomes miserable to audit because every decision came from shared memory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If one agent has seen personal spending, rental income, payroll, and business receivables in the same context window, every downstream classification gets a little worse.&lt;/p&gt;

&lt;p&gt;That is not an LLM issue.&lt;br&gt;
That is a boundary issue.&lt;/p&gt;
&lt;h2&gt;
  
  
  The pattern that actually works: 3 workspaces + 1 orchestrator
&lt;/h2&gt;

&lt;p&gt;The best comment in that thread was basically a mini design doc:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;3 streams: Personal finance, rental property finances, corporation finances. I have a separate agent workspace for each, and keep everything isolated. My main/orchestrating agent has the instructions/smarts to delegate appropriately.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the pattern I would trust.&lt;/p&gt;

&lt;p&gt;Not one omniscient finance bot.&lt;br&gt;
Three bounded workspaces and one orchestrator.&lt;/p&gt;

&lt;p&gt;Like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Workspace A: Personal finance
Workspace B: Rental property finance
Workspace C: Corporation finance
Orchestrator: receives request, identifies domain, delegates to A/B/C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the routing rule is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If request contains mixed-source financial records:
- identify the financial domain first
- restrict retrieval to that workspace only
- compare only like-for-like records
- flag mismatches for review
- never auto-match across domains
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is less magical than the one-super-agent fantasy.&lt;/p&gt;

&lt;p&gt;It is also much safer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The redaction-first step is doing more work than most people realize
&lt;/h2&gt;

&lt;p&gt;Another useful comment in the thread said the first agent should do nothing except redact and label rows before anything touches QuickBooks matching.&lt;/p&gt;

&lt;p&gt;Yes.&lt;/p&gt;

&lt;p&gt;That is the part people skip when they are moving fast.&lt;/p&gt;

&lt;p&gt;Most finance automations fail because the first step tries to do too much:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingest raw exports&lt;/li&gt;
&lt;li&gt;interpret them&lt;/li&gt;
&lt;li&gt;reconcile them&lt;/li&gt;
&lt;li&gt;explain them&lt;/li&gt;
&lt;li&gt;maybe even draft reviewer notes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is lazy pipeline design.&lt;/p&gt;

&lt;p&gt;A safer version is staged.&lt;/p&gt;

&lt;h2&gt;
  
  
  A safer finance pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;raw export -&amp;gt; redact -&amp;gt; classify -&amp;gt; reconcile -&amp;gt; review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More explicitly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ingest raw bank or card exports.&lt;/li&gt;
&lt;li&gt;Redact sensitive fields.&lt;/li&gt;
&lt;li&gt;Label rows into exactly one domain.&lt;/li&gt;
&lt;li&gt;Compare only domain-relevant records against QuickBooks.&lt;/li&gt;
&lt;li&gt;Flag mismatches for human review.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That redaction step matters a lot.&lt;/p&gt;

&lt;p&gt;If raw exports include account numbers, personal notes, medical references, spouse purchases, or unrelated business details, those should not be visible to the reconciliation step unless they are absolutely required.&lt;/p&gt;

&lt;p&gt;Once broad raw context enters a shared workspace, you have already lost the clean boundary.&lt;/p&gt;

&lt;p&gt;Now your reconciliation problem is also a privacy problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete implementation sketch
&lt;/h2&gt;

&lt;p&gt;If I were building this in a real workflow today, I would structure it like this.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Orchestrator decides the domain
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;

&lt;span class="n"&gt;Domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rental&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;business&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quickbooks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;invoice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payroll&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;practice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;insurance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;business&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;property&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lease&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rental&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;groceries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;family&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personal card&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doctor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, use better classification than keyword rules.&lt;br&gt;
But the principle stays the same: route first, retrieve later.&lt;/p&gt;
&lt;h3&gt;
  
  
  2) Redact before reconciliation
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;redact_row&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;redacted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;redacted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;redacted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;account_number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[REDACTED]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;redacted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;redacted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\b\d{4,}\b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[REDACTED]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redacted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;redacted&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3) Reconcile only like-for-like records
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;can_compare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;record_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;record_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt;
        &lt;span class="n"&gt;record_a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;record_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;record_b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;record_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;That last check is where a lot of bad automations go wrong.&lt;/p&gt;

&lt;p&gt;An invoice is not a deposit.&lt;br&gt;
A receivable is not settled cash.&lt;br&gt;
A pending insurance payment is not a bank transaction.&lt;/p&gt;

&lt;p&gt;If your agent skips those distinctions, it will create fake matches.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example: don’t compare QuickBooks invoices directly to bank deposits
&lt;/h2&gt;

&lt;p&gt;This is the exact kind of bug that looks smart in demos and causes pain later.&lt;/p&gt;

&lt;p&gt;Bad logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# wrong: amount-only matching across different record types
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;qb_invoice&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;bank_txn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;matched&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Safer logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reconcile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;qb_record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bank_record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;qb_record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;record_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;receivable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bank_record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;record_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deposit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# still not enough to auto-match
&lt;/span&gt;    &lt;span class="c1"&gt;# adjustment logic, payment processor mapping, and timing windows belong here
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review_required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The boring answer is often the correct one:&lt;/p&gt;

&lt;p&gt;If you do not have explicit adjustment logic, do not auto-match.&lt;br&gt;
Send it to review.&lt;/p&gt;
&lt;h2&gt;
  
  
  What this looks like in n8n or Make
&lt;/h2&gt;

&lt;p&gt;This pattern maps cleanly to automation tools.&lt;/p&gt;
&lt;h3&gt;
  
  
  n8n shape
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Webhook / Schedule
  -&amp;gt; Fetch export
  -&amp;gt; Redaction node
  -&amp;gt; Classification node
  -&amp;gt; Switch by domain
      -&amp;gt; Personal workflow
      -&amp;gt; Rental workflow
      -&amp;gt; Business workflow
  -&amp;gt; Reconciliation node
  -&amp;gt; Human review queue
  -&amp;gt; Notification / report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Make shape
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scheduler
  -&amp;gt; Download CSV / API records
  -&amp;gt; Text parser / code module for redaction
  -&amp;gt; Router by domain
  -&amp;gt; Domain-specific reconciliation scenario
  -&amp;gt; Airtable / Notion / Slack review queue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is exactly where teams start seeing a cost problem too.&lt;/p&gt;

&lt;p&gt;Because the safer design usually means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more classification calls&lt;/li&gt;
&lt;li&gt;more review passes&lt;/li&gt;
&lt;li&gt;more retries&lt;/li&gt;
&lt;li&gt;more delegated agent steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is good architecture.&lt;br&gt;
But per-token pricing punishes it.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why cost predictability matters more after you fix the architecture
&lt;/h2&gt;

&lt;p&gt;This is the twist people miss.&lt;/p&gt;

&lt;p&gt;The architecture that reduces financial risk often increases automation activity.&lt;/p&gt;

&lt;p&gt;If you split one risky workflow into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an orchestrator&lt;/li&gt;
&lt;li&gt;3 domain agents&lt;/li&gt;
&lt;li&gt;a redaction step&lt;/li&gt;
&lt;li&gt;a mismatch review loop&lt;/li&gt;
&lt;li&gt;retries for uncertain classifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...you now have a safer system and more LLM calls.&lt;/p&gt;

&lt;p&gt;That is exactly why flat-rate compute is useful for agent workflows.&lt;/p&gt;

&lt;p&gt;If you are running always-on automations through n8n, Make, Zapier, OpenClaw, or custom workers, you do not want engineers second-guessing every extra review step because each call feels billable.&lt;/p&gt;

&lt;p&gt;This is the practical appeal of Standard Compute: it is a drop-in OpenAI API replacement with flat monthly pricing, so you can keep the safer multi-step workflow instead of collapsing everything into one risky prompt just to control token spend.&lt;/p&gt;

&lt;p&gt;You keep the OpenAI-compatible client.&lt;br&gt;
You stop obsessing over every background agent call.&lt;br&gt;
You can afford caution.&lt;/p&gt;

&lt;p&gt;That matters a lot in finance workflows, where the safe system is usually the one with more stages.&lt;/p&gt;
&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What actually happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single finance agent with full account access&lt;/td&gt;
&lt;td&gt;Fast to set up, but personal, rental, and business context bleed together and auditing gets ugly fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Separate workspaces plus orchestrator&lt;/td&gt;
&lt;td&gt;Cleaner delegation, lower privacy leakage, better reviewability, and fewer cross-domain mistakes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redaction-first staged pipeline&lt;/td&gt;
&lt;td&gt;Sensitive fields are removed before reconciliation, which is much safer for mixed exports and shared automations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  What I would deploy
&lt;/h2&gt;

&lt;p&gt;Here is the production pattern I would start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;orchestrator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;job&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;route requests by financial domain&lt;/span&gt;
  &lt;span class="na"&gt;can_access&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;metadata_only&lt;/span&gt;

&lt;span class="na"&gt;personal_finance_agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redacted_personal_exports&lt;/span&gt;
  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;personal_only&lt;/span&gt;

&lt;span class="na"&gt;rental_finance_agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redacted_rental_exports&lt;/span&gt;
  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;rental_only&lt;/span&gt;

&lt;span class="na"&gt;corporation_finance_agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redacted_business_exports&lt;/span&gt;
  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;business_only&lt;/span&gt;

&lt;span class="na"&gt;reconciliation_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;never match quickbooks receivables directly to bank deposits without adjustment logic&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;flag mismatches for human review&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;require explicit definition of business-related records&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;never auto-match across domains&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not clever.&lt;br&gt;
Reliable.&lt;/p&gt;
&lt;h2&gt;
  
  
  If you want to test this pattern locally
&lt;/h2&gt;

&lt;p&gt;You can stub the workflow with a small Python service and keep your existing OpenAI SDK shape.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;finance-agent-boundaries
&lt;span class="nb"&gt;cd &lt;/span&gt;finance-agent-boundaries
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi uvicorn openai pydantic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one router endpoint&lt;/li&gt;
&lt;li&gt;one redaction worker&lt;/li&gt;
&lt;li&gt;one reconciliation worker per domain&lt;/li&gt;
&lt;li&gt;one review queue sink&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your stack already talks to an OpenAI-compatible API, swapping endpoints is easy. That is useful when you want to keep the SDK code but stop paying per-token for every extra safety step.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do on Monday
&lt;/h2&gt;

&lt;p&gt;If one agent currently touches every finance account you have, do not start with prompt tuning.&lt;/p&gt;

&lt;p&gt;Start with boundaries.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Split personal, rental, and business workflows into separate workspaces.&lt;/li&gt;
&lt;li&gt;Put an orchestrator in front of them.&lt;/li&gt;
&lt;li&gt;Add a redaction-first preprocessing step.&lt;/li&gt;
&lt;li&gt;Treat QuickBooks invoices and bank deposits as different record types unless you have real adjustment logic.&lt;/li&gt;
&lt;li&gt;Tell the agent to flag mismatches instead of forcing a match.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That was the real lesson in that dental-practice thread.&lt;/p&gt;

&lt;p&gt;Not how to do bookkeeping with AI.&lt;/p&gt;

&lt;p&gt;How to keep your finance automation from becoming confidently wrong.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
      <category>devops</category>
    </item>
    <item>
      <title>My fix for hallucinating case notes was weirdly boring: stop stuffing context and split the job in two</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Sun, 07 Jun 2026 17:37:21 +0000</pubDate>
      <link>https://dev.to/lars_winstand/my-fix-for-hallucinating-case-notes-was-weirdly-boring-stop-stuffing-context-and-split-the-job-in-5e2d</link>
      <guid>https://dev.to/lars_winstand/my-fix-for-hallucinating-case-notes-was-weirdly-boring-stop-stuffing-context-and-split-the-job-in-5e2d</guid>
      <description>&lt;p&gt;I keep seeing the same failure mode in note-to-action workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;therapy notes -&amp;gt; action plan&lt;/li&gt;
&lt;li&gt;incident report -&amp;gt; follow-up steps&lt;/li&gt;
&lt;li&gt;HR case log -&amp;gt; recommendation&lt;/li&gt;
&lt;li&gt;intake notes -&amp;gt; classification + priority&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model output sounds polished.&lt;/p&gt;

&lt;p&gt;And then you trace it back to the source notes and realize half the reasoning is mush.&lt;/p&gt;

&lt;p&gt;Not always fully fabricated. Just... not grounded enough to trust.&lt;/p&gt;

&lt;p&gt;I got pulled back into this after reading a thread on r/openclaw from someone trying to turn therapy notes into action plans with OpenClaw. The problem was painfully familiar: the agent could read the notes, but the recommendations drifted away from the actual evidence.&lt;/p&gt;

&lt;p&gt;One reply had the best fix in the whole thread:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Hallucination in data extraction usually happens when the prompt is too open-ended or the context window is crowded. Try implementing a two-step verification process: first, have the agent extract raw quotes from the notes that support the action item, and then have a second pass generate the action plan based only on those quotes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That answer is boring.&lt;/p&gt;

&lt;p&gt;Which is exactly why I trust it.&lt;/p&gt;

&lt;p&gt;The real fix usually is not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a smarter system prompt&lt;/li&gt;
&lt;li&gt;a bigger model&lt;/li&gt;
&lt;li&gt;a 1M-token context window&lt;/li&gt;
&lt;li&gt;a more elaborate agent loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real fix is architectural.&lt;/p&gt;

&lt;p&gt;Split the job in two.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual bug: one prompt doing too many jobs
&lt;/h2&gt;

&lt;p&gt;A single-pass note-to-action prompt usually asks one model call to do all of this at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read messy notes&lt;/li&gt;
&lt;li&gt;decide what matters&lt;/li&gt;
&lt;li&gt;classify the case&lt;/li&gt;
&lt;li&gt;infer missing context&lt;/li&gt;
&lt;li&gt;prioritize risk&lt;/li&gt;
&lt;li&gt;generate recommendations&lt;/li&gt;
&lt;li&gt;explain why&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not one task.&lt;/p&gt;

&lt;p&gt;That is a committee meeting inside a single prompt.&lt;/p&gt;

&lt;p&gt;For sensitive workflows, that’s where things go sideways.&lt;/p&gt;

&lt;p&gt;If you’re using OpenClaw, n8n, Make, Zapier, or a custom agent stack to process notes and trigger downstream actions, you do not want the model improvising across all of those steps in one shot.&lt;/p&gt;

&lt;p&gt;You want a chain of custody for the evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two-pass pattern
&lt;/h2&gt;

&lt;p&gt;This is the version I’d ship first.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;What actually happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-pass note-to-action prompt&lt;/td&gt;
&lt;td&gt;One model call does extraction, classification, and recommendations together. Fast to prototype, but recommendations are generated from the full noisy context, so drift is harder to catch.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Two-pass grounded workflow&lt;/td&gt;
&lt;td&gt;Pass 1 extracts evidence or quotes with source references; Pass 2 generates recommendations from only approved evidence. More auditable, easier to debug, and much safer for sensitive workflows.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pass 1: extract evidence only
&lt;/h3&gt;

&lt;p&gt;Do not ask for recommendations.&lt;/p&gt;

&lt;p&gt;Do not ask for synthesis.&lt;/p&gt;

&lt;p&gt;Ask for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;verbatim quotes&lt;/li&gt;
&lt;li&gt;structured facts&lt;/li&gt;
&lt;li&gt;source references&lt;/li&gt;
&lt;li&gt;confidence or ambiguity flags if needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"evidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ev_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"quote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Client reported missing two medication doses this week."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"note_2026_06_05"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"line_range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"14-14"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"adherence_issue"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pass 2: generate recommendations from evidence only
&lt;/h3&gt;

&lt;p&gt;Now the model gets a much smaller input.&lt;/p&gt;

&lt;p&gt;No giant note blob. No hidden distractions. Just the extracted evidence.&lt;/p&gt;

&lt;p&gt;Every recommendation should cite evidence IDs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommendations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Schedule medication adherence follow-up within 48 hours."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"evidence_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ev_001"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pass 3: abstain or escalate
&lt;/h3&gt;

&lt;p&gt;This is the step teams skip.&lt;/p&gt;

&lt;p&gt;If the evidence is weak, conflicting, or incomplete, the workflow should return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;insufficient evidence&lt;/li&gt;
&lt;li&gt;needs human review&lt;/li&gt;
&lt;li&gt;unable to classify&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not failure.&lt;/p&gt;

&lt;p&gt;That is a working safety mechanism.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why stuffing more context often makes outputs worse
&lt;/h2&gt;

&lt;p&gt;People still talk about long context like it automatically solves grounding.&lt;/p&gt;

&lt;p&gt;It doesn’t.&lt;/p&gt;

&lt;p&gt;A big context window means the model can receive more text. It does not mean the model will reliably use the right text.&lt;/p&gt;

&lt;p&gt;That’s the important distinction.&lt;/p&gt;

&lt;p&gt;The "Lost in the Middle" result is still one of the clearest explanations for why giant prompts underperform in practice: relevant information buried in the middle of long context is easier for the model to miss.&lt;/p&gt;

&lt;p&gt;That matches what a lot of us see in production.&lt;/p&gt;

&lt;p&gt;You stuff in every note, all prior history, metadata, policy text, and a giant instruction block because it feels safer than leaving anything out.&lt;/p&gt;

&lt;p&gt;But now the important sentence is buried on page 8 between irrelevant details.&lt;/p&gt;

&lt;p&gt;The model has more text.&lt;/p&gt;

&lt;p&gt;It does not have better grounding.&lt;/p&gt;

&lt;p&gt;That is why retrieval and scoped evidence extraction keep beating context stuffing in real systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Long context is useful. Just not for every job.
&lt;/h2&gt;

&lt;p&gt;I’m not anti-long-context.&lt;/p&gt;

&lt;p&gt;Long context is great for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;policy Q&amp;amp;A&lt;/li&gt;
&lt;li&gt;chat-with-documents&lt;/li&gt;
&lt;li&gt;repeated analysis over a stable corpus&lt;/li&gt;
&lt;li&gt;cached prompts over large reference material&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I want Claude or GPT-5 to answer questions about a handbook, a giant cached prompt can be elegant.&lt;/p&gt;

&lt;p&gt;If I want a model to turn sensitive notes into recommendations, I want evidence extraction first.&lt;/p&gt;

&lt;p&gt;Different job. Different failure mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical implementation with an OpenAI-compatible client
&lt;/h2&gt;

&lt;p&gt;This is where things get useful for devs.&lt;/p&gt;

&lt;p&gt;If your stack already talks to the OpenAI API, you can usually implement the two-pass pattern without rewriting your app architecture.&lt;/p&gt;

&lt;p&gt;That matters because the best workflow changes are the ones you can actually ship.&lt;/p&gt;

&lt;p&gt;Here’s a minimal example using the OpenAI Node SDK shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pass 1: evidence extraction
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_BASE_URL&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
Case note 1: Client reported missing two medication doses this week.
Case note 2: Client denied suicidal ideation.
Case note 3: Client requested transportation help for next appointment.
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;extraction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5.4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Extract only verbatim evidence from the notes.
Return JSON with this schema:
{
  "evidence": [
    {
      "id": "string",
      "quote": "string",
      "source": "string",
      "type": "string"
    }
  ]
}
Do not generate recommendations.`&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;extraction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pass 2: recommendations from extracted evidence only
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;extractedEvidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;extraction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4.6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Generate recommendations using only the provided evidence.
Return JSON with this schema:
{
  "recommendations": [
    {
      "action": "string",
      "priority": "low|medium|high",
      "evidence_ids": ["string"]
    }
  ],
  "status": "ok|insufficient_evidence"
}
Every recommendation must cite evidence_ids.
If evidence is insufficient, return status=insufficient_evidence.`&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;extractedEvidence&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few practical notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can use the same model for both passes.&lt;/li&gt;
&lt;li&gt;You can route extraction to a cheaper model and recommendation to a stronger one.&lt;/li&gt;
&lt;li&gt;You can validate JSON between steps.&lt;/li&gt;
&lt;li&gt;You can insert human review after pass 1.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is huge for sensitive workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this pattern fits Standard Compute really well
&lt;/h2&gt;

&lt;p&gt;This is exactly the kind of workflow where per-token pricing gets annoying fast.&lt;/p&gt;

&lt;p&gt;Once you split one giant prompt into two or three smaller calls, you usually get better reliability.&lt;/p&gt;

&lt;p&gt;But you also increase call volume.&lt;/p&gt;

&lt;p&gt;That’s the right tradeoff technically. It’s just annoying financially on standard per-token billing.&lt;/p&gt;

&lt;p&gt;This is where Standard Compute makes a lot of sense for agent and automation teams.&lt;/p&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unlimited AI compute for a flat monthly price&lt;/li&gt;
&lt;li&gt;an OpenAI-compatible API&lt;/li&gt;
&lt;li&gt;drop-in use with existing SDKs and HTTP clients&lt;/li&gt;
&lt;li&gt;dynamic routing across GPT-5.4, Claude Opus 4.6, and Grok 4.20&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of trying to cram everything into one giant call to save tokens, you can design the workflow the way it should be designed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one pass for extraction&lt;/li&gt;
&lt;li&gt;one pass for recommendation&lt;/li&gt;
&lt;li&gt;maybe one pass for validation or abstention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s especially useful if you’re building in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n&lt;/li&gt;
&lt;li&gt;Make&lt;/li&gt;
&lt;li&gt;Zapier&lt;/li&gt;
&lt;li&gt;OpenClaw&lt;/li&gt;
&lt;li&gt;custom agent frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flat-rate compute changes behavior.&lt;/p&gt;

&lt;p&gt;People stop asking, “Can we afford one more verification step?”&lt;/p&gt;

&lt;p&gt;And start asking the better question:&lt;/p&gt;

&lt;p&gt;“Does one more verification step make this workflow safer and easier to debug?”&lt;/p&gt;

&lt;p&gt;That is a much healthier way to build.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw setup for local or agent-based workflows
&lt;/h2&gt;

&lt;p&gt;If you’re using OpenClaw as the orchestration layer, splitting responsibilities between agents is straightforward.&lt;/p&gt;

&lt;p&gt;Basic setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
openclaw onboard &lt;span class="nt"&gt;--install-daemon&lt;/span&gt;
openclaw dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Health checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw status
openclaw status &lt;span class="nt"&gt;--all&lt;/span&gt;
openclaw health &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A nice pattern here is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent 1: retrieve relevant notes/chunks&lt;/li&gt;
&lt;li&gt;Agent 2: extract evidence only&lt;/li&gt;
&lt;li&gt;Agent 3: generate recommendations from approved evidence&lt;/li&gt;
&lt;li&gt;Agent 4: escalate if evidence is incomplete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That beats one overloaded agent trying to do everything in one breath.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval vs stuffing
&lt;/h2&gt;

&lt;p&gt;For tiny corpora, stuffing can be fine.&lt;/p&gt;

&lt;p&gt;For growing corpora, retrieval usually wins.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Tradeoff&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context stuffing&lt;/td&gt;
&lt;td&gt;Simpler for small corpora. But as prompts grow, relevant facts can get buried in the middle and become harder for the model to use correctly.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval + reranking&lt;/td&gt;
&lt;td&gt;More moving parts, but it scales better and is stronger when the right evidence would otherwise be lost inside long context.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If your recommendation step starts from the wrong chunks, it does not matter whether you picked GPT-5.4, Claude Opus 4.6, Grok 4.20, Qwen, or Llama.&lt;/p&gt;

&lt;p&gt;The output will still drift because the evidence was wrong or incomplete upstream.&lt;/p&gt;

&lt;p&gt;That’s why model shopping is often a distraction.&lt;/p&gt;

&lt;p&gt;If the architecture is wrong, a better model just gives you more fluent mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The default workflow I’d recommend
&lt;/h2&gt;

&lt;p&gt;If you’re building anything that turns notes into decisions, this is the default I’d start with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve the smallest useful evidence set&lt;/li&gt;
&lt;li&gt;Extract verbatim quotes and structured facts first&lt;/li&gt;
&lt;li&gt;Attach source references to every extracted item&lt;/li&gt;
&lt;li&gt;Generate recommendations only from extracted evidence&lt;/li&gt;
&lt;li&gt;Allow abstention when evidence is weak or missing&lt;/li&gt;
&lt;li&gt;Add human review where the cost of error is high&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It’s not glamorous.&lt;/p&gt;

&lt;p&gt;It won’t win any prompt engineering beauty contests.&lt;/p&gt;

&lt;p&gt;But it survives contact with reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;When a model hallucinates on case notes, my first question is no longer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Should we switch models?&lt;/li&gt;
&lt;li&gt;Should we increase the context window?&lt;/li&gt;
&lt;li&gt;Should we write a smarter prompt?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My first question is:&lt;/p&gt;

&lt;p&gt;Why did we ask one model call to do three jobs while hiding the evidence in the middle of a giant prompt?&lt;/p&gt;

&lt;p&gt;If you fix that, a lot of the "hallucination" problem gets much less mysterious.&lt;/p&gt;

&lt;p&gt;And if you’re running these workflows at scale, this is also where flat-rate infrastructure becomes practical, not just cheaper. The more your architecture improves, the more multi-step validation you’ll want. Standard Compute is built for exactly that kind of agent workload.&lt;/p&gt;

&lt;p&gt;If your stack already uses an OpenAI-compatible client, it’s a very small implementation change to test this pattern.&lt;/p&gt;

&lt;p&gt;That’s probably the most useful part of this whole idea:&lt;/p&gt;

&lt;p&gt;The fix is boring.&lt;/p&gt;

&lt;p&gt;Which means you can ship it this week.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>openai</category>
      <category>devops</category>
    </item>
    <item>
      <title>My OpenClaw agent started writing nonsense and the real fix was a kill switch, not a better prompt</title>
      <dc:creator>Lars Winstand</dc:creator>
      <pubDate>Sun, 07 Jun 2026 09:38:26 +0000</pubDate>
      <link>https://dev.to/lars_winstand/my-openclaw-agent-started-writing-nonsense-and-the-real-fix-was-a-kill-switch-not-a-better-prompt-4092</link>
      <guid>https://dev.to/lars_winstand/my-openclaw-agent-started-writing-nonsense-and-the-real-fix-was-a-kill-switch-not-a-better-prompt-4092</guid>
      <description>&lt;p&gt;I hit a thread on r/openclaw with the perfect title: &lt;strong&gt;“How to stop an insane model from openclaw.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s the whole post, honestly.&lt;/p&gt;

&lt;p&gt;Because once &lt;code&gt;/abort&lt;/code&gt; stops working, you are not doing prompt engineering anymore. You are doing incident response.&lt;/p&gt;

&lt;p&gt;The original poster was running &lt;strong&gt;OpenClaw&lt;/strong&gt; with &lt;strong&gt;Ollama&lt;/strong&gt; and &lt;strong&gt;Kimi-K2.6:cloud&lt;/strong&gt;. The agent started dumping gibberish. &lt;code&gt;/abort&lt;/code&gt; didn’t help. &lt;code&gt;stop&lt;/code&gt; didn’t help. Restarting Ollama didn’t help.&lt;/p&gt;

&lt;p&gt;That’s the moment where a lot of people reach for a better system prompt.&lt;/p&gt;

&lt;p&gt;I think that’s the wrong instinct.&lt;/p&gt;

&lt;p&gt;If a coding agent has shell access and write access, the real question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I make the model behave?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It’s:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I make a bad run cheap to kill and easy to clean up?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My take: &lt;strong&gt;self-healing agents are only safe if they are easy to contain, supervise, and terminate from outside the chat loop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So if you’re running OpenClaw, or building similar agent flows in &lt;strong&gt;n8n&lt;/strong&gt;, &lt;strong&gt;Make&lt;/strong&gt;, &lt;strong&gt;Zapier&lt;/strong&gt;, &lt;strong&gt;OpenClaw&lt;/strong&gt;, or your own runner, here’s the setup I’d actually trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  The big mistake: treating prompts like a safety system
&lt;/h2&gt;

&lt;p&gt;Prompts matter.&lt;/p&gt;

&lt;p&gt;Good prompts reduce ambiguity. Narrow scopes help. Short tasks are easier to recover from than “go refactor the app.”&lt;/p&gt;

&lt;p&gt;But prompts are steering.&lt;/p&gt;

&lt;p&gt;They are not brakes.&lt;/p&gt;

&lt;p&gt;That distinction gets ignored because the happy path looks great. You point &lt;strong&gt;GPT-5.4&lt;/strong&gt; or &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; at a coding task, it edits files, runs tests, explains the diff, and everyone starts believing the agent is reliable.&lt;/p&gt;

&lt;p&gt;Then one run goes feral and you remember what this really is: a probabilistic system with tool access.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;/abort&lt;/code&gt; is dead, your prompt is no longer the control plane.&lt;/p&gt;

&lt;p&gt;Your architecture is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually helped in the OpenClaw thread
&lt;/h2&gt;

&lt;p&gt;The most useful reply in that thread was also the least glamorous.&lt;/p&gt;

&lt;p&gt;A Reddit user basically said: don’t trust that repo path anymore, and run the agent in a &lt;strong&gt;git worktree&lt;/strong&gt; or disposable clone so the blast radius stays contained.&lt;/p&gt;

&lt;p&gt;That is exactly right.&lt;/p&gt;

&lt;p&gt;Not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add another safety reminder&lt;/li&gt;
&lt;li&gt;repeat the task more clearly&lt;/li&gt;
&lt;li&gt;ask the model to be careful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;isolate the workspace&lt;/li&gt;
&lt;li&gt;gate writes&lt;/li&gt;
&lt;li&gt;supervise liveness&lt;/li&gt;
&lt;li&gt;kill fast when the run degrades&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s how you make agent failures survivable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule #1: never let a coding agent work directly in your main checkout
&lt;/h2&gt;

&lt;p&gt;If an agent can edit files, the first thing you should control is blast radius.&lt;/p&gt;

&lt;p&gt;The cheapest way to do that for local dev is usually &lt;strong&gt;git worktree&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Create a disposable workspace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git worktree add &lt;span class="nt"&gt;-b&lt;/span&gt; agent-sandbox ../repo-agent-sandbox
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or track from your main branch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git worktree add &lt;span class="nt"&gt;--track&lt;/span&gt; &lt;span class="nt"&gt;-b&lt;/span&gt; agent-sandbox ../repo-agent-sandbox origin/main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent can do weird stuff in a separate working tree instead of vandalizing your main checkout.&lt;/p&gt;

&lt;p&gt;That one step changes the whole risk profile.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Execution mode&lt;/th&gt;
&lt;th&gt;What happens when the agent goes weird&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direct repo execution&lt;/td&gt;
&lt;td&gt;Highest blast radius. Fastest setup, worst failure mode.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disposable git worktree&lt;/td&gt;
&lt;td&gt;Cheap isolation for normal coding tasks. Easy diff review and cleanup.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disposable clone or container&lt;/td&gt;
&lt;td&gt;Better isolation, more overhead. Best for long-running or less trusted agents.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My opinion: &lt;strong&gt;git worktree should be the default for coding agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Running directly in your main repo should feel sketchy, because it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule #2: approvals belong outside the model
&lt;/h2&gt;

&lt;p&gt;This is where OpenClaw is actually better than people give it credit for.&lt;/p&gt;

&lt;p&gt;OpenClaw’s approvals system gives you host-level policy enforcement. That matters because external policy is real control. Model instructions are not.&lt;/p&gt;

&lt;p&gt;Check the current policy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw exec-policy show
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the cautious preset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw exec-policy preset cautious &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set approvals from stdin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw approvals &lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;--stdin&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
{ version: 1, defaults: { security: "full", ask: "off" } }
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If I’m letting an agent write code, I want the host policy to be the source of truth.&lt;/p&gt;

&lt;p&gt;Not the prompt.&lt;/p&gt;

&lt;p&gt;Not the model’s self-reported plan.&lt;/p&gt;

&lt;p&gt;The host.&lt;/p&gt;

&lt;h3&gt;
  
  
  My practical approval rules
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read-only analysis&lt;/strong&gt;: mostly fine to automate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code edits inside a sandbox worktree&lt;/strong&gt;: allowed, but review before merge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shell commands that can rewrite state&lt;/strong&gt;: require approval&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anything touching deploys, secrets, infra, or production data&lt;/strong&gt;: separate environment or no autonomous execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, this is slower than YOLO mode.&lt;/p&gt;

&lt;p&gt;That’s the point.&lt;/p&gt;

&lt;p&gt;Guardrails are supposed to feel annoying right before they save you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule #3: if &lt;code&gt;/abort&lt;/code&gt; fails, you need a kill path outside the chat loop
&lt;/h2&gt;

&lt;p&gt;One of the funniest replies in that thread was just:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;ctrl C&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Blunt, but correct.&lt;/p&gt;

&lt;p&gt;If the in-band abort path is broken, you need out-of-band control.&lt;/p&gt;

&lt;p&gt;That means your agent runner should support a hard kill from the supervisor layer, not just from the conversation layer.&lt;/p&gt;

&lt;p&gt;At minimum, I want all of these available:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keyboard interrupt&lt;/li&gt;
&lt;li&gt;process kill&lt;/li&gt;
&lt;li&gt;child process cleanup&lt;/li&gt;
&lt;li&gt;workspace disposal&lt;/li&gt;
&lt;li&gt;retry suppression&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your architecture depends on the model cleanly cooperating with shutdown, you do not have a kill switch.&lt;/p&gt;

&lt;p&gt;You have a suggestion box.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rule #4: add a heartbeat, because zombie sub-agents are real
&lt;/h2&gt;

&lt;p&gt;A separate OpenClaw thread about WhatsApp reliability had a reply that jumped out at me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“It ended up being sub agent that are still running.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not a prompt problem.&lt;/p&gt;

&lt;p&gt;That’s orchestration drift.&lt;/p&gt;

&lt;p&gt;Now we’re talking about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hung jobs&lt;/li&gt;
&lt;li&gt;child workers that never exit&lt;/li&gt;
&lt;li&gt;retries that stack&lt;/li&gt;
&lt;li&gt;liveness checks that don’t exist&lt;/li&gt;
&lt;li&gt;supervisors that never declare failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the same class of problem you get in &lt;strong&gt;n8n&lt;/strong&gt;, &lt;strong&gt;Make&lt;/strong&gt;, &lt;strong&gt;Zapier&lt;/strong&gt;, or any custom agent runner. If the workflow can loop unattended, then you need process supervision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Minimum viable heartbeat design
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Liveness timeout&lt;/strong&gt;: if no valid event arrives for 30 to 60 seconds, mark the run unhealthy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sanity check&lt;/strong&gt;: detect repeated malformed tool calls, repeated identical output, or obvious gibberish loops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step budget&lt;/strong&gt;: cap tool invocations and file mutations per task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry budget&lt;/strong&gt;: one retry max&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard abort&lt;/strong&gt;: kill the process tree and discard the workspace&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is what “self-healing” should mean in practice.&lt;/p&gt;

&lt;p&gt;Not “retry forever until vibes improve.”&lt;/p&gt;

&lt;p&gt;Sometimes the healthy behavior is to declare the run unrecoverable and tear it down.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple supervisor pattern
&lt;/h2&gt;

&lt;p&gt;Here’s the rough shape I’d use around an OpenClaw run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;WORKTREE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"../repo-agent-&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;BRANCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"agent-run-&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;TIMEOUT_SECONDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;45

cleanup&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  pkill &lt;span class="nt"&gt;-P&lt;/span&gt; &lt;span class="nv"&gt;$$&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
  &lt;/span&gt;git worktree remove &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WORKTREE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nb"&gt;trap &lt;/span&gt;cleanup EXIT INT TERM

git worktree add &lt;span class="nt"&gt;-b&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BRANCH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WORKTREE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WORKTREE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Run the agent under an outer timeout.&lt;/span&gt;
&lt;span class="nb"&gt;timeout&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TIMEOUT_SECONDS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; openclaw run &lt;span class="s2"&gt;"Update only src/auth/login.ts and run auth tests"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not fancy. That’s why I like it.&lt;/p&gt;

&lt;p&gt;You can make it smarter later with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;event stream monitoring&lt;/li&gt;
&lt;li&gt;output sanity checks&lt;/li&gt;
&lt;li&gt;child process tracking&lt;/li&gt;
&lt;li&gt;model fallback routing&lt;/li&gt;
&lt;li&gt;diff-based rollback rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But even this basic wrapper is better than trusting &lt;code&gt;/abort&lt;/code&gt; to save you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Better prompts still help. They just help in a different way.
&lt;/h2&gt;

&lt;p&gt;I don’t want to overcorrect and pretend prompts don’t matter.&lt;/p&gt;

&lt;p&gt;They do.&lt;/p&gt;

&lt;p&gt;Specific tasks are easier to supervise.&lt;/p&gt;

&lt;p&gt;This is bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fix the auth flow, clean up the frontend, and improve error handling.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is much better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Edit only src/auth/login.ts.
Do not modify any other files.
Run npm test -- auth/login.test.ts.
Stop after reporting the diff and test result.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That kind of prompt improves success rate.&lt;/p&gt;

&lt;p&gt;But the important distinction is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;prompts improve good runs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;guardrails improve bad runs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If I have to choose which one matters more for an unattended coding agent, I’m picking guardrails every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost problem shows up fast when agents fail badly
&lt;/h2&gt;

&lt;p&gt;There’s also a very practical economics angle here.&lt;/p&gt;

&lt;p&gt;A looping agent is not just a reliability bug. Under per-token pricing, it becomes a billing bug too.&lt;/p&gt;

&lt;p&gt;That’s true whether the loop lives in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw&lt;/li&gt;
&lt;li&gt;n8n&lt;/li&gt;
&lt;li&gt;Make&lt;/li&gt;
&lt;li&gt;Zapier&lt;/li&gt;
&lt;li&gt;a custom worker queue&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A broken retry policy can quietly burn money while doing nothing useful.&lt;/p&gt;

&lt;p&gt;This is exactly why flat-rate compute is a much better fit for always-on agents and automations. If you can route or retry without doing mental token math every time, you can build safer supervisors.&lt;/p&gt;

&lt;p&gt;That’s one of the reasons I like what &lt;strong&gt;Standard Compute&lt;/strong&gt; is doing: it gives you an &lt;strong&gt;OpenAI-compatible API&lt;/strong&gt; with &lt;strong&gt;flat monthly pricing&lt;/strong&gt; instead of per-token billing, and it can dynamically route across models like &lt;strong&gt;GPT-5.4&lt;/strong&gt;, &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt;, and &lt;strong&gt;Grok 4.20&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For agent workflows, that changes behavior.&lt;/p&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry once without worrying about a surprise bill&lt;/li&gt;
&lt;li&gt;route away from a degraded model stack&lt;/li&gt;
&lt;li&gt;run 24/7 automations without token anxiety&lt;/li&gt;
&lt;li&gt;keep your existing OpenAI SDK setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re building unattended automations, cost predictability is not a nice-to-have. It changes how aggressive you can be with supervision and recovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  The baseline setup I’d use tomorrow
&lt;/h2&gt;

&lt;p&gt;If I were configuring OpenClaw for real work, this would be my default:&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Start every run in a disposable workspace
&lt;/h3&gt;

&lt;p&gt;Use &lt;strong&gt;git worktree&lt;/strong&gt; for normal coding tasks.&lt;/p&gt;

&lt;p&gt;Use a disposable clone or container for higher-risk runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) Put approvals outside the model
&lt;/h3&gt;

&lt;p&gt;Use OpenClaw host approvals as the actual enforcement layer.&lt;/p&gt;

&lt;p&gt;Default to &lt;strong&gt;cautious&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Add a supervisor heartbeat
&lt;/h3&gt;

&lt;p&gt;If the agent or a sub-agent stops making sane progress for 30 to 60 seconds, kill it.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Retry once, and narrow the task
&lt;/h3&gt;

&lt;p&gt;Don’t rerun the exact same broken context five times.&lt;/p&gt;

&lt;p&gt;Retry once with a smaller scope, cleaner context, or a different model.&lt;/p&gt;

&lt;h3&gt;
  
  
  5) Normalize hard aborts
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Ctrl+C&lt;/code&gt;, &lt;code&gt;timeout&lt;/code&gt;, &lt;code&gt;pkill&lt;/code&gt;, child cleanup, sandbox deletion.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;/abort&lt;/code&gt; works, great.&lt;/p&gt;

&lt;p&gt;If it doesn’t, your system should still be safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real lesson
&lt;/h2&gt;

&lt;p&gt;What I liked about that OpenClaw thread is that the best replies were not from prompt obsessives.&lt;/p&gt;

&lt;p&gt;They were from people thinking like operators.&lt;/p&gt;

&lt;p&gt;They asked the right questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What if sub-agents are still running?&lt;/li&gt;
&lt;li&gt;What if the workspace is no longer trustworthy?&lt;/li&gt;
&lt;li&gt;What if the in-chat abort path is fake comfort?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the shift.&lt;/p&gt;

&lt;p&gt;Once you build always-on agents, you are not just writing prompts anymore.&lt;/p&gt;

&lt;p&gt;You are designing failure boundaries.&lt;/p&gt;

&lt;p&gt;And when the model starts writing nonsense, the right move is not to beg it to calm down.&lt;/p&gt;

&lt;p&gt;Cut power.&lt;/p&gt;

&lt;p&gt;Contain damage.&lt;/p&gt;

&lt;p&gt;Review the diff.&lt;/p&gt;

&lt;p&gt;Start fresh somewhere disposable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>automation</category>
      <category>openclaw</category>
    </item>
  </channel>
</rss>
