<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ansh Saxena</title>
    <description>The latest articles on DEV Community by Ansh Saxena (@anshss).</description>
    <link>https://dev.to/anshss</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F924503%2F269dc4de-290a-44fc-a73b-9ad86d4a66f2.jpeg</url>
      <title>DEV Community: Ansh Saxena</title>
      <link>https://dev.to/anshss</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/anshss"/>
    <language>en</language>
    <item>
      <title>My agent wasn't flaky. I just couldn't see it looping.</title>
      <dc:creator>Ansh Saxena</dc:creator>
      <pubDate>Sun, 26 Apr 2026 07:36:16 +0000</pubDate>
      <link>https://dev.to/anshss/your-agent-isnt-flaky-youre-blind-4pk3</link>
      <guid>https://dev.to/anshss/your-agent-isnt-flaky-youre-blind-4pk3</guid>
      <description>&lt;p&gt;A lot of what I do is stare at other people’s agent traces, the ones whose print logs say they are fine, but whose users say they are slow. This time, on Wednesday afternoon, I received a ping saying the agent feels “slow.” not broken. just slow. which is the worst kind of bug report because there’s nothing to grep for.&lt;/p&gt;

&lt;p&gt;I open the logs and see this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[tool] search_knowledge_base called
[tool] search_knowledge_base returned: null
[tool] search_knowledge_base called
[tool] search_knowledge_base returned: null
[tool] search_knowledge_base called
[tool] search_knowledge_base returned: null
[tool] search_knowledge_base called
[tool] search_knowledge_base returned: null
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same call. Same input. Same null. Four times in a row.&lt;/p&gt;

&lt;p&gt;I'd love to say I caught it from the logs, but honestly I just restarted the process and it went away. Blamed the upstream API in my head. Closed the tab. Moved on with my day.&lt;/p&gt;

&lt;p&gt;The tool wasn't broken. The agent was stuck in a loop because it didn't know what to do when a search came back empty, so it just... tried again. And the logs had no opinion about that. They just dutifully printed each attempt as if it were the first.&lt;/p&gt;




&lt;p&gt;Here's the part I wish I'd seen at the time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Alerts fired:
  ALERT retry_loop
  ALERT failure_rate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;retry_loop&lt;/code&gt; trips when the same tool shows up 4+ times in the last 6 spans. &lt;code&gt;failure_rate&lt;/code&gt; trips when more than 20% of recent spans are errors. Both are on by default — I didn't have to pick a threshold or configure anything. The loop would have been visible from span #4, which is roughly 30 seconds before any of my users noticed.&lt;/p&gt;




&lt;p&gt;I don't think this is a "bad code" problem. Tools return &lt;code&gt;null&lt;/code&gt; sometimes. APIs go down. An agent that retries when it gets nothing back is, in some sense, doing the right thing. It just doesn't have a good intuition for when to stop. Mine certainly didn't.&lt;/p&gt;

&lt;p&gt;The actual problem is that I had no way to see the loop happening until someone messaged me about it. By then it had been going for a while.&lt;/p&gt;




&lt;p&gt;If you want to see what I mean, this takes about 30 seconds and doesn't need any API keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openclawwatch
ocw demo retry-loop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It shows the same scenario two ways - the &lt;code&gt;print()&lt;/code&gt; version (technically accurate, completely unhelpful) and the OCW version, where the alerts fire on their own a few spans in.&lt;/p&gt;




&lt;p&gt;Wiring it into a real agent is three lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ocw.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;patch_anthropic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;watch&lt;/span&gt;

&lt;span class="nf"&gt;patch_anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@watch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;  &lt;span class="c1"&gt;# your existing code, unchanged
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;ocw serve&lt;/code&gt; somewhere in the background. Then &lt;code&gt;ocw alerts&lt;/code&gt; tells you what's fired and &lt;code&gt;ocw traces&lt;/code&gt; gives you the full waterfall - every tool call, every latency, in order.&lt;/p&gt;

&lt;p&gt;It's local. No cloud, no signup, no account.&lt;/p&gt;




&lt;p&gt;The thing I keep coming back to is this: there's a real difference between an agent that retried four times because a tool returned null, and one that retried four times for no reason anyone can explain. From the outside they look identical. One is an infrastructure problem you can fix. The other is just... vibes. And vibes don't ship.&lt;/p&gt;

&lt;p&gt;I think people say "you can't trust agents in production" when what they actually mean is "I can't see what mine is doing." Those aren't the same problem. The first one is unsolvable. The second one is a Wednesday afternoon and a missing alert.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ocw demo retry-loop&lt;/code&gt; - go see for yourself.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part of the &lt;a href="https://github.com/Metabuilder-Labs/openclawwatch" rel="noopener noreferrer"&gt;Agent Incident Library&lt;/a&gt; - reproducible scenarios for the failures that don't show up in your logs.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
