<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: keesan.eth</title>
    <description>The latest articles on DEV Community by keesan.eth (@cryptokeesan).</description>
    <link>https://dev.to/cryptokeesan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F918212%2F4a0730ab-6568-4e42-ae73-5088e9b37b59.jpg</url>
      <title>DEV Community: keesan.eth</title>
      <link>https://dev.to/cryptokeesan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cryptokeesan"/>
    <language>en</language>
    <item>
      <title>Why the retry loop is usually the expensive part of agent work</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Wed, 17 Jun 2026 01:20:28 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/why-the-retry-loop-is-usually-the-expensive-part-of-agent-work-1e35</link>
      <guid>https://dev.to/cryptokeesan/why-the-retry-loop-is-usually-the-expensive-part-of-agent-work-1e35</guid>
      <description>&lt;p&gt;The first failure usually is not the expensive one.&lt;/p&gt;

&lt;p&gt;The expensive part is what happens after the first failure when the system keeps trying, keeps spending, and keeps producing the same outcome because nothing about the situation changed.&lt;/p&gt;

&lt;p&gt;We kept running into a simple pattern: the agent would miss a step, the runtime would retry, the next attempt would see the same state, and the loop would repeat until the cost was visible in the bill or the operator log. At that point the problem stops being a model-quality issue and becomes a control-system issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the loop hurts more than the mistake
&lt;/h2&gt;

&lt;p&gt;A single bad step is recoverable. An unbounded retry loop compounds the mistake.&lt;/p&gt;

&lt;p&gt;That is true for token spend, API calls, and operator attention. It is also true for trust. Once a system gets a reputation for wandering, people stop letting it touch real work.&lt;/p&gt;

&lt;p&gt;The failure mode is boring, which is why it gets missed. Nobody looks at a happy-path demo and thinks about what happens after the third identical error. But that is where the real cost lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we tried first
&lt;/h2&gt;

&lt;p&gt;The obvious moves are usually the wrong ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;make the prompt longer&lt;/li&gt;
&lt;li&gt;add a generic retry&lt;/li&gt;
&lt;li&gt;increase the timeout&lt;/li&gt;
&lt;li&gt;let the model reason more&lt;/li&gt;
&lt;li&gt;rerun the same command with slightly different wording&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those changes can make a demo look better, but they do not fix a stuck loop.&lt;/p&gt;

&lt;p&gt;If the environment is unchanged, a retry is often just a second copy of the same mistake.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually worked
&lt;/h2&gt;

&lt;p&gt;The fix was not smarter language. It was stricter boundaries.&lt;/p&gt;

&lt;p&gt;We had to make the runtime answer four questions before it kept going:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is the budget?&lt;/li&gt;
&lt;li&gt;What counts as success?&lt;/li&gt;
&lt;li&gt;What is the verifier?&lt;/li&gt;
&lt;li&gt;What happens when the same failure repeats?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A small policy block is often enough to make that concrete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"budget_cap"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_attempts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stop_on_same_error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"require_verifier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"emit_receipt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That does not sound ambitious. That is the point.&lt;/p&gt;

&lt;p&gt;The biggest reliability gain came from refusing to treat repeated failure as progress. Once the runtime could detect the same blocker twice or three times in a row, it had permission to stop instead of pretending the next rerun would somehow be different.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why receipts matter
&lt;/h2&gt;

&lt;p&gt;Receipts turn a run from a vague story into a checkable fact.&lt;/p&gt;

&lt;p&gt;A receipt should show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the agent tried&lt;/li&gt;
&lt;li&gt;what changed&lt;/li&gt;
&lt;li&gt;what failed&lt;/li&gt;
&lt;li&gt;why the run stopped&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that, a loop can hide inside a confidence-generating summary. With it, you can see the exact stopping point and decide whether the next action should be a human intervention, a different tool, or no action at all.&lt;/p&gt;

&lt;p&gt;That is also why this kind of work ends up feeling less like prompt engineering and more like operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoff
&lt;/h2&gt;

&lt;p&gt;Stricter control means the system stops earlier.&lt;/p&gt;

&lt;p&gt;That can feel annoying when you want the agent to push through friction. But earlier stopping is cheaper than a long blind retry sequence. More importantly, it preserves operator trust.&lt;/p&gt;

&lt;p&gt;A bounded agent is less flashy than an agent that never gives up. It is also much more usable.&lt;/p&gt;

&lt;p&gt;That is the core of the control-layer approach we keep coming back to in MartinLoop: the runtime should know when to stop, when to ask for help, and when to write down what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we are watching next
&lt;/h2&gt;

&lt;p&gt;The next improvement is not more retries.&lt;/p&gt;

&lt;p&gt;It is better failure classification so the runtime can separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing permission&lt;/li&gt;
&lt;li&gt;stale state&lt;/li&gt;
&lt;li&gt;tool mismatch&lt;/li&gt;
&lt;li&gt;external outage&lt;/li&gt;
&lt;li&gt;real task completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When those are distinct, the system can choose a better next step instead of recycling the same command.&lt;/p&gt;

&lt;p&gt;That is the line between an agent that looks autonomous and an agent that is actually operable.&lt;/p&gt;

&lt;p&gt;What failure shape are you still letting your runtime retry too many times?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>automation</category>
    </item>
    <item>
      <title>The expensive part of an AI agent failure is usually the retry loop</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Sat, 13 Jun 2026 01:19:23 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/the-expensive-part-of-an-ai-agent-failure-is-usually-the-retry-loop-245b</link>
      <guid>https://dev.to/cryptokeesan/the-expensive-part-of-an-ai-agent-failure-is-usually-the-retry-loop-245b</guid>
      <description>&lt;p&gt;The first failure usually is not the expensive one.&lt;/p&gt;

&lt;p&gt;The expensive part is what happens after the first failure when the system keeps trying, keeps spending, and keeps producing the same outcome because nothing about the situation changed.&lt;/p&gt;

&lt;p&gt;We kept running into a simple pattern: the agent would miss a step, the runtime would retry, the next attempt would see the same state, and the loop would repeat until the cost was visible in the bill or the operator log. That is the point where the problem stops being a model-quality issue and becomes a control-system issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the loop hurts more than the mistake
&lt;/h2&gt;

&lt;p&gt;A single bad step is recoverable. An unbounded retry loop compounds the mistake.&lt;/p&gt;

&lt;p&gt;That is true for token spend, API calls, and operator attention. It is also true for trust. Once a system gets a reputation for wandering, people stop letting it touch real work.&lt;/p&gt;

&lt;p&gt;The failure mode is boring, which is why it gets missed. Nobody looks at a happy-path demo and thinks about what happens after the third identical error. But that is where the real cost lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we tried first
&lt;/h2&gt;

&lt;p&gt;The obvious moves are usually the wrong ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;make the prompt longer&lt;/li&gt;
&lt;li&gt;add a generic retry&lt;/li&gt;
&lt;li&gt;increase the timeout&lt;/li&gt;
&lt;li&gt;let the model "reason more"&lt;/li&gt;
&lt;li&gt;rerun the same command with slightly different wording&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those changes can make a demo look better, but they do not fix a stuck loop.&lt;/p&gt;

&lt;p&gt;If the environment is unchanged, a retry is often just a second copy of the same mistake.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually worked
&lt;/h2&gt;

&lt;p&gt;The fix was not smarter language. It was stricter boundaries.&lt;/p&gt;

&lt;p&gt;We had to make the runtime answer four questions before it kept going:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What is the budget?&lt;/li&gt;
&lt;li&gt;What counts as success?&lt;/li&gt;
&lt;li&gt;What is the verifier?&lt;/li&gt;
&lt;li&gt;What happens when the same failure repeats?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A small policy block is often enough to make this concrete:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"budget_cap"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;250&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_attempts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stop_on_same_error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"require_verifier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"emit_receipt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That does not sound ambitious. That is the point.&lt;/p&gt;

&lt;p&gt;The biggest reliability gain came from refusing to treat repeated failure as progress. Once the runtime could detect the same blocker twice or three times in a row, it had permission to stop instead of pretending the next rerun would somehow be different.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why receipts matter
&lt;/h2&gt;

&lt;p&gt;Receipts turn a run from a vague story into a checkable fact.&lt;/p&gt;

&lt;p&gt;A receipt should show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the agent tried&lt;/li&gt;
&lt;li&gt;what changed&lt;/li&gt;
&lt;li&gt;what failed&lt;/li&gt;
&lt;li&gt;why the run stopped&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that, a loop can hide inside a confidence-generating summary. With it, you can see the exact stopping point and decide whether the next action should be a human intervention, a different tool, or no action at all.&lt;/p&gt;

&lt;p&gt;That is also why this kind of work ends up feeling less like prompt engineering and more like operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoff
&lt;/h2&gt;

&lt;p&gt;Stricter control means the system stops earlier.&lt;/p&gt;

&lt;p&gt;That can feel annoying when you want the agent to push through friction. But earlier stopping is cheaper than a long blind retry sequence. More importantly, it preserves operator trust.&lt;/p&gt;

&lt;p&gt;A bounded agent is less flashy than an agent that "never gives up." It is also much more usable.&lt;/p&gt;

&lt;p&gt;That is the core of the control-layer approach we keep coming back to in MartinLoop: the runtime should know when to stop, when to ask for help, and when to write down what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we are watching next
&lt;/h2&gt;

&lt;p&gt;The next improvement is not more retries.&lt;/p&gt;

&lt;p&gt;It is better failure classification so the runtime can separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing permission&lt;/li&gt;
&lt;li&gt;stale state&lt;/li&gt;
&lt;li&gt;tool mismatch&lt;/li&gt;
&lt;li&gt;external outage&lt;/li&gt;
&lt;li&gt;real task completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When those are distinct, the system can choose a better next step instead of recycling the same command.&lt;/p&gt;

&lt;p&gt;That is the line between an agent that looks autonomous and an agent that is actually operable.&lt;/p&gt;

&lt;p&gt;What failure shape are you still letting your runtime retry too many times?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The most expensive AI agent failures are boring</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Fri, 05 Jun 2026 06:07:11 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/the-most-expensive-ai-agent-failures-are-boring-5042</link>
      <guid>https://dev.to/cryptokeesan/the-most-expensive-ai-agent-failures-are-boring-5042</guid>
      <description>&lt;p&gt;Most AI coding agent failures are boring.&lt;/p&gt;

&lt;p&gt;Not dramatic.&lt;br&gt;
Not cinematic.&lt;br&gt;
Just the same wrong step repeated until the bill gets weird and someone asks what happened.&lt;/p&gt;

&lt;p&gt;That is why I think the most important control is not “use a cheaper model.”&lt;br&gt;
It is “before another retry, show what changed.”&lt;/p&gt;

&lt;p&gt;If nothing changed, stop.&lt;/p&gt;

&lt;p&gt;That one rule kills a surprising amount of fake progress.&lt;/p&gt;

&lt;p&gt;The other three controls I would put in early are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;a hard budget cap&lt;/li&gt;
&lt;li&gt;one real verification gate&lt;/li&gt;
&lt;li&gt;a receipt that explains why the run stopped&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the class of problem we have been working on with MartinLoop.&lt;/p&gt;

&lt;p&gt;Not making agents feel magical.&lt;br&gt;
Making them easier to trust when the loop gets messy.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AI coding agents don't fail because they're dumb. They fail because they don't know when to stop.</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Wed, 03 Jun 2026 16:05:07 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/ai-coding-agents-dont-fail-because-theyre-dumb-they-fail-because-they-dont-know-when-to-stop-4b7c</link>
      <guid>https://dev.to/cryptokeesan/ai-coding-agents-dont-fail-because-theyre-dumb-they-fail-because-they-dont-know-when-to-stop-4b7c</guid>
      <description>&lt;p&gt;Yesterday we launched MartinLoop on Product Hunt.&lt;/p&gt;

&lt;p&gt;The biggest thing we keep seeing with AI coding agents is simple:&lt;/p&gt;

&lt;p&gt;They do not fail because they are "bad at coding."&lt;/p&gt;

&lt;p&gt;They fail because they do not know when to stop.&lt;/p&gt;

&lt;p&gt;That creates a very specific kind of pain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the same mistake gets retried over and over&lt;/li&gt;
&lt;li&gt;a small bug turns into a weirdly expensive afternoon&lt;/li&gt;
&lt;li&gt;someone still has to explain what happened after the run is over&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the whole reason we built MartinLoop.&lt;/p&gt;

&lt;p&gt;The job is not to make an agent feel smarter.&lt;br&gt;
The job is to give it a budget, a finish line, and a receipt.&lt;/p&gt;

&lt;p&gt;The pattern we keep hearing from teams is basically:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"It wasn't one catastrophic failure. It was 40 small dumb retries that nobody caught fast enough."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is a systems problem, not a prompt problem.&lt;/p&gt;

&lt;p&gt;If you are using coding agents already, the 3 controls that matter most are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A hard budget cap before the run starts.&lt;/li&gt;
&lt;li&gt;A real verification gate before the run counts as done.&lt;/li&gt;
&lt;li&gt;A receipt you can read later when somebody asks, "why did this cost so much?"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If that pain sounds familiar, that is exactly what we are working on.&lt;/p&gt;

&lt;p&gt;If you want to support the Product Hunt launch, I would appreciate it.&lt;br&gt;
More importantly, I would love to hear the story of the most annoying AI-agent failure you have seen in the wild.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>If your coding agent can retry forever, it will</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Tue, 02 Jun 2026 04:59:17 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/if-your-coding-agent-can-retry-forever-it-will-2e26</link>
      <guid>https://dev.to/cryptokeesan/if-your-coding-agent-can-retry-forever-it-will-2e26</guid>
      <description>&lt;p&gt;If an AI coding agent can keep retrying with no budget cap, no finish line, and no check before it exits, the problem is not the model.&lt;/p&gt;

&lt;p&gt;The problem is the missing operating system around it.&lt;/p&gt;

&lt;p&gt;Three simple things make a huge difference:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Put a real dollar cap on the run.&lt;/li&gt;
&lt;li&gt;Require one clear verification step before calling it done.&lt;/li&gt;
&lt;li&gt;Keep a receipt of what actually happened.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most teams do not need more autonomy.&lt;br&gt;
They need a clean stop condition.&lt;/p&gt;

&lt;p&gt;That is the difference between a helpful agent and a very expensive loop.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>What Actually Makes Social Automation Reliable</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Sun, 31 May 2026 20:03:46 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/what-actually-makes-social-automation-reliable-4g1m</link>
      <guid>https://dev.to/cryptokeesan/what-actually-makes-social-automation-reliable-4g1m</guid>
      <description>&lt;p&gt;A reliable social automation stack is not built by stacking more retries on top of brittle behavior.&lt;/p&gt;

&lt;p&gt;The durable pattern is simpler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use official APIs where they exist&lt;/li&gt;
&lt;li&gt;keep browser execution as a controlled fallback&lt;/li&gt;
&lt;li&gt;require both a receipt and a verified postcondition before counting a run&lt;/li&gt;
&lt;li&gt;fail closed when the platform state does not match the reported result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That discipline matters more than raw surface area. A smaller set of lanes with honest verification is worth more than a wider setup that quietly reports false success.&lt;/p&gt;

</description>
      <category>automation</category>
      <category>devtools</category>
      <category>api</category>
    </item>
    <item>
      <title>Receipts beat scheduled optimism</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Sun, 31 May 2026 20:00:30 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/receipts-beat-scheduled-optimism-1c5d</link>
      <guid>https://dev.to/cryptokeesan/receipts-beat-scheduled-optimism-1c5d</guid>
      <description>&lt;h1&gt;
  
  
  Receipts beat scheduled optimism
&lt;/h1&gt;

&lt;p&gt;The fastest way to lose trust in an automation is to mistake a schedule for a result.&lt;/p&gt;

&lt;p&gt;We have been rebuilding our execution stack around one rule: if a worker cannot show the exact action it took or the exact blocker it hit, it did not finish the job.&lt;/p&gt;

&lt;p&gt;That has forced us to simplify a lot. Fewer lanes. Better proofs. More honest failure states.&lt;/p&gt;

&lt;p&gt;The upside is that the system gets easier to trust once every action has to survive real verification.&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>automation</category>
      <category>opensource</category>
    </item>
    <item>
      <title>MartinLoop: a control plane for AI coding agents</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Wed, 27 May 2026 01:39:14 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/martinloop-a-control-plane-for-ai-coding-agents-3dg5</link>
      <guid>https://dev.to/cryptokeesan/martinloop-a-control-plane-for-ai-coding-agents-3dg5</guid>
      <description>&lt;h1&gt;
  
  
  MartinLoop
&lt;/h1&gt;

&lt;p&gt;MartinLoop is an open-source control plane for AI coding agents.&lt;/p&gt;

&lt;p&gt;It adds hard budget stops, JSONL run records, and verify-gated completion so autonomous coding stays accountable.&lt;/p&gt;

&lt;p&gt;We built it because agent loops are powerful, but most teams still do not have enough control over cost, retries, or proof of completion.&lt;/p&gt;

&lt;p&gt;If you are using AI coding agents in production, I would love to hear how you are handling governance, cost ceilings, and verification.&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AI Coding Agents Are Burning Budgets. The Next Layer Is Control</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Tue, 12 May 2026 01:08:29 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/ai-coding-agents-are-burning-budgets-the-next-layer-is-control-1eah</link>
      <guid>https://dev.to/cryptokeesan/ai-coding-agents-are-burning-budgets-the-next-layer-is-control-1eah</guid>
      <description>&lt;h2&gt;
  
  
  AI coding agents are becoming useful, but they still burn budgets, loop on bad strategies, and finish without enough evidence. The next layer is trace intelligence, model routing, and control."
&lt;/h2&gt;

&lt;h1&gt;
  
  
  AI Coding Agents Are Burning Budgets. The Next Layer Is Control.
&lt;/h1&gt;

&lt;p&gt;AI coding agents are getting better.&lt;/p&gt;

&lt;p&gt;They can read a repo, edit files, run tests, inspect errors, and try again.&lt;/p&gt;

&lt;p&gt;That is useful.&lt;/p&gt;

&lt;p&gt;But the problem showing up in real workflows is not just whether agents can write code.&lt;/p&gt;

&lt;p&gt;The problem is that agents can spend budget without producing finished work.&lt;/p&gt;

&lt;p&gt;They loop.&lt;/p&gt;

&lt;p&gt;They retry weak strategies.&lt;/p&gt;

&lt;p&gt;They switch files without explaining why.&lt;/p&gt;

&lt;p&gt;They chase unrelated errors.&lt;/p&gt;

&lt;p&gt;They claim completion without enough proof.&lt;/p&gt;

&lt;p&gt;And when the run ends, the human still has to ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What actually happened?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the gap the next generation of agent infrastructure has to solve.&lt;/p&gt;

&lt;p&gt;Not more autonomy first.&lt;/p&gt;

&lt;p&gt;Control first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wvfdsegi3ko8d4ht9ol.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wvfdsegi3ko8d4ht9ol.jpg" alt=" " width="800" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Is Not Just Bad Code
&lt;/h2&gt;

&lt;p&gt;A bad patch is easy to see.&lt;/p&gt;

&lt;p&gt;A bad agent run is harder.&lt;/p&gt;

&lt;p&gt;The agent may do a lot of work that looks productive:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read many files&lt;/li&gt;
&lt;li&gt;generate a long plan&lt;/li&gt;
&lt;li&gt;edit several modules&lt;/li&gt;
&lt;li&gt;run commands&lt;/li&gt;
&lt;li&gt;inspect failures&lt;/li&gt;
&lt;li&gt;produce a confident summary&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But at the end, the task is still not done.&lt;/p&gt;

&lt;p&gt;The budget is gone.&lt;/p&gt;

&lt;p&gt;The repo is messy.&lt;/p&gt;

&lt;p&gt;The logs are unclear.&lt;/p&gt;

&lt;p&gt;The next engineer has to reconstruct the run from fragments.&lt;/p&gt;

&lt;p&gt;This is why agentic coding needs a better unit of accountability.&lt;/p&gt;

&lt;p&gt;Not just the final diff.&lt;/p&gt;

&lt;p&gt;The full trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trace Becomes The Product
&lt;/h2&gt;

&lt;p&gt;A coding agent trace should not be an afterthought.&lt;/p&gt;

&lt;p&gt;It should be the primary artifact of the run.&lt;/p&gt;

&lt;p&gt;A useful trace answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the agent try first?&lt;/li&gt;
&lt;li&gt;Where did it get stuck?&lt;/li&gt;
&lt;li&gt;Which files did it touch?&lt;/li&gt;
&lt;li&gt;Which commands did it run?&lt;/li&gt;
&lt;li&gt;Which verifier failed?&lt;/li&gt;
&lt;li&gt;Did it repeat the same strategy?&lt;/li&gt;
&lt;li&gt;Did it switch models?&lt;/li&gt;
&lt;li&gt;Did it exceed budget?&lt;/li&gt;
&lt;li&gt;Why did it stop?&lt;/li&gt;
&lt;li&gt;What should a human do next?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what I think of as &lt;strong&gt;trace intelligence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not just raw logs.&lt;/p&gt;

&lt;p&gt;Not just token usage.&lt;/p&gt;

&lt;p&gt;Not just a transcript.&lt;/p&gt;

&lt;p&gt;Trace intelligence means turning the run into something a human, system, or second agent can reason about.&lt;/p&gt;

&lt;p&gt;The trace should explain the work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Model Routing Matters
&lt;/h2&gt;

&lt;p&gt;Most agent workflows still treat model choice too casually.&lt;/p&gt;

&lt;p&gt;One model may be good at planning.&lt;/p&gt;

&lt;p&gt;Another may be better at code edits.&lt;/p&gt;

&lt;p&gt;Another may be cheaper for search, summarization, or test-output analysis.&lt;/p&gt;

&lt;p&gt;Another may be stronger for final review.&lt;/p&gt;

&lt;p&gt;But without a control layer, model routing becomes guesswork.&lt;/p&gt;

&lt;p&gt;A better system should ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this step worth a premium model?&lt;/li&gt;
&lt;li&gt;Can a cheaper model classify this failure?&lt;/li&gt;
&lt;li&gt;Should a stronger model review the plan before execution?&lt;/li&gt;
&lt;li&gt;Should the run downgrade when budget is tight?&lt;/li&gt;
&lt;li&gt;Should the run escalate when repeated failures appear?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Model routing should not just optimize quality.&lt;/p&gt;

&lt;p&gt;It should optimize quality within budget.&lt;/p&gt;

&lt;p&gt;That matters because the most painful agent failure is not always wrong code.&lt;/p&gt;

&lt;p&gt;Sometimes it is expensive unfinished work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Headless Agents Need More Guardrails, Not Fewer
&lt;/h2&gt;

&lt;p&gt;Headless coding agents are especially interesting.&lt;/p&gt;

&lt;p&gt;They can run without a constant human in the loop.&lt;/p&gt;

&lt;p&gt;They can process tasks, inspect repos, execute commands, and produce outputs asynchronously.&lt;/p&gt;

&lt;p&gt;That is powerful.&lt;/p&gt;

&lt;p&gt;But headless execution increases the need for control.&lt;/p&gt;

&lt;p&gt;If an agent is running without a developer watching every step, the system needs stronger answers to basic questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is this agent allowed to do?&lt;/li&gt;
&lt;li&gt;What budget can it spend?&lt;/li&gt;
&lt;li&gt;What commands are blocked?&lt;/li&gt;
&lt;li&gt;What verifier defines success?&lt;/li&gt;
&lt;li&gt;When should it stop?&lt;/li&gt;
&lt;li&gt;When should it ask for approval?&lt;/li&gt;
&lt;li&gt;What trace does it leave behind?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The more autonomous the workflow becomes, the more important the control layer becomes.&lt;/p&gt;

&lt;p&gt;Autonomy without traceability is not leverage.&lt;/p&gt;

&lt;p&gt;It is invisible execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Teams Make The Problem Bigger
&lt;/h2&gt;

&lt;p&gt;The next step is not one agent.&lt;/p&gt;

&lt;p&gt;It is teams of agents.&lt;/p&gt;

&lt;p&gt;A planner agent.&lt;/p&gt;

&lt;p&gt;A coding agent.&lt;/p&gt;

&lt;p&gt;A reviewer agent.&lt;/p&gt;

&lt;p&gt;A test agent.&lt;/p&gt;

&lt;p&gt;A documentation agent.&lt;/p&gt;

&lt;p&gt;A security agent.&lt;/p&gt;

&lt;p&gt;A release agent.&lt;/p&gt;

&lt;p&gt;That sounds useful, but it also creates a new coordination problem.&lt;/p&gt;

&lt;p&gt;If one agent produces a bad plan, another may execute it.&lt;/p&gt;

&lt;p&gt;If the reviewer misses the issue, the system may mark the run complete.&lt;/p&gt;

&lt;p&gt;If the test agent checks the wrong verifier, the whole workflow may look successful while still being wrong.&lt;/p&gt;

&lt;p&gt;Agent-to-agent workflows need shared state, shared budgets, shared traces, and shared stop conditions.&lt;/p&gt;

&lt;p&gt;Otherwise, teams of agents can become teams of budget-burning loops.&lt;/p&gt;

&lt;p&gt;The question becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Who governs the team?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is where a control layer becomes necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MartinLoop 360 Is Pointing Toward
&lt;/h2&gt;

&lt;p&gt;The direction I am exploring with MartinLoop is a control layer for agentic coding workflows.&lt;/p&gt;

&lt;p&gt;The current idea is simple:&lt;/p&gt;

&lt;p&gt;Every agent run should be bounded, inspectable, and test-verifiable.&lt;/p&gt;

&lt;p&gt;The next layer expands that into a broader loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trace intelligence&lt;/strong&gt; to understand what happened during a run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model routing&lt;/strong&gt; to choose the right model for the right step&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HeadlessOS&lt;/strong&gt; for controlled background execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MartinLoop 360&lt;/strong&gt; as a higher-level view of agent runs, budgets, traces, policies, and outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not to make agents look more magical.&lt;/p&gt;

&lt;p&gt;The goal is to make them easier to trust.&lt;/p&gt;

&lt;p&gt;If an agent burns budget and fails, that should be visible.&lt;/p&gt;

&lt;p&gt;If an agent loops, that should be classified.&lt;/p&gt;

&lt;p&gt;If an agent completes a task, that should be verified.&lt;/p&gt;

&lt;p&gt;If multiple agents collaborate, the team should leave one coherent trace.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Loop
&lt;/h2&gt;

&lt;p&gt;A governed agent workflow should look less like this:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
text
Prompt → Agent runs → Agent says done

I’m exploring these ideas while building MartinLoop, an open-source control layer for AI coding agents.

GitHub: https://github.com/Keesan12/Martin-Loop 

Website: https://martinloop.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>devops</category>
      <category>opensource</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI coding agents need receipts, not just better prompts</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Mon, 11 May 2026 17:46:15 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/ai-coding-agents-need-receipts-not-just-better-prompts-838</link>
      <guid>https://dev.to/cryptokeesan/ai-coding-agents-need-receipts-not-just-better-prompts-838</guid>
      <description>&lt;p&gt;AI coding agents are getting good enough to run real engineering tasks, but not safe enough to run without guardrails.&lt;/p&gt;

&lt;p&gt;The failure mode is not always dramatic.&lt;/p&gt;

&lt;p&gt;Sometimes the agent just keeps working.&lt;/p&gt;

&lt;p&gt;It retries.&lt;br&gt;
It rewrites.&lt;br&gt;
It spends tokens.&lt;br&gt;
It changes files.&lt;br&gt;
It says it is done.&lt;/p&gt;

&lt;p&gt;Then another engineer opens the diff and realizes the agent solved the wrong problem.&lt;/p&gt;

&lt;p&gt;That creates a new engineering question:&lt;/p&gt;

&lt;p&gt;Can another engineer audit this run later?&lt;/p&gt;

&lt;p&gt;That is why I’m building MartinLoop.&lt;/p&gt;

&lt;p&gt;MartinLoop is an open-source control plane for AI coding agents. The goal is to make every agent run bounded, inspectable, and test-verifiable.&lt;/p&gt;

&lt;p&gt;The first version focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hard budget caps&lt;/li&gt;
&lt;li&gt;JSONL run records&lt;/li&gt;
&lt;li&gt;audit trails&lt;/li&gt;
&lt;li&gt;failure classification&lt;/li&gt;
&lt;li&gt;test-verified completion&lt;/li&gt;
&lt;li&gt;reproducible agent runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The thesis is simple:&lt;/p&gt;

&lt;p&gt;The next layer of AI coding is not only better prompts.&lt;/p&gt;

&lt;p&gt;It is governance.&lt;/p&gt;

&lt;p&gt;Before agents touch serious repos, teams need receipts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what the agent tried&lt;/li&gt;
&lt;li&gt;what it changed&lt;/li&gt;
&lt;li&gt;how much it spent&lt;/li&gt;
&lt;li&gt;what commands it ran&lt;/li&gt;
&lt;li&gt;what tests passed&lt;/li&gt;
&lt;li&gt;what failed&lt;/li&gt;
&lt;li&gt;why it stopped&lt;/li&gt;
&lt;li&gt;whether a human can resume, revert, or rerun it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m looking for feedback from developers using Claude Code, Codex, Cursor, Devin-style agents, or custom coding agents in real repos.&lt;/p&gt;

&lt;p&gt;What would you want in the default “agent receipt”?&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Keesan12/Martin-Loop" rel="noopener noreferrer"&gt;https://github.com/Keesan12/Martin-Loop&lt;/a&gt;&lt;br&gt;
Site: &lt;a href="https://martinloop.com" rel="noopener noreferrer"&gt;https://martinloop.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>Come Build on Concordium</title>
      <dc:creator>keesan.eth</dc:creator>
      <pubDate>Tue, 30 Aug 2022 19:57:03 +0000</pubDate>
      <link>https://dev.to/cryptokeesan/come-build-on-concordium-2jgh</link>
      <guid>https://dev.to/cryptokeesan/come-build-on-concordium-2jgh</guid>
      <description>&lt;p&gt;Concordium Blockchain is the only public blockchain with a privacy based ID-layer at the protocol level built using RUST. It is the only blockchain with user attributes accessible from Smart Contracts built to be enterprise grade and compliant by nature. &lt;/p&gt;

&lt;p&gt;We welcome all #rustdevs to test us out and help us build out our bounties as we look to create new tools for integrations, interoperability, and Dapps. &lt;/p&gt;

&lt;p&gt;The blockchain for the future has arrived!&lt;/p&gt;

</description>
      <category>rust</category>
      <category>blockchain</category>
      <category>webdev</category>
      <category>bounties</category>
    </item>
  </channel>
</rss>
