<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AttestDojo</title>
    <description>The latest articles on DEV Community by AttestDojo (@attestdojo).</description>
    <link>https://dev.to/attestdojo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3976305%2F229879ea-7771-4ae6-8d2e-b86bfc0f17aa.jpg</url>
      <title>DEV Community: AttestDojo</title>
      <link>https://dev.to/attestdojo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/attestdojo"/>
    <language>en</language>
    <item>
      <title>Kaizen Harness: patterns for making AI agents reliable</title>
      <dc:creator>AttestDojo</dc:creator>
      <pubDate>Tue, 09 Jun 2026 16:02:17 +0000</pubDate>
      <link>https://dev.to/attestdojo/kaizen-harness-patterns-for-making-ai-agents-reliable-3cnb</link>
      <guid>https://dev.to/attestdojo/kaizen-harness-patterns-for-making-ai-agents-reliable-3cnb</guid>
      <description>&lt;p&gt;Kaizen Harness is a set of patterns for the system around an AI model: trajectory logging, verification, self-healing, and multi-model council debates.&lt;/p&gt;

&lt;p&gt;The idea is simple. When an agent makes a mistake, you don't rerun the prompt. You change the system so that class of mistake stops happening.&lt;/p&gt;

&lt;p&gt;I kept hitting the same problem: AI agents fail silently, claim success without evidence, and repeat mistakes they already made an hour ago. The model wasn't the issue. The system around the model was.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four patterns
&lt;/h2&gt;

&lt;p&gt;Each pattern is a single bash script or TypeScript file with a README. No framework, no install, copy what you want.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trajectory logger.&lt;/strong&gt; Append-only JSONL of every action: task, model, tools, success/failure, failure type. Bash + python3, nothing else. This is the memory you need before you can fix anything. Without a log you're guessing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verification.&lt;/strong&gt; Stop trusting the agent's "done." A script that checks exit codes, stdout patterns, and common error signatures (segfaults, connection refused, syntax errors) even when the exit code looks clean.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-healing.&lt;/strong&gt; Reads a failure memory log, fingerprints each failure, auto-patches the classes it recognizes with a post-fix verification, and escalates the ones it doesn't to a council. Rate-limited by fingerprint so it can't get stuck in a patch loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Council debate.&lt;/strong&gt; Fans a hard question out to 3-6 models in parallel, lets them disagree, then synthesizes the tradeoffs. Built around free models (OpenRouter free tier, Groq) and a local Ollama seat. Only the synthesis step touches a paid model, and only if you want it to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a runnable demo in the README showing self-healing on its own: it reads its failure memory, patches and verifies three known failure classes, and kicks the unknown one to a council. No API keys, no network.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we learned running it
&lt;/h2&gt;

&lt;p&gt;The honest part is in the "What We Learned" sections in each pattern's README:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An early self-heal applied patches and assumed they worked. One bad patch made things worse. Now verification after every fix is mandatory, and a failed fix escalates instead of looping.&lt;/li&gt;
&lt;li&gt;The council talked us out of adding Redis, microservices, and a message queue we didn't need at our scale.&lt;/li&gt;
&lt;li&gt;Without a trajectory log, you're just guessing what went wrong. Structured logging turned "the agent broke" into "the agent broke at step 3 with error class X."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The patterns are framework-agnostic. You can copy one in without adopting the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repo
&lt;/h2&gt;

&lt;p&gt;GitHub (MIT): &lt;a href="https://github.com/sarichan777/kaizen-harness" rel="noopener noreferrer"&gt;https://github.com/sarichan777/kaizen-harness&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is infrastructure we run, not a research artifact. Feedback and better patterns welcome, especially on the self-healing and verification pieces.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
