<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: DtoTHEmoon</title>
    <description>The latest articles on DEV Community by DtoTHEmoon (@dtothemoon).</description>
    <link>https://dev.to/dtothemoon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3955327%2F7001497f-1267-4c21-8c0d-30c63c86a629.png</url>
      <title>DEV Community: DtoTHEmoon</title>
      <link>https://dev.to/dtothemoon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dtothemoon"/>
    <language>en</language>
    <item>
      <title>RAG vs Agent: The Decision That Broke My System (And How I Now Enforce It Upfront)</title>
      <dc:creator>DtoTHEmoon</dc:creator>
      <pubDate>Mon, 01 Jun 2026 02:14:44 +0000</pubDate>
      <link>https://dev.to/dtothemoon/rag-vs-agent-the-decision-that-broke-my-system-and-how-i-now-enforce-it-upfront-oel</link>
      <guid>https://dev.to/dtothemoon/rag-vs-agent-the-decision-that-broke-my-system-and-how-i-now-enforce-it-upfront-oel</guid>
      <description>&lt;p&gt;Most people treat the RAG-vs-Agent question as a technical preference. Pick whichever feels right, adjust later.&lt;/p&gt;

&lt;p&gt;I did that. It cost me two full rebuilds.&lt;/p&gt;

&lt;p&gt;Here's the decision framework I've landed on — and the tool I built to enforce it before the first line of code gets written.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Mistake: Treating Architecture as Reversible
&lt;/h2&gt;

&lt;p&gt;I was building GrowthOS, a four-module internal talent development platform. When I hit module three — personalized learning path generation — I reached for RAG out of habit. I'd just built a solid RAG knowledge base in module one. The pattern was familiar.&lt;/p&gt;

&lt;p&gt;Six days in, I had a retrieval system that could surface relevant learning materials. What it couldn't do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read an employee's current skill profile&lt;/li&gt;
&lt;li&gt;Analyze which specific gaps needed closing&lt;/li&gt;
&lt;li&gt;Decide the optimal sequencing given available time&lt;/li&gt;
&lt;li&gt;Monitor whether the employee's behavior changed after completing a path&lt;/li&gt;
&lt;li&gt;Trigger re-planning when skills shifted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;RAG returned documents. The task required &lt;em&gt;decisions across time&lt;/em&gt;. I had picked the wrong primitive, and the cost was a rebuild.&lt;/p&gt;

&lt;p&gt;The deeper problem: I had no forcing function that made me answer the architecture question before building.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Decision Framework
&lt;/h2&gt;

&lt;p&gt;After rebuilding twice, I reduced the RAG-vs-Agent decision to three diagnostic questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 1: Is this a retrieval task or an execution task?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RAG is fundamentally a retrieval primitive: given a query, find and synthesize relevant content. It's excellent when the output is &lt;em&gt;information&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Agent is an execution primitive: given a goal, take a sequence of actions using tools. It's necessary when the output is &lt;em&gt;a decision or a state change&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The confusion happens because modern RAG pipelines can feel agentic — they chunk, embed, retrieve, rerank, generate. But all of that complexity is still in service of answering a question, not executing a workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 2: Does the task require maintaining state across multiple steps?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If yes, you need Agent.&lt;/p&gt;

&lt;p&gt;RAG is stateless by design. Each query is independent. You can build workarounds — storing context, chaining queries — but you're fighting the architecture.&lt;/p&gt;

&lt;p&gt;Agent is stateful by design. It maintains context, tracks intermediate results, and can loop back based on what it finds.&lt;/p&gt;

&lt;p&gt;For GrowthOS module three, the path generation workflow looked like this:&lt;br&gt;
read_profile(employee_id)&lt;br&gt;
→ analyze_skill_gap(profile, target_role)&lt;br&gt;
→ search_materials(gap_list)&lt;br&gt;
→ generate_path(gaps, materials, available_time)&lt;br&gt;
→ monitor_progress(employee_id, path)  ← runs continuously&lt;br&gt;
→ trigger_replan(if behavior_signal_detected)&lt;/p&gt;

&lt;p&gt;Each arrow is a tool call that depends on the result of the previous one. This is Agent territory, not RAG.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question 3: What is the cost of getting this wrong?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;RAG failure modes are usually visible and recoverable: the answer is wrong or incomplete, the user notices, you fix the retrieval. Time cost, not catastrophic.&lt;/p&gt;

&lt;p&gt;Agent failure modes can be silent and compounding: the agent takes the wrong action, downstream steps build on that error, you find out six steps later. Or you don't find out until a user hits it in production.&lt;/p&gt;

&lt;p&gt;This asymmetry should directly affect how much upfront rigor you apply to the architecture decision. The higher the cost of failure, the more you need to be certain before you build.&lt;/p&gt;


&lt;h2&gt;
  
  
  The GrowthOS Module Breakdown
&lt;/h2&gt;

&lt;p&gt;Running all four modules through this framework makes the pattern clear:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Stateful?&lt;/th&gt;
&lt;th&gt;Failure Cost&lt;/th&gt;
&lt;th&gt;Decision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Module 1: Knowledge base&lt;/td&gt;
&lt;td&gt;Answer questions about docs&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Low (visible)&lt;/td&gt;
&lt;td&gt;RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Module 2: Skill profiling&lt;/td&gt;
&lt;td&gt;Compute tags from behavior events&lt;/td&gt;
&lt;td&gt;No (batch job)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Rules engine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Module 3: Learning paths&lt;/td&gt;
&lt;td&gt;Generate + monitor + replan&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;High (silent drift)&lt;/td&gt;
&lt;td&gt;Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Module 4: Tracking + flywheel&lt;/td&gt;
&lt;td&gt;Detect signals, update weights&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Hybrid&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The interesting case is module two. You might expect a skill-tagging system to use RAG or Agent, but the task is actually deterministic: behavior events map to skill weights via defined rules, decay runs on a schedule, nothing requires LLM inference. A rules engine with a cron job is more reliable and cheaper than an LLM call for every event.&lt;/p&gt;

&lt;p&gt;Over-reaching for AI where deterministic logic is sufficient is one of the most common and expensive mistakes in production systems. The question isn't "can AI do this?" but "does this task actually require AI?"&lt;/p&gt;


&lt;h2&gt;
  
  
  The Enforcement Problem
&lt;/h2&gt;

&lt;p&gt;Knowing the framework doesn't help if you don't apply it at the right moment. The right moment is &lt;em&gt;before you write any code&lt;/em&gt; — at the point where the architecture is still a decision, not a sunk cost.&lt;/p&gt;

&lt;p&gt;In practice, most developers (myself included) reach the architecture question after they've already started building. The pattern looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start implementing a feature&lt;/li&gt;
&lt;li&gt;Realize something isn't working&lt;/li&gt;
&lt;li&gt;Debug for hours&lt;/li&gt;
&lt;li&gt;Eventually diagnose a fundamental architecture mismatch&lt;/li&gt;
&lt;li&gt;Rebuild&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What I needed was something that forced the decision &lt;em&gt;earlier&lt;/em&gt; — ideally the moment I started describing a new module or feature, before the first tool call.&lt;/p&gt;

&lt;p&gt;This is the problem Rein is designed to solve.&lt;/p&gt;


&lt;h2&gt;
  
  
  How Rein Enforces Upfront Architecture Decisions
&lt;/h2&gt;

&lt;p&gt;Rein is an open-source Skill for Claude Code that monitors your development conversations and intervenes at specific diagnostic moments.&lt;/p&gt;

&lt;p&gt;For architecture decisions, Rein's Q1 layer (SPEC) enforces a constraint: before any implementation work begins on a feature involving data retrieval or automated decision-making, the SPEC must answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is the output type? (information vs decision vs state change)&lt;/li&gt;
&lt;li&gt;Does the task require state across multiple steps?&lt;/li&gt;
&lt;li&gt;What is the failure mode and its cost?&lt;/li&gt;
&lt;li&gt;Which primitive does this map to: rules engine / RAG / single Agent / multi-Agent?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you start describing an implementation without these questions answered, Rein surfaces them. Not as a checklist — as targeted questions based on what you've described.&lt;/p&gt;

&lt;p&gt;The second enforcement point is Q4 (verification scripts). Architecture decisions aren't just written down; they're verified. Before module three was considered "done," verify.sh included:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;check &lt;span class="s2"&gt;"PathAgent tool list matches SPEC"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"grep -c 'def get_employee_profile&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;def analyze_skill_gap&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;def search_learning_materials&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;def generate_learning_path&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;def monitor_progress' agent/path_agent.py | grep -q '^5&lt;/span&gt;&lt;span class="nv"&gt;$'&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

check &lt;span class="s2"&gt;"MonitorAgent runs on schedule"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"grep -q 'monitor_agent&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;schedule&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;cron' backend/main.py"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the implementation drifts from the SPEC, the gate fails. You find out immediately, not in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Silence Rule
&lt;/h2&gt;

&lt;p&gt;One design principle worth noting: Rein is silent when there's nothing to flag.&lt;/p&gt;

&lt;p&gt;This matters because most Harness tooling errs toward verbosity — warning about everything, asking for confirmation constantly, inserting itself into every decision. The overhead degrades the development experience until you start ignoring it.&lt;/p&gt;

&lt;p&gt;Rein's trigger conditions are narrow and specific. For architecture decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trigger: you describe a new feature involving retrieval or automated decisions, without a SPEC that answers the three diagnostic questions&lt;/li&gt;
&lt;li&gt;No trigger: you're implementing a feature with a clear SPEC already written&lt;/li&gt;
&lt;li&gt;No trigger: you're debugging, refactoring, or working on UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the 16-scenario benchmark, Rein triggered on 100% of cases where intervention was warranted and stayed silent on 100% of cases where it wasn't. The silence test is as important as the trigger test.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Practical Takeaway
&lt;/h2&gt;

&lt;p&gt;If you're building an AI system and haven't explicitly answered these three questions for every component, you're accumulating architecture debt that compounds:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is the output information, or a decision/state change?&lt;/li&gt;
&lt;li&gt;Does the task require state across multiple steps?&lt;/li&gt;
&lt;li&gt;What's the cost if this is wrong?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The answers don't have to be permanent — architectures evolve as requirements change. But they need to exist before you build, not after you've rebuilt twice.&lt;/p&gt;

&lt;p&gt;RAG and Agent are not interchangeable tools on a gradient. They're different primitives for different problem shapes. Getting the match right early is one of the highest-leverage decisions in AI system design.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Rein&lt;/strong&gt; is open source: &lt;a href="https://github.com/DtoTHEmoon/rein-skill" rel="noopener noreferrer"&gt;github.com/DtoTHEmoon/rein-skill&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/DtoTHEmoon/rein-skill.git ~/.claude/skills/rein
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>programming</category>
      <category>claude</category>
      <category>security</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>DtoTHEmoon</dc:creator>
      <pubDate>Thu, 28 May 2026 02:03:01 +0000</pubDate>
      <link>https://dev.to/dtothemoon/-3p6b</link>
      <guid>https://dev.to/dtothemoon/-3p6b</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/dtothemoon/why-your-ai-agent-keeps-making-the-same-mistakes-its-not-the-model-3pl" class="crayons-story__hidden-navigation-link"&gt;Why Your AI Agent Keeps Making the Same Mistakes (It's Not the Model)&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/dtothemoon" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3955327%2F7001497f-1267-4c21-8c0d-30c63c86a629.png" alt="dtothemoon profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/dtothemoon" class="crayons-story__secondary fw-medium m:hidden"&gt;
              DtoTHEmoon
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                DtoTHEmoon
                
              
              &lt;div id="story-author-preview-content-3767092" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/dtothemoon" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3955327%2F7001497f-1267-4c21-8c0d-30c63c86a629.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;DtoTHEmoon&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/dtothemoon/why-your-ai-agent-keeps-making-the-same-mistakes-its-not-the-model-3pl" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 27&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/dtothemoon/why-your-ai-agent-keeps-making-the-same-mistakes-its-not-the-model-3pl" id="article-link-3767092"&gt;
          Why Your AI Agent Keeps Making the Same Mistakes (It's Not the Model)
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/claude"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;claude&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agentaichallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agentaichallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/chatgpt"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;chatgpt&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/dtothemoon/why-your-ai-agent-keeps-making-the-same-mistakes-its-not-the-model-3pl" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;5&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/dtothemoon/why-your-ai-agent-keeps-making-the-same-mistakes-its-not-the-model-3pl#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              2&lt;span class="hidden s:inline"&gt;&amp;nbsp;comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            3 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Why Your AI Agent Keeps Making the Same Mistakes (It's Not the Model)</title>
      <dc:creator>DtoTHEmoon</dc:creator>
      <pubDate>Wed, 27 May 2026 23:28:49 +0000</pubDate>
      <link>https://dev.to/dtothemoon/why-your-ai-agent-keeps-making-the-same-mistakes-its-not-the-model-3pl</link>
      <guid>https://dev.to/dtothemoon/why-your-ai-agent-keeps-making-the-same-mistakes-its-not-the-model-3pl</guid>
      <description>&lt;p&gt;Does this sound familiar?&lt;/p&gt;

&lt;p&gt;Your AI just fixed a bug. Two weeks later, the exact same bug is back.&lt;/p&gt;

&lt;p&gt;You deploy something, and you have no idea if it actually worked — so you manually test it.&lt;/p&gt;

&lt;p&gt;You've written 100 lines of rules in your config file, but the AI still ignores half of them.&lt;/p&gt;

&lt;p&gt;Every new chat session, you re-explain the same context from scratch.&lt;/p&gt;

&lt;p&gt;I ran into all four of these problems while building an internal AI quoting system for a healthcare company — with no technical background. And after months of debugging, I realized: &lt;strong&gt;none of these were model problems. They were Harness problems.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Harness Engineering?
&lt;/h2&gt;

&lt;p&gt;Harness Engineering is the discipline of building the scaffolding around your AI — the rules, constraints, verification scripts, and knowledge structures that make it produce consistent, reliable output.&lt;/p&gt;

&lt;p&gt;Without Harness, even the best model will drift, forget, and repeat the same mistakes.&lt;/p&gt;

&lt;p&gt;The data backs this up: research shows that &lt;strong&gt;80% of Agent quality failures come from Harness gaps, not model limitations&lt;/strong&gt;. And in one benchmark, the same 15 models all improved significantly when only the Harness changed — not the models themselves.&lt;/p&gt;

&lt;p&gt;The problem is: most people don't know what their Harness is missing. They just know something feels broken.&lt;/p&gt;




&lt;h2&gt;
  
  
  The framework: two dimensions, not six steps
&lt;/h2&gt;

&lt;p&gt;After studying real production failures and building my own system from scratch, I organized Harness Engineering into two dimensions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vertical Quality Layers (Q) — required for every project&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;What it solves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Q1&lt;/td&gt;
&lt;td&gt;SPEC&lt;/td&gt;
&lt;td&gt;AI knows what to build, what not to, and how to verify&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q2&lt;/td&gt;
&lt;td&gt;Rules + Security&lt;/td&gt;
&lt;td&gt;Hard business limits + security red lines, equally mandatory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q3&lt;/td&gt;
&lt;td&gt;Skills&lt;/td&gt;
&lt;td&gt;Repetitive workflows standardized with counter-examples&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Q4&lt;/td&gt;
&lt;td&gt;Scripts (unified gate)&lt;/td&gt;
&lt;td&gt;Nothing is "done" until scripts pass&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Horizontal Scale Layers (S) — enable only when needed&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;When to enable&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;S1&lt;/td&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;Sessions losing coherence after ~20 turns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S2&lt;/td&gt;
&lt;td&gt;dev-map + Memory&lt;/td&gt;
&lt;td&gt;Project iterating 2+ months, AI re-inventing solutions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;Multi-Agent&lt;/td&gt;
&lt;td&gt;Single agent consistently failing on long task chains&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: &lt;strong&gt;Q4 is not step four. It's the exit gate for every layer.&lt;/strong&gt; Code changes, doc updates, multi-agent outputs — all must pass Q4 before anything counts as done.&lt;/p&gt;

&lt;p&gt;Most people skip Q4 entirely. That's why the same bug keeps coming back.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I built: Rein
&lt;/h2&gt;

&lt;p&gt;Rein is an open-source Skill for Claude Code (and any agent supporting the SKILL.md standard) that acts as a silent Harness Engineering advisor throughout your project.&lt;/p&gt;

&lt;p&gt;It watches your conversations for patterns — not keywords — and speaks up only when it detects a real gap. When everything's fine, it stays silent. &lt;strong&gt;Silence is a feature.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What it detects automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repeated failures (same bug fixed twice → missing Rule or regression test)&lt;/li&gt;
&lt;li&gt;Context loss (re-explaining background every session → incomplete project docs)&lt;/li&gt;
&lt;li&gt;Scale shifts (internal tool going external → time to harden your Harness)&lt;/li&gt;
&lt;li&gt;Cost spikes (API bill climbing → identifies token waste sources)&lt;/li&gt;
&lt;li&gt;Over-engineering (more config, slower shipping → tells you what to delete)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test results: 97% pass rate across 16 scenarios with Rein vs 52% without.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The biggest gap was in root cause diagnosis: 92% accuracy with Rein, 24% without.&lt;/p&gt;




&lt;h2&gt;
  
  
  A real example from my project
&lt;/h2&gt;

&lt;p&gt;My &lt;code&gt;verify.sh&lt;/code&gt; only checked if the service started. It didn't check if the business logic was correct.&lt;/p&gt;

&lt;p&gt;So when the AI "fixed" a pricing calculation bug, it passed my verification — service was running — but the actual calculation was still wrong. Same bug, two weeks later.&lt;/p&gt;

&lt;p&gt;After adding a business baseline check (call a known correct quote request, compare against expected output), that class of bug disappeared entirely.&lt;/p&gt;

&lt;p&gt;This is Q4. Not just "is the service alive?" but "&lt;strong&gt;is the output actually correct?&lt;/strong&gt;"&lt;/p&gt;




&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/DtoTHEmoon/rein-skill.git ~/.claude/skills/rein
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart your agent. Rein activates automatically — no commands needed.&lt;/p&gt;

&lt;p&gt;Also works with: OpenClaw, Codex CLI, Gemini CLI, Cursor, and any agent supporting SKILL.md.&lt;/p&gt;




&lt;h2&gt;
  
  
  The core philosophy
&lt;/h2&gt;

&lt;p&gt;Start minimal. Add only when you have a real pain point. And know when to subtract — Rein will tell you when your Harness is getting in your own way.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If your scaffolding is slowing you down, it's time to cut.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/DtoTHEmoon/rein-skill" rel="noopener noreferrer"&gt;github.com/DtoTHEmoon/rein-skill&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agentaichallenge</category>
      <category>chatgpt</category>
    </item>
  </channel>
</rss>
