<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Brady Smith</title>
    <description>The latest articles on DEV Community by Brady Smith (@brady_smith_e32781c702b6c).</description>
    <link>https://dev.to/brady_smith_e32781c702b6c</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838100%2F60cc59e0-1a4d-409b-9ce5-cebd7f79f592.png</url>
      <title>DEV Community: Brady Smith</title>
      <link>https://dev.to/brady_smith_e32781c702b6c</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brady_smith_e32781c702b6c"/>
    <language>en</language>
    <item>
      <title>Building Self-Improving AI Agent Hierarchies with Paperclip Plugins</title>
      <dc:creator>Brady Smith</dc:creator>
      <pubDate>Wed, 01 Apr 2026 21:58:21 +0000</pubDate>
      <link>https://dev.to/brady_smith_e32781c702b6c/building-self-improving-ai-agent-hierarchies-with-paperclip-plugins-55f</link>
      <guid>https://dev.to/brady_smith_e32781c702b6c/building-self-improving-ai-agent-hierarchies-with-paperclip-plugins-55f</guid>
      <description>&lt;p&gt;If you're running AI agent hierarchies, you've probably noticed the gap: agents complete tasks, but nothing checks if the output is actually good. There's no feedback loop, no auto-retry, and no way to catch performance degradation before it costs you.&lt;/p&gt;

&lt;p&gt;I built a set of 4 plugins for &lt;a href="https://github.com/paperclipai/paperclip" rel="noopener noreferrer"&gt;Paperclip AI&lt;/a&gt; that add a self-improvement layer to multi-agent setups (works with Paperclip-managed OpenClaw agent teams too). Here's how the architecture works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;A typical agent hierarchy looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CEO Agent [Opus] - decomposes goals, delegates
  CTO Agent [Opus] - makes tech decisions, delegates
    Worker Agent [Sonnet] - executes tasks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tasks flow down. Results flow up. But there's no quality layer. If the Worker produces bad output, the CEO doesn't know until a human checks manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: An Event-Driven Feedback Loop
&lt;/h2&gt;

&lt;p&gt;Four plugins, each handling one part of the loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;task.created  -&amp;gt; Skill Router assigns tools/agent type
task.completed -&amp;gt; Performance Tracker logs outcome
              -&amp;gt; Self-Correction QA checks output
              -&amp;gt; Prompt Evolver stores data for evaluation
              -&amp;gt; On degradation: Prompt Evolver proposes new prompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Tracker
&lt;/h3&gt;

&lt;p&gt;Subscribes to &lt;code&gt;issue.completed&lt;/code&gt; events. Logs success/failure outcomes per agent, calculates rolling success rates, and emits a &lt;code&gt;TASK_COMPLETED&lt;/code&gt; event with structured metadata. When an agent's success rate drops below a configurable threshold, it emits a degradation event.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-Correction
&lt;/h3&gt;

&lt;p&gt;Listens for &lt;code&gt;TASK_COMPLETED&lt;/code&gt; from Performance Tracker. Runs a QA check on the output. If the check fails:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retry up to N times (configurable)&lt;/li&gt;
&lt;li&gt;If still failing, escalate to the supervisor agent&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent catches its own mistakes before they propagate up the chain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Skill Router
&lt;/h3&gt;

&lt;p&gt;Intercepts &lt;code&gt;issue.created&lt;/code&gt; events. Analyzes the task description and assigns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The appropriate agent type (don't send architecture decisions to a Worker)&lt;/li&gt;
&lt;li&gt;The right tool set for the task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This prevents misrouting before work begins.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt Evolver
&lt;/h3&gt;

&lt;p&gt;Accumulates outcome data per agent over time. When Performance Tracker emits a degradation event, Prompt Evolver uses the historical data to propose a rewritten prompt for the underperforming agent.&lt;/p&gt;

&lt;p&gt;The agent's instructions improve automatically based on real performance data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Details
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Built with &lt;code&gt;@paperclipai/plugin-sdk&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Each plugin runs as an isolated Node.js child process&lt;/li&gt;
&lt;li&gt;Communication via JSON-RPC 2.0 protocol&lt;/li&gt;
&lt;li&gt;TypeScript throughout&lt;/li&gt;
&lt;li&gt;56 tests passing (Vitest)&lt;/li&gt;
&lt;li&gt;Tested against a real 3-agent CEO/CTO/Worker hierarchy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plugins are event-driven and composable. You can use all four together for the full feedback loop, or drop in individual plugins alongside your existing setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get It
&lt;/h2&gt;

&lt;p&gt;The plugin pack is available on Gumroad with a company template and 10-chapter implementation guide:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://bradytech.gumroad.com/l/ai-companies" rel="noopener noreferrer"&gt;Self-Improving AI Companies - Plugin Pack + Guide&lt;/a&gt;&lt;/strong&gt; - $49&lt;/p&gt;

&lt;p&gt;Happy to answer questions in the comments. I'm deaf, so all communication is text-based.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>agents</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
