Building Self-Improving AI Agent Hierarchies with Paperclip Plugins

#agents #ai #showdev #typescript

If you're running AI agent hierarchies, you've probably noticed the gap: agents complete tasks, but nothing checks if the output is actually good. There's no feedback loop, no auto-retry, and no way to catch performance degradation before it costs you.

I built a set of 4 plugins for Paperclip AI that add a self-improvement layer to multi-agent setups (works with Paperclip-managed OpenClaw agent teams too). Here's how the architecture works.

The Problem

A typical agent hierarchy looks like this:

CEO Agent [Opus] - decomposes goals, delegates
  CTO Agent [Opus] - makes tech decisions, delegates
    Worker Agent [Sonnet] - executes tasks

Tasks flow down. Results flow up. But there's no quality layer. If the Worker produces bad output, the CEO doesn't know until a human checks manually.

The Solution: An Event-Driven Feedback Loop

Four plugins, each handling one part of the loop:

task.created  -> Skill Router assigns tools/agent type
task.completed -> Performance Tracker logs outcome
              -> Self-Correction QA checks output
              -> Prompt Evolver stores data for evaluation
              -> On degradation: Prompt Evolver proposes new prompt

Performance Tracker

Subscribes to issue.completed events. Logs success/failure outcomes per agent, calculates rolling success rates, and emits a TASK_COMPLETED event with structured metadata. When an agent's success rate drops below a configurable threshold, it emits a degradation event.

Self-Correction

Listens for TASK_COMPLETED from Performance Tracker. Runs a QA check on the output. If the check fails:

Retry up to N times (configurable)
If still failing, escalate to the supervisor agent

The agent catches its own mistakes before they propagate up the chain.

Skill Router

Intercepts issue.created events. Analyzes the task description and assigns:

The appropriate agent type (don't send architecture decisions to a Worker)
The right tool set for the task

This prevents misrouting before work begins.

Prompt Evolver

Accumulates outcome data per agent over time. When Performance Tracker emits a degradation event, Prompt Evolver uses the historical data to propose a rewritten prompt for the underperforming agent.

The agent's instructions improve automatically based on real performance data.

Technical Details

Built with @paperclipai/plugin-sdk
Each plugin runs as an isolated Node.js child process
Communication via JSON-RPC 2.0 protocol
TypeScript throughout
56 tests passing (Vitest)
Tested against a real 3-agent CEO/CTO/Worker hierarchy

The plugins are event-driven and composable. You can use all four together for the full feedback loop, or drop in individual plugins alongside your existing setup.