If you're running AI agent hierarchies, you've probably noticed the gap: agents complete tasks, but nothing checks if the output is actually good. There's no feedback loop, no auto-retry, and no way to catch performance degradation before it costs you.
I built a set of 4 plugins for Paperclip AI that add a self-improvement layer to multi-agent setups (works with Paperclip-managed OpenClaw agent teams too). Here's how the architecture works.
The Problem
A typical agent hierarchy looks like this:
CEO Agent [Opus] - decomposes goals, delegates
CTO Agent [Opus] - makes tech decisions, delegates
Worker Agent [Sonnet] - executes tasks
Tasks flow down. Results flow up. But there's no quality layer. If the Worker produces bad output, the CEO doesn't know until a human checks manually.
The Solution: An Event-Driven Feedback Loop
Four plugins, each handling one part of the loop:
task.created -> Skill Router assigns tools/agent type
task.completed -> Performance Tracker logs outcome
-> Self-Correction QA checks output
-> Prompt Evolver stores data for evaluation
-> On degradation: Prompt Evolver proposes new prompt
Performance Tracker
Subscribes to issue.completed events. Logs success/failure outcomes per agent, calculates rolling success rates, and emits a TASK_COMPLETED event with structured metadata. When an agent's success rate drops below a configurable threshold, it emits a degradation event.
Self-Correction
Listens for TASK_COMPLETED from Performance Tracker. Runs a QA check on the output. If the check fails:
- Retry up to N times (configurable)
- If still failing, escalate to the supervisor agent
The agent catches its own mistakes before they propagate up the chain.
Skill Router
Intercepts issue.created events. Analyzes the task description and assigns:
- The appropriate agent type (don't send architecture decisions to a Worker)
- The right tool set for the task
This prevents misrouting before work begins.
Prompt Evolver
Accumulates outcome data per agent over time. When Performance Tracker emits a degradation event, Prompt Evolver uses the historical data to propose a rewritten prompt for the underperforming agent.
The agent's instructions improve automatically based on real performance data.
Technical Details
- Built with
@paperclipai/plugin-sdk - Each plugin runs as an isolated Node.js child process
- Communication via JSON-RPC 2.0 protocol
- TypeScript throughout
- 56 tests passing (Vitest)
- Tested against a real 3-agent CEO/CTO/Worker hierarchy
The plugins are event-driven and composable. You can use all four together for the full feedback loop, or drop in individual plugins alongside your existing setup.
Get It
The plugin pack is available on Gumroad with a company template and 10-chapter implementation guide:
Self-Improving AI Companies - Plugin Pack + Guide - $49
Happy to answer questions in the comments. I'm deaf, so all communication is text-based.
Top comments (0)