I Built a Self-Improving AI Agent. Here Is What Made It Learn.

#ai #agents #selfimprovement #buildinpublic

Six months ago I started building what I'd call a self-improving AI agent. Not in the academic sense, not with RL loops. Just a practical personal system: my daily driver agent that observes its own mistakes, files them, and by the next morning has already learned from them.

The layer I kept getting wrong was memory. Every time something went sideways, my instinct was to add another memory file. Longer context. More rules. More notes to self.

It did not work. Not because memory is wrong, but because "memory" is one word covering four completely different jobs. Once I separated them, the improvement started showing up in the data.

Here's what's inside this post:

The corrections loop (capture, classify, graduate): The three-stage pipeline I built to ensure no correction ever expires unaddressed. Capture is a single Python call. Classify is a regex map of 7 patterns into 6 kinds. Graduate is a nightly job that picks the right file. The whole thing costs nothing if a session is under pressure -- which matters more than it sounds.

Four memory sinks: Working memory that decays. Lessons that read like engineering postmortems. Per-rule feedback files (one rule, one file, linkable and deduplicatable). Always-loaded top-level RULE lines the agent wakes up next to. Different lifespans, different files, different decay curves.

A Basecamp card for every correction: The human-in-the-loop surface that keeps the model from grading its own homework in private. Every correction lands as a card I can read, push back on, fold into another, or trash. Corrections become reviewable. That is the part I would build first if starting over.

The actual numbers (30-day window): 22 corrections in 30 days, trending to 18 in the last 7. 93.5% task success rate. The analyzer flagged 2 repeating themes -- both of which proposed rules that have already graduated to always-loaded RULE lines. The loop is closing on itself in data I can read off the file.

What would break it: Letting the model grade its own corrections in private. Yohei Nakajima wrote the clearest version of this risk. The failure mode is the model hallucinating bad reflections and reinforcing them. The Behavioral Learning card table is my only guardrail against that.

If you have ever made the same correction to your agent twice, this is the post for it.

Full post: https://thoughts.jock.pl/p/i-built-a-self-improving-ai-agent

Free weekly: https://thoughts.jock.pl