Mirza Iqbal

Posted on Jun 2

The agent setup that updates itself. What self-evolving actually means in practice

#ai #claudecode #agents #automation

[detected]  a library you depend on shipped a breaking change overnight
[adopted]   the old pattern is blocked, the new default is already live

I have not touched a hook in three weeks.

My setup kept getting better anyway.

Most demos of self improving agents are theater.

A model rewrites its own prompt on stage, the room claps, and none of it survives contact with a real codebase on a Tuesday.

Real self evolution is boring.

Forget the agent rewriting itself while you watch. A useful one notices that the world changed, writes down what it learned, and never relearns the same thing twice.

That distinction is everything.

Why your agent config rots

Every agent setup starts decaying the day after you build it.

A tool you depend on ships a new version. A default flips. Some model you call gets retired. One rule you wrote in good faith quietly starts lying, because the thing it described no longer exists.

You do not notice on the day it happens.

You notice three weeks later, when something breaks in front of a customer, and the cause turns out to be a rule that was right when you wrote it and wrong every day since.

Nobody puts that part in the demo.

An agent setup rarely dies in a dramatic crash. Slow rot kills it, and you only feel the wound when the bill comes due.

A loop that earns its keep

My production version has four moves, and none of them are clever.

Detect. Every day it checks the sources it depends on for change. Release notes, version numbers, security advisories, the docs of the tools it leans on.

Classify. For each change it asks one question. Can I use this, does this break me, or do I ignore it.

Codify. Everyone skips this move. Any change worth keeping turns into a durable rule, or a guard that fires on its own from then on. One off fixes you forget by Friday do not count.

Propagate. Then it reaches every place it applies, and the same correction never costs me a second time.

Everything turns on the third move.

Most people patch the symptom in the moment and move on. Their patch lives in someone's head, or in one file, and the next time the same class of problem appears they pay again.

Capturing the correction, rather than only applying it, is the gap between a tool that ages and a tool that builds on itself.

One discipline nobody mentions

Here is the opinion that gets me in trouble.

Such a setup must never rewrite its own rules without a human in the loop.

Let frequency alone decide what becomes permanent and you get garbage. It codifies noise, over generalizes from a single bad day, and contradicts itself across files inside a month.

So it detects and proposes, then waits.

Someone confirms the rule before it becomes law.

That one constraint separates a setup that sharpens over months from one that rots into a pile of half true instructions nobody trusts.

Captured judgment beats autonomy every time.

Where it almost broke

My first version of this loop nearly killed itself.

Detection worked fine. It found changes every day and queued them for review.

Then the queue grew. And grew. Findings piled up faster than anyone drained them.

Run a detection loop that feeds a queue nobody empties, and you end up worse off than with no loop at all, because now you own the illusion of a setup that learns while it quietly stops learning.

More detection would not help here. Three guardrails on the queue did. Findings age out if they sit too long. It screams when it grows past a cap. Stale items archive themselves instead of rotting in place.

A learning system that cannot forget stops learning. It hoards.

What this writeup does not give you

I run a working version of this loop. Its wiring, its accumulated rules, and the playbook that maps a change class to the right kind of guard are what I bring to client work.

Here is the honest reason this post does not paste them.

What I bring is accumulated judgment about which changes deserve a permanent rule and which are noise. That only arrives after months of running the loop against real failures, and it does not fit in a snippet.

You can build the four move loop from this description.

Keeping it useful is the hard part. Letting it grow into the hoarder above is the easy mistake.

Your turn

I have walked enterprise teams through this exact shape, usually from one question. What in your agent setup do you re fix every single month.

That answer is almost always the thing that should have captured itself the first time.

So tell me yours.

What correction do you keep making to your AI tooling that never sticks. Drop it below and I will tell you whether it is a rule, a guard, or a thing to let go of.