I Stopped Writing Better Prompts and Started Counting What My Skills Couple To

#ai #productivity #programming #discuss

Prompts rot. Captured failures compound. Most of the AI skills you are building are mostly prompt, which is why most of them will not survive the year.

Not because the prompts are bad. A skill's value is maybe twenty percent instruction and eighty percent scar tissue, and only that second part lasts. The instruction rots the moment the thing it describes moves. Encode how your team deploys and it works until the pipeline changes. Then you are debugging a prompt at 2am, with less to go on than if you had written the script yourself.

So before you build another one, stop asking whether the prompt is good. Ask what the skill is holding onto, and whether that thing sits still.

A skill rots at the speed of what it touches

A skill rots in proportion to how tightly it is coupled to things that move. Generic scaffolding leans on stable ground like a language or a convention, so it ages slowly. Domain logic wired to a codebase that gets refactored every quarter ages fast, no matter how good the prompt is.

The difference is the dependency count. "Write a unit test in this style" depends on a language and a convention. Both barely move. It keeps working for years because nothing under it shifts.

Real company-specific procedure is the opposite. File layouts. Service contracts. The one edge case in the billing flow. Each detail you pack in is a thread tied to something that gets refactored. Pack in enough of them and the skill is not a tool anymore. It is a liability with good intentions, and it fails silently, because a stale prompt does not throw. It quietly does the wrong thing.

That is what the skill-library pitch gets backwards. Volume is not value. A hundred skills wired to a moving codebase is a hundred things to maintain.

The only part that compounds is the scar

One part of a skill does not rot. The captured failure.

The five-line check you added after a model confidently reported a 41 percent dividend yield. The retry that refuses to fire twice so a flaky webhook cannot double-charge anyone. The guardrail you wrote only because production taught you, the expensive way, that you needed it.

None of that is prompting. Each one is a bug you paid for once and encoded so you never pay again. A prompt that says "always check the yield" rots the moment attention drifts. A five-line script that checks it and fails the run does not. The model reads the verdict; it is not trusted to re-derive the rule. Instructions ask the model to behave. Captured failures make the misbehavior impossible to ship.

That is also why they outlast the model. The failure modes of reality do not expire. Rate limits at 2am, malformed payloads, the off-by-one nobody catches in review. Those keep happening, to every version of the model, forever. A check against them is worth more next year than it is today.

That eighty percent is the only part worth carrying to the next model. The rest you rewrite every time the ground moves.

You can predict the rot before you write it

A skill's future is readable before the first line exists.

For each one, ask two questions:

What does this couple to, and how often does that move?
Which lines are captured failures, and which are decoration that makes the skill look thorough?
Which rules here would survive the model forgetting them? If a rule lives only in the prompt, it rots with the prompt. If a deterministic check enforces it, it compounds.

Then build to the answer. Keep the volatile coupling thin and swappable, so when the pipeline changes you edit one line instead of rereading the whole skill. Let the captured failures accumulate, because that is the part that pays rent. A skill built this way ages in reverse. It gets more useful as it collects more scars.

The skills worth keeping are not the clever ones. They are the ones that remember what broke. The engineering was never in the prompt. It was in the failures you bothered to capture.

Open the skill you reach for most. How much of it is instruction the model could half-guess on its own, and how much is a check that fails the run without asking the model's permission? Which half will still be true after the next model upgrade?

I write field notes from real builds, AI integration, cron-driven automation, and the parts that break in production. New posts every two weeks; if this one was useful, the agent playbook is the companion download.