Capturing review lessons: how I stopped my feedback loop from depending on my memory

#ai #technicalwriting #documentation #devrel

For context: in the previous piece, I took the review comments I kept repeating and turned them into mechanical scans, so the agent stops asking me to catch the same passive-voice slip for the hundredth time. That piece, and the one before it, were the same idea from two angles: move verification out of the agent's head and onto disk, where I can check it without taking the agent's word for anything.

But I ended checkbox theater on a confession. There was one part of the system that still ran on me remembering to run it. This piece is about closing that gap. It is the last of three pieces on gating, and unlike the other two, it has an actual ending.

The thread I left hanging

Here is what I admitted last time. After every review there is a step where you look at what the reviewers caught that your gates didn't, and you write down the lesson so the gate catches it next time. I had that step. I had built the log it writes to. The mechanism that feeds those lessons back into the next review worked.

The problem was the trigger. The "now go look at the reviewer comments and write down what we missed" part was mine. I had to prompt the agent to do it. If I forgot, which I did, the loop didn't run, and the lessons that should have hardened into checks just evaporated.

So I had a feedback loop with a single point of failure, and the single point of failure was a person. Me. The least reliable component in any system I have ever built.

What the gap log actually is

Before the fix, the mechanism. The gap log is one file, append-only, one line per gap:

2026-04-22 | PR-1247 | clarity      | "this powerful feature" not caught            | resolved
2026-05-12 | PR-1268 | completeness | nav entry missing for new section            | mechanized
2026-05-18 | PR-1284 | technical    | dependency claim not cross-checked vs source | open

Three statuses carry the whole thing. open means the gap is logged but nothing automated catches it yet, so the next review reads it and injects it as an extra check. mechanized means a scan or a hook now catches the pattern on its own, and the line can go quiet. resolved means the underlying problem is gone, often because something upstream changed, and no check is needed anymore.

The format is boring on purpose. The value is in what reads it: before the next review starts, the pre-flight step opens this file and turns every open line into an additional check on its dimension. A gap that surfaced once keeps showing up in every subsequent review until it becomes a script or it becomes irrelevant.

That is the part I am proud of. It converts a one-time lesson into infrastructure. The thing I learned reviewing PR-1284 doesn't sit in a Slack message I will lose by Thursday. It sits in the log, and the log gets read.

The part that was still me

And yet. The capture was manual. After a review, I had to be the one to say "walk the comments, log the gaps." The log got written, the injection worked, the lessons carried forward, all of it, but only after I remembered to pull the trigger. A loop that depends on me remembering is not a loop. It is a to-do item wearing a loop costume, and I will eventually drop it.

Cutting myself out

Two moves fixed it.

The first was to stop waiting on me. I had been treating gap capture as something the agent does and then pauses on: find the gap, log it as open, show me, let me decide whether to build the check. That is two places for me to be the bottleneck. Getting asked, and approving. I removed both. The rule now is that for every gap, the agent builds the smallest deterministic check it can, in the same session, and reports it afterward. It does not log the gap and wait. It does not present the analysis and ask permission. It does the work and then tells me what it did.

Not every gap survives that. Some need judgment a regex cannot fake, version-aware reasoning, the difference between a reader and an actor in a sentence. Those still get logged open, with a recorded reason for why a dumb scanner would do more harm than good. The default flipped, though. It used to be "log it and wait for John." Now it is "mechanize it, or justify why you can't."

The second move was the trigger that replaced my memory. Every review now records its findings to a small ledger, pass by pass. When a later pass surfaces a finding that no earlier pass recorded, that is a leak, and the check that detects it exits nonzero. A nonzero exit is the signal to run the whole routine: root cause, re-audit the artifact, close the gap by strengthening the gate. I do not have to notice the miss anymore. A thing the first pass should have caught showing up in the second pass is, mechanically and exactly, what "we missed something" looks like. So the script notices, and the script does not get tired or distracted or called into a meeting.

It normalizes the findings before it compares them, so a shifted line number doesn't fool it. Same issue, different line, still a match.

What it looks like when it works

Three real cases, because the abstract version never lands.

A British spelling slipped through. The automated reviewers passed it, the nine scans in my style gate passed it, and the human reviewers who spotted it held back on purpose, to see whether the gate would flag it on its own. It didn't. There was no spelling scan, so "honour" cleared all nine, and the miss went unactioned until I ran an explicit American-English pass days later. The gap got logged, and inside the same session it became a curated British-spelling scan, tuned to fire on the real offenders and stay silent on the words that only look British. Now it catches the next one before it reaches a human at all.

The same page lived in several parallel versions. A wording fix landed in some of the copies and not the others, so two of them kept the wrong term and nobody noticed they had drifted apart. The lesson became a parity scan: when one copy of a shared page changes, diff the changed lines against its siblings and flag the ones that no longer match. Verified against the exact case that produced it, zero false positives.

A config key got documented that did not exist anywhere in the source. Plausible-looking, well-formatted, completely invented. The fix was a zero-hit search: take every identifier the docs claim is real, search the actual source for it, and flag the ones that come back with nothing. The agent can no longer make up a key and have it sail through, because the gate now asks the source whether the key exists before believing the prose.

Each of those started as a single embarrassing miss and ended as a check that runs forever. That is the entire point of the log. The miss is one-time. The check is permanent.

The principle

Here is the spine of all three pieces, finally connected. Checkbox theater said: do not trust the agent's claim, check the artifact it produced. Mechanical pattern gates said: take the note you keep repeating and turn it into a scan. Both move verification out of the agent's control. This piece moves the improvement out of mine.

Because that was the gap I could not see for too long. I had made the agent's verification mechanical and left the system's self-improvement running on a human prompt, and the human was me. A feedback loop that fires only when I remember to fire it is checkbox theater one level up. The box that said "lessons captured" was getting checked by me deciding I had done it, with nothing on disk proving I had. Same failure mode, one floor higher, harder to see because it was wearing the costume of a process that worked.

The capture has to be as mechanical as the checks it produces. Otherwise the weakest gate in the whole system is the one whose entire job is to make the others stronger.

Where it actually stands

The loop runs without me now. Gaps get caught, the mechanizable ones become checks in the same session, the rest sit logged with a reason, and the leak detector trips when something slips between passes. I am still in it, but as the person who reads what it did, not the person it depends on to remember to do anything.

Plenty is still open, and I am fine with that. The log has open lines I have not cracked because they need judgment I cannot cheaply encode: version-aware checks, telling a reader apart from an actor, the cases where a blunt scanner would flag more good than bad. "Open with a reason" is an honest state to leave something in. The goal was never zero open gaps. It was to stop losing the lessons between the moment I learned them and the moment I forgot.

So that is the gating arc. Three pieces, one idea: the agent does the work, but the work it does, and the work of getting better at the work, both live somewhere I can point to instead of somewhere I have to vouch for. If you are building anything where the same system does the work, audits the work, and improves the work, I would put to you the question it took me three articles to answer for myself. Which of those three still depends on you remembering? That is the one to mechanize next. I would like to hear what you find when you go looking.

I write about AI-assisted documentation workflows, developer experience, and the evolving role of technical writers. If any of this resonates, let's connect on LinkedIn.