DEV Community

Patrick Hughes
Patrick Hughes

Posted on • Originally published at bmdpat.com

When Your Blog Repair Loop Fails 23 Times, Stop Repairing

When Your Blog Repair Loop Fails 23 Times, Stop Repairing

This morning my brain checker flagged a missed blog post. The publisher had nothing approved. The repair loop had been chewing on the same draft for over a week. 23 attempts. Status: blocked. Reason: "1 paragraph still exceeds the 900-char scan limit."

The draft was a devlog from May 10. The dates inside it said "this week" and "this week's." That was three weeks ago. The repair agent kept trying to fix the paragraph length and never once asked the obvious question: should we still be shipping this?

That is the lesson. I want to write it down before I forget it.

What the loop was doing

The blog repair loop runs at 08:55 CT every morning. It reads the QA reviewer's notes from 08:30, opens the draft, and tries to fix the named issues. If it fails, it tries again tomorrow. It is patient. It is dumb.

For 23 days it tried to split one paragraph under 900 characters. The QA reviewer also asked for absolute dates and an infographic. The repair loop could not rewrite the relative dates because the post was about a specific past week. New "absolute" dates would have made it sound like a museum exhibit. So it picked the paragraph length and chewed on that.

What was actually broken

The draft was not salvageable. Not for a mechanical reason. Because it was old.

A devlog about a week three weeks ago is not a missing-infographic problem. It is a content problem. No amount of paragraph splitting fixes that. The repair loop did not have the concept of "this draft has gone stale, archive it." So it kept the draft in Queue forever and reported "blocked" every morning. Reporting blocked feels like work. It is not.

The heal path

The brain checker has a separate path for this. If no post went live today and Review is empty, it calls /think-heal-blog. Heal does one thing: ship one good post today. It is allowed to look at Queue, but it is allowed to ignore it. If the queue is full of unsalvageable drafts, heal writes fresh.

This is the post heal wrote. The Queue draft is still sitting there. I will archive it manually after this ships.

The pattern

Automation that retries forever lies to you. Every morning the repair status said "blocked, repair_attempts: 23." That is a green-shaped red. The runner fired. The status updated. The report wrote a line. Nothing shipped.

The fix is not a smarter repair loop. The fix is a TTL on drafts and a heal path that bypasses the loop when the loop is stuck. Two structural pieces. Neither one tries to be clever.

For agent systems generally: if a step can fail in a way the next step does not notice, you need a different step. Not a better version of the same one. Same lesson as the silent success trap and the cron-jobs-lie outcome checker. The theme keeps showing up because it is the actual hard part of running agents in production. They will tell you everything is fine right up to the moment you check.

What I am changing today

Three things:

  1. Drafts in Queue get a 7-day TTL. Older than that, they go to an archive folder. The repair loop stops touching them.
  2. The QA reviewer gets a "rescue not worth it" verdict. If the changes needed are not mechanical, the reviewer says so and the draft skips the repair loop and lands in archive.
  3. The heal path stays. Today it earned its keep.

If you are building agents and you have any loop that retries, ask what it does when retrying is wrong. Then add the escape hatch before you need it. I needed mine three weeks ago.

If you want hard budget limits and loop guards for coding agents, start with AgentGuard: https://bmdpat.com/tools/agentguard

Top comments (0)