Discussion on: Building a Real AI Harness: Auto-Reviewed PRs, Self-Healing Ops, and Non-Engineer Contributors (Series Intro)

View post

Big concern from experience: self‑healing can create a “heal and hide” pattern where the same issue recurs and is silently fixed without anyone noticing the root cause is still rotting. Do you pair auto‑healing with mandatory root‑cause logging, or does the system just patch and move on? This is where AI goes from toy to force multiplier. Auto‑reviewed PRs and self‑healing ops aren’t sci‑fi anymore; they’re team‑extenders. I like that you even thought about non‑engineer contributors — that’s where real org‑wide impact lies.

What if the self‑healing steps were always proposed as a “suggested action” with a 5‑minute delay, and only auto‑execute if no human overrides? That gives you a semi‑autonomous system that teaches rather than hides.

Ryosuke Tsuji • May 13

Heal-and-hide is the right anti-pattern to name. The way we avoid it: every auto-fix PR carries 3 artifacts beyond the code change.

The fix itself
A pattern doc entry (problem, solution, code example, checklist)
A 1-line invariant added to the agent's persistent rules file, so the next agent run already has the pattern in context

Concrete example from a recent fix: a Cloud Run deploy failed because secretKeyRef version:latest was pointing at a Secret with no versions. The auto-fix PR added the placeholder SecretVersion to unblock the deploy, and the same PR added a 40-line entry to our cloud-run-deploy guideline (problem, solution, code, checklist) plus a 1-line invariant to the agent's rules file. The next agent that touches a Cloud Run service with secretKeyRef sees the pattern before it codes.

Patch-and-move-on is a real risk, agreed. Our answer is making the docs and invariant update part of the PR scope itself. If the agent doesn't generate the learning artifact, the PR isn't ready for review.

On your 5-min delay suggestion: we use the code review window (auto-review agent + human approval) as the override surface instead of a time window. Different trade on the gate, same goal.

HARD IN SOFT OUT • May 13

Solid pattern. Baking the learning artifact into the same PR scope closes the loop cleanly — fix, doc, invariant, all in one atomic commit. That's not self-healing anymore, that's self-documenting TDD for ops.

One edge I'd watch: invariant collision. As the rules file grows, two invariants can silently contradict under a narrow condition. Do you have a reconciliation mechanism, or does the review surface catch that before it commits? Curious if you've considered a periodic "rule set lint" pass — treat invariants like a test suite that can be checked for internal consistency.

Ryosuke Tsuji • May 13

Nice catch. That gap was missing.

Implementing it is cheap in our setup: every doc (including the review guidelines) is ingested into the knowledge graph and the agent reaches it via MCP at any time.
Just opened a PR to add rule set lint to the review guideline. Thanks!