Discussion on: Code Review When Half the Diffs Are From Agents

View post

This is the review problem of the moment. When half your diffs are agent-generated, the bottleneck moves entirely to review, and human review doesn't scale to AI's output rate. The trap is rubber-stamping plausible diffs because there are too many to scrutinize, which is exactly how subtle bugs ship. What's worked for me: make the agent prove its own diff first (tests pass, it builds, it wires into the existing code) so human review is the second gate, not the only one. Review the intent and the risky parts, let automation verify the mechanical. That generate-then-self-verify split is the core of how Moonshift ships. How's your team adapting, more automated gates or just more eyes?

Ian Johnson • May 31

We’ve been investing heavily in automated gates to keep as much machine checkable as possible. This does create less pressure on the reviewer. Since the checks pass, the engineer can focus on more subtle, structural, and code design. If we can make something a machine check, we do. Then the reviewer’s job is only the concerns that need to pass through a human: is this the right abstraction? Is information leaking? Is the design convoluted? Are business rules entirely correct? Will this scale when we have traffic? Human intuition works best there, not “I ran the unit tests.”