I Woke Up to 14 PRs I Didn't Write

#ai #programming #productivity #discuss

Woke up with 14 pull requests I didn't write. My AI has been churning since midnight, refactoring a module I've been steering clear of for weeks. Most of it was good. Some of it was wrong in ways I wouldn't have caught without coffee.

It's happening more frequently. Karpathy's Autoresearch project spends the night spinning up and tearing down hundreds of ML experiments. Claude Code now has built-in checkpointing and rollbacks so it can revert its own incorrect handiwork. People are quite literally giving their terminals tomorrow's work before they go to bed.

But here's what no one talks about: the morning review problem.

Reviewing autonomous work is archaeology

You write the code, you understand every decision. Someone else wrote the code, you can follow their logic through the diff. The agent wrote 47 commits while you were sleeping — you're doing archaeology. You're reverse-engineering intent from output.

This morning I spent an hour and a half reviewing 6 hours of my agent's work. That's a 4x win if it's correct.

But the review is harder than just writing it would have been. I wasn't sanity-checking whether it functions. I was evaluating its engineering over the long term:

→ Does it match our abstractions?
→ Would a human engineer make the same tweaks?
→ Am I unknowingly perpetuating technical debt?

The tooling gap

The systems aren't built for it yet. Git blame is useless when every line is the responsibility of an agent. PR descriptions tell you what happened — not why, given alternative approaches, it decided on that one. You cannot replay the decision tree of the code's author.

What actually helps

A couple of things have helped:

→ Extremely detailed parameters on the agents rather than loosely defined goals
→ A post-commit hook that runs the full test suite so you're only reviewing green code
→ Handcuffing it to a single file rather than letting it spider out across twenty

The real shift

My money is still that the big shift isn't AI writing code. We had that two years ago. It's AIs writing code uninterruptedly for hours. That changes my work as a developer from writing to reviewing — and reviewing autonomous work is a fundamentally different skill than reviewing a human's. 🚀

What's your overnight agent setup? Or are you still doing everything synchronously?