AI-generated code is showing up in pull requests everywhere. Whether it's from Copilot, Claude, or a teammate who used ChatGPT — you need a review strategy that catches the specific failure modes AI code tends to have.
Here's the checklist I use. It's different from how I review human-written code.
The Problem With Default Reviews
Human code has human failure patterns: inconsistent naming, forgotten edge cases, copy-paste errors. You know what to look for because you've made the same mistakes.
AI code has different failure patterns:
- Plausible but wrong logic — it looks correct on first read but handles an edge case backwards
- Hallucinated APIs — function calls to methods that don't exist in your version of the library
- Over-engineering — adds abstraction layers nobody asked for
- Silent behavior changes — refactors adjacent code that wasn't part of the task
- Missing error handling — the happy path works perfectly; everything else crashes
Your review process needs to target these specifically.
The Checklist
1. Scope Check (30 seconds)
Question: Does this PR change only what was asked?
AI loves to "improve" things it wasn't asked to touch. Check the file list first. If the PR was "add input validation to the signup form" and it also refactored the auth middleware — that's a red flag.
Action: If extra files are modified, ask why or request they're reverted into a separate PR.
2. API Verification (2 minutes)
Question: Do all imported modules, functions, and methods actually exist?
This is the #1 AI-specific failure mode. The code will import parseAsync\ from a library that only exports parse\. It'll call response.json({ strict: true })\ with an option that doesn't exist.
Action: For any import or method call you don't recognize, check the docs. Don't trust that it exists just because the code looks confident.
3. Edge Case Audit (3 minutes)
Question: What happens with empty input? Null? A list with one item? A string with unicode?
AI-generated code almost always handles the happy path correctly. The bugs live in the edges.
Action: Pick the three most likely edge cases for this code and mentally trace through them. If the code doesn't handle them, flag it.
4. Error Path Trace (2 minutes)
Question: What happens when the network call fails? When the file doesn't exist? When the database is down?
Look for try/catch blocks. If there aren't any around I/O operations, that's a bug. If there are, check what happens in the catch — swallowing errors silently is a common AI pattern.
Action: Every I/O call should have explicit error handling. "Log and rethrow" is fine. Silence is not.
5. Test Coverage Check (1 minute)
Question: Are there tests? Do they test failure cases, not just success?
AI-generated tests tend to test that the function returns the right thing when given perfect input. That's the least valuable test. Look for tests that cover: bad input, missing data, network failures, concurrent access.
Action: If the PR has no tests, request them. If it has only happy-path tests, request edge-case tests.
6. Dependency Check (30 seconds)
Question: Did this PR add any new dependencies?
AI sometimes imports a library to solve a problem that's a three-line function. Check if new packages were added to package.json\ / requirements.txt\ / etc.
Action: If a new dependency was added, check: Is it maintained? Is it necessary? Could this be done without it?
7. The "Read It Backwards" Pass (2 minutes)
Question: Does each function make sense in isolation?
Start from the bottom of the file and read each function independently. AI-generated code can have a coherent narrative top-to-bottom but individual functions that don't hold up.
Action: If a function's logic doesn't make sense without reading the surrounding code, it might be doing too much or the abstraction is wrong.
Timing
The full checklist takes about 10 minutes for a typical PR. Here's the breakdown:
| Step | Time |
|---|---|
| Scope check | 0:30 |
| API verification | 2:00 |
| Edge case audit | 3:00 |
| Error path trace | 2:00 |
| Test coverage | 1:00 |
| Dependency check | 0:30 |
| Backwards pass | 2:00 |
| Total | ~11 min |
Compare that to the 45 minutes you'll spend debugging a hallucinated API call in production.
When to Be Extra Careful
- The PR is large. AI-generated PRs tend to be bigger than they need to be. If it's over 300 lines, ask if it can be split.
- The PR modifies auth, payments, or data deletion. Extra scrutiny. Run the code locally.
- The commit messages are generic. "Implement feature" or "Add improvements" suggests the whole thing was generated in one shot without iteration.
The Meta-Rule
AI-generated code is optimized to look correct. Your job as reviewer is to verify it is correct. The gap between "looks right" and "is right" is where the bugs live.
Trust the code exactly as much as you'd trust a confident junior developer: probably fine, but check the important parts.
What does your AI code review process look like? I'd be curious to hear what other people check for.
Top comments (1)
Hi,
I’m now looking for a reliable long-term partner.
You’ll use your profile to communicate with clients, while I handle all technical work in the background.
We’ll position ourselves as an individual freelancer to attract more clients, especially in the US market where demand is high.
Best regards,