DEV Community

Learn AI Resource
Learn AI Resource

Posted on

AI Code Review Without the Theatre

AI Code Review Without the Theatre

I've been running LLM-powered code reviews on our team for three months now, and the honest take: it's saved us from shipping bugs, but only because we stopped using it wrong.

Here's what most teams get wrong with AI code review, and how to actually make it work.

The Dumb Way (That Everyone Tries First)

You paste your PR into Claude or ChatGPT and ask "review this code." Then you get back a wall of generic advice: "Consider adding error handling. Make variable names more descriptive. Add docstrings."

Useless. Your code already has those things, or you don't care.

The problem? You're not giving the AI context. It's like showing someone your code with your hands over half of it.

What Actually Works

1. Review by Behavior, Not Style

Instead of "review this code," ask: "What could break this function with these inputs: [examples]?"

// What I used to do
"Review this payment processing function"

// What works
"I'm processing refunds. Edge cases that scare me:
- What if the user's bank rejects the refund?
- What if they request it twice in 5 seconds?
- What if the amount is \$0.01?
Here's the function: [code]
Could any of these scenarios cause problems?"
Enter fullscreen mode Exit fullscreen mode

The AI will actually find the issues because you've told it what to look for.

2. Review Actual Diffs, Not Whole Files

Show it what changed, not everything:

- OLD: const price = item.price * quantity;
+ NEW: const price = Math.max(0, item.price * quantity);
Enter fullscreen mode Exit fullscreen mode

Then ask: "Does this handle negative prices correctly in refunds?"

Context is everything. A 50-line diff with a specific question beats a 500-line file review.

3. Review for Your Stack, Not Generic Code

Tell it what matters to you:

"We run this on serverless (cold starts matter). 
Our p99 latency SLA is 200ms. 
We can't use external dependencies without approval.
Review this image processing function for those constraints."
Enter fullscreen mode Exit fullscreen mode

Now it's actually helpful because it knows your constraints.

Real Examples From Our Practice

Example 1: Database Query

Bad: "Review this SQL query"
Good: "This query gets user comments ordered by recency. We have 50M comments and the user IDs are random. Will this be slow? What index would you add?"

AI caught that we were sorting post-filter instead of using an indexed sort. 🎯

Example 2: React Component

Bad: "Review this form component"
Good: "Users complain this form is slow to type in. Here's the component. Where could it be re-rendering unnecessarily?"

Found three unnecessary state updates. Fixed one, saved 200ms on keystroke. 🎯

Example 3: API Endpoint

Bad: "Review this endpoint"
Good: "This endpoint handles file uploads (up to 50MB). We've had issues with memory spikes. Does this code keep the whole file in memory?"

It did. We fixed it. 🎯

The One Rule That Changes Everything

Ask it to explain its reasoning for every recommendation.

"Here's why this is a problem: [explanation]. Here's how to fix it: [fix]. Here's why that fix matters: [impact]."

Lazy AI reviews will say "consider error handling" and that's it. Force it to justify. If it can't explain why something matters for your specific case, ignore it.

Tools That Make This Easier

  • GitHub Copilot PR review – Built into your workflow, understands context
  • Continue.dev – Bring Claude/Copilot into your editor, review as you write
  • Custom tools – We built a small script that grabs the PR diff, adds our constraints, and feeds it to Claude API. Takes 2 minutes to set up.

Don't overthink tooling. Shell script + Claude API works. CLI tool works. Paid service works. Pick one and actually use it.

What This Isn't

AI code review isn't a replacement for humans. It's a filter.

Human review is still for: architecture decisions, API design, big picture concerns.
AI review is for: silly bugs, edge cases you missed, performance gotchas, context-specific issues.

Let the machine be good at what it's good at. Let humans be good at what they're good at.

Next Steps

Start small. Pick one annoying thing your PRs have (memory leaks, SQL N+1s, missing edge cases) and ask AI specifically to hunt for that. You'll be shocked what it finds when it's looking.

Then expand. Different app, different concern.


Want more practical AI patterns for developers? Check out LearnAI Weekly newsletter – real tools and techniques (not fluff).

Happy reviewing.

Top comments (1)

Collapse
 
marcusykim profile image
Marcus Kim

The strongest point here is that the review target has to be a behavior, not a file. The refund example with duplicate requests within 5 seconds and the 50MB upload memory spike are exactly the kind of cases where an LLM can be useful because the failure mode is named up front. For a founder/engineering team, I'd treat this as a lightweight risk register: every weird production bug becomes one reusable review question that gets attached to future diffs in that area.