DEV Community

Cover image for Claude Code Wrote the PR. Here's What the Code Review Actually Caught.

Claude Code Wrote the PR. Here's What the Code Review Actually Caught.

Daniel Nwaneri on June 17, 2026

Everyone is shipping AI-generated code right now. Most of it is going straight to main. Quick verdict: Qodo catches production-grade bugs in AI-g...
Collapse
 
leob profile image
leob

Yeah that's amazing ... so, would Qodo use different "models" (LLMs), or the same models but trained differently, how does that work?

Collapse
 
dannwaneri profile image
Daniel Nwaneri • Edited

From what I can tell, foundation models (Claude, GPT-5-class) . The differentiation is in how they orchestrate multiple specialized agents and index your codebase for context, not in the underlying weights.

Collapse
 
leob profile image
leob • Edited

So the difference is not in the underlying capabilities, but really in how you utilize them ...

I'm asking because, if the LLM is able to find those bugs (when orchestrated and directed by Qodo), you'd think that that same LLM should be capable of not making those bugs in the first place when generating the code! :-)

Guess it goes to show that it matters a lot what you're asking of an LLM in affecting what it does or 'can do', even though the fundamental capabilities are all there ... maybe an LLM can be "really good" only at one clearly defined task at the same time, compared to the human brain which just more naturally does multiple things simultaneously?

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

Generation is autocomplete . The model optimizes for the next plausible token. Review is inversion . The model looks for where "plausible" breaks down at runtime. Same weights, opposing objectives.

Your "one task at a time" framing is close but I'd put it differently: it's not capacity, it's optimization direction. A model writing code isn't asking "where could this fail?" It's asking "what comes next?" Switch the prompt, switch the question.

The human parallel holds . same dev, same brain, writes a bug at 2pm and catches it in review at 4pm. The question is whether you'd actually want a generator that paused mid-write to second-guess itself.

Thread Thread
 
leob profile image
leob • Edited

Right, so in the end the difference is in the context that you feed into it ...

Still baffles me that we now have these enormous and opaque artificial "brains", and nobody really understands how it's doing its magic, but we're somehow getting good at coaxing it into doing what we want ;-)

P.S. but with something like Qodo, is it only about the different context that they're supplying to the LLM, as in, a clever prompt? ;-)

Or would they do some additional 'training' on the model, creating a new variant of it? (at this point I realize I might be talking total nonsense, lol)

(well I'm asking something which nobody might have the answer to, because Qodo is probably not disclosing their "secret sauce" ...)

To use the "human brain" analogy again:

You might give that (human) developer, who has to do code reviews, but has little experience with it (yeah okay, this is just fictitious ...) a detailed checklist telling him/her how to do code reviews - or, you might send that developer on a 1 week course, to learn best practices and basic principles of doing code reviews - where the checklist is analogous to a "prompt", or 'context', while the 1 week course would be "additional training of the model" (assuming that the latter is even technically possible at all ...)

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The checklist analogy is closer than you're giving yourself credit for. Most of what tools like Qodo do is retrieval and orchestration . figure out which code is relevant, package it with structured review instructions, dispatch specialized agents per concern. The underlying model stays the same.

Fine-tuning (your "1-week course") is expensive and goes stale fast as codebases evolve. RAG and prompt engineering age better because the context is dynamic. You don't retrain the model; you get better at telling it what to look at and what to ask.

The opaque brain does the same thing with a better briefing packet. That's most of the magic.

Collapse
 
sloan profile image
Sloan the DEV Moderator

Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.

We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.

Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.

We hope you understand and take care to follow our guidelines going forward!

Collapse
 
francistrdev profile image
FrancisTRᴅᴇᴠ (っ◔◡◔)っ

Issue resolved. Thanks Daniel!

Collapse
 
benjamin_nguyen_8ca6ff360 profile image
Benjamin Nguyen

I find interesting your post because I did not know that you had issues with claude.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The opposite actually . Claude generated the code well. Qodo reviewed it and found six bugs in what Claude produced. The issue was with the generated code, not the tool.