Three LLMs in a trenchcoat

#ai #software #codereview

You’re a tech-lead in charge of bringing everybody’s code together. You hired a smart, fast moving team of engineers and you’re moving in six-week cycles, hitting all the goals that matter. Life is grand. High fives all over the place. Things are going well. But suddenly you start falling behind on reviewing everybody’s pull requests. Great problem to have right? Output is going up and engineers are moving faster. But you’re slowly becoming the bottleneck. You can’t keep up with testing everything on your beta environment anymore, let alone look through all the code in great detail. What’s happening? Are you getting too old? Why is everybody moving so fast? Well... you might be looking at three LLMs in a trenchcoat.

The dawn of a new era

We’ve finally left the era in which code reviews were – with a given certainty – conducted on hand-crafted, coffee-fueled and human-tested code. Sometimes it’s very easy to spot code written by LLMs, whereas other times it’s actually very hard to tell. Either way, the code can be absolutely great, or flat-out bad. Does it even matter who wrote it? Shouldn’t the review process be the same regardless? Well, yes and no. Let me explain.

I don’t want to be a slop cop.

From a developer’s perspective who writes the code, PR reviews can be a great way to learn things about coding patterns, the product’s current architecture, security issues you might accidentally introduce and much more. However, when 99% of the code is AI-generated and you barely gave it a pass and briefly tested if it gets the job done on your local machine, this doesn’t apply anymore. You’re almost self-sabotaging because an opportunity to level-up doesn’t happen anymore. It’s a weird trade-off of being done faster with tasks, while also not gaining the knowledge to be faster with similar tasks in the future. Maybe that’s fine though?

What’s not fine is, when there are different degrees of AI usage and disclosure and the responsibility for what ‘good code’ is, suddenly shifts completely to the people reviewing the code. As a team, each engineer should still take full responsibility for the code they write (or let write). In fact, when engineers let LLMs do the coding part, their job just shifted towards more of a manager’s role and they have to be the first line of defense against bad code or patterns.

Establishing guidelines

Don’t get me wrong – I love the productivity boost AI gives us these days. I just think there need to be clear rules that an engineering team can agree on and clear communication about what kind of code should make it into a PR nowadays. ‘It works’ was never really enough, but the responsibility to double-check the code before opening a PR has significantly shifted towards the engineers writing it, and I suppose not everybody likes that. If engineers don’t take that responsibility serious, tech leads or whoever reviews the code will eventually become the bottleneck, as the final code reviews are still largely manual, even though AI assistance is getting much better in that regard as well.

So, as somebody who reviews code daily, here are some questions I’m asking myself more lately:

Does this person really understand what this code does?
Could they debug this at 2am when it breaks? (Could I?!)
Is the feedback I’m giving going to help them grow, or just generate a better prompt?

If all of these answers are no, or the code simply introduces complexity that will make the next debugging or iteration a nightmare. I will likely not approve the PR without pushing back.

Also, if somebody’s just re-prompting when you’re pushing back, they’re not leveling up as an engineer, they’re leveling up their prompting game. That’s a skill too, but likely not the one you’re looking to foster or the one that will make them a better engineer in the long run. A culture of honesty about what has been prompted is much more healthy than people covertly using AI to get the job done faster.

How to fix it

I don’t have the answer. I think there are a few options already, but none of them is perfect – yet. As mentioned above, one angle is ‘this is a people problem’ and engineers should really take full responsibility for the code they write and thoroughly test it. Now suddenly everybody’s a manager (of AI agents) and does QA on top of actually ‘writing’ code. Even if that works though, somebody has to be responsible for the final code that makes it to production and needs to review it. The output simply exceeds the capacity to do that in a regular engineering team, where not everybody does code reviews.

I’m also not trying to be the old man yelling at clouds here. AI-assisted development is the future and so are AI-assisted code-reviews. The assisted reviews are certainly getting better, but might lack context and the AI won’t get up at 2am when shit hits the fan, because a wrong merge actually did make it to production. So again, somebody has to be responsible for the quality checks and for what makes it to production. I do think AI assistance will get a lot better in this area over time, but it’s not quite there yet.

The Uncomfortable Truth

The developers who are going to thrive aren’t the ones who can prompt the best code. They’re the ones who can understand the code regardless of where it came from, adapt it as needed, debug it when it breaks, and take genuine ownership of the systems they build.

And the tech leads and managers who are going to stay sane are the ones who figure out how to set clear expectations, establish common AI usage guidelines, create accountability, and manage to maintain knowledge transfer while plausible-looking code is just one prompt away.

Got any thoughts on the topic? I’d love to read them!

DEV Community

Three LLMs in a trenchcoat

The dawn of a new era

Establishing guidelines

How to fix it

The Uncomfortable Truth

Top comments (0)