DEV Community

Dmytro Bieliaiev
Dmytro Bieliaiev

Posted on

The Onion of Criticality: Stop Asking If AI Writes Better Code

I believe you’ve heard the claim: “The time saved by writing code with AI will be spent on code review.” That might have been true about a year ago, when models were still raw and hallucinated more. Now the quality of models has improved significantly, so it’s worth revisiting how accurate this statement still is in 2026.

I’ve been running a test suggested by a friend two years ago. From time to time, we would feed an LLM a specific prompt to generate a high-efficiency matching engine in Golang. We used the same prompt across different models and compared the outputs. Some models implemented a red-black tree using floats, others used strings. In my opinion, neither approach was particularly optimal from a performance standpoint. I treated this as a personal benchmark: when an LLM can produce this code better than I can, then perhaps the “time saved by using an LLM for backend development” would truly be justified. However, after two years of working with models, I’ve come to understand why this approach isn’t entirely correct.

LLMs perform better with more context - the more context you provide, the better. Ideally, a very detailed description of what you want. But there’s a tradeoff: the more time you spend describing the task, the smaller the net time savings. For low-priority projects, such as a back-office frontend admin panel, this would be a poor strategy. It’s also important to note that context is not just the prompt itself; it includes the codebase, comments, organizational knowledge, and so on.

For most business problems, we don’t need an HFT-level matching engine. We need a working solution that is easy to maintain and won’t break in three months, ideally covered by meaningful tests and free of security gaps. So how does the matching engine code differ from something like an API that simply returns data from a database?

I suggest introducing a metric called “design quality.” If we assume that LLMs can already write code, the next question becomes what “quality” actually means and how we measure it. This is where things get interesting. Beyond patterns and best practices, there is a great deal of human experience that you simply cannot pack into a prompt. Only another human can truly evaluate this quality. You can keep asking an LLM to “make it better”, and it may replace a mutex with atomics or make other surface-level improvements. But that still won’t capture many deeper nuances, which is why an experienced developer will ultimately review the code.

When a senior developer reviews another developer’s PR, they focus primarily on design quality. Ask yourself: what would you review more carefully, line by line, a mission-critical system or a secondary project? Most likely, the closer a project is to being mission-critical, the more important design quality becomes and the more thoroughly you review the PR. In other words, our perception directly correlates with project criticality.

This leads to what I call the “onion of criticality.” It’s a metaphor for my core idea: the closer we are to the core, the less we should rely on LLMs, because the cost of code review will not be justified. The further we move from the core, the more we should use LLMs, because the acceleration benefits increase while design quality becomes less impactful.

In 2026, the question is no longer “Does an LLM write code better than I do?” (maybe yes, maybe no). The real question is: “Can I define tasks in a way that allows the LLM to accelerate me rather than hinder me?” And this is where personal experience may matter more than whatever the LLM can generate.

Top comments (1)