Patrick Cornelißen

Posted on May 5

AI-generated code: almost right is still risky

#ai #coding #testing #security

AI-generated code is often close to correct. That is exactly what makes it dangerous.

Obviously broken code is easy to reject. Code that compiles, looks reasonable and passes the happy path is much harder to distrust.

"Almost right" is not enough

In software, small gaps matter:

one missing null check
one unhandled timeout
one weak authorization condition
one unsafe default
one test that only covers the obvious path

AI tools can produce useful code quickly, but they can also create these gaps quickly.

The common failure pattern

AI coding tools tend to optimize for a plausible solution. That is not the same as a production-ready solution.

You often get:

clean-looking structure
reasonable names
a working example
incomplete edge cases
shallow tests
error handling that looks present but does not help much

This is why reviewing AI code can take a different kind of attention. You are not only looking for syntax problems. You are looking for missing thought.

Review the assumptions

A good review question is:

What did the model assume that may not be true?

Examples:

input is always valid
the API always responds
the user has permission
the array is never empty
the environment variable exists
the model response is well-formed

If the code depends on those assumptions, they need to be explicit or handled.

Tests matter more, not less

AI can generate tests, but generated tests often mirror the implementation instead of challenging it.

Ask for tests that cover:

invalid input
empty states
missing permissions
timeouts
malformed API responses
duplicate data
race conditions where relevant

The test suite should make the implementation prove something.

Ask the AI to criticize its own code

One useful pattern is a second pass:

Review this implementation as if you did not write it.
Look for bugs, missing edge cases, weak error handling and security issues.
Do not rewrite the code yet. List findings first.

This does not replace a human review, but it often catches issues from the first draft.

Keep changes small

The larger the AI-generated diff, the harder it is to review. Small changes keep accountability intact.

Good task shape:

one function
one component
one route
one failing test
one refactoring pattern

Bad task shape:

"rewrite the module"
"add auth everywhere"
"modernize the whole app"
"fix all tests"

The model can produce large changes. That does not mean you should accept them.

Bottom line

AI coding tools are useful, but the quality bar does not change. Code still needs tests, review, error handling and security thinking.

Treat AI output as a fast draft, not as a trusted implementation.

This article is based on the German original on KIberblick:
https://kiberblick.de/artikel/grundlagen/ki-code-qualitaet/

DEV Community