Kair Akhmettayev

Posted on May 19

AI Coding Is Fast Now. Engineering Trust Still Has to Be Earned.

#ai #codereview #programming #softwareengineering

AI tools have dramatically increased the speed of software development.

That is a fact.

Today, a model can write a function or method in minutes, sketch out tests, suggest a migration, explain an error, propose a refactoring plan, or draft an initial architecture decision.

This no longer feels like magic.

It is becoming a normal part of engineering work.

But speed has introduced another problem:

we have lost confidence.

And I do not mean only confidence in code quality.

I mean confidence that the code will actually work correctly and reliably.

A team receives an AI-generated answer: confident, coherent, often useful.

But the main question for developers is no longer whether AI can suggest something.

It can.

The real question is different:

Can we trust that suggestion?

The problem is not that AI makes mistakes

Everyone makes mistakes:

people;
tests;
documentation;
static analyzers;
models.

The problem with AI-generated answers is different:

they often make mistakes beautifully.

An answer can look logically structured, well-written, and very convincing. It may use the right terminology, sound professional, and even include code snippets that look completely valid.

But that is not always enough for a reliable engineering decision.

A developer or tech lead still needs to understand:

which files the model actually considered;
which facts from the codebase the answer is based on;
which assumptions were made without evidence;
which hypotheses were considered and rejected;
which checks are still open;
whether this can be merged, or whether it is only a diagnostic conclusion.

Without that visibility, an AI answer becomes a new kind of technical debt.

The model saved time by producing the first version.

But it pushed the verification burden back onto the team: figuring out where the answer contains facts, where it contains assumptions, where the risks are, and where it is simply a confident guess.

A confident answer is not the same as a verified answer

In a regular chat interface, the final answer often looks like the final truth.

The model says:

Here is the root cause.

Here is the fix.

Here are the tests.

And for simple cases, that may be enough.

But in a real project, details matter:

Was a neighboring call-site missed?
Did a contract change in another module?
Is the fix based on a file the model never read?
Did the model mix existing code with code it invented itself?
Did it present an assumption as a confirmed fact?
Was important criticism lost on the way to the final answer?

These are not edge cases.

This is everyday engineering work.

That is why the problem with AI coding in teams is not only the quality of the model.

The bigger problem is the lack of a verifiable process around the answer.

What a good AI engineering artifact should contain

If an AI answer is used in engineering work, it should look more like a reviewable artifact than a polished chat message.

A useful artifact should show the following.

1. What is being proposed

Not a vague statement like:

improve validation

But specific files, functions, tests, and the boundaries of the change.

2. What evidence from the codebase supports the answer

The model should show which files or code fragments confirm its conclusions.

3. Which assumptions are still assumptions

If behavior was not confirmed by the code that was actually read, this must be stated clearly.

4. Which hypotheses were rejected

This is just as important as the final conclusion.

A good investigation shows not only what turned out to be true, but also what was checked and ruled out.

5. Which checks remain open

Some things cannot be honestly closed without additional files, tests, running the project, or a human decision.

That is not a failure if the system says it explicitly.

6. Trust status

The result should distinguish between:

this can be considered a patch candidate;
this is useful diagnostics, but not a merge-ready patch.

This kind of format changes the role of an AI answer.

It stops being just generated text and becomes an engineering decision that can be reviewed.

Verification should be part of generation

One might say:

Fine, let the model write the answer first, and then we’ll ask it to check itself.

For small tasks, that works.

Sometimes.

But once the task becomes more serious, post-fact verification quickly runs into limitations:

the model may defend its own previous answer;
some evidence may already be lost from the context;
criticism may remain as prose, but never affect the final result;
open checks may be softened to make the final answer look cleaner;
generated code may not make it into the final answer in full.

That is why verification should be part of the process, not an optional step at the end.

Especially not something a developer only remembers after the problem has already happened.

We need a process where different agents or model roles do different things:

some propose a solution;
others criticize it;
a separate step synthesizes the overall conclusion;
the system checks evidence and open items;
the final answer receives a trust status.

What matters is this:

using multiple AI roles does not automatically make the answer correct.

The value is not in:

models argued, so now it must be right

The value is that the argument, evidence, risks, rejected hypotheses, and limitations do not disappear.

They become part of the final artifact.

This is exactly why I am building Undes

Undes is a local-first AI engineering CLI that does not simply generate an engineering answer.

It generates the answer together with verification.

The idea is simple:

AI generates.

Undes verifies.

A single prompt should not produce just “a model answer”.

It should produce a verifiable engineering result:

proposed implementation or diagnostic answer;
evidence from the codebase;
assumptions;
rejected hypotheses;
risks;
open checks;
trust / patch-safety status.

Undes builds a structured workflow around the task:

proposal;
critique;
synthesis;
evidence checks;
risk review;
final artifact.

It is not trying to replace Cursor, Claude Code, Copilot, or other AI coding tools.

Those tools are useful.

They accelerate generation.

Undes focuses on a different layer:

making AI-generated engineering answers more trustworthy and more useful for teams.

Why local-first matters

For an engineering trust tool, it matters where the code lives.

The community version of Undes is designed as a local-first CLI:

the code is read locally;
the user configures access to model providers;
the result stays on the developer’s machine.

This does not mean there are no calls to LLMs, whether cloud-based or local.

But the process itself runs locally on the developer’s machine.

For many teams, this is an important boundary.

A trust-focused engineering tool should not begin with:

Upload your entire codebase to us.

What Undes does not promise

There is an important point here.

Undes does not promise magical correctness.

It does not turn AI into a formal verifier.

It does not replace:

tests;
code review;
CI;
security review;
engineering responsibility.

In fact, the strength of this approach is honesty:

if there is not enough evidence, the result should be diagnostic;
if there is an unresolved risk, it should be visible;
if generated code is based on an assumption, that should be stated;
if the task requires a human decision, the system should not pretend everything is closed.

For a team, this is more practical than a polished but overconfident answer.

Where this is especially useful

This approach is not needed for every small question.

If you just need to quickly recall syntax or draft a throwaway script, a regular chat is enough.

Undes makes sense where the cost of a mistake is higher:

feature implementation;
bug fixes in an unfamiliar part of the project;
migration planning;
architecture decision review;
incident investigation;
refactoring that may break neighboring contracts;
codebase onboarding, where it is important to separate facts from assumptions.

In these cases, a fast answer is only half of the value.

The other half is understanding how well that answer is proven.

What should the next step in AI coding look like?

The first wave of AI coding tools made generation accessible.

The next step is to make AI-generated engineering work verifiable.

Not because models are bad.

But because good engineering teams do not trust a result just because it sounds confident.

They look at:

evidence;
risks;
contracts;
tests;
open checks;
boundaries of applicability.

AI tools should help us not only write faster, but also make fewer mistakes.

That is the direction I want to move Undes in.

Try it

I am exploring this direction in the community version of Undes, an experimental local-first AI engineering CLI.

The most useful first test is simple:

take a small real task in your repository and look not only at the final answer, but also at the trust signals around it:

which evidence was used;
which assumptions remain;
which hypotheses were rejected;
which checks are still open;
what trust status the result received.

For me, the most valuable feedback is whether the artifact exposes enough signal for a real engineering review before merge.

Because the goal is not just another polished AI answer.

The goal is an AI-generated engineering answer you can actually trust.

Disclosure: this article is based on my own experience building Undes. I used AI assistance for English translation and editing, and reviewed the final text before publishing.

DEV Community