orenlab

Posted on Jun 10

Code Review Starts Too Late

#ai #python #codequality #devtools

Most tools in AI-assisted development answer questions like:

What changed?
Is the code worse?
Does the patch pass tests?
Did the agent complete the task?

These are useful questions.

But they all share the same assumption:

The review process begins when a diff exists.

That assumption made sense when humans were the primary authors of code.

I’m no longer convinced it does.

The Real Decision Happens Before the Diff

When an experienced engineer receives a task, they usually make several decisions before writing a single line of code:

What should be changed?
What should not be changed?
Which modules are relevant?
Which dependencies are dangerous?
How far should this change spread?

Most of that reasoning happens implicitly.

With AI agents, the situation is different.

An agent receives an objective and immediately begins exploring the codebase.

The first important decision is not:

How should I implement this?

The first important decision is:

What am I allowed to touch?

That decision often determines whether the change remains safe, maintainable, and reviewable.

Yet most tooling only starts paying attention after a patch has already been generated.

AI Broke a Long-Standing Assumption

Traditional code review assumes:

Task ↓ Implementation ↓ Diff ↓ Review

The diff is treated as the beginning of the review process.

For AI agents, the diff is often the end of the decision process.

By the time a reviewer sees the patch:

scope decisions have already been made
dependencies have already been traversed
unrelated files may already have been modified
architectural boundaries may already have been crossed

The patch records the outcome.

It does not explain how the agent arrived there.

Passing Tests Is Not Enough

Imagine a simple task:

Rename one API parameter.

An agent modifies:

the target API
eight dependent modules
three test suites
two utility files that were not part of the request

All tests pass.

Most modern workflows would consider this successful.

But should they?

The implementation may be correct.

The change process may not be.

A passing patch does not automatically mean the agent stayed within a reasonable scope.

The Missing Layer

I increasingly think there is a missing control layer in AI-assisted development.

Something between the task and the diff.

Instead of:

Task ↓ Agent ↓ Diff

The workflow should look more like:

Task ↓ Intent ↓ Blast Radius ↓ Bounded Edit ↓ Verification ↓ Diff

The goal is not to prevent changes.

The goal is to make structural expansion explicit.

If an agent decides to move from one file to twenty files, that decision should be visible.

If a change crosses architectural boundaries, that fact should be visible.

If scope expands, it should be visible before the patch reaches review.

This Is Not About Distrusting Agents

The common reaction is:

We just need better models.

Maybe.

But history suggests a different pattern.

Software engineering rarely becomes more reliable because people become smarter.

It becomes more reliable because systems become more observable.

Version control did not make developers smarter.

Testing did not make developers smarter.

CI did not make developers smarter.

All of them made mistakes easier to detect.

AI development workflows will likely follow the same path.

Reliability will not come solely from stronger foundation models.

Reliability will come from stronger control systems surrounding those models.

Reviewing Decisions Instead of Diffs

For years, software engineering focused on reviewing code.

As agents become primary contributors, a different question emerges:

Should we only review code?
Or should we review the decisions that produced the code?

The diff remains important.

But it may no longer be the beginning of the review process.

It may simply be the final receipt.

⸻

This is the direction I am exploring with CodeClone 2.1.

CodeClone is a deterministic structural review and change-control layer for Python projects. The 2.1 line adds a Structural Change Controller: agents declare intent, inspect blast radius, edit inside explicit boundaries, verify the resulting patch, and leave an auditable receipt.

The goal is not to tell agents how to write code.

The goal is to make agentic changes explicit, bounded, observable, and verifiable before the diff becomes the only evidence we have.