Shinsuke Matsuda

Posted on Jan 22

How Do We Keep Evolving a 100k-Line Codebase in the Age of AI?

#architecture #ai #devops #llm

Plan Stack as a Methodology

Imagine a codebase with over 100,000 lines of code.

Six months ago, an AI-generated pull request added several thousand more.
The tests passed. The review was rushed. The change was merged.

Today, you need to modify that area.

You can read the code.
You can see what it does.

But you have no idea why it is the way it is.

No one remembers what constraints existed.
No one knows which alternatives were considered.
And the person who “wrote” the code was an AI.

This situation is no longer rare — and it’s not temporary.

The real problem is not “too much code”

Traditionally, even when code was messy, we could infer intent.

We knew who wrote it
We remembered the discussion
We could reconstruct the reasoning from context

With AI-generated code, intent lives somewhere else.

It depends entirely on what instructions were given to the AI —
instructions that are usually not preserved.

So reviewers are left asking:

Was this design intentional or accidental?
Were other options considered?
What constraints existed at the time?

The code doesn’t answer these questions.

As this repeats at scale, reviews slowly degrade into:

“Does this look obviously broken?”

Eventually, review itself stops scaling.

This is not a review problem — it’s a methodology problem

The real issue is bigger:

How do we keep evolving a 100k- or 1M-line codebase

over years, when most of the code is written by AI?

This is not about productivity.

It’s about control, maintenance, and long-term evolution.

“Just write plans and review them” — that’s only the entrance

Plan Stack is often summarized as:

“Write a plan first, commit it, and review the plan.”

That sounds like a review optimization trick.

It’s not.

That’s just the entrance.

The real value of Plan Stack

Plan Stack provides a structure that makes AI-driven development sustainable at scale.

More precisely, it enables three things that large, long-lived codebases
struggle with in the AI era.

Scale: letting humans review intent, not output

AI can generate unlimited code.
Humans cannot review unlimited details.

In large systems, the bottleneck is no longer implementation —
it’s human judgment.

By introducing plans as first-class artifacts,
humans review intent instead of raw diffs:

What is in scope for this change?
What is explicitly out of scope?
Which trade-offs are being made this time?

This allows a small number of humans to stay in control,
even as AI-generated output grows by orders of magnitude.

Maintainability: preserving the “why” over time

Six months later, the hardest question is never:

“What does this code do?”

It’s:

“Why is it like this?”

Plan Stack makes that answer explicit.

Not as comments.
Not as detached documentation.

But as a plan that is reviewed, committed,
and permanently associated with the change.

This is what allows future maintainers
— including your future self —
to re-enter the decision context quickly.

Continuous evolution: plans as accumulated knowledge

A plan is not a disposable note.

In a long-lived codebase,
plans accumulate as a decision history:

Why this abstraction exists
Why a shortcut was acceptable at the time
Why a deeper refactor was deferred

When the next change comes,
those past plans become shared context
for both humans and AI.

As the codebase grows to hundreds of thousands or millions of lines,
this accumulated judgment becomes more valuable, not less.

Isn’t this just ADRs or design docs?

This question always comes up.

ADRs and design documents are excellent at capturing
big, infrequent decisions:

Architecture choices
Technology selection
Long-term direction

But AI-driven development explodes the number of
small, local decisions:

How much abstraction is enough for this change?
Should we optimize now or accept some debt?
Which edge cases are intentionally ignored?

These decisions shape the codebase,
but rarely make it into ADRs.

Plan Stack exists exactly at this missing layer.

The key difference: proximity to code

ADRs and design docs are distant from code:

Rarely updated
Easy to drift out of sync
Not part of the PR review loop

Plans, in contrast:

Are written per PR
Are reviewed together with code
Live and die with the change itself

Plans are not documentation.

They are part of the commit.

Is the plan for humans or for AI?

At first glance, plans look like instructions for AI.

They do help AI produce more stable output.

But that’s not the point.

AI can write code without plans.

Humans can’t judge AI-written code without them.

The real beneficiary of Plan Stack is the human.

Plans as externalized human reasoning

AI-generated code removes the human “thinking trace” from the artifact.

Plans restore it.

They externalize:

What was decided
What was deferred
Where compromises were made

This preserved judgment is what makes large AI-written codebases
maintainable instead of merely functional.

Plan as discipline: how humans retake control

This process is not just a workflow improvement.

It is a form of discipline in AI-driven development.

The simple rule —
“don’t let AI write code immediately” —
creates a crucial buffer.

That buffer pulls control back to the human side.

1. Shift-left review: intervene at the cheapest point

In traditional development,
post-implementation code review
was the last line of defense for quality.

In the AI era, reviewing thousands of generated lines after the fact
is not realistic.

Having the AI write a plan first, and letting humans review that plan,
means intervening where mistakes are cheapest to fix.

This prevents development from turning into
a slot machine of “generate → patch → regenerate”.

2. Reducing cognitive load, focusing on decisions

Writing a plan from scratch is cognitively expensive for humans.

But having the AI produce a draft plan changes the equation.

Humans no longer start from a blank page —
they review, adjust, and approve.

This lets humans focus on the highest-value activity:
decision making.

AI: expands the space — options, dependencies, edge cases
Humans: narrow it — scope, trade-offs, priorities

3. The plan as a contract

Once a plan is reviewed and agreed upon,
code generation becomes contract execution.

If the output looks wrong, the question becomes clear:

Was the plan ambiguous?
Or did the AI fail to execute it?

Spending more time reviewing the plan
often results in less time spent coding and debugging overall.

This is how Plan Stack achieves both scale and control.

The future role of humans: the decision-maker

In the AI era, human roles converge toward one thing:

Making decisions under constraints.

What matters now?
What can wait?
What risks are acceptable?

AI executes.

Humans decide.

Ownership without authorship

When AI writes most of the code, authorship becomes blurry.

Ownership doesn’t.

Ownership comes from recorded judgment.

If decisions are preserved,
the codebase remains human-owned —
even when AI-written.

Without plans, the future looks like this

Not “unmaintainable” code — worse.

Code that is untouchable.

No one dares to modify existing logic,
so new behavior gets added next to it.

The same concept ends up implemented three times,
each slightly different,
because no one knows which constraints still matter.

The system grows, but it doesn’t evolve.

Conclusion

AI writes code.

Humans decide.

And decisions must survive longer than any single implementation.

Plan Stack is a minimal structure
that allows humans to remain decision-makers —
even as AI becomes the primary author.

A small invitation to try

If this sounds heavy, start small.

In your next PR, write a short plan first:
what you decided, and what you intentionally didn’t.

It doesn’t need to be perfect.

Just make the judgment explicit — and commit it with the code.

You’ll notice the shift immediately:

AI-assisted development stops being generation

and starts becoming control.

DEV Community