Nhường Phan

Posted on Apr 12

We've Been Thinking About AI Code Wrong. Code Isn't Flat — It's Layers.

#discuss #ai #programming #vibecoding

I've been vibe coding almost every day for the past year. Claude, GPT, Copilot — they're all incredible. I can go from idea to working app in hours instead of weeks.

But I keep hitting the same wall.

And I think the wall isn't AI. The wall is code itself.

The $200 Bug

Last month I was building a payment feature. Minute 1, I told AI:

"Balance can never go negative."

AI wrote the check. Clean code. Looked perfect.

30 minutes later, I asked AI to add a bulk transfer feature. The context window was getting full. AI generated new code — and somewhere in that new code, it created a path where balance could go to -$200.

AI didn't hallucinate. It didn't write bad code. It literally didn't have my constraint in context anymore. The constraint existed in my prompt from 30 minutes ago — which had already scrolled out of the window.

The constraint lived in 1 line of English. Then it disappeared.

I fixed it. Added a test. Moved on. But it kept bugging me.

Then I realized: this isn't an AI problem. This is a code problem.

Why Code Is Broken (For AI)

Think about how your brain works when someone says "build a money transfer feature":

"Transfer money between two users"
  → ok, first validate, then transfer, then save
    → validate means: check users are active, check balance
      → check balance means: sender has enough money
        → that means: if sender.balance < amount, reject

You think in layers. From vague to specific. Each layer only has 3-5 concepts. Your brain naturally decomposes the problem.

But when you write code? All layers collapse into one flat file.

# This is what your brain thought:
#   Layer 0: Transfer money safely
#   Layer 1: Validate → Execute → Save
#   Layer 2: Check sender active, check balance
#   Layer 3: if sender.balance < amount: raise Error
#
# This is what survived:

def transfer_money(sender, receiver, amount):
    if sender.status != "active":
        raise InvalidUserError(user_id=sender.id)
    if sender.balance < amount:
        raise InsufficientBalanceError(...)
    new_balance = sender.balance - amount
    # ... 50 more lines

The why disappeared. Only the how survived.

Layer 0 ("transfer money safely") became a function name.
Layer 1 ("validate, then execute, then save") became... nothing. Maybe a comment if you're lucky.
Layer 2 became the actual code.

When AI works on line 50, it has no idea what you decided at "layer 0." The architectural intent is gone. The constraints are gone. The reasoning is gone.

Code is a lossy compression of thought.

What If the Layers Don't Disappear?

This is the idea I can't stop thinking about.

What if you write exactly how you think — starting broad, getting more specific — and every layer IS the code?

"Transfer money safely"

  "Validate both users"
    "Sender must be active, otherwise reject"
    "Balance must be sufficient, otherwise reject"

  "Execute the transfer"
    "New sender balance = sender balance minus amount"
    "New receiver balance = receiver balance plus amount"

  "Save and return result"
    "Atomically save both balances to database"
    "Return result with updated users"

Read that again.

Line 1 is what a product owner writes in a ticket.
Lines 3-4 are what a business analyst writes in a spec.
Lines 7-8 are what a developer writes in code.
Line 11 is what a DBA cares about.

Same file. Same syntax. Same language. Every line is English. Every line is code.

The depth is unlimited. Simple feature? 2 layers. Banking compliance feature? 10 layers. Each layer only adds the detail that layer needs.

The Key Insight: Constraints Flow Downward

Here's where it gets interesting.

When you write "Balance must be sufficient" at layer 1 — what happens to that constraint at layer 4? In normal code, it disappears. Someone has to remember it. AI has to have it in context.

But what if the system remembers it?

Layer 0: "Transfer money safely"
         constraint: "balance must never be negative"     ← written once

Layer 1: "Execute the transfer"
         constraint: "balance must never be negative"     ← auto-inherited

Layer 2: "Calculate new sender balance"
         constraint: "balance must never be negative"     ← auto-inherited

Layer 3: new_balance = sender.balance - amount
         system check: can this violate "balance >= 0"?
         → YES, if sender.balance < amount
         → ERROR: "Constraint 'balance never negative' may be violated.
                   Counterexample: balance=100, amount=150 → new_balance=-50"

The constraint flows DOWN through every layer automatically. The system catches violations mathematically — not through tests, not through AI memory, but through structural inheritance.

I call this Constraint Inheritance Chain (CIC):

DOWN: Parent constraints automatically apply to ALL children at any depth
UP: When a child step is verified correct, its result becomes a fact the parent can use
ACROSS: Output from step 1 is automatically available to step 2
DIAGONAL: Type definitions inject constraints everywhere they're used

That last one is powerful. Watch:

"WalletBalance is a number that is never negative"    ← define once

"User has a balance (WalletBalance)"                  ← use the type

"Transfer money from sender (User)"                   ← use User

The system automatically knows: sender.balance >= 0. Nobody wrote that constraint for transfer_money. It was inferred from the type definition, through the record field, into the function scope.

In my prototype, developers write ~5 explicit constraints, and the system verifies ~16 total — because 69% are auto-inferred from type definitions.

Same Logic, Three Styles

One thing I'm exploring: what if the same code can be written in multiple styles? Product owners, developers, and AI all think differently. But the underlying logic is the same.

Document style (product owner writes this):

transfer money safely
  given sender is a User, receiver is a User, amount is a PositiveAmount
  producing TransferResult or InsufficientBalanceError

  requires that sender is active
  requires that receiver is active
  must always satisfy balance is non-negative
  guarantees that sender balance decreases by exactly the amount

Code style (developer writes this):

do transfer money safely
  from sender:User, receiver:User, amount:PositiveAmount
  -> TransferResult or InsufficientBalanceError

  promise before: sender.status is "active"
  promise before: receiver.status is "active"
  promise always: sender.balance >= 0
  promise after: result.sender.balance is old(sender.balance) - amount

Same parse tree. Same verification. Same output. The system doesn't care which style you use. AI tends to use document style naturally (because that's how language models think). Developers can use code style if they prefer.

What Gets Generated

When all layers are detailed enough, the system compiles to real, runnable code:

# Auto-generated from the layered description above

async def transfer_money_safely(
    sender: User,
    receiver: User,
    amount: PositiveAmount,
) -> TransferResult:
    """Transfer money safely between two users."""

    # --- contracts ---
    assert sender.status == "active", "sender must be active"
    assert receiver.status == "active", "receiver must be active"
    _old_sender_balance = sender.balance

    # --- validate both users ---
    if sender.status != "active":
        raise InvalidUserError(user_id=sender.id)
    if sender.balance < amount:
        raise InsufficientBalanceError(
            current_balance=sender.balance,
            requested_amount=amount)

    # --- execute the transfer ---
    new_sender_balance = sender.balance - amount
    new_receiver_balance = receiver.balance + amount

    # --- save and return result ---
    async with db.transaction():
        await user_repo.update(sender.id, balance=new_sender_balance)
        await user_repo.update(receiver.id, balance=new_receiver_balance)

    result = TransferResult(
        sender=replace(sender, balance=new_sender_balance),
        receiver=replace(receiver, balance=new_receiver_balance),
        amount=amount)

    assert result.sender.balance == _old_sender_balance - amount
    return result

Plus auto-generated tests:

def test_transfer_rejects_inactive_sender():
    """Violate: requires that sender is active"""
    sender = User(status="suspended", balance=1000)
    with pytest.raises(InvalidUserError):
        transfer_money_safely(sender, receiver, 100)

def test_transfer_rejects_insufficient_balance():
    """Violate: balance must be sufficient"""
    sender = User(status="active", balance=50)
    with pytest.raises(InsufficientBalanceError):
        transfer_money_safely(sender, receiver, 100)

def test_transfer_happy_path():
    """Verify: sender balance decreases by amount"""
    sender = User(status="active", balance=1000)
    result = transfer_money_safely(sender, receiver, 200)
    assert result.sender.balance == 800

From layers → verified correct → Python + tests. Automatically.

Why This Matters For AI Coding

Current AI coding flow:

idea → AI writes code → hope it's correct → test → find bugs → fix → repeat

With layers:

idea → describe in layers → system verifies consistency → generate code

The difference:

AI never sees more than 3-5 concepts per layer. No context window overflow.
Constraints are structural, not contextual. System remembers them even when AI's context window is full.
Verification happens before code generation. Bugs caught at description time, not runtime.
Every layer is readable English. Code review = reading a document.

The Uncomfortable Question

Here's what I keep coming back to:

Do we actually want to "write code"?

Or do we want to describe behavior at increasing levels of detail until it's precise enough to run?

Because if it's the second one... then maybe the right tool isn't a better code editor, or a better AI model, or better prompting.

Maybe the right tool is a language where description IS code. Where the document you write first is the source of truth. Where layers don't collapse. Where constraints don't disappear. Where AI and humans work at whatever level of detail they need.

I've been prototyping this — a language where you write structured English that decomposes layer by layer until it's precise enough to compile. Where the system mathematically verifies that every layer is consistent with every other layer. Where the generated code is guaranteed to match your description.

The tagline I keep coming back to:

Write a document. Run it as code. It can't be wrong.

Still early. Still rough. But every time I use it, I can't go back to flat code.

Curious if this resonates — or if I'm overthinking it. Have you hit the "flat code" wall with AI? How do you deal with constraints disappearing mid-session?

If you're interested in following this project, I'll be sharing updates as the prototype matures. Drop a comment or follow — I'd love to hear your experiences with AI coding pain points.

Top comments (9)

dtannen • Apr 13

The spec is the most important part of ai development. Any feature needs iteration on the spec until what is says in unambiguous. After the code is written, you can then ask is it 100% aligned with the spec.

Nhường Phan • Apr 13

Totally agree - the spec is everything. But here's what bugs me: right now the spec and the code are two separate things. You write a spec, then AI writes code, then you manually check "does code match spec?"
What if they're NOT separate? What if the spec IS the code?
You write the spec in layers — from high-level intent down to precise logic. Each layer reads like English. And the system mathematically verifies that every layer is consistent. Then it generates Python/TS from the lowest layer.
No manual "is it aligned?" check needed. The structure makes misalignment impossible.

dtannen • Apr 13

We are saying the same thing. Define the layer architecture in the spec and the implementation will follow. People talk about LLM producing hard to read code. If you ask LLM "do a code quality review" and get yourself to A grade then its not.

Nhường Phan • Apr 14

You're right that we overlap on the foundation - spec-first is king. But I think there's a key difference in how we handle the "is it aligned?" step.
Your approach: good spec -> LLM codes -> LLM reviews -> A grade. That works, but the review is still LLM opinion - it can miss things, just like a human reviewer can.
What I'm proposing: the spec itself has layers, and each layer is formally checked against the one above. So by the time you reach code, alignment is already guaranteed - no review step needed.
Simple analogy: your way is like proofreading an essay really carefully. My way is like using a calculator - the structure doesn't allow wrong answers in the first place.
Both are better than what most people do today. I just think we can push it further.

Tual Maxime (@filozofer) • Apr 22

It's a very interesting post ! 🤯
I'm thinking about using yours thoughts in combination with git shadow usage.
See this post for more information about git shadow : dev.to/filozofer/i-was-asked-to-de...
Or gitshadow.dev website directly.

leob • Apr 21 • Edited

Deep stuff, you could be on to something ... however, wouldn't "spec driven development" already solve this? You say that the "balance < 0" requirement "dropped out of the context window" - that sounds like a technical limitation of the AI tooling, to be fixed, or to be worked around ...

Context should reference a file (or files) with Specs, and these should be enforced at all times ...

Doug Wilson • Apr 21

Exactly!

Layering is necessary since constraints are enforced differently in different layers (data layer by data type, size, etc, service layer process logic and rules, and presentation layer by validation before submission).

But keeping process logic separate from rules and from data structure is key.

Jill Mercer • Apr 14

this layers concept is exactly why i spend so much time in cursor trying to get the intent right — code is never just one flat thing. usually i’m focused on the ui layer first because if the vibe is off the logic doesn't even matter. treating the stack like layers lets me focus on the interaction without getting bogged down in the syntax of every little helper function. austin taught me: just start the thing.

Nhường Phan • Apr 14

That's a great workflow honestly. But the layers I'm describing are a bit different. Not UI/logic/infra layers, more like intent layers. Think of it as: layer 1 is "what do I want", layer 2 is "how does it behave", layer 3 is "exact logic", and then code falls out of that. Each layer refines the one above, and the system verifies they're consistent. So you'd never have that moment where the vibe is right but the logic is secretly wrong underneath, because the layers are connected, not independent.
But yeah "just start the thing" is real. You can always refine layers after. The point is the structure catches when things drift apart. 😄