Lars Moelleken

Posted on Jan 25

Crayons and the Wall

#ai #llm #engineering #programming

Why LLMs Break Systems — and Why That’s Our Fault

#blogPostAsWebApp: https://voku.github.io/llm_coding_constraints/

Introduction — Incentives, Not Rebellion

Imagine teaching a toddler to draw.

Every time they produce a recognizable shape on paper, you praise them:

“Great job.”
“Nice colors.”
“Very creative.”

But you never articulate the one sentence that actually matters:

“You are not allowed to draw on the wall.”

A week later, your hallway looks like a prehistoric art gallery.

The child didn’t rebel.

The child followed incentives.

You rewarded the output (drawing) and forgot to define the boundary (the paper).

That is exactly how we train and deploy Large Language Models today.

We reward:

helpfulness
fluency
clean abstractions
elegant refactorings

But we almost never encode the other half of the equation:

What must never change?
What is forbidden, even if it looks reasonable?
What exists solely because production burned down three years ago?

So the model generalizes.

If drawing is good and nobody mentioned walls, then the wall is just a larger canvas.

This post follows one red line, end to end: the missing constraint.

We’ll walk through:

why transformers are architecturally simple
why history matters more than architecture
why rules must be explicit, owned, and test-backed
and why we’re now papering over the gap with agents, prompts, and skills

The Important Realization (State It Early)

Before history, before code, here’s the punchline:

Modern LLMs are conceptually simple.

Their power — and their danger — comes from scale, data, and missing constraints.

Just like parenting.

Teaching rules is hard.

Handing out crayons is easy.

A Readable History (So We Can Argue About It)

This section exists to kill the “AI magic” narrative.

Nothing here is mystical. None of this is new.

1906 — Probability Before Intelligence

Long before silicon, the core idea already existed:

“Given what I’ve seen so far, what is the most likely next thing?”

That’s a Markov chain.

no understanding
no meaning
no intent

Just probability conditioned on context.

This is the child repeating a word because they heard it last.

No wall.

Just imitation.

1940s–1950s — Tiny Brains, Explicit Boundaries

Early neural models (perceptrons) were brutally honest.

At their core:

if (weighted_sum > threshold) {
    fire();
}

No creativity.
No abstraction.
No crayons on the wall.

Ironically, these early models had clearer boundaries than modern ones.

The wall was absolute: this fires, that doesn’t.

1990s — Memory Appears

Then we realized something important:

Context matters.

With LSTMs, earlier inputs could influence later outputs.
Sequences became meaningful.

This is the child remembering:

“Last time I drew on the table, mom was angry.”

Still:

the rule is external

the model didn’t invent it

it just remembers the consequence

2017 — The Transformer (The Code, Not the Myth)

This is where explanations usually collapse into mysticism.

So let’s not do that.

Here is the uncomfortable truth:

The transformer architecture is embarrassingly small.

Conceptually, a transformer block is just linear algebra and normalization:

# 1. Token embeddings
x = embed(tokens)

# 2. Self-attention
q = x @ Wq
k = x @ Wk
v = x @ Wv

# Attention: 
softmax(QKᵀ / √d) · V
attn = softmax(q @ k.T / sqrt(d))
x = attn @ v

# 3. Feed-forward
x = relu(x @ W1) @ W2

That’s it.

There is:

no business logic module

no ethics layer

no domain model

no notion of “this is illegal”

If you’re comfortable with matrix multiplication, you understand the engine.

The complexity is not inside the transformer.
The complexity is in the data, feedback loops, and scale around it.

Where the Metaphor Stops Being Cute

Transformers are exceptional generalizers.

That is their superpower — and their flaw.

If you teach a model:

“Clean code is good.”
“Duplication is bad.”
“Simplify logic.”

The model concludes:

“Simplify everything.”

So it removes a redundant check.

What it doesn’t know:

this check exists because of a lawsuit in 2019

this branch prevents a race condition we hit once

this “ugly” code is contractual

Patterns vs. Rules

Across Markov → LSTM → Transformer, one invariant holds:

We taught models patterns, not rules.

Patterns scale.
Rules constrain.

Children need both.
So do LLMs.

The Wall Was Never Learned

An LLM trained on your codebase sees:

the final snapshot
the cleaned-up version
the happy path

It does not see:

reverted commits
production outages
2 a.m. Slack threads
“never do this again” post-mortems

That knowledge lives:

in git history
in tests
in annotations
in human memory

Not in tokens.

Your Git History Is Parenting

Git is not just version control.
It is a decision log.

Every hotfix and revert is negative knowledge.

Example:

if ($timeout < 30) {
    $timeout = 30;
}

LLM interpretation:
Magic number. Cleanup candidate.

Git blame interpretation:
We tried 10. Production burned. 30 survived.

That if statement is the wall.

But the model can’t see it.

Rules Are Not Comments — They Are Contracts

Most teams try to fix this with:

comments
prompts
“be careful here” notes

That fails.

Comments are:

optional
unverifiable
easy to delete
invisible to tools

A real rule answers:

why does this exist?
who owns it?
how critical is it?
how is it proven?

A critical rule without executable proof is just a suggestion.

The Final Example — Drawing the Wall Properly

Name the Rule

Stop relying on folklore.
Give the rule an identity.

enum BillingRules: string
{
    case RefundLimit = 'REFUND_LIMIT_CRITICAL';
}

Define the Intent (Not the Logic)

return [
    BillingRules::RefundLimit->value => new RuleDefinition(
        statement: 'Refunds above 500 EUR require manual approval',
        tier: Tier::Critical,
        rationale: 'Fraud prevention and regulatory requirements (2021 Audit)',
        owner: 'Team-Billing',
        verifiedBy: [RefundLimitTest::class],
    ),
];

No conditionals.
No duplication.
Just why.

Attach the Rule to the Code

final class RefundService
{
    #[Rule(BillingRules::RefundLimit)]
    public function refund(Order $order): void
    {
        if ($order->amount > 500) {
            throw new ManualApprovalRequired();
        }
    }
}

Zero runtime cost.
Maximum semantic weight.

The Test Is the Concrete Wall

final class RefundLimitTest extends TestCase
{
    public function testRefundAboveLimitRequiresManualApproval(): void
    {
        $this->expectException(ManualApprovalRequired::class);

        // If an LLM (or human) removes the check,
        // this test fails. The wall holds.
        (new RefundService())->refund(Order::fake(amount: 600));
    }
}

Remove the test → the wall disappears.
Break the rule → CI fails.

This is enforced memory.

Why We’re Now Talking About AGENTS.md, Prompts, and Skills

Here’s the twist that makes everything above unavoidable.

We are suddenly adding:
_ AGENTS.md

role descriptions
skill boundaries
“this agent may / may not” rules

Not because LLMs changed.

But because we trained them without our past.

LLMs were trained on:

final snapshots
cleaned-up repositories
best practices
success stories

They were not trained on:

git history
post-mortems
reverted ideas

hacks that hold entire systems together

In other words:

Happy path in. Happy path out.

Agent docs and skill descriptions are not AI features.
They are manual history injection.

They answer questions the model can never infer:

Where does optimization stop?

Which invariants override cleanup?

When must the agent refuse?

This is not babysitting.

This is us finally writing down what humans “just knew”.

LLMs are not magical; they are pattern matchers

Transformers are conceptually small

Scale amplifies mistakes

A missing “no” is interpreted as “yes”

Or, more bluntly:

We didn’t create a monster.
We optimized the crayons and forgot the wall.

Call to Action

Pick one critical service today and ask:

What must never change here?

Why does this ugly code exist?

Where is that knowledge written down?

If the answer is “in people’s heads”,
your wall is imaginary.

Start drawing it.

The wall was always there.
We just never wrote it down.

DEV Community