DEV Community

keeper
keeper

Posted on

I Spent a Year Building an AI Verification Framework. Then I Found a Hole.

I Spent a Year Building an AI Verification Framework. Then I Found a Hole.


I spent a year writing an AI verification framework: L1 Rules → L2 Feedback → L3 Self-Consistency & Causality → L4 Framework Calibration.

Four layers, stacking up. From "is the output correct" to "is the framework itself reasonable."

I thought that was complete.

Then I read about a case study Ilya Sutskever mentioned in a recent interview. Not a paper. Not a technical talk. A clinical neuroscience case. It made me realize there's a layer underneath everything I built — I'd been checking whether AI produces correct results, but I never asked whether the thing was worth doing in the first place.


The story.

A man suffered brain damage and lost all his emotions. No sadness. No anger. No excitement. Sounds ideal — pure rationality, no emotional bias.

What happened? He spent three hours picking out socks. Lost everything in the stock market. His IQ tests were completely normal — he could compute, reason, analyze — but he couldn't decide.

Damasio's Somatic Marker Hypothesis explains it: your body comes with a pre-installed evaluation system. You see two options, your body reacts first — heart rate shifts, skin conductance flickers, stomach tightens or relaxes — and before you've "started thinking," the options are already tagged: this one's good, that one's not.

The brain damage didn't cut off feeling. It severed the tagging pathway. The patient's body still worked, but the signals couldn't reach the decision center. A and B looked identical — blank white noise. So he had to reason every single thing from scratch.

Your brain never reasons its way through every decision. It runs on "this feels right" and "this feels wrong" — then finds reasons to justify the feeling.

Ilya mapped this onto AI: LLMs have knowledge and reasoning, but they lack a built-in value system.


The missing layer.

My L1 checks rules. L2 checks results. L3 checks logic. L4 checks the framework. All of them check "is it right."

AI can do all of this, often better than humans. But none of them ask "should we" — is this worth doing? Should we go this way? Is this question worth our time?

I'm calling this L0 — The Value Layer. Not below L1-L4. In front of them. "Should we" comes before "is it right."

AI doesn't answer "should we." Not because it doesn't know how. Because it doesn't even perceive the question exists. The next-token prediction paradigm has no dimension for "is this worth doing." If it's not in the paradigm, it won't emerge.

That's why competition-grade AI writes flawless solutions, then makes boneheaded mistakes in real projects. Not a knowledge gap. Not a reasoning gap. There's no "this doesn't feel right" pathway. The knowledge tank is full. The "is it worth it" dimension is empty.


This hole isn't in just one framework.

I went back through everything I'd written. It cuts through every single one.

The Five-Layer OS — every layer needs a value judgment to operate. L0 (embodied) has no somatic markers — knows how to move but not where. L1 (app) can generate features but can't judge whether to build them. L2 (SE) can architect systems but doesn't know if the direction is right. L3 (meta-domain) can analyze but never picks a direction. L4 (meta-cognition) can reflect but doesn't know what to reflect on. The Five-Layer OS maps capability boundaries between humans and AI. What it doesn't show: there's a value-judgment line running through all five layers, and every single one stops at it.

The Mastery Framework (学透) — I explained why most people get stuck at Level 1 ("got it running but no further") using Peck's delayed gratification. This case gave me a deeper answer: deconstructing has no immediate somatic marker reward. Getting it running does. It's not willpower. Your body didn't give the signal.

Three Things AI Cannot Replace — I thought I was listing things AI technically couldn't do. Now I see they share the same structure: all three require a value function to drive them. It's not that AI tech isn't good enough. It's that the architecture has no "worth doing" dimension.


How do you fill this?

Three directions. Not solutions. Research directions.

One: Stage-based developmental training. Not one-shot. Sensitive-period-based. Each layer's teacher signal comes from a different source. L0 from physics. L1 from social feedback. L2 from multi-agent interaction. L3 from meta-learning override. Each layer has its own window. Upper layers can override lower layers but cannot delete them.

Two: Multi-agent persistent environment. Social feedback needs others. MuJoCo can teach walking but not reputation, because simulators have no "they remember you cheated" mechanism. 20-50 agents sharing a space without resets — that's how deception gets a cost and cooperation gets a payoff.

Three: Meta-learning override mechanism. Each value tag carries a counter. Counterexamples accumulate past a threshold → trigger re-evaluation. Not deleting old labels. Adding conditional judgment — "under what conditions does the old intuition no longer apply."


But there's another path.

Facebook's CICERO went a completely different direction — pure RL, no explicit value design. It spontaneously learned cooperation, deception, promise-keeping. Behavior closely matched humans.

So I set a falsification condition:

If by 2028, pure RL builds an equivalent value-judgment system, the conclusion of this essay is invalid.

Not a prediction. A door left open.


This isn't an answer. It's an interruption.

This essay isn't the answer. It's an interruption — I spent a year building capability frameworks, then found a hole. Not "I fixed it." "I found a hole."

AI verification doesn't just need better tools. It needs to know which problems are worth verifying. The Five-Layer OS doesn't just need capability mapping. It needs the one question before every layer starts: "Is this direction right?"

That question comes from L0. And what L0 needs, the current AI architecture can't provide.

I don't have the answer for this layer yet. But at least I know where it is.


This essay is part of the AI Capability Framework series. Other essays:

Top comments (0)