Stop Asking 'Is GAI Here' — Ask 'At What Layer'
The GAI debate has a structural problem.
Someone says "passing this benchmark means GAI." A model passes it. Then they say "that benchmark wasn't hard enough." The goalpost moves.
Someone says "passing the Turing test means GAI." Models pass it. Then they say "the Turing test is too easy." The goalpost moves again.
Someone says "inventing new mathematics means GAI." Models do it. Then they say "that's just pattern matching in disguise." Goalpost moves.
This isn't bad faith. It's a missing layer definition.
We never agreed on what "general" means. Without that, every achievement gets reclassified as "not really general."
I've been working on a framework that might fix this. It started as a capability map. Then I realized: this isn't just a map. It's a GAI maturity model.
The Five Layers
| Layer | Name | Definition |
|---|---|---|
| L0 | Embodied | Perceive and operate in the physical world |
| L1 | Application | Complete single-domain tasks using tools |
| L2 | Engineering | Build and maintain systems |
| L3 | Meta-Domain | Abstract and transfer between unrelated domains |
| L4 | Meta-Cognition | Perceive and control your own thinking process |
The rule: layers cannot be skipped. It's a maturity sequence, not a checklist.
This immediately explains the goalpost problem: some people define GAI as L1. Others define it as L4. They're using different layers for the same word.
What About Models Without Bodies?
L0 requires embodiment. Text-only models don't have bodies.
The cleanest answer: LLMs have no L0. They start at L1 — cognition without embodiment. This isn't a defect. It's an architectural difference.
Humans build up from L0 (a baby senses the world before understanding it). LLMs start at L1 (they understand the world directly, skipping physical experience). The result: humans can "feel" when something is wrong — that's L0 feeding signals up to L4. LLMs don't have this channel.
The framework forced me to face something uncomfortable: human intelligence cannot exist without a body.
Six Models, Five Layers
L0 — Embodied
| Model | Verdict |
|---|---|
| Gemini 3.1 Pro | ✅ Pass |
| GPT-5.5 | ✅ Pass |
| Claude Fable 5 / Mythos 5 | ✅ Pass |
| Claude Opus 4.8 | ✅ Pass |
| DeepSeek V4 Pro | ❌ Fail |
| GLM-5.2 | ❌ Fail |
L1 — Application
Every frontier model is solid at L1. Gaps are within 5% on AIME, GPQA, HLE. This is not where differentiation lives anymore.
L2 — Engineering
| Model | SWE-bench Pro | Verdict |
|---|---|---|
| Fable 5 / Mythos 5 | 80.3 | Dominant |
| Claude Opus 4.8 | 69.2 | Leading |
| GLM-5.2 | 62.1 | Strong |
| GPT-5.5 | 58.6 | Strong |
| DeepSeek V4 Pro | 55.4 | Good |
| Gemini 3.1 Pro | 54.2 | Good |
Fable 5's 80.3% is 11 points ahead of Opus 4.8. That's not an optimization gap — it's a generation gap.
L3 — Meta-Domain
There is no benchmark for L3. Mythos 5 shows the strongest signal: protein design, genomics, cybersecurity — three unrelated domains — with autonomous work. Its genomics result outperformed a Science-published model despite being 100x smaller.
The biggest gap isn't model capability — nobody built a benchmark for L3.
L4 — Meta-Cognition
All models: no evidence. No model can accurately describe its own reasoning process in real time. The entire industry isn't targeting this capability.
What This Means
If GAI = L1 or L2, we're already there.
If GAI = L3, we don't know — no benchmark exists to verify it.
If GAI = L4, we're not close — and nobody is aiming for it.
The GAI debate isn't one debate. It's people arguing at different layers using the same word.
Next time someone says "GAI is here" or "GAI is nowhere," ask them one question:
At what layer?
Top comments (0)