Jose Crespo, PhD

Posted on Nov 15

LLMs Are Dying - The New AI Is Killing Them

#ai #programming #mathematic #llm

LMs are already museum pieces

Yes, ChatGPT, Claude, Gemini, all of them. Brilliant fossils of a linguistic age that’s already ending. They’re decomposing in public, billions are still being spent to polish their coffins: bigger models, longer contexts, more hallucinations per watt.

The “predict-the-next-word” LLM era is over.
The new killer on the stage isn’t language, it’s world modeling.
An AI that understands reality like a conceptual puzzle, where you don’t need every piece to see the whole picture.

We just haven’t dragged the old LLM stars off the stage yet.
If you’re still building your career around prompting chatbots, you’re preparing for a world that’s already gone.

We don’t need machines that shovel tokens like brainless parrots. We need machines that carry objects through time, that bind cause to effect, that keep the scene consistent. “Stochastically parroted” was cute for 2023. It’s not intelligence. It’s probability cosplay.

And here’s what many of us have seen already before in the tech industry: if AI keeps stitching likelihood instead of reality, the industry that burned 1 trillion dollars to make chatty cockatoos will burst. Soon.

Is JEPA the New AI Hope?

In a galaxy not so far from Menlo Park, a new acronym rises from the ashes of the chatbot empire: JEPA!

The promised one. The architecture said to restore balance to the Force after the LLM wars.

Before we let Meta’s marketing department baptize it as the next Messiah, let’s decode the acronym.
JEPA stands for Joint Embedding Predictive Architecture - four words that sound profound until you realize they describe a fairly old idea wearing a new coat of GPU varnish:

Joint, because it trains two halves of a scene — one visible, one masked — and forces them to agree in a shared latent space.
Embedding, Because instead of dealing with raw pixels or words like LLMs do, it operates in dense vector representations: the hidden space where modern AI stores meaning.
Predictive, because its only trick is to predict the missing chunk from the one it still sees.
Architecture, because every new AI concept needs an impressive noun at the end to look academic.
As Yann LeCun - the new high priest of the post-LLM cult - likes to put it, in plain English:

Intelligence isn’t language, nor is it predicting the next word. Intelligence is predicting what will happen in the world.

And that’s exactly what JEPA tries to do.
It learns to guess what’s missing in a world it only half perceives — not by generating text or pixels, but by aligning internal representations so that context can explain absence.

It doesn’t write; it completes.
It doesn’t imagine; it infers.

Too Good to Be True ?

But you’re probably wondering whether this is the real deal - or just another half-baked tech product with brighter LEDs and duller intelligence

Let’s cut through the grand talk for a second.
Meta presents JEPA like a revelation carved in silicon: the dawn of “true intelligence,” the end of chatbots, the start of world-model gods.

And at first glance, it really does look that way — until you
strip off the marketing halo, and you’ll see what’s actually there.
Not quite the miracle LeCun sells, but still something valuable: a step in the right direction, but still long way to go

For now, JEPA lives mostly in videos and physical-world scenarios, while the wording layer still leans heavily on the same stochastic parrot LLMs they claim to have surpassed (see chart 1 below).

Of course, Meta just prefers not to mention that part in the brochures. No surprise there.

So yes, In the blueprint JEPA looks clever - new modules, shiny arrows, and a fresh sense of purpose, but underneath, we’re still stirring the same pot.
The word-soup problem just got an upgrade to concept-soup.
Different flavor, same indigestion.

To be fair, JEPA isn’t pure marketing incense.
As the chart below shows — humorously stripped of Meta’s triumphal glow its context encoder, target encoder, and predictor form a neat little triad that genuinely breaks from the token tyranny of LLMs.

So what can you actually do with JEPA, you’re asking?
Well - you can show it half an image, and it predicts the missing part for you, not by painting pixels, but by reasoning in latent space.
That’s progress: perception over parroting.

And here’s the lineup:

I-JEPA recognized ImageNet objects with only a few labeled samples, impressive. But it still fails under fixed distractor noise and never touches language.

Then comes V-JEPA, trained on videos to learn motion intuition, what moves, when, and why. But not real physics: collisions, forces, contact still out of reach.

More into robotics? V-JEPA 2 guides robotic arms, predicting how objects behave before acting. But it still trails humans on physical benchmarks and, irony of ironies, needs an LLM to talk about what it sees.

So yes, progress, enough to declare LLM technology as a fossil in artificial life maintenance - but still flatland thinking dressed up as revelation.

And that’s the part Meta doesn’t want you to see.

Here’s the fatal secret hiding behind JEPA’s slick diagrams and LeCun’s confident declarations: they solved the linguistic problem by creating an architectural one that’s mathematically far worse.

They escaped the word-prison only to build a concept-prison with thicker walls.

Think about it: LLMs were bad because they treated reality as a linear sequence of tokens: a one-dimensional march through probability space. JEPA says “we’re smarter than that” and jumps into high-dimensional representation space, where concepts float as vectors in a 768-dimensional cloud.

Sounds impressive, right?

Wrong.

They just traded a bad neighborhood for a worse one: the kind where the laws of mathematics themselves guarantee failure.

The Fatal Mathematical Doom of the New AI Architectures: They Still See a Flat World
And now, get ready, dear reader, to learn what you won’t read anywhere else. How the mathematical poison is silently killing every AI architecture — from the LLM dinosaurs to the new kids who swear they’ll fix everything.

Stop. Breathe 😄 This is where the story gets dangerous.

But don’t worry, we have the antidote. And you can use it to your own advantage.

Before you go on, a small recommendation.
If you want to dive deeper - into the gritty, nerdy, beautiful mathematics that prove beyond doubt how the dual-number toroidal model cures hallucinations and the myopic doom haunting today’s AI architectures - I’ve left a few doors open for you:

→ [JEPA-AI: Core Technologies & Programming Stack]
→ The Mathematical Myopia of New AI Architectures
→ Mathematical Core Equations: AI JEPA’s Failures vs. AI-Toroidal Truth

Take your time there if you wish.
If not, stay with us.
We’re about to cross the event horizon.

Now, let’s go on.

Imagine you’re trying to organize your massive music library. You could throw all your songs into one giant folder and hope your computer can tell the difference between “Stairway to Heaven” and “Highway to Hell” based on… vibes? That’s basically what JEPA and the new AI architectures are doing.

Here’s the problem: these systems live in what mathematicians call “Euclidean space”, basically a flat, infinite spreadsheet where everything is a bunch of numbers floating around. Sounds reasonable, right? Wrong.

This is why you’ll find the same mathematical doom baked right into the so-called “next generation” AI systems: the ones sold as the antidote to LLM poison.
They promise salvation but inherit the same broken math.

— Welcome again to the Hall of AI Shame. Here they are. —

The Birthday Party Disaster
You know the birthday paradox? Put 23 random people in a room, and there’s a 50% chance two share a birthday. Now imagine your AI has to store a million concepts in this flat number-space. The math says: collision guaranteed. “White truck” ends up looking almost identical to “bright sky” because they landed in nearly the same spot in this giant number soup, and there’s how you got those self-driving Tesla killer cars.

It’s like trying to organize a library by randomly throwing books into a warehouse and hoping you can find them later by remembering their approximate coordinates… you can’t.

The Gradient Descent Hamster Wheel
Current AI architectures use something called “gradient descent” to find the minimum error given an error function, which is a fancy way of saying they stumble around in the dark, poking things with a stick, hoping to eventually find the exit.

The problem? They use fake infinitesimals with point-wise myopic vision. They can’t see the shape of the hill they’re trying to climb down, just one pebble at a time. It’s like trying to navigate San Francisco with a blindfold and a magnifying glass that only shows you one square inch of sidewalk.

But wait, it gets dumber: you have to pick your epsilon (the step size) before you start stumbling. Pick it too big? You’re that drunk guy at a party taking huge steps and crashing into walls. Too small? You’re inching forward like a paranoid snail, and you’ll die of old age before you get anywhere. Yup, this whole mess comes from 19th-century calculus and its epsilon–delta limit formalism.

But the craziest thing of all happens during training: AI runs billions of these tiptoeing optimization steps trying to minimize its loss function. Billions! Each one either jumps off the cliff like a buggy robot or advances like Windows loading at 1%. The computational waste is absolutely bonkers - all because this outdated framework, born in the 19th century, forces you to pick an epsilon value beforehand.

The second error caused by this outdated way of doing infinitesimal calculus is the compounding effect of tiny approximation errors. You start with something like 10^-8 and think, “eh, close enough to zero, right?” Wrong. Square it and you get 10^-16. Still. Not. Zero. After billions of iterations, these pretend infinitesimals pile up like compound interest from hell, spawning numerical explosions, instabilities, and rounding errors that eventually turn into full-blown AI hallucinations.

Yup, there is an easy solution: switch to a dual-number system that laughs at this entire clown show. No limits. No epsilon guessing games. No billion-step hamster wheel. When ε² = 0 by definition, not approximation, actual mathematical law: derivatives are exact, and topology just tells you where everything belongs. No stumbling required.

The Attention Apocalypse
Transformers (the tech behind ChatGPT, not the robots) use something called “attention” where every word looks at every other word. That’s N-squared complexity: which means if you double your text length, the computation goes up 4x.

With 1000 words? That’s a million comparisons. With 10000 words? 100 million comparisons. Your AI is basically reading a book by comparing every single word to every other word simultaneously. Exhausting and expensive.

How Our Toroidal Model Fixes the AI Flatland Doom
Instead of a flat spreadsheet, we use a donut (mathematically, a torus). Stay with me here.

On a donut, you can wrap string around it in different ways: around the hole, through the hole, or both. These “winding patterns” give every concept a unique address that cannot collide. It’s not probability, it’s topology. Different winding patterns are as different as a circle and a figure-8. They literally cannot become each other.

The Real Infinitesimals

We use dual numbers where ε² = 0 isn’t an approximation - it’s the definition. This means our layers are separated by actual infinitesimals, not fake ones. No numerical explosions. No gradient descent needed. The topology just… works.

Sparse by Design
Our attention mechanism only connects compatible winding patterns. Most connections are exactly zero - not “close to zero” but structurally impossible. This drops complexity from N-squared to linear. That 100 million comparisons? Now it’s 10,000.

The Bottom Line
JEPA and these new AI architectures are the right idea to replace our very primitive state towards truly AI. But still, like LLMs, are repeating the same error by trying to navigate a flat world with a bad compass and approximations.

The real leap won’t come from another tweak in parameters.
It will come from changing the space itself.

We need to abandon the Euclidean coffin that traps intelligence in two dimensions, and build in a topology where meaning can breathe.
In our toroidal model, concepts don’t collide or blur.They live in separate, protected neighborhoods: each one infinitesimally distinct, each one safe from the chaos of false merges.

Why Toroidal AI Is Not Being Built — Yet
Now, the sharp reader may ask:
“If this is so obvious, why hasn’t anyone built it yet?”

Fair question - and the answer itself proves the point.

Institutional Inertia
The entire AI establishment has sunk trillions into Euclidean architectures. Every framework, every GPU kernel, every optimization routine assumes a flat world. Replacing that geometry would mean rebuilding the cathedral from its foundations - and few engineers dare shake the pillars of their own temple.
The Workforce Barrier
A full generation of machine-learning engineers has been trained to think in gradients, not geometry. Retraining them to reason with curvature, continuity, and dual numbers is not a weekend tutorial - it’s a civilizational shift in mathematical literacy.
Patents and IP Locks
Big Tech doesn’t innovate; it defends its moat. Any true geometric paradigm shift would invalidate entire layers of existing intellectual property and licensing chains. The system isn’t optimized for truth - it’s optimized for control.
The Sunk-Cost Fallacy
From cloud infrastructure to AI chips, everything has been built for the flatland paradigm. Even when engineers know it’s broken, the machinery keeps running - because admitting it would collapse too many balance sheets and too many egos.

So yes - it hasn’t been built yet.
Not because it’s wrong.
But because it’s too right, too disruptive, and too costly for a world addicted to its own mistakes.

And that’s precisely why it will happen.
Because math doesn’t care who resists it.
It simply wins - always.

And soon, the new startups will notice the gap. You’ll watch Toroidal AI evolve exactly as every disruptive technology has before it:
First ignored - too threatening to a trillion dollars in sunk investments.
Then ridiculed “crackpot” accusations from people who don’t understand topology.
And finally, triumphantly accepted “Of course! We always knew Euclidean space was wrong.”

History doesn’t repeat itself.
It curves. 😁

Top 10 Essential References

On JEPA Architectures:
1.- LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures (September 2025)

2.- ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning (April 2025)

3.- Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud (February 2025)

On Transformer Attention Complexity:
4.- The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures (October 2024)

5.- FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning (October 2023, still widely cited in 2024–2025)

On Toroidal/Topological Neural Networks:
6.- Toroidal Topology of Population Activity in Grid Cells (January 2022, Nature — still foundational for 2024–25 work)

7.- Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry (June 2022)

On Dual Numbers & Automatic Differentiation:
8-. Dual Numbers for Arbitrary Order Automatic Differentiation (January 2025)

On LLM Hallucinations:
9.- Why Language Models Hallucinate (September 2025)

Bonus - Yann LeCun’s Vision:
10.- Navigation World Models (April 2025)

DEV Community