There's something strange about the way we talk about artificial intelligence right now. We keep hearing that models will soon "take control" of physical machines — robots, self-driving cars, industrial drones. As if AI only needed a little more computing power to get there.
The reality seems more interesting, and harder.
What LLMs actually do
A language model is not an intelligence that reasons about the world. It's an extraordinarily efficient system for reproducing patterns found in text. When it "answers" a question, it's not looking for the truth — it's calculating which sequence of words is most plausible in that context.
This works remarkably well on text-based tasks, precisely because human language is full of regularities. But this property becomes a problem the moment you step outside the domain of anticipation: a model can anticipate the next word in a sentence, but it cannot anticipate the behavior of an object slipping through a gripper at 40-millisecond intervals.
An LLM looks for the most plausible answer. A physical control system must reach a precise objective within a precise timeframe — even in a situation it has never encountered before.
The real world doesn't always look like training data
A robotic arm picking up an object has to process data in real time, correct micro-errors every millisecond, and adapt to signals the human eye cannot perceive — the slight deformation of a soft material under pressure, the micro-slip of a smooth surface, the change in resistance of a mechanical assembly depending on ambient temperature. These are continuous signals, not words.
This is why robotics and embedded systems have developed radically different approaches: control loops that continuously measure the gap between the actual state and the target, and correct in real time. Not because researchers lacked ambition, but because the problem demands it.
The natural follow-up question is: what if we trained a model on robotic scenarios instead of text? That's exactly what reinforcement learning in robotics does — and it works, in bounded environments. The problem is combinatorics: the number of possible situations in the real world is enormous. You can't cover the whole domain.
Why the illusion persists
The spectacular robot demonstrations that circulate on social media are potentially filmed in controlled environments, with scenarios carefully chosen to showcase what works. They don't show the thousands of failures that came before, or the conditions under which the same system breaks down completely.
The deeper problem is that we're very bad at telling the difference between an impressive performance and a general capability. A model that brilliantly explains the mechanics of a liquidity crisis and then cites a nonexistent author in the very next sentence illustrates exactly that gap. The fluency of the response hides the absence of any internal verification.
What this means in practice
This doesn't mean AI has no role in physical systems. Hybrid approaches — where a high-level model plans and a classical system executes — can produce serious results in bounded domains.
The real question isn't "when will AI be powerful enough to drive machines?" It's "how do we improve the behavior of a system when it encounters conditions it wasn't trained for?"
Conclusion
The more models seem to "understand", the easier it is to believe that the leap to the physical world is only a matter of time.
An LLM that encounters an unknown situation will still produce a response — it will complete, invent, fill the gap.
In the physical world, that behavior becomes dangerous. A robotic system that "improvises" in an unknown situation doesn't produce a bad answer — it produces an unexpected movement, potentially violent, in an environment where humans may be present. This isn't a display bug. It's a real physical risk.
What today's language-model-based systems lack is precisely this: the ability to detect that they're out of their domain, and stop. Not keep acting.
Top comments (0)