A new startup founded by former DeepMind lead David Silver just raised $1.1B — and it’s not building an LLM.
Instead, it’s pursuing a pure reinforcement learning architecture called a “superlearner.”
Key idea:
Replace static pretraining on human text with continuous learning through environment interaction.
Why this matters for developers:
Moves beyond token prediction → decision-making systems
Could eliminate dependence on scraped datasets
Changes how we think about agents and autonomy
But major challenges remain:
Generalization beyond structured environments
Reward design at scale
Computational efficiency
If this works, we’re not just upgrading models — we’re replacing the paradigm.
Top comments (0)