JoeStrout

Posted on Apr 7

The Master Algorithm

#ai #machinelearning #discuss #science

In 2015, a book by AI researcher Pedro Domingos came out called The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. The author explored the "five tribes" of artificial intelligence (AI), and how each one might develop into the "Master Algorithm" of intelligence itself — the algorithm that could learn to do virtually anything humans and other animals can learn to do. The five tribes are:

inductive reasoning
connectionism (aka neural networks)
evolutionary computation
Bayesian networks
analogical modelling

This was ten years ago. It was very much not obvious, at that time, which of these — or what combination of them — or even whether something entirely different — might turn out to be the Master Algorithm.

But it's clear now. I'm calling it.

The Master Algorithm is neural networks.

It turns out, the master algorithm is connectionism. A neural network, when it's big enough, structured appropriately, and trained on enough data, can do it all: language, reasoning, translation, programming, answering questions, following instructions, understanding pictures and videos, generating pictures and videos, solving complex math problems, and on and on.

The goalpost-movers like to pick at each of those and say "yes, but, it doesn't do this thing very well and that thing doesn't really count because of the following hand-wavy reasons." But if you read the book, it's obvious that everything we all take for granted nowadays was far-off science fiction in 2015. AI researchers literally dreamed of machines that could do half of what ChatGPT or Claude does before breakfast, or what Gemma does running locally on my cell phone.

The other four approaches? They still have their uses in specialized cases, and they're still doing more or less what they were doing in 2015. None of them turned out to be able to do pretty much everything, nor did any of them turn out to get a lot smarter and more capable when you simply scale them up. (Indeed, most of them explode when you try to do that.)

This was not obvious.

I've considered myself a "connectionist" since middle school, when I did my first neural networks science fair project. (Fun fact: my junior year of high school, I went to the International Science & Engineering Fair with my novel algorithm for recursive neural networks, and won the U.S. Army's top prize and a 2-week trip to Japan... and I still practice Japanese every day today.)

But I did not expect neural networks, on their own, to be the master algorithm. Hardly anyone expected that. Neural networks were very good at perception (identifying/classifying patterns in raw data), and at prediction (which is really the same thing). But it was not at all clear how a neural network would ever be able to carry on a real conversation, reason through problems, etc. I thought you'd probably need some combination of a neural network and symbolic (e.g. inductive) reasoning.

I mean, of course our own brains are made entirely of neurons. So as a raw proof of concept, yes, it was obvious that neural networks could do it all. But I assumed that evolution had equipped our brains with very specialized circuits for language, logic, & reasoning, and that you would not get these facilities out of a general-purpose neural network running some general-purpose learning algorithm.

But nope. It turns out that you do get exactly that.

The moment this happened.

In What Is Intelligence? (2025), Google AI researcher Blaise Agüera y Arcas writes about the moment that I would consider the pivotal turning point. They were training a neural network to predict the next word after a bunch of previous words, so that they could make a better autocomplete function. This is a natural and obvious application of neural networks' strengths, and because people are terrible (and terribly slow) at typing messages on cell phones, better autocomplete was worth investing in.

But they had trained a very large model. On a very large amount of text. And the model started talking to them. They discovered that they could ask it questions, and it would answer. The could even ask it things that had certainly never been in its training data — like, how to translate a specific made-up sentence from one language to another — and, with a bit of urging, it would do it, and do it correctly.

I can only imagine the complex emotions they must have felt at that moment. They realized what some deniers still haven't understood today: a sufficiently deep language model isn't just parroting its training data; it is understanding and responding with real intelligence.

Now, a raw language model, without additional reinforcement learning to shape its behavior, is rather schizophrenic; it slips in and out of different personalities, carries on both halves of a conversation all by itself, etc. Few people get to interact with a network in this state, because (as Agüera y Arcas explains) it is quite disturbing. But the behavioral training fixes that, improves reasoning skills, and gives us the AI coworkers we use daily today.

Prediction, all the way down

So how is this possible? How does a neural network act as the master algorithm?

It turns out, intelligence is fundamentally just prediction. It's prediction at all levels: cortical columns predict what their next inputs are going to be; our visual system learns to predict what the world is going to look like in the next moment; our language system learns to predict the next word we are going to hear or say. In a literal sense, LLMs like Claude do nothing but predict the next token, based on all the tokens in the session so far.

But prediction is all you need. When Claude generates a string of tokens working through a problem, it is not fundamentally different from the string of thoughts (mostly in the form of words) that you and I would generate working through a similar problem. Reinforcement learning biases the network so that these strings of predictions tend to go in useful directions, ones that were successful during training. In humans (unlike current LLMs), training is an ongoing process, so we are constantly tweaking our own neural networks to make these trains of predictions more successful. But fundamentally, moment to moment, it's still just predicting the next thing, at every level of the network.

We now understand that some problems are what Agüera y Arcas calls AI complete. Solving an AI complete problem requires actually understanding the world that the problem represents; no shallower, surface-features sort of representation can do it. Examples of AI complete problems:

Next word prediction: it's impossible to predict the next word in a string of text with any accuracy unless you understand what those words actually mean.
Language translation: ditto; you can't just look up individual words in a dictionary.
Video next-frame prediction: requires understanding what the objects in the video are, how they behave, what gravity does, etc.
Picture or video captioning: requires understanding both what the objects are, and how they are referred to in language.

A sufficiently big neural network, given sufficient training data, will eventually solve almost any problem you give it. So if you give it an AI complete problem, which can only be solved by understanding the world... it will understand the world.

Is its understanding perfect? Of course not. But then, that's true of humans, too.

So that's it for AI research

Haha, just kidding! LLMs are general intelligence, and already smarter than humans in some ways. But they have glaring limitations: in particular, "training" and "inference" are completely separated processes. The LLMs we use every day do not learn from experience, except insofar as that experience can be crammed into the context window.

The addition of continual learning will allow them to get better at tasks with practice, just like we do. But there are a lot of tricky open questions: how do you avoid forgetting too much old knowledge while acquiring new knowledge? How do you keep the AI's personality stable over time? And how do you even learn this new stuff quickly enough to matter?

Moreover, there really are things that neural networks are terrible at. And ironically, those are exactly the things that most AI research has been doing for the last 80 years: game playing, deep search, complex logical reasoning, optimization, etc. We (and other neural networks) can approximately do those things, with a lot of training and practice, but there are classical (GOFAI or "Good Old-Fashioned AI") algorithms that do them much better and faster. We'll continue to need progress on these, and on ways to integrate them with neural networks to get the best of both worlds.

This is just a random sampling; in fact there are many, many open research directions. We've cracked the code of intelligence; we know now what intelligence is and how to produce it in a machine. But that's only the tip of the iceberg. It is the beginning of wisdom, not the end.

The next 5 or 10 years are going to be very exciting. Hold on tight!

Top comments (3)

Trey Tomes • Apr 8

"prediction is all you need". This all started (as much as any bit of scientific progress started) with "attention is all you need". I wonder sometimes if that "attention" is where life truly resides.

Labeling a raw language model "schizophrenic" is apt; we train extremely large language models on the output of a plethora of minds; it's almost like the Biblical story of the man at the tombs possessed by "Legion".

I'm running my own experiment right now on a 50M language model. A big part of the experiment is to see what happens if we "parent" a network to have a single personality. The results have been interesting.

JoeStrout • Apr 9

Yep, I was referencing the Attention paper there — good job catching it! But I actually think that prediction is more fundamental. The attention mechanism is just a way of making better predictions (in the particular case of a token stream). It probably won't be the last such invention; we'll find even better ways of predicting things. But I suspect we'll never replace prediction as the fundamental purpose and building block of intelligence.

I'm following your 50M language model with interest. Drop a link here when you post about it!

Trey Tomes • Apr 10

It's not so much an article, as a series of diary entries combining philosophy and engineering.

Growing a Language Model

Which I suppose is a fine summary of the field of Artificial Life.