Lei Hua

Posted on May 14

The Man Who Summoned Ghosts | Chapter 2: The Training Stack Is Not a Secret

#ai

From OpenAI to nanoGPT: why the training stack should feel legible, not magical.

Originally published on Lei Hua's Substack.

Anchors:
2023-05-23 · State of GPT @ Microsoft Build · https://www.youtube.com/watch?v=bZQun8Y4L2A
2023-11-23 · [1hr Talk] Intro to Large Language Models · https://www.youtube.com/watch?v=zjkBMFhNj_g

Epigraph

"99% of the compute is in pretraining. ... For applications, you want low-stakes things, with humans in the loop. Treat these models like cognitive interns."
— Andrej Karpathy, State of GPT · 2023-05

The Return

In the last two months of 2022, three things happened that reshaped the road Karpathy was about to take.

On November 30, ChatGPT launched. A million users in five days; a cultural phenomenon in two months. It wasn't a researcher's curiosity anymore; it was a daily ritual for ordinary people. The romantic imagination that once compared neural networks to "another kind of intelligence in nature" was, by early 2023, no longer just romantic. It was a chat box that tens of millions of people opened every day.

The second thing: he re-joined OpenAI. He returned as a researcher, but the context had changed. OpenAI was no longer the small team upstairs from a chocolate factory (Stephanie Zhan would later recall, in 2024 at Sequoia, that this was OpenAI's original office). It was now the center of everyone's attention. Everyone wanted to know what was being built inside.

The third thing — the subtlest, and arguably the real protagonist of this chapter: he decided to explain the training stack to the public.

II. State of GPT — Unveiling the Black Box

May 2023, Microsoft Build conference. Karpathy stood on stage for about 42 minutes and laid out the entire training pipeline of GPT-class models as a single systematic diagram: pretraining → supervised fine-tuning → reward modeling → reinforcement learning from human feedback. For each stage, he showed what data goes in, how much compute it costs, what trade-offs it involves.

The talk would be quoted across the industry for years. Its impact came not from the novelty of any single detail — most of the specifics were already known internally at frontier labs. Its power came from the posture: he chose to treat all of this as public knowledge.

The AI industry at that moment was sliding into a kind of mystification of the training stack. Every frontier lab hinted that their success came from some secret recipe outsiders couldn't see. Karpathy's talk was a quiet refutation: there is no secret. 99% of the compute is in pretraining. The rest is engineering and taste.

What deserves to be remembered even more is his stance toward applications, expressed at the end of that talk. He gave developers a clear judgment: at this stage, build "low-stakes, human-in-the-loop" applications. Treat the model as a cognitive intern, not as an autonomous agent. It was an engineer's caution. Not glamorous, not sexy — but in the years that followed, this exact line would resurface, in different vocabulary, again and again. By the time he said "march of nines" on Dwarkesh two and a half years later, this thread had been buried deep into his thinking.

III. The Birth of a Metaphor

Half a year later, in November 2023, he recorded an hour-long video for his own YouTube channel, titled Intro to Large Language Models. The talk was originally given at an internal AI safety summit; the response was strong enough that he re-recorded it himself for everyone.

This was the first time he systematically translated the LLM stack for a non-technical audience. And it was at this moment that he publicly introduced a metaphor that would echo for years: the LLM is a new kind of operating system. The LLM is the CPU, the context window is RAM, tool use is peripherals, multimodality is I/O. The entire LLM ecosystem, in his framing, was a new kind of computer still taking shape.

The metaphor has a clear place in the evolution of his own thinking — it sits exactly between his 2017 Software 2.0 (the essence of programs shifts from code to weights) and the Software 3.0 he would formally announce in 2025. Three concepts; three escalating answers to the question of what computation is. And in some real sense, from this moment on, Karpathy was no longer just a researcher. He had become a public thinker with his own narrative framework.

But it is worth hearing his tone. At this point, his voice is not yet the sober, sharp-edged register of the Dwarkesh interview in 2025. It is devout — the kind of engineer's devotion that says, this thing is beautiful, let me show you how beautiful it is.

IV. The Emergence of the Educator

He had designed Stanford's first deep learning course (CS231n). In his Tesla years, he was almost invisible in public. In September 2022, he started the Neural Networks: Zero to Hero series, building up from micrograd to makemore to Let's build GPT from scratch. Each video, in isolation, is a tutorial. Taken together, they are someone quietly building his next identity.

By the end of 2023, that identity was clear. He was still at OpenAI, but more and more of his influence was flowing through his own channel and public talks rather than through OpenAI's internal products. He was becoming the AI era's first true public teacher — not a university lecturer, but a YouTube teacher facing a world that suddenly needed to understand AI.

In the Intro to LLMs talk, one detail captures the maturity of this identity: he drew the LLM-OS diagram in a way that's friendly even to viewers with no computing background. CPU, memory, peripherals — these are concepts ordinary users have lived with since the 1980s. He wasn't showing off to peers. He was handing a key to an unprepared world.

V. The Seeds Planted in This Chapter

By the end of 2023, three seeds had been planted in Karpathy's public posture, each of which would germinate in later chapters:

Seed one: the training stack is public knowledge. This will grow, in February 2024, into his critique of tokenization ("legacy tech we should try to escape from"), and by 2025 into the demystifying extremity of nanochat and microGPT — "the best ChatGPT you can buy for $100."

Seed two: low-stakes plus human-in-the-loop. This will grow, in October 2025, into the "it's slop" line on Dwarkesh — the same engineering caution, just no longer being polite to the code that frontier models produce.

Seed three: his identity as a public teacher. This will bloom in just half a year — in July 2024, he leaves OpenAI for the second time and announces the founding of Eureka Labs. From that moment on, for the first time in his life, his main work would not be research. It would be education.

But none of that had happened yet. The Karpathy of 2023 was still inside OpenAI, still doing research with OpenAI's resources, not yet the independent teacher he would eventually become. What we see in this chapter is an insider who has begun watching the door with one eye. He was already preparing for what came next, even if he himself may not yet have known.

One Line for This Chapter

If chapter one's Karpathy is an engineer who just let go of the wheel, this chapter's Karpathy is an engineer who has begun drawing maps for the whole world — and hasn't yet realized he is no longer just an engineer.

Sources

State of GPT, Microsoft Build (2023-05-23) — https://www.youtube.com/watch?v=bZQun8Y4L2A
[1hr Talk] Intro to Large Language Models (2023-11-23) — https://www.youtube.com/watch?v=zjkBMFhNj_g
Neural Networks: Zero to Hero series (2022-09–) — https://www.youtube.com/@AndrejKarpathy
Sequoia AI Ascent 2024 transcript (for the chocolate factory office context) — https://aletteraday.substack.com/p/letter-228-andrej-karpathy-and-stephanie

DEV Community