Fonyuy Gita

Posted on Jun 26

Months Inside Andrej Karpathy's Mind

#ai #deeplearning #machinelearning #andrejkarpathy

A deep dive into the podcasts, papers, tweets, and tutorials of the engineer who made me add a fifth idol to my list.

I have been quiet for a while. Not because I stopped working. Not because I got lazy. Not because I gave up on AI.

It is the opposite, honestly. I have been deep in realistic, hands-on work, the kind that does not photograph well for LinkedIn or fit neatly into a tweet. The kind of work that demands full attention and does not leave much room for performance on social media.

Somewhere along the way I made a decision: step back from the noise and just build. No posts. No updates. No proof of existence.

But I have been thinking. A lot. And what I have been thinking about most is a question that has followed me quietly for years: why am I in tech?

I have four idols in my life. The Blessed Virgin Mary. Muhammad Ali. Michael Jordan. Pep Guardiola.

Not a single engineer in that group.

The Blessed Virgin Mary for her humility, a kind of quiet strength that does not announce itself, that simply holds. Muhammad Ali for his absolute refusal to lose mentally before the first punch was ever thrown. Michael Jordan for what winning cost him, and how he paid it every single day without asking for sympathy. Pep Guardiola for his obsession with perfection, the way he treats a Tuesday training session with the same weight he gives a Champions League final.

None of them are in tech. And for a long time I could not explain why I was studying engineering and not football or boxing. That question genuinely bothered me.

Then I started spending serious time with Andrej Karpathy.

His YouTube tutorials. His podcast appearances. His papers. His threads on X. His blog posts. I went deep, the way you go deep when something stops feeling like content and starts feeling like a mirror.

Here is what I have always felt about the so-called tech gurus: most of them stopped engineering somewhere along the way. They became engineers in suits. They give keynotes, they advise startups, they post takes. But they are no longer in the work. They are commentators on it.

Karpathy is different. He builds and he explains. He writes code that runs and writes essays that teach. He is in the research and in the classroom at the same time. Most engineers choose one lane. He refuses to. And not because he is trying to be impressive, but because he genuinely seems to believe that both things matter.

That is the combination I have always wanted. Engineering and communication, together, at full commitment. Watching someone actually do it, at the highest level, settled something in me.

The list of idols became five.

This blog is what I want to say about that. It is not a biography of Karpathy. It is not a summary of his work. It is a collection of specific moments, things I found in his videos, podcasts, papers, tweets, and blog posts, that caught me and made me think differently. For each one I will tell you what it is, what caught my attention, and give you the link so you can go deeper yourself.

The Unreasonable Effectiveness of Just Building Things
Software Ate Itself: From Code to Weights to Prompts
On Being a "Full Stack" AI Engineer
Neural Networks from Scratch, Not from APIs
The Bitter Lesson, and Why Karpathy Lives It
LLMs Are a New Kind of Operating System
Vibe Coding and the Future of Programming
Sleep, Reading, and the Boring Habits of Great Engineers
The Attention Mechanism, Explained Like You Actually Matter
What Karpathy Taught Me About Teaching
Closing Thoughts

1. The Unreasonable Effectiveness of Just Building Things

What it is: A recurring theme across nearly all of Karpathy's public content, but crystallised best in his Lex Fridman appearance.

What caught my attention:

Karpathy keeps coming back to one idea: the best way to understand something deeply is to build it from scratch, badly, and then fix it. Not read about it. Not watch a video about it. Build it.

He built micrograd, a tiny autograd engine, not because it was production-ready, but because building it forced him to understand backpropagation at a level that reading papers never gave him. He said something in that conversation that stopped me: the goal is not to use the tool, the goal is to understand what the tool is doing so well that you could write it yourself.

That changed how I approach learning. I stopped trying to consume and started trying to reproduce.

Resource: Andrej Karpathy on Lex Fridman Podcast #333

2. Evolution of Software from software 1.0 to software 3.0

What it is: A framework Karpathy introduced in a 2017 Medium essay titled "Software 2.0," later extended into a three-era model as LLMs matured.

What caught my attention:

Karpathy drew a line between two fundamentally different ways of writing software.

Software 1.0 is what most people think of when they think of programming. A human writes explicit instructions. If this, then that. The logic lives in the code and the developer is fully responsible for every decision the program makes.

Software 2.0 is different. You do not write the logic. You define a goal, gather data, and train a neural network. The weights of that network become the program. The developer's job shifts from writing rules to curating datasets, designing loss functions, and managing training pipelines. The network figures out the rest.

That reframing alone was significant. But what caught me harder was when people started extending it to Software 3.0, the era of large language models. Here the program is not even a trained model you own. It is a pretrained foundation model you prompt. The "code" is natural language. The developer is now someone who knows how to communicate intent to a system that already knows a enormous amount about the world.

Three eras. Three completely different skill sets. Three different answers to the question: what does it mean to build software?

For me, coming from a context where teams are small and resources are limited, this framework was genuinely liberating. You do not need to engineer every edge case if the model has already seen the world. What you need is clarity about what you want and the discipline to build the right evaluation around it.

The part that stays with me most: Karpathy was not celebrating this shift uncritically. He was mapping it. He wanted engineers to understand what era they were operating in so they could choose the right tool, not just reach for what was fashionable.

Resources:

Software 2.0 — Karpathy's original Medium essay (2017)

Intro to Large Language Models — where Karpathy extends the framework to Software 3.0

Andrej Karpathy at YC AI Startup School — covers the full arc of all three eras

3. On Being a "Full Stack" AI Engineer

What it is: Karpathy's philosophy, spread across several talks and podcast appearances, about what it means to truly own your stack.

What caught my attention:

In a talk he gave at a Y Combinator event, Karpathy described what he considers a dangerous trend: AI practitioners who only know the API layer. They call GPT-4, they build a wrapper, they ship. But they have no idea what is happening inside. When things break, they are helpless. When they need to go beyond the defaults, they cannot.

He made a distinction between people who use neural networks and people who understand them. He is clearly committed to being the second type, and committed to training others to be the second type too.

This resonated hard. At SEED, I have been building AI curriculum for people who come in not knowing what a weight even is. Karpathy's conviction that the fundamentals matter, that the abstraction is not enough, gives me language for why the deep path is worth it.

Resource: Andrej Karpathy at YC AI Startup School

4. Neural Networks from Scratch, Not from APIs

What it is: Karpathy's "Neural Networks: Zero to Hero" YouTube series.

What caught my attention:

I have watched a lot of ML tutorials. Most of them start at the API and go up. Karpathy goes down. He starts with a single neuron. He builds a micrograd library from pure Python. He shows you how backpropagation is not magic but math you learned in high school, applied carefully.

The thing that caught me most was his treatment of the derivative. He does not rush past it. He draws it out, manually computes the gradient at each node, and shows you why the chain rule is the heartbeat of all of deep learning. I had read about this many times. Watching him walk through it, step by step on a whiteboard-style video, was the first time I actually felt it.

For anyone building AI curriculum, this series is the gold standard for how to sequence knowledge without losing the human on the other side.

Resource: Neural Networks: Zero to Hero — Full Playlist

5. The Ghost in the Machine: What Karpathy Says the LLM Actually Is

What it is: A mental model Karpathy has described across several talks and posts, where he frames the LLM not as an intelligent being but as a kind of statistical ghost, a system that has compressed an enormous amount of human knowledge without ever experiencing any of it.

What caught my attention:

Karpathy has a way of saying uncomfortable things calmly. One of them is this: the LLM does not know anything. Not in the way you and I know things. It has no experiences, no continuous memory, no body, no stakes. What it has is a very sophisticated compression of patterns from text produced by billions of humans over decades. It is, in his framing, something like a dream of the internet.

He calls it a "lossy compression" of human knowledge. When you prompt an LLM, you are not talking to an intelligence. You are querying a statistical reconstruction of what humans have written, thought, argued, and published. The model hallucinates not because it is broken but because that is what happens when a compression tries to reconstruct something it only partially captured.

What stopped me was the implication of that framing for how we build with these systems.

If the LLM is a ghost of human knowledge rather than a mind, then the job of the engineer changes completely. You are not collaborating with an intelligent agent. You are designing retrieval systems, evaluation pipelines, grounding mechanisms, and guardrails around a very powerful but fundamentally hollow statistical engine. The intelligence, the judgment, the accountability, that stays with you.

Karpathy is not saying LLMs are useless. He is saying something more important: that understanding what they actually are, rather than what they feel like, is what separates engineers who build reliable systems from engineers who build impressive demos.

This hit me hard in the context of education. I have watched students treat LLM outputs as ground truth. I have watched developers ship products built entirely on the assumption that the model knows what it is doing. Karpathy's ghost framing gives you a way to push back on that without dismissing the technology. The model is powerful. It is also stateless, memoryless, and indifferent. Hold both things at once.

The engineers who will build the most trustworthy AI systems in the next decade are the ones who never forget they are working with a ghost.

Resources:

Intro to Large Language Models — Karpathy's clearest public explanation of what LLMs actually are under the hood

Karpathy on X — where he regularly challenges naive assumptions about LLM cognition

Sparks of AGI or Sparks of Stochastic Parrots — a related community debate Karpathy has engaged with

6. LLMs Are a New Kind of Operating System

What it is: A framing Karpathy introduced in a widely shared 2023 tweet and expanded in his talks.

What caught my attention:

He described the LLM not as a chatbot, not as a search engine, but as a new computing primitive. Like how the operating system abstracts hardware and lets developers build on top, the LLM abstracts language and reasoning and lets a new class of applications sit on top of that.

The analogy has teeth. If the LLM is the OS, then prompt engineering is closer to systems programming than most people realise. Retrieval pipelines are like file systems. Agents are like processes. Tool use is like system calls.

I have been building on the Vercel AI SDK and this framing clarified why so much of what I was doing felt architecturally familiar. It is because the concepts are architecturally parallel. Just at a higher level of abstraction.

Resources:

Karpathy's tweet on LLMs as a new OS (2023)

Intro to Large Language Models — Full Talk

7. Vibe Coding and the Future of Programming

What it is: A concept Karpathy introduced in a February 2025 tweet that immediately sparked a global conversation.

What caught my attention:

He described a mode of programming he called "vibe coding," where you essentially surrender to the AI and let it write the code while you steer by feel. You describe what you want, you accept or reject what comes back, you iterate. You stop reading every line. You stop maintaining full comprehension of the system.

People reacted strongly, in both directions. Some saw it as liberating. Some saw it as dangerous.

What caught my attention was not the concept itself but how Karpathy framed it: he was not prescribing it as the future of serious engineering. He was describing a new mode that exists and is already being used, whether we name it or not. The naming forces us to think about when it is appropriate and when it is not.

For the work I do at SEED, this matters enormously. Vibe coding might be fine for a weekend prototype. It is not fine for a health assistant making clinical decisions. Knowing the difference is the skill.

Resource: Karpathy's Original Vibe Coding Tweet — February 2025

8. Sleep, Reading, and the Boring Habits of Great Engineers

What it is: Scattered across several podcast appearances and X posts, Karpathy talks about his daily habits with unusual candour.

What caught my attention:

He treats sleep as a performance input, not a luxury. He reads widely outside of AI, history, biology, physics, and thinks cross-domain thinking is underrated in ML research. He runs. He takes long walks when stuck on a problem.

None of this is exotic. That is exactly the point.

There is an expectation in tech culture that elite performance looks dramatic. All-nighters, caffeine, obsessive sprinting. Karpathy's life, at least as he describes it, is measured. Consistent. He protects his attention and his recovery the way a serious athlete would.

Coming from a culture where "hustle" is glorified at the expense of everything else, this was worth sitting with. Output over a long career requires sustainability. That is not a soft take. It is an engineering constraint.

Resource: Karpathy on Dwarkesh Patel Podcast
https://www.youtube.com/watch?v=I2NNxr3WPDo

9. The Attention Mechanism, Explained Like You Actually Matter

What it is: Karpathy's "Let's build GPT from scratch" video on YouTube.

What caught my attention:

There are hundreds of transformer explanations on the internet. Most of them either stay at the surface ("attention lets the model focus on relevant words") or go deep into math in a way that loses the intuition. Karpathy does neither.

He builds a character-level language model, then upgrades it step by step until it becomes a small GPT. Every addition is motivated. You know why you are adding positional encoding. You know why multi-head attention is not just a trick but a feature. You understand the residual stream before you understand why it matters.

The phrase that stayed with me: "I want you to be able to look at the code and feel like it makes sense, not just accept it." That is curriculum design philosophy, not just a tutorial choice.

Resource: Let's Build GPT from Scratch, in Code, Spelled Out

10. What Karpathy Taught Me About Teaching

What it is: A synthesis, not a single resource, but a pattern I noticed across everything I consumed.

What caught my attention:

Karpathy never explains something as if it is beneath him. He does not rush past the basics to signal sophistication. He meets the learner where they are, not where he is.

Watch how he handles the backpropagation derivation in the Zero to Hero series. He is one of the most credentialed AI researchers alive. He spent years at OpenAI and Tesla. And he is sitting there, carefully, manually computing the gradient of a single multiplication node, narrating every step, because he knows that is where the real understanding lives.

That is not humility as performance. That is humility as pedagogy. And it is the hardest thing to actually do.

I think about this every time I design a module for SEED. The goal is not to show what I know. The goal is to transfer what I know. Those are very different jobs.

Resource: Karpathy's Teaching Approach Across Zero to Hero

Closing Thoughts

I started this reading and watching project because I wanted to understand Karpathy the engineer. I ended it understanding something more useful: what a complete practitioner actually looks like.

He builds. He explains. He publishes code that runs. He writes posts that teach. He is not famous for being famous in tech. He is famous for doing the work at a level that is genuinely hard to ignore.

The fifth idol did not join the list because he is successful. He joined because he embodies something I want to grow into: the ability to do serious technical work and communicate it with the same seriousness. Engineering and language, together, at full commitment.

There is a lot more in the Karpathy archive. These ten things are just what caught me hardest over the past several months. I will keep going. I recommend you do too.

If you found this useful, share it with someone who is learning AI and needs to see what the ceiling looks like.

Written by Fonyuy Gita fonyuygita.ai