Rohit Ghumare

Posted on May 24

Build It, Then Use It: How I wrote 435 AI engineering lessons from scratch

#ai #opensource #agents #machinelearning

Repo-driven AI learning across 20 phases

The first time I wrote a tokenizer, I did it with a for loop. I counted byte pairs by hand, merged the most common ones, and waited about forty seconds for it to chew through a small corpus. The output was slow. The output was ugly. The output was correct.

GitHub Repo: https://github.com/rohitg00/ai-engineering-from-scratch

Then I ran the same input through tiktoken and watched it finish in forty milliseconds.

That was the moment tiktoken stopped being magic. It was the same thing I had written the night before, in Rust, with the loop unrolled and the cache warm. It was not a library anymore. It was my code, faster.

That experience is the rule I followed for the next eighteen months. Build the small version by hand. Then run the same thing through the production library. The framework stops being a black box because you already wrote the smaller version.

I wrote 435 lessons under that rule. They live in a free, MIT-licensed curriculum called ai-engineering-from-scratch. This post is about the rule, why it works, and what fell out when I followed it across twenty phases of AI engineering.

The 18% problem

A survey of CS students that came out last year stuck with me. Around 84% of them use AI tools every day. Around 18% feel ready to ship anything with those tools at work.

That gap is not about access. It is about the shape of what gets taught.

You can fine-tune a model and never write a forward pass. You can wire an agent up to a function and never define attention. You can pip install transformers, ship a demo, and never compute a gradient by hand. Frameworks accept that bargain. The bargain breaks the first time your loss curve diverges, your tokenizer chews ten times as many tokens for Japanese as for English, or your agent ships hallucinations because the context window is half-full of duplicated boilerplate. None of that is in the README of the library you imported.

Most courses I tried either taught the math without writing a line of code, or taught the code without writing a line of math. The few that did both jumped straight to PyTorch on lesson one. I wanted the in-between, so I wrote it.

The rule

Every algorithm worth knowing gets two halves.

Build It. Numpy and stdlib. No frameworks. You step through the chain rule on paper, then write backprop. You count byte pairs in a loop, then call that a tokenizer. You compose three matrices, then call that attention. The code is slow. The code is short enough to read in one sitting. You can put a print statement anywhere.

Use It. Same algorithm, same data, but through PyTorch or sklearn or tiktoken or whatever the production tool is. You diff the output. You watch the framework hide the noise. The framework stops being a black box because the small version is sitting in the file next to it.

The trick is that the two halves are not optional. The Build It half on its own leaves you with toy code that does not scale. The Use It half on its own leaves you with a library call you cannot debug. Together they leave you with a tool you can ship with and a model in your head of what it is doing.

A worked example: attention in 30 lines

Here is the core of the first transformer lesson. No framework. No checkpoint. The same math that runs inside Llama, GPT-class models, and most of the open weights you have heard of.

import numpy as np

def attention(Q, K, V, mask=None):
    d_k = Q.shape[-1]
    scores = Q @ K.swapaxes(-1, -2) / np.sqrt(d_k)
    if mask is not None:
        scores = np.where(mask, scores, -1e9)
    weights = np.exp(scores - scores.max(axis=-1, keepdims=True))
    weights /= weights.sum(axis=-1, keepdims=True)
    return weights @ V

# Toy run: 4 tokens, 8-dim
rng = np.random.default_rng(0)
Q = rng.standard_normal((4, 8))
K = rng.standard_normal((4, 8))
V = rng.standard_normal((4, 8))
out = attention(Q, K, V)
print(out.shape)  # (4, 8)

That is the whole thing. The numerical-stability trick on the softmax (subtracting the max before exp) is the one part that confuses readers, so the lesson takes a paragraph to derive it. The mask becomes the difference between encoder and decoder attention later in the phase.

The second half of the lesson runs the same example through torch.nn.MultiheadAttention and checks the output matches to numerical precision once the head count is set to one. Now PyTorch is not a black box. It is your code, compiled for CUDA.

A second example: gradient descent without the framework

Same shape, different layer of the stack. Phase 2 builds a linear regressor with handwritten gradient descent, no optimizer, no autograd.

import numpy as np

def fit(X, y, lr=0.01, steps=1000):
    w = np.zeros(X.shape[1])
    b = 0.0
    n = len(X)
    for _ in range(steps):
        pred = X @ w + b
        err = pred - y
        w -= lr * (X.T @ err) / n
        b -= lr * err.sum() / n
    return w, b

Six lines of math. Three of them are the gradient. When the lesson re-runs the same fit through scikit-learn, the coefficients match. When the lesson re-runs the same loop through PyTorch with optim.SGD, the loss curve overlays. Now optim.SGD is no longer something you trust on faith.

Stack twenty phases of this and you end up writing a small LLM in Phase 10, then a working agent loop in Phase 14, then a multi-agent system in Phase 19. Every layer of the tower has a hand-built version sitting under the framework version.

The four artifacts

There is one more thing I would not have predicted before I started writing. Every lesson, on top of the code, produces one of four reusable outputs.

A prompt template for a specific task. A skill spec that drops into Claude, Cursor, or Codex. An agent definition with a clear job. An MCP server that exposes the lesson's code as a tool.

By Phase 19 you have hundreds of these. They are not novelty. They are the thing you reach for when a real task lands on your desk and you remember "I wrote a retrieval skill for this in Phase 12." The curriculum is a textbook on the way in and a toolbox on the way out.

How to start

Three ways in, ordered by friction.

Read it in the browser. Open any lesson at aiengineeringfromscratch.com. No setup, no clone.

Clone and run.

git clone https://github.com/rohitg00/ai-engineering-from-scratch.git
cd ai-engineering-from-scratch
python phases/01-math-foundations/01-linear-algebra-intuition/code/vectors.py

Install the skills into your agent. Works in Claude, Cursor, Codex, and a few others.

npx skills add rohitg00/ai-engineering-from-scratch

Then run /find-your-level inside the agent. Ten questions, the agent picks a phase, gives an hour estimate. If you have shipped ML before, you might start at Phase 7. If you are coming from a frontend background, Phase 1.

What this is not

Not video lectures. Not copy-paste deploys. Not a five-minute YouTube explainer. Not "ten prompts to land a senior role."

The lessons are dense. The math is real. Everything runs on a laptop. Backprop in Python. Attention in TypeScript. A toy GPU kernel in Rust. A Bayesian sampler in Julia. If you only want to call an API, the curriculum will feel slow. If you want to know why the API works, this is the route.

The closing argument

I wrote this because nothing on the internet was the thing I wished existed when I started. I wrote it free because the open-source version of me was the one who needed it most.

The rule worked. It is the only reason a curriculum this large held together for eighteen months without contradicting itself. Build the small version first. Then run the same thing through the framework. The framework stops being magic.

If the curriculum helps you, a star on the repo means the next person finds it sooner. If something is missing, open an issue and I will write the lesson.

Read it → GitHub

Top comments (10)

Valentin Monteiro • May 25

The structure looks solid. Curious about one thing though: in client work I keep seeing the same gap, devs know the patterns but skip the eval discipline. Did you build dedicated lessons on writing the test harness BEFORE the agent runs, or is that woven through?

Rohit Ghumare • May 25

Yes, there are 20 lessons on harness engineering.

Valentin Monteiro • May 26

Nice, 20 lessons is more depth than I expected. Curious whether the harness sections cover regression-style evals (replaying past failures against current versions), or it's more 'fresh eval at every iteration'? That's the part most people skip even when they have a harness.

Sara Aly • May 25

Looks like I’ll have to free up 270 hours in the next few months, this is epic!! I have been consuming AI for a while, but the thought of creating seemed daunting. Hopefully my rusty math skills and strong Python knowledge can get me through. Plus, this style of learning is right up the alley of me and many other programmers, getting your hands dirty is the only way! Producing useful output throughout the lessons is an added perk. Super impressive!

Swift • May 25

I tried this myself yesterday after seeing it on HN! I love the idea of repo-driven learning. It's as close to the environment we're trying to replicate as possible. Thanks for building and sharing!

Rohit Ghumare • May 25

Glad you liked it.

Talen • May 25

This is AMAZING! I've been on the lookout to find a material that teaches the maths behind every code snippet, while actually shipping. Just cloned the repository, can't wait to get stuck in!

Rohit Ghumare • May 25

All the Best

Harjot Singh • May 31

"Build it, then use it" is the most honest way to actually learn something - writing 435 lessons forces you to confront every gap in your own understanding, because you can't teach what you only half-know. The dogfooding angle (use your own material) is the quality gate that keeps it from being 435 plausible-but-shallow entries.

The AI-era twist: you could have generated 435 lessons in an afternoon, but they'd be the average of the internet, not your hard-won understanding. The value is precisely in the parts that came from your experience, not the model's priors - same reason owned, legible output beats a black-box dump. That "the human-judgment 20% is the whole value" theme is core to how I think about Moonshift (prompt to a shipped SaaS on your own GitHub+Vercel). Huge effort; how'd you keep quality consistent across 435 - templates, or self-review pass? (Moonshift's first run's free if useful.)

Maya Andersson • May 27

Build-it-then-use-it is a good frame, especially because most AI eng material teaches the API surface without ever testing whether the reader can apply it. Curious how you plan to measure retention. The honest answer is usually a coding test at week 4 (can the reader build a similar thing without the post in front of them) rather than self-reported recall. Anything like that planned? Otherwise the 435 turns into pleasant background reading that doesn't change behavior