I'm an AI PhD student, and I have started writing a free public notebook on how AI models work under the hood:
The goal is to make the mechanics easier to reason about, without hiding everything behind library calls. I am writing the notes I wish I had when I was moving from "I can run the code" to "I understand what the model is doing."
What is inside so far
- Building GPT from scratch in PyTorch: tokenizer, embeddings, masked self-attention, multi-head attention, residual blocks, training loop, and generation.
- Attention explained from scratch: query, key, value vectors, softmax, context vectors, and why the mechanism matters.
- Tensors for deep learning: shapes, dimensions, and why tensor thinking is the language of neural networks.
- Gradient descent intuition: learning rate, derivatives, backpropagation, and the optimization loop.
- Identity-aware negative sampling: a short note from my deepfake-detection research direction.
A few direct links:
- GPT walkthrough: https://insideaimodels.com/blog/building-gpt-from-scratch
- Attention explainer: https://insideaimodels.com/blog/attention-explained
- Tensor primer: https://insideaimodels.com/blog/what-is-a-tensor
- Gradient descent: https://insideaimodels.com/blog/how-gradient-descent-works
There is no paywall, signup, or course funnel. I am sharing it publicly because writing helps me learn, and because practical ML resources are better when they stay open.
If there is a part of modern AI models that usually gets hand-waved in tutorials, I would love to hear what I should cover next.
Top comments (0)