Free from-scratch deep learning notes: tensors, attention, and a tiny GPT

#tutorial #ai #machinelearning #deeplearning

I'm an AI PhD student, and I have started writing a free public notebook on how AI models work under the hood:

The goal is to make the mechanics easier to reason about, without hiding everything behind library calls. I am writing the notes I wish I had when I was moving from "I can run the code" to "I understand what the model is doing."

What is inside so far

Building GPT from scratch in PyTorch: tokenizer, embeddings, masked self-attention, multi-head attention, residual blocks, training loop, and generation.
Attention explained from scratch: query, key, value vectors, softmax, context vectors, and why the mechanism matters.
Tensors for deep learning: shapes, dimensions, and why tensor thinking is the language of neural networks.
Gradient descent intuition: learning rate, derivatives, backpropagation, and the optimization loop.
Identity-aware negative sampling: a short note from my deepfake-detection research direction.

A few direct links:

GPT walkthrough: https://insideaimodels.com/blog/building-gpt-from-scratch
Attention explainer: https://insideaimodels.com/blog/attention-explained
Tensor primer: https://insideaimodels.com/blog/what-is-a-tensor
Gradient descent: https://insideaimodels.com/blog/how-gradient-descent-works

There is no paywall, signup, or course funnel. I am sharing it publicly because writing helps me learn, and because practical ML resources are better when they stay open.

If there is a part of modern AI models that usually gets hand-waved in tutorials, I would love to hear what I should cover next.