Introduction
In Part 1, we built the Tensor class and a computation graph.
In Part 2, we implemented automatic differentiation from scratch.
In this part, we will:
- Use our custom autograd engine (babygrad)
- Build a small neural network
- Train it on the MNIST handwritten digits dataset
Missed Part 1?
Read it here: https://dev.to/zekcrates/lets-build-a-deep-learning-library-from-scratch-using-numpy-part-1-32p9
Want to skip the series and read the full book now?
- Read it for free online: https://zekcrates.quarto.pub/deep-learning-library/
Loading MNIST data
You can easily download MNIST data and the files look like this
# data/
t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz
train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz
The images file should return (num_images,784) and labels files should return (num_images) where labels are in the range 0–9.
import struct
import gzip
import numpy as np
def parse_mnist(image_filename, label_filename):
with gzip.open(image_filename, 'rb') as f:
magic, num_images, rows, cols = struct.unpack('>IIII', f.read(16))
image_data = np.frombuffer(f.read(), dtype=np.uint8)
images = image_data.reshape(num_images, rows * cols)
with gzip.open(label_filename, "rb") as f:
magic, num_labels = struct.unpack('>II', f.read(8))
labels = np.frombuffer(f.read(), dtype=np.uint8)
images = images.astype(np.float32) / 255.0
return images, labels
Now that we have our data lets create a simple model that we will train on the data.
It will have only 2 weights(W1,W2)
W1 = (784, 100)
W2 = (100, 10)
from babygrad import Tensor, ops
class SimpleNN:
def __init__(self, input_size, hidden_size, num_classes):
self.W1 = Tensor(
np.random.randn(input_size, hidden_size).astype(np.float32)
/ np.sqrt(hidden_size),
requires_grad=True
)
self.W2 = Tensor(
np.random.randn(hidden_size, num_classes).astype(np.float32)
/ np.sqrt(num_classes),
requires_grad=True
)
def forward(self, x):
z1 = x @ self.W1
a1 = ops.relu(z1)
logits = a1 @ self.W2
return logits
def parameters(self):
return [self.W1, self.W2]
The model will take an image of size (784,) first go through W1
(5,784) @ (784,100) -> (5,100)
x @ self.W1
#Note: The '@' is our `matmul` function defined in the previous part.
Note: The @ operator uses our custom
matmulop implemented in Part 2.
Now the shape of the image after passing through W1 is (5,100).
We only have 10 labels (digits 0–9) so that means the model predicts one value out of those 10.
We will send the above result to W2.
(5,100) @ (100,10) = (5,10)
logits = a1 @ self.W2
Now we have (5,10).
The output (5, 10) contains raw class scores (logits) for each digit.
But logits alone aren’t enough we need a loss function.
This is the loss which we will decrease by updating our (W1,W2) by using their gradients.
Loss function
def softmax_loss(logits: Tensor, y_true: Tensor) -> Tensor:
batch_size = logits.shape[0]
log_sum_exp = ops.log(ops.exp(logits).sum(axes=1))
z_y = (logits * y_true).sum(axes=1)
loss = log_sum_exp - z_y
return loss.sum() / batch_size
This gives us a single loss value.
We now have everything
- Data
- Model
- Loss function
The only thing left is to train this model using the data . We can do this by adding a training loop.
Training loop
def train_epoch(model, X_train, y_train, lr, batch_size):
for i in range(0, X_train.shape[0], batch_size):
x_batch = Tensor(X_train[i:i+batch_size])
y_batch_np = y_train[i:i+batch_size]
logits = model.forward(x_batch)
num_classes = logits.shape[1]
y_one_hot = np.zeros((y_batch_np.shape[0], num_classes),
dtype=np.float32)
y_one_hot[np.arange(y_batch_np.shape[0]), y_batch_np] = 1
y_one_hot = Tensor(y_one_hot)
loss = softmax_loss(logits, y_one_hot)
# Zero gradients
for p in model.parameters():
p.grad = None
# Backprop
# Gradients are calculated.
loss.backward()
# Parameters (w1,w2) updated using gradients
for p in model.parameters():
p.data -= lr * p.grad
preds = logits.data.argmax(axis=1)
acc = np.mean(preds == y_batch_np)
print(
f"Loss: {loss.data:.4f}, Accuracy: {acc*100:.2f}%"
)
This loop:
- Builds the computation graph.
- Calls backward().
- Updates parameters using
gradients.
After training for some time.
Batch 13: Loss = 0.2163, Accuracy = 96.09%
Batch 14: Loss = 0.1742, Accuracy = 96.09%
Batch 15: Loss = 0.1630, Accuracy = 96.88%
Batch 16: Loss = 0.1862, Accuracy = 95.31%
Batch 17: Loss = 0.1637, Accuracy = 96.09%
Batch 18: Loss = 0.1812, Accuracy = 95.31%
Batch 19: Loss = 0.2156, Accuracy = 94.53%
Batch 20: Loss = 0.1259, Accuracy = 99.22%
Conclusion
At this point, we’ve successfully trained a neural network on MNIST using an autograd engine built entirely from scratch.
This is the core of every modern deep learning library.
Everything that comes next optimizers, deeper networks, CNNs will be built
on top of this same foundation
Want to skip the series and read the full book now?
- Read it for free online: https://zekcrates.quarto.pub/deep-learning-library/
Top comments (0)