DEV Community

Cover image for Let’s Build a Deep Learning Library from Scratch Using NumPy (Part 3: Training MNIST)
zekcrates
zekcrates

Posted on

Let’s Build a Deep Learning Library from Scratch Using NumPy (Part 3: Training MNIST)

Introduction

In Part 1, we built the Tensor class and a computation graph.
In Part 2, we implemented automatic differentiation from scratch.

In this part, we will:

  • Use our custom autograd engine (babygrad)
  • Build a small neural network
  • Train it on the MNIST handwritten digits dataset

Missed Part 1?

Read it here: https://dev.to/zekcrates/lets-build-a-deep-learning-library-from-scratch-using-numpy-part-1-32p9

Want to skip the series and read the full book now?

Loading MNIST data

You can easily download MNIST data and the files look like this

# data/ 
t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz
train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz

Enter fullscreen mode Exit fullscreen mode

The images file should return (num_images,784) and labels files should return (num_images) where labels are in the range 0–9.

import struct
import gzip
import numpy as np

def parse_mnist(image_filename, label_filename):
    with gzip.open(image_filename, 'rb') as f:
        magic, num_images, rows, cols = struct.unpack('>IIII', f.read(16))
        image_data = np.frombuffer(f.read(), dtype=np.uint8)
        images = image_data.reshape(num_images, rows * cols)

    with gzip.open(label_filename, "rb") as f:
        magic, num_labels = struct.unpack('>II', f.read(8))
        labels = np.frombuffer(f.read(), dtype=np.uint8)

    images = images.astype(np.float32) / 255.0
    return images, labels

Enter fullscreen mode Exit fullscreen mode

Now that we have our data lets create a simple model that we will train on the data.

It will have only 2 weights(W1,W2)

W1 = (784, 100)
W2 = (100, 10)
Enter fullscreen mode Exit fullscreen mode
from babygrad import Tensor, ops

class SimpleNN:
    def __init__(self, input_size, hidden_size, num_classes):
        self.W1 = Tensor(
            np.random.randn(input_size, hidden_size).astype(np.float32)
            / np.sqrt(hidden_size),
            requires_grad=True
        )
        self.W2 = Tensor(
            np.random.randn(hidden_size, num_classes).astype(np.float32)
            / np.sqrt(num_classes),
            requires_grad=True
        )
    def forward(self, x):
        z1 = x @ self.W1
        a1 = ops.relu(z1)
        logits = a1 @ self.W2
        return logits
    def parameters(self):
        return [self.W1, self.W2]

Enter fullscreen mode Exit fullscreen mode

The model will take an image of size (784,) first go through W1

(5,784) @ (784,100) -> (5,100)
x @ self.W1

#Note: The '@' is our `matmul` function defined in the previous part.
Enter fullscreen mode Exit fullscreen mode

Note: The @ operator uses our custom matmul op implemented in Part 2.

Now the shape of the image after passing through W1 is (5,100).

We only have 10 labels (digits 0–9) so that means the model predicts one value out of those 10.

We will send the above result to W2.

(5,100) @ (100,10) = (5,10)
logits = a1 @ self.W2
Enter fullscreen mode Exit fullscreen mode

Now we have (5,10).

The output (5, 10) contains raw class scores (logits) for each digit.

But logits alone aren’t enough we need a loss function.

This is the loss which we will decrease by updating our (W1,W2) by using their gradients.

Loss function

def softmax_loss(logits: Tensor, y_true: Tensor) -> Tensor:
    batch_size = logits.shape[0]
    log_sum_exp = ops.log(ops.exp(logits).sum(axes=1))
    z_y = (logits * y_true).sum(axes=1)
    loss = log_sum_exp - z_y
    return loss.sum() / batch_size

Enter fullscreen mode Exit fullscreen mode

This gives us a single loss value.

We now have everything

  • Data
  • Model
  • Loss function

The only thing left is to train this model using the data . We can do this by adding a training loop.

Training loop

def train_epoch(model, X_train, y_train, lr, batch_size):
    for i in range(0, X_train.shape[0], batch_size):
        x_batch = Tensor(X_train[i:i+batch_size])
        y_batch_np = y_train[i:i+batch_size]

        logits = model.forward(x_batch)

        num_classes = logits.shape[1]
        y_one_hot = np.zeros((y_batch_np.shape[0], num_classes),
                             dtype=np.float32)
        y_one_hot[np.arange(y_batch_np.shape[0]), y_batch_np] = 1
        y_one_hot = Tensor(y_one_hot)

        loss = softmax_loss(logits, y_one_hot)

        # Zero gradients
        for p in model.parameters():
            p.grad = None

        # Backprop
        # Gradients are calculated.
        loss.backward()

        # Parameters (w1,w2) updated using gradients
        for p in model.parameters():
            p.data -= lr * p.grad

        preds = logits.data.argmax(axis=1)
        acc = np.mean(preds == y_batch_np)

        print(
            f"Loss: {loss.data:.4f}, Accuracy: {acc*100:.2f}%"
        )
Enter fullscreen mode Exit fullscreen mode

This loop:

  • Builds the computation graph.
  • Calls backward().
  • Updates parameters using gradients.

After training for some time.

  Batch  13: Loss = 0.2163, Accuracy = 96.09%
  Batch  14: Loss = 0.1742, Accuracy = 96.09%
  Batch  15: Loss = 0.1630, Accuracy = 96.88%
  Batch  16: Loss = 0.1862, Accuracy = 95.31%
  Batch  17: Loss = 0.1637, Accuracy = 96.09%
  Batch  18: Loss = 0.1812, Accuracy = 95.31%
  Batch  19: Loss = 0.2156, Accuracy = 94.53%
  Batch  20: Loss = 0.1259, Accuracy = 99.22%
Enter fullscreen mode Exit fullscreen mode

Conclusion

At this point, we’ve successfully trained a neural network on MNIST using an autograd engine built entirely from scratch.

This is the core of every modern deep learning library.
Everything that comes next optimizers, deeper networks, CNNs will be built
on top of this same foundation

Want to skip the series and read the full book now?

Top comments (0)