DEV Community: Leo Gau

How I Go From 70 Lines Of Code To Only 26 Using The NumPy Library

Leo Gau — Sat, 03 Apr 2021 17:18:48 +0000

Context

In my previous post I implemented a nerual network to understand handwritten digits using matrix math functions I implemented myself.

Luckily, I don't have to be doing this going forward in the future. The NumPy library will do this all for me. Numpy is the one of the foundational libraries in the Python scientific computing ecosystem. It provides a high-performance, multidimentional array object which makes it fast and easy to work with matrices.

In this post, I'll show you how I use NumPy to replace the hand written math functions I wrote.

The Original Code Using Only Python

Below is the original code with my owm matrix multiplication functions.

def flatten_image(image):
    return list(itertools.chain.from_iterable(image))

def weighted_sum(a, b):
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += (a[i] * b[i])
    return output

def vector_matrix_multiplication(a, b):
    output = [0 for i in range(10)]
    for i in range(len(output)):
        assert(len(a) == len(b[i]))
        output[i] = weighted_sum(a, b[i])
    return output

def zeros_matrix(rows, cols):
    output = []
    for r in range(rows):
        output.append([0 for col in range(cols)])
    return output

def outer_product(a, b):
    output = zeros_matrix(len(a), len(b))
    for i in range(len(a)):
        for j in range(len(b)):
            output[i][j] = a[i] * b[j]
    return output

class NeuralNet:
    def __init__(self):
        self.weights = [
            [0.0000 for i in range(784)],
            [0.0001 for i in range(784)],
            [0.0002 for i in range(784)],
            [0.0003 for i in range(784)],
            [0.0004 for i in range(784)],
            [0.0005 for i in range(784)],
            [0.0006 for i in range(784)],
            [0.0007 for i in range(784)],
            [0.0008 for i in range(784)],
            [0.0009 for i in range(784)]
        ]
        self.alpha = 0.0000001

    def predict(self, input):
        return vector_matrix_multiplication(input, self.weights)

    def train(self, input, labels, epochs):
        for i in range(epochs):
            for j in range(len(input)):
                pred = self.predict(input[j])

                label = labels[j]
                goal = [0 for k in range(10)]
                goal[label] = 1

                error = [0 for k in range(10)]
                delta = [0 for k in range(10)]

                for a in range(len(goal)):
                    delta[a] = pred[a] - goal[a]
                    error[a] = delta[a] ** 2

                weight_deltas = outer_product(delta, input[j])

                for x in range(len(self.weights)):
                    for y in range(len(self.weights[0])):
                        self.weights[x][y] -= (self.alpha * weight_deltas[x][y])

Implement The Helper Functions With NumPy

To help me better understand how NumPy slots into this code, I'm going to keep my helper functions but implement them using NumPy. For example, I still have a weighted sum function but instead of hand calculating the weighted sum, I use the NumPy dot function.

def flatten_image(image):
    return image.reshape(1, 28*28)

def weighted_sum(a, b):
    return a.dot(b)

def vector_matrix_multiplication(a, b):
    return np.matmul(input, weights.T)

def zeros_matrix(rows, cols):
    return np.zeros((rows, cols))

def outer_product(a, b):
    return np.outer(a, b)

class NeuralNet:
    def __init__(self):
        self.weights = np.random.random((10, 28 * 28)) * 0.0001
        self.alpha = 0.0000001

    def predict(self, input):
        return vector_matrix_multiplication(input, self.weights)

    def train(self, input, labels, epochs):
        for i in range(epochs):
            for j in range(len(input)):
                pred = self.predict(input[j])

                label = labels[j]
                goal = np.zeros(10)
                goal[label] = 1                       

                delta = pred - goal
                error = delta ** 2

                weight_deltas = outer_product(delta, input[j])

                self.weights -= (self.alpha * weight_deltas)

Use The NumPy Functions Inline

Already you can see the code is a lot cleaner.

If we remove those helper functions and do everything inline, the code shrinks even more. The original implmentation is 70 lines long and this one is only 26 lines.

The NumPy library is doing a lot of work for me.

def flatten_image(image):
    return image.reshape(1, 28*28)

class NeuralNet:
    def __init__(self):
        self.weights = np.random.random((10, 28 * 28)) * 0.0001
        self.alpha = 0.0000001

    def predict(self, input):
        return np.matmul(input, self.weights.T)

    def train(self, input, labels, epochs):
        for i in range(epochs):
            for j in range(len(input)):
                pred = self.predict(input[j])

                label = labels[j]
                goal = np.zeros(10)
                goal[label] = 1

                delta = pred - goal
                error = delta ** 2

                weight_deltas = np.outer(delta, input[j])

                self.weights -= (self.alpha * weight_deltas)

import itertools
import numpy as np

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

images = x_train
labels = y_train

prepared_images = [flatten_image(image) for image in images]
prepared_labels = np.array(labels)

nn = NeuralNet()
nn.train(prepared_images, prepared_labels, 5)

test_set = x_test
test_labels = y_test
num_correct = 0
for i in range(len(test_set)):
    prediction = nn.predict(flatten_image(test_set[i]))
    correct = test_labels[i]
    if np.argmax(prediction) == int(correct):
        num_correct += 1

print(str(num_correct/len(test_set) * 100) + "%")

76.05%

So What's Next?

The code is now so efficient I can train on the full dataset and the accuracy gets a little bump.

In the next post we'll see if adding a layer helps improve this accuracy even more.

How I Identify Handwritten Digits Using Only Python

Leo Gau — Tue, 02 Mar 2021 17:22:10 +0000

What I'm Building

In this post I'll show you how I built a neural network which takes an array of numbers representing a handwritten digit and output a prediction of what digit it is.

The handwritten digits are from the famous MNIST dataset. The Modified National Institute of Standards and Technology (MNIST) dataset is a collection of 60,000 small, square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.

The task is to classify a given image into one of the 10 digits.

I’m doing it all in Python.

Let's get started.

The Code

import itertools

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

images = x_train[0:1000]
labels = y_train[0:1000]

def flatten_image(image):
    return list(itertools.chain.from_iterable(image))

def weighted_sum(a, b):
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += (a[i] * b[i])
    return output

def vector_matrix_multiplication(a, b):
    output = [0 for i in range(10)]
    for i in range(len(output)):
        assert(len(a) == len(b[i]))
        output[i] = weighted_sum(a, b[i])
    return output

def zeros_matrix(rows, cols):
    output = []
    for r in range(rows):
        output.append([0 for col in range(cols)])
    return output

def outer_product(a, b):
    output = zeros_matrix(len(a), len(b))
    for i in range(len(a)):
        for j in range(len(b)):
            output[i][j] = a[i] * b[j]
    return output

class NeuralNet:
    def __init__(self):
        self.weights = [
            [0.0000 for i in range(784)],
            [0.0001 for i in range(784)],
            [0.0002 for i in range(784)],
            [0.0003 for i in range(784)],
            [0.0004 for i in range(784)],
            [0.0005 for i in range(784)],
            [0.0006 for i in range(784)],
            [0.0007 for i in range(784)],
            [0.0008 for i in range(784)],
            [0.0009 for i in range(784)]
        ]
        self.alpha = 0.0000001

    def predict(self, input):
        return vector_matrix_multiplication(input, self.weights)

    def train(self, input, labels, epochs):
        for i in range(epochs):
            for j in range(len(input)):
                pred = self.predict(input[j])

                label = labels[j]
                goal = [0 for k in range(10)]
                goal[label] = 1

                error = [0 for k in range(10)]
                delta = [0 for k in range(10)]

                for a in range(len(goal)):
                    delta[a] = pred[a] - goal[a]
                    error[a] = delta[a] ** 2

                weight_deltas = outer_product(delta, input[j])

                for x in range(len(self.weights)):
                    for y in range(len(self.weights[0])):
                        self.weights[x][y] -= (self.alpha * weight_deltas[x][y])

# Train on first image
first_image = images[0]
first_label = labels[0]
input = [flatten_image(first_image)]
label = [first_label]

nn = NeuralNet()
nn.train(input, label, 5)

prediction = nn.predict(input[0])
print(prediction)
print("The label is: " + str(label[0]) + ". The prediction is: " + str(prediction.index(max(prediction))))

# Train on full dataset
prepared_images = [flatten_image(image) for image in images]
mm = NeuralNet()
mm.train(prepared_images, labels, 45)

# Test 1 prediction
prediction = mm.predict(prepared_images[3])
print("That image is the number " + str(prediction.index(max(prediction))))

# Calculate accuracy
test_set = x_test[0:100]
test_labels = y_test[0:100]
num_correct = 0
for i in range(len(test_set)):
    prediction = mm.predict(flatten_image(test_set[i]))
    correct = test_labels[i]

    if prediction.index(max(prediction)) == int(correct):
        num_correct += 1

print(str(num_correct/len(test_set) * 100) + "%")

Get Dataset

The keras library helpfully includes the dataset so I can import it from the library.

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

images = x_train[0:1000]
labels = y_train[0:1000]

When I call load_data(), I get back two tuples: a training set and a test set. To successfully finishing training on my personal laptop, I had to limit the data to the first 1000 elements. When I tried training on the full data set, it was hadn't finished after a full 24 hours and I had to kill the process to use my laptop :D.

With only 1000 images, the best accuracy I achieved was about 75%. Maybe you can tweak the numbers and get something better!

Getting back to the data, if I take a look at one of the images in the training set, I see that it is an array of arrays - a matrix. The numbers range from 0 to 255 - each representing the greyscale value of the pixel at a particular position in the image.

images[0]

array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
         18,  18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  30,  36,  94, 154, 170,
        253, 253, 253, 253, 253, 225, 172, 253, 242, 195,  64,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  49, 238, 253, 253, 253, 253,
        253, 253, 253, 253, 251,  93,  82,  82,  56,  39,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  18, 219, 253, 253, 253, 253,
        253, 198, 182, 247, 241,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  80, 156, 107, 253, 253,
        205,  11,   0,  43, 154,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,  14,   1, 154, 253,
         90,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 139, 253,
        190,   2,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  11, 190,
        253,  70,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  35,
        241, 225, 160, 108,   1,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         81, 240, 253, 253, 119,  25,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,  45, 186, 253, 253, 150,  27,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,  16,  93, 252, 253, 187,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0, 249, 253, 249,  64,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,  46, 130, 183, 253, 253, 207,   2,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  39,
        148, 229, 253, 253, 253, 250, 182,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  24, 114, 221,
        253, 253, 253, 253, 201,  78,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  23,  66, 213, 253, 253,
        253, 253, 198,  81,   2,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  18, 171, 219, 253, 253, 253, 253,
        195,  80,   9,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,  55, 172, 226, 253, 253, 253, 253, 244, 133,
         11,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0, 136, 253, 253, 253, 212, 135, 132,  16,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)

If I look at the first label, I see the number five. This means that the collection of numbers in images[0] represents is the number 5.

labels[0]

Prepare Data

The matrix math that I implement does not know how to handle an array of arrays so, the first thing I do is prepare the data by flattening the image into a single array.

import itertools

def flatten_image(image):
    return list(itertools.chain.from_iterable(image))

What I'm doing in this function is using the itertools library to flatten the array. Specifically, I'm using the .chain.from_iterable() method to give me one element at a time. Then I use the list() function to create a flat list to return.

When I print the first image, I see that all the numbers are in one flat array.

print(flatten_image(images[0]))

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 18, 18, 18, 126, 136, 175, 26, 166, 255, 247, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 36, 94, 154, 170, 253, 253, 253, 253, 253, 225, 172, 253, 242, 195, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 49, 238, 253, 253, 253, 253, 253, 253, 253, 253, 251, 93, 82, 82, 56, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 219, 253, 253, 253, 253, 253, 198, 182, 247, 241, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 80, 156, 107, 253, 253, 205, 11, 0, 43, 154, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1, 154, 253, 90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139, 253, 190, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 190, 253, 70, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 35, 241, 225, 160, 108, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 240, 253, 253, 119, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 186, 253, 253, 150, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 93, 252, 253, 187, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 249, 253, 249, 64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 46, 130, 183, 253, 253, 207, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 39, 148, 229, 253, 253, 253, 250, 182, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 114, 221, 253, 253, 253, 253, 201, 78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 66, 213, 253, 253, 253, 253, 198, 81, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 171, 219, 253, 253, 253, 253, 195, 80, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 55, 172, 226, 253, 253, 253, 253, 244, 133, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 136, 253, 253, 253, 212, 135, 132, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Matrix Math Helper Functions

Now that I've prepared the data, I can move on to the next step - implement matrix math.

Since I'm working with arrays, I need math functions which understand arrays. You may remember from the previous post that a neural network makes predictions by multiplying the input by the weights. So one thing I need to do now is figure out how to do matrix multiplication.

In order to do matrix multipliation, I need a method to calculate weighted sums.

def weighted_sum(a, b):
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += (a[i] * b[i])
    return output

The weighted sum function takes two arrays of the same length. It multiplies each number in the same index and adds the result to a running sum. So the weighted sum takes two arrays and gives you back a single number.

The best way to think about what this single number represents is as a score of similarity between two arrays. The higher the weighted sum, the more similar arrays a and b are to each other. Roughly speaking, the neural network will give higher scores to inputs that are more similar to its weights.

def vector_matrix_multiplication(a, b):
    output = [0 for i in range(10)]
    for i in range(len(output)):
        assert(len(a) == len(b[i]))
        output[i] = weighted_sum(a, b[i])
    return output

Next, I have the matrix multiplication method. This calculates the weighted sum between weight and input for each position in the array. When it's done, I get an array of weighted sums.

In my case, the returned output of 10 elements contain the probability of which digit the input represents. Whichever index has the highest number is the prediction for what digit is in the image.

I need two other matrix math helpers. These functions will be used to adjust the weights in the right direction.

First, I have a zeros matrix method which creates a matrix filled with zeros.

def zeros_matrix(rows, cols):
    output = []
    for r in range(rows):
        output.append([0 for col in range(cols)])
    return output

This is used to implement a function to calculate the outer product of two matrices.

The outer product does an elementwise multiplication between two matricies. This will be used to tell the neural network how to change its weights.

def outer_product(a, b):
    output = zeros_matrix(len(a), len(b))
    for i in range(len(a)):
        for j in range(len(b)):
            output[i][j] = a[i] * b[j]
    return output

Okay. That's a lot of math. Let's find out how these functions are being used in the neural network.

Neural Network

class NeuralNet:
    def __init__(self):
        self.weights = [
            [0.0000 for i in range(784)],
            [0.0001 for i in range(784)],
            [0.0002 for i in range(784)],
            [0.0003 for i in range(784)],
            [0.0004 for i in range(784)],
            [0.0005 for i in range(784)],
            [0.0006 for i in range(784)],
            [0.0007 for i in range(784)],
            [0.0008 for i in range(784)],
            [0.0009 for i in range(784)]
        ]
        self.alpha = 0.0000001

    def predict(self, input):
        return vector_matrix_multiplication(input, self.weights)

    def train(self, input, labels, epochs):
        for i in range(epochs):
            for j in range(len(input)):
                pred = self.predict(input[j])

                label = labels[j]
                goal = [0 for k in range(10)]
                goal[label] = 1

                error = [0 for k in range(10)]
                delta = [0 for k in range(10)]

                for a in range(len(goal)):
                    delta[a] = pred[a] - goal[a]
                    error[a] = delta[a] ** 2

                weight_deltas = outer_product(delta, input[j])

                for x in range(len(self.weights)):
                    for y in range(len(self.weights[0])):
                        self.weights[x][y] -= (self.alpha * weight_deltas[x][y])

This neural network is similar to the one from the previous post. The only real difference is that we're using an array of numbers instead of a single number.

In the initializer, I have the weights and the alpha. I've initialized each weight array to have 784 elements of an initial number. 784 is the number of pixels in the image.

def __init__(self):
    self.weights = [
        [0.0000 for i in range(784)],
        [0.0001 for i in range(784)],
        [0.0002 for i in range(784)],
        [0.0003 for i in range(784)],
        [0.0004 for i in range(784)],
        [0.0005 for i in range(784)],
        [0.0006 for i in range(784)],
        [0.0007 for i in range(784)],
        [0.0008 for i in range(784)],
        [0.0009 for i in range(784)]
    ]
    self.alpha = 0.0000001

The prediction function is again multplying the input by the weights.

def predict(self, input):
    return vector_matrix_multiplication(input, self.weights)

The training function iterates through the dataset an epoch number of times.

for i in range(epochs):
    for j in range(len(input)):

For each image, it makes a prediction

pred = self.predict(input[j])

Next we transform the label into a format that the neural network expects.

label = labels[j]
goal = [0 for k in range(10)]
goal[label] = 1

I create an array of ten 0s and then set the index of the goal prediction to 1. So all the wrong answers are 0 and the right answer is 1.

Next, I calculate the error and the delta.

error = [0 for k in range(10)]
delta = [0 for k in range(10)]

for a in range(len(goal)):
    delta[a] = pred[a] - goal[a]
    error[a] = delta[a] ** 2

I then calculate the weight deltas by using an outer product between delta and the input.

weight_deltas = outer_product(delta, input[j])

Finally I update all the weights using the weight deltas.

for x in range(len(self.weights)):
    for y in range(len(self.weights[0])):
        self.weights[x][y] -= (self.alpha * weight_deltas[x][y])

The main takeaway here is that this is exactly like the neural network with one digit. The only difference is that the math is done on arrays instead of on single numbers.

Training The Network On The First Data Point

Let's put this new network into action. To test it out, I take take the first image and the first label. I create a neural network and train it on that first image and label for five epochs. When I predict the digit on that same image, I see the output array is an array of 10 numbers.

first_image = images[0]
first_label = labels[0]
input = [flatten_image(first_image)]
label = [first_label]

nn = NeuralNet()
nn.train(input, label, 5)

prediction = nn.predict(input[0])
print(prediction)
print("The label is: " + str(label[0]) + ". The prediction is: " + str(prediction.index(max(prediction))))

[0.0, 0.03036370905054081, 0.06072741810108162, 0.09109112715162263, 0.12145483620216324, 1.1407872249800253, 0.18218225430324525, 0.21254596335378556, 0.24290967240432648, 0.2732733814548679]
The label is: 5. The prediction is: 5

The number in index five is the greatest, so the network correctly identified the handwritten number of the number 5.

It works on one data point but what about the entire data set?

Let's do that next.

Training The Network On All The Whole Dataset

I prepare the images by flattening every image in our data set. Again, this is the first 1000 from the MNIST dataset. I create the neural network, giving it the prepared images and labels.

I run it for 5 epochs. Through trial and error I found that 5 epochs gives me the highest accuracy of just under 75%.

When it's finished, I test the network by making a prediction on a random image. It correctly identified the image.

prepared_images = [flatten_image(image) for image in images]

mm = NeuralNet()
mm.train(prepared_images, labels, 5)

prediction = mm.predict(prepared_images[3])
print("That image is the number " + str(prediction.index(max(prediction))))

That image is the number 1

labels[3]

To test the true accuracy, I use the test data and labels.

I run through a loop of the test set, make a prediction, checking its accuracy, and counting the number correct.

test_set = x_test
test_labels = y_test
num_correct = 0
for i in range(len(test_set)):
    prediction = mm.predict(flatten_image(test_set[i]))
    correct = test_labels[i]
    if prediction.index(max(prediction)) == int(correct):
        num_correct += 1

print(str(num_correct/len(test_set) * 100) + "%")

74.47%

In the end, I'm able to correctly predict 3 out of every 4 images in the test set.

So What Did We Do?

This was a fun little exercise to see how neural networks use matrix math to make predictions.

What's Next?

In the next post, I’ll experiment with adding multiple layers to make the network "deep". I'll also swap my handwritten matrix math functions for NumPy functions and see how much easier it makes some of this for me.

See you next time!

How I Implemented The Most Simple Neural Network Using Python

Leo Gau — Mon, 15 Feb 2021 17:02:09 +0000

Context

I've been reading the book Grokking Deep Learning by Andrew W. Trask and instead of summarizing concepts, I want to review them by building a simple neural network. This neural network will use the concepts in the first 4 chapters of the book.

What I'm Building

I'm going to build a neural network which outputs a target number given a specific input number. For example, given the number 5, I want the neural network to output the number 42.

Now I can hear you think to yourself, "That's stupid. How is that better than a function with the line return 42 in the body?"

What's cool about this code is that I didn't type the number 5 or 42 anywhere in the body of the network. Instead, I told the network I wanted it to print 42 when it received 5 as an input and it figure out how to adjust itself to do that.

In fact, I could train the network on any 2 numbers using the same code. Try changing the parameters yourself and test it out!

With that context, let's see what the code looks like for this most simple neural network.

The Code

# A simple neural network class
class SimpleNN:
    def __init__(self):
        self.weight = 1.0
        self.alpha = 0.01

    def train(self, input, goal, epochs):
        for i in range(epochs):
            pred = input * self.weight
            delta = pred - goal
            error = delta ** 2
            derivative = delta * input
            self.weight = self.weight - (self.alpha * derivative)
            print("Error: " + str(error))

    def predict(self, input):
        return input * self.weight

# Create a new SimpleNN
neural_network = SimpleNN()
# Train the SimpleNN 
neural_network.train(input=5, goal=42, epochs=20)

Error: 1369.0
Error: 770.0625
Error: 433.16015625
Error: 243.6525878906251
Error: 137.05458068847665
Error: 77.09320163726807
Error: 43.36492592096329
Error: 24.39277083054185
Error: 13.72093359217979
Error: 7.718025145601132
Error: 4.341389144400637
Error: 2.442031393725358
Error: 1.373642658970514
Error: 0.7726739956709141
Error: 0.43462912256489855
Error: 0.24447888144275018
Error: 0.13751937081154697
Error: 0.07735464608149517
Error: 0.043511988420844
Error: 0.02447549348672308

neural_network.predict(5)

41.88266515825944

After 20 rounds of training, the network's final prediction is off by about 0.02. Not bad!

Even in this barebones neural network, there's a lot going on. Let's take it line by line.

Neural Networks

A neural network is a collection of weights being used to compute an error function. That's it.

The interesting thing about this statement is that for any error function, no matter how complicated, you can compute the relationship between a weight and the final error of the network. Therefore, after each prediction, we can change each weight in the network to inch the final error towards 0.

Let's take a look what a neural network needs to make a prediction.

The 2 Things A Neural Network Needs To Make A Prediction

The Weight

self.weight = 1.0

I mentioned before that a neural network is just "a collection of weights". So what are weights?

weight is a number that the neural network stores and remembers. It can be thought of of the "memory" of the network. After each round of training, the network updates the weight to make more accurate predictions.

In our network, I set weight=1.0. I just used trial-and-error to figure out a good starting weight for this problem.

The Input

def train(self, input, goal, epochs):

def predict(self, input):

input is a number that the neural network accepts. This can be thought of as information from the outside world.

In our network, I set input=5 when I start training the network.

So how does this thing learn?

I use a method called Stochasitc Gradient Descent to get SimpleNN to learn the training data.

At a high level, the 4 step process is:

Make a prediction using a given input
Calculate the error
Calculate the derivative to tell us how much to adjust the weights by
Adjust the weight and go back to step 1.

1. The Prediction

pred = input * self.weight

When the neural network has both an input and weight, it multiplies them together to make a prediction. Every single neural network, from the most simple to ones with 1000s of layers works this way.

2. How much are we off by?

delta = pred - goal
error = delta ** 2

So we've seen that the network make a prediction by multiplying input and weight. After it makes a prediction, the network is able to calculate how much it was off by.

A neural network learning is all about error attribution. How much did each weight contribute to the overall error of the system and how can we change the weight so that error is minimized? In our example, it's easy to figure out since there is only 1 weight.

How do we calculate the error? One thing we need to keep in mind is that we want the error to be a positive number. If the error is allowed to be negative, multiple errors might accidentally cancel each other out when averaged together.

In our case, we square the amount we are off by. Why square instead of something straightforward like absolute value? Squaring gives us a sense of importance. Large errors are magnified while small errors are minimized. Therefore, we can prioritize large errors before small errors. Absolute value doesn't give us this additional sense of importance.

3. Adjusting the weights

derivative = delta * input
self.weight = self.weight - (self.alpha * derivative)

The network figures out how much to adjust the weights by using a derivative. How does derivative play into this process? What a derivative tells us is the direction and amount one variable changes when you change a different variable. In our case, derivatives tell us much much error changes when you change the weight. Given that we want error to be 0, this is exactly what we need.

The network calculates the derivative by multiplying the delta by the weight's input to get the weight_delta. weight_delta is the direction and the amount we're going to change the weight by.

self.alpha = 0.01

One bit of nuance is the variable alpha. alpha is a throttle limiting how much we actually adjust the weights by. Determining the appropriate rate of change for the weights of a neural network is a challenge. If the steps are too large, the network will overshoot the error getting to zero and start acting in unpredictable ways. If the steps are too small, the network will take a long time and need a very large number of training cycles.

The solution to this problem is to multiply partial derivative by a single number between 0 and 1. This lets us control the rate of change and adjust the learning as needed.

Finding the appropriate alpha is often done through trial and error so we're just going to hard code is here.

4. Training rounds

neural_network.train(input=5, goal=42, epochs=20)

for i in range(epochs):

Finally, there's the concept of epochs. This refers to the number of times the network will go through the entire data set. The appropriate number of epochs for a problem will often be found through trial and error.

I'm using 20 in the example, which I found by running the training with different epochs and picking the lowest one with an acceptable error. Feel free to experiment with the number of epochs and see what happens at different numbers.

So what did I accomplish?

I'm able to give the neural network the number 5, and have it output a number very close to our goal number 42 without putting the number 5 or 42 in the body of the function.

I also learned the basic parts which make up all neural networks and we learned the process of how the network learns.

As we start to move into networks with multiple inputs, multiple outputs, and multiple layers, it's going to get a lot more complicated. However, the mental model stays the same. The network makes a prediction by multiplying the received input with its stored weights. It measures the error, takes the derivative, and adjusts the weights so that error moves towards 0. Then it goes again.

What's next?

I'm going to tackle multiple inputs and multiple outputs. I'll see how matrices come into play and how we can build a simple library to do matrix math.

See you then!