In this article I show how to build a neural network from scratch. The example is simple and short to make it easier to understand but I haven’t took any shortcuts to hide details.
Looking for PyTorch version of this same tutorial? Go here.
import tensorflow as tf
import matplotlib.pyplot as plt
First we create some random data. x is just 1-D tensor and the model will predict one value y.
x = tf.Variable([[1.,2.]])
x.shape
CONSOLE: TensorShape([1, 2])
y = 5.
The parameters are initialized using normal distribution where mean is 0 and variance 1.
def initalize_parameters(size, variance=1.0):
return tf.Variable((tf.random.normal(size) * variance))
first_layer_output_size = 3
weights_1 = initalize_parameters((x.shape[1], first_layer_output_size))
weights_1
CONSOLE: <tf.Variable 'Variable:0' shape=(2, 3) dtype=float32,
numpy=array([
[ 0.0535108 , 1.1256728 , 0.19349864],
[-0.8206305 , 1.8411716 , -0.18347588]],
dtype=float32)>
bias_1 = initalize_parameters([1])
bias_1
CONSOLE: <tf.Variable 'Variable:0' shape=(1,) dtype=float32,
numpy=array([-1.7967013], dtype=float32)>
weights_2 = initalize_parameters((first_layer_output_size,1))
weights_2
CONSOLE: <tf.Variable 'Variable:0' shape=(3, 1) dtype=float32,
numpy=array([[-0.68191385],
[-1.3771404 ],
[-0.59087867]], dtype=float32)>
bias_2 = initalize_parameters([1])
bias_2
CONSOLE: <tf.Variable 'Variable:0' shape=(1,) dtype=float32,
numpy=array([-0.93876433], dtype=float32)>
The neural network contains two linear functions and one non-linear function between them.
def simple_neural_network(xb):
# linear (1,2 @ 2,3 = 1,3)
l1 = xb @ weights_1 + bias_1
# non-linear
l2 = tf.math.maximum(l1, tf.Variable([0.]))
# linear (1,3 @ 3,1 = 1,1)
l3 = l2 @ weights_2 + bias_2
return l3
Loss function measures how close the predictions are to the real values.
def loss_func(preds, yb):
# Mean Squared Error (MSE)
return tf.math.reduce_mean((preds-yb)**2)
Learning rate reduces gradient making sure parameters are not changed too much in each step.
lr = tf.constant([10E-4])
Training contains three simple steps:
- Make prediction
- Calculate how good the prediction was compared to the real value (When calculating loss it automatically calculates gradient so we don’t need to think about it)
- Update parameters by subtracting gradient times learning rate
The code continues taking steps until the loss is less than or equal to 0.1. Finally it plots the loss change.
losses = []
while(len(losses) == 0 or losses[-1] > 0.1):
with tf.GradientTape() as tape:
# 1. predict
preds = simple_neural_network(x)
# 2. loss
loss = loss_func(preds, y)
dW1, db1, dW2, db2 = tape.gradient(loss,
[weights_1, bias_1,
weights_2, bias_2])
# 3. update parameters
weights_1.assign_sub(dW1 * lr)
bias_1.assign_sub(db1 * lr)
weights_2.assign_sub(dW2 * lr)
bias_2.assign_sub(db2 * lr)
losses.append(loss)
plt.plot(list(range(len(losses))), losses)
plt.ylabel('loss (MSE)')
plt.xlabel('steps')
plt.show()
It changes a lot how many steps it takes to get to loss under 0.1.
Top comments (1)
Thanks a lot for the post! Nice showcase of GradientTape for automatic differentiation! 😀