In this article I show how to build a neural network from scratch. The example is simple and short to make it easier to understand but I havenβt took any shortcuts to hide details.
Looking for PyTorch version of this same tutorial? Go here.
import tensorflow as tf
import matplotlib.pyplot as plt
First we create some random data. x is just 1-D tensor and the model will predict one value y.
x = tf.Variable([[1.,2.]])
x.shape
CONSOLE: TensorShape([1, 2])
y = 5.
The parameters are initialized using normal distribution where mean is 0 and variance 1.
def initalize_parameters(size, variance=1.0):
return tf.Variable((tf.random.normal(size) * variance))
first_layer_output_size = 3
weights_1 = initalize_parameters((x.shape[1], first_layer_output_size))
weights_1
CONSOLE: <tf.Variable 'Variable:0' shape=(2, 3) dtype=float32,
numpy=array([
[ 0.0535108 , 1.1256728 , 0.19349864],
[-0.8206305 , 1.8411716 , -0.18347588]],
dtype=float32)>
bias_1 = initalize_parameters([1])
bias_1
CONSOLE: <tf.Variable 'Variable:0' shape=(1,) dtype=float32,
numpy=array([-1.7967013], dtype=float32)>
weights_2 = initalize_parameters((first_layer_output_size,1))
weights_2
CONSOLE: <tf.Variable 'Variable:0' shape=(3, 1) dtype=float32,
numpy=array([[-0.68191385],
[-1.3771404 ],
[-0.59087867]], dtype=float32)>
bias_2 = initalize_parameters([1])
bias_2
CONSOLE: <tf.Variable 'Variable:0' shape=(1,) dtype=float32,
numpy=array([-0.93876433], dtype=float32)>
The neural network contains two linear functions and one non-linear function between them.
def simple_neural_network(xb):
# linear (1,2 @ 2,3 = 1,3)
l1 = xb @ weights_1 + bias_1
# non-linear
l2 = tf.math.maximum(l1, tf.Variable([0.]))
# linear (1,3 @ 3,1 = 1,1)
l3 = l2 @ weights_2 + bias_2
return l3
Loss function measures how close the predictions are to the real values.
def loss_func(preds, yb):
# Mean Squared Error (MSE)
return tf.math.reduce_mean((preds-yb)**2)
Learning rate reduces gradient making sure parameters are not changed too much in each step.
lr = tf.constant([10E-4])
Training contains three simple steps:
- Make prediction
- Calculate how good the prediction was compared to the real value (When calculating loss it automatically calculates gradient so we donβt need to think about it)
- Update parameters by subtracting gradient times learning rate
The code continues taking steps until the loss is less than or equal to 0.1. Finally it plots the loss change.
losses = []
while(len(losses) == 0 or losses[-1] > 0.1):
with tf.GradientTape() as tape:
# 1. predict
preds = simple_neural_network(x)
# 2. loss
loss = loss_func(preds, y)
dW1, db1, dW2, db2 = tape.gradient(loss,
[weights_1, bias_1,
weights_2, bias_2])
# 3. update parameters
weights_1.assign_sub(dW1 * lr)
bias_1.assign_sub(db1 * lr)
weights_2.assign_sub(dW2 * lr)
bias_2.assign_sub(db2 * lr)
losses.append(loss)
plt.plot(list(range(len(losses))), losses)
plt.ylabel('loss (MSE)')
plt.xlabel('steps')
plt.show()
It changes a lot how many steps it takes to get to loss under 0.1.
Top comments (1)
Thanks a lot for the post! Nice showcase of GradientTape for automatic differentiation! π