Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star git-lrc on GitHub to help more developers discover the project. Do give it a try and share your feedback for improving the product.
In the previous article, we introduced backpropagation and learned that neural networks improve by reducing prediction errors.
We also saw that backpropagation relies on two fundamental ideas:
- The Chain Rule
- Gradient Descent
But we haven't yet answered an important question:
How does we calculate wieghts and biases to decrease the error?
To answer that, let's look at a very small neural network.
A Simple Neural Network
Imagine a neural network with:
Similar to the previous example.
- One input neuron
- Two hidden neurons
- One output neuron
Calculating Last Bias In the last layer
Let's asssume we have wieght and bias of all hidden layer and we only want to find last bias b3
Now from gradient descent, we can update the last bias b3 using the partial derivative of loss with respect to b3
The Error rate is done with Residuals.
Residual = Observed - Predicted
SSR = Sum of (Observed - Predicted)^2
So, We take 3 samples for training
Starting, Ending and middle values.
Finaly By calculating SSR.
Use of Chain Rule
We actually calculated b3 only using gradient descent.
Now Using chain Value generated from the weight and bias of previous layers
Predicted = Top Layer + Bottom Layer + Bias (b3)
Using Chain Rule we can write Dirivative of SSR with
dssr/db3 = dssr/dpredicted * dpredicted/db3
dssr/dpredicted = (Observed - Predicted)^2
As predicted, it is not constant and we are dirving it.
dssr/dpredicted = 2*(Observed - Predicted)*(d(Observed - Predicted))/dpredicted)
dssr/dpredicted = 2*(Observed - Predicted)(-1)
dssr/dpredicted = -2(Observed - Predicted)
For dpredicted/db3
dpredicted = Top Layer + Bottom Layer + Bias (b3)
Both Top Layer and Bottom Layer is constant for this calculation
dpredicted/db3 = 1
Finaly dssr/db3 = -2*(Observed - Predicted) * 1
Slop Calculation and Learning
Now we have 3 values of predicted for 3 samples
dssr/db3 = Σ(-2*(Observed-Predicted))
dssr/db3 = -2 * [(Observed1 - Predicted1) * 1 + (Observed2 - Predicted2) * 1 + (Observed3 - Predicted3) * 1]
dssr/db3 = -2 * [(Residual1) + (Residual2) + (Residual3)]
dssr/db3 = -2 * (ResidualSum)
For our training data I got slope = -15.7
step size = slope x learning rate
step size = -15.7 x 0.1 = -1.57
new b3 = old b3 + step size
new b3 = 0 + (-1.57) = -1.57
Then again, recalculating SSR with new b3 we got slop.
slop = -6.26
step size = -6.26 x 0.1 = -0.626
new b3 = -1.57 + (-0.626) = -2.196
Similarly after calculatinng multiple times utile we get step size close to 0.
Final Result
We found the optimal
b3 = 2.21
Conclusion
We could able to apply these chain rule, gradient descent and backpropagation in a very small neural network.
In next article we will discuss how to calculate wieghts and biases in same neural network.
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.



Top comments (0)