DEV Community

Cover image for Mathematical Formulae behind Optimization Algorithms for Neural Networks
Ruthvik Raja M.V
Ruthvik Raja M.V

Posted on

Mathematical Formulae behind Optimization Algorithms for Neural Networks

Hello,
The following topics are covered in this blog:-

  • Introduction

  • Optimization Algorithms:

Gradient Descent (GD)

Stochastic Gradient Descent (SGD)

Mini-batch SGD

SGD with Momentum

AdaGrad

AdaDelta & RMSProp

Adam

  • Conclusion Image description

Download the whole blog from the following link:-
https://github.com/ruthvikraja/Optimization-Algorithms.git

Introduction

  • Neural networks is a subset of Machine Learning in which Neural networks adapt and learn from vast amounts of data.

  • The Neuron is the building block of a Neural network that takes some input, does some mathematical computation by multiplying the input values with their corresponding random weights and finally produces the output.

Image description

  • Each node (Hidden and Output layers) in a Neural network is composed of two functions, namely linear and activation function. In the forward propagation, the Linear function is computed by summation of multiplying previously connected nodes output and corresponding weight, bias as shown in the Figure.

  • After applying the Linear function, Activation functions like Sigmoid, Relu, Leaky Relu, Parametric Relu, Swish Relu, Softplus AF’s etc are implemented based on the problem type and requirement.

Image description

Role of an Optimizer

  • After computing the output at the output layer, the predicted value is compared with the actual value by computing Loss.

  • The Loss function is used to determine the error between the actual and predicted value. The Optimization algorithm is used to determine the new weight values i.e Loss w.r.t change in weights to bring the output of the next trial closer to the actual output.

Image description

Gradient Descent

  • The formula to compute new weights using Gradient Descent is as follows:-

Image description

  • The formula to compute Loss using Gradient Descent is as follows:-

Image description

Image description

Stochastic Gradient Descent

  • The formula to compute new weights using Stochastic Gradient Descent is as follows:-

Image description

  • The formula to compute Loss using Stochastic Gradient Descent is as follows:-

Image description

Image description

Mini-Batch Stochastic Gradient Descent

  • The formula to compute new weights using Mini-Batch Stochastic Gradient Descent is as follows:-

Image description

  • The formula to compute Loss using Mini-Batch Stochastic Gradient Descent is as follows:-

Image description

Image description

Overall Comparison (GD (vs) SGD (vs) Mini-Batch SGD)

Image description

Stochastic Gradient Descent with Momentum

  • The formula to compute new weights using Stochastic Gradient Descent with Momentum is as follows:-

Image description

  • The formula to compute Loss using Stochastic Gradient Descent with Momentum is as follows:-

Image description

Image description

For better illustration, consider the following scenario to calculate Exponential Weighted Average:-

Image description

Image description
Image description

  • Therefore, the final updated formulae to calculate new weights & bias are as follows:-

Image description

Image description

where,

Image description

Image description

Adaptive Gradient Descent

  • The formula to compute new weights using Adaptive Gradient Descent is as follows:-

Image description

  • The formula to compute Loss using Adaptive Gradient Descent is as follows:-

Image description

Image description

where,

Image description
Image description
Image description

Adaptive Learning Rate Method & Root Mean Squared Propagation

  • The formula to compute new weights using AdaDelta & RMSProp is as follows:-

Image description

  • The formula to compute Loss using AdaDelta & RMSProp is as follows:-

Image description

where,

Image description
Image description
Image description
Image description

Adaptive Moment Estimation

  • The formulae to compute new weights & bias using Adam are as follows:-

Image description

  • The formulae to compute Loss for Regression & Classification problems using Adam are as follows:-

Image description
Image description

where,

Image description
Image description

  • When utilising Exponential Weighted Averages, there is a process known as bias correction. Scientists have introduced Bias correction to get better results at the initial time stamps. Therefore, the formulae for Bias correction is as follows:-

Image description

  • The updated Weight & Bias formulae are as follows:-

Image description
Image description

Conclusion

  • In this Presentation, different Optimization algorithms that are available in the field of Artificial Intelligence were discussed in detail to reduce the Loss function of a Neural Network.

  • Overall, Adam Optimizer is comparatively better than other algorithms because it was implemented using some advanced theories.

  • However, there is no guarantee that the Adam optimizer will outperform all the given datasets because it depends on several other features like the type of problem, size of the input data, number of features, etc.

  • Gradient Descent and SGD algorithm work well for small datasets, Mini-batch SGD, SGD with Momentum & RMSProp can be tried on large datasets.

THANK YOU

Top comments (0)