Activation functions in PyTorch (1)

#python #pytorch #activationfunction #deeplearning

*Memos:

My post explains heaviside() and Identity().
My post explains ReLU() and LeakyReLU().
My post explains Leaky ReLU, PReLU and FReLU.
My post explains ELU, SELU and CELU.
My post explains GELU, Mish, SiLU and Softplus.
My post explains Tanh, Softsign, Sigmoid and Softmax.
My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
My post explains layers in PyTorch.
My post explains loss functions in PyTorch.
My post explains optimizers in PyTorch.

An activation function is the function or layer which enables neural network to learn complex(non-linear) relationships by transforming the output of the previous layer. *Without activation functions, neural network can only learn linear relationships.

(1) Step function:

can convert an input value(x) to 0 or 1. *If x < 0, then 0 while if x >= 0, then 1.
is also called Binary step function, Unit step function, Binary threshold function, Threshold function, Heaviside step function or Heaviside function.
is heaviside() in PyTorch.
's pros:
- It's simple, only expressing the two values 0 and 1.
- It avoids Exploding Gradient Problem.
's cons:
- is rarely used in Deep Learning because the cons are more than other activation functions.
- It can only express the two values 0 and 1 so the created model has bad accuracy, predicting inaccurately. *The activation functions which can express wider values can create the model of good accuracy, predicting accurately.
- It causes Dying ReLU Problem.
- It's non-differentiable at x = 0. *The gradient for step function doesn't exist at x = 0 during Backpropagation which does differential to calculate and get a gradient.
's graph in Desmos:

(2) Identity:

can just return the same value as an input value(x) without any conversion.
's formula is y = x.
is also called Linear function.
is Identity() in PyTorch.
's pros:
- It's simple, just returning the same value as an input value.
's cons:
- It's non-differentiable at x = 0.
's graph in Desmos:

(3) ReLU(Rectified Linear Unit):

can convert an input value(x) to the output value between 0 and x. *If x < 0, then 0 while if 0 <= x, then x.
's formula is y = max(0, x).
is ReLU() in PyTorch.
is used in:
- Binary Classification Model.
- Multi-Class Classification Model.
- CNN(Convolutional Neural Network).
- RNN(Recurrent Neural Network). *RNN in PyTorch.
- Transformer. *Transformer() in PyTorch.
- NLP(Natural Language Processing) based on RNN.
- GAN(Generative Adversarial Network).
's pros:
- It mitigates Vanishing Gradient Problem.
's cons:
- It causes Dying ReLU Problem.
- It's non-differentiable at x = 0.
's graph in Desmos:

DEV Community

Activation functions in PyTorch (1)

Top comments (0)