DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

1

Activation functions in PyTorch (2)

Buy Me a Coffee

*Memos:

(1) Leaky ReLU(Leaky Rectified Linear Unit):

  • is improved ReLU, being able to mitigate Dying ReLU Problem.
  • can convert an input value(x) to the output value between ax and x. *Memos:
    • If x < 0, then ax while if 0 <= x, then x.
    • a is 0.01 by default basically.
  • 's formula is y = max(ax, x).
  • is also called LReLU.
  • is LeakyReLU() in PyTorch.
  • is used in:
    • GAN.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's non-differentiable at x = 0.
  • 's graph in Desmos:

Image description

(2) PReLU(Parametric Rectified Linear Unit):

  • is improved Leaky ReLU, having the 0 or more learnable parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
  • can convert an input value(x) to the output value between ax and x: *Memos:
    • If x < 0, then ax while if 0 <= x, then x.
    • a is 0.25 by default basically. *a is the initial value for 0 or more learnable parameters.
  • 's formula is y = max(ax, x).
  • is PReLU() in PyTorch.
  • is used in:
    • SRGAN(Super-Resolution Generative Adversarial Network). *SRGAN is a type of GAN(Generative Adversarial Network).
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's non-differentiable at x = 0. *The gradient for step function doesn't exist at x = 0 during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

(3) FReLU(Flexible Rectified Linear Unit):

  • is improved ReLU, having the 0 or more learnable bias parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
  • 's formula is y = ReLU(x) + b. *b is the initial value for 0 or more learnable bias parameters.
  • is also called Funnel Activation.
  • isn't in PyTorch so you can use frelu.pytorch or FunnelAct_Pytorch.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It avoid Dying ReLU Problem when b < 0 or 0 < b.
  • 's cons:
    • It causes Dying ReLU Problem when b = 0.
    • It causes Exploding Gradient Problem when b is greater and greater than 0.
    • It's non-differentiable at the angle. *The gradient for FReLU doesn't exist at the angle during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay