DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

1

Activation functions in PyTorch (2)

Buy Me a Coffee

*Memos:

(1) Leaky ReLU(Leaky Rectified Linear Unit):

  • is improved ReLU, being able to mitigate Dying ReLU Problem.
  • can convert an input value(x) to the output value between ax and x. *Memos:
    • If x < 0, then ax while if 0 <= x, then x.
    • a is 0.01 by default basically.
  • 's formula is y = max(ax, x).
  • is also called LReLU.
  • is LeakyReLU() in PyTorch.
  • is used in:
    • GAN.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's non-differentiable at x = 0.
  • 's graph in Desmos:

Image description

(2) PReLU(Parametric Rectified Linear Unit):

  • is improved Leaky ReLU, having the 0 or more learnable parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
  • can convert an input value(x) to the output value between ax and x: *Memos:
    • If x < 0, then ax while if 0 <= x, then x.
    • a is 0.25 by default basically. *a is the initial value for 0 or more learnable parameters.
  • 's formula is y = max(ax, x).
  • is PReLU() in PyTorch.
  • is used in:
    • SRGAN(Super-Resolution Generative Adversarial Network). *SRGAN is a type of GAN(Generative Adversarial Network).
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's non-differentiable at x = 0. *The gradient for step function doesn't exist at x = 0 during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

(3) FReLU(Flexible Rectified Linear Unit):

  • is improved ReLU, having the 0 or more learnable bias parameters which are changing(adjusting) during training to improve a model's accuracy and convergence.
  • 's formula is y = ReLU(x) + b. *b is the initial value for 0 or more learnable bias parameters.
  • is also called Funnel Activation.
  • isn't in PyTorch so you can use frelu.pytorch or FunnelAct_Pytorch.
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
    • It avoid Dying ReLU Problem when b < 0 or 0 < b.
  • 's cons:
    • It causes Dying ReLU Problem when b = 0.
    • It causes Exploding Gradient Problem when b is greater and greater than 0.
    • It's non-differentiable at the angle. *The gradient for FReLU doesn't exist at the angle during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

Billboard image

Use Playwright to test. Use Playwright to monitor.

Join Vercel, CrowdStrike, and thousands of other teams that run end-to-end monitors on Checkly's programmable monitoring platform.

Get started now!

👋 Kindness is contagious

Explore a sea of insights with this enlightening post, highly esteemed within the nurturing DEV Community. Coders of all stripes are invited to participate and contribute to our shared knowledge.

Expressing gratitude with a simple "thank you" can make a big impact. Leave your thanks in the comments!

On DEV, exchanging ideas smooths our way and strengthens our community bonds. Found this useful? A quick note of thanks to the author can mean a lot.

Okay