DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

Activation functions in PyTorch (3)

Buy Me a Coffee

*Memos:

  • My post explains PReLU() and ELU().
  • My post explains SELU() and CELU().
  • My post explains Step function, Identity and ReLU.
  • My post explains Leaky ReLU, PReLU and FReLU.
  • My post explains GELU, Mish, SiLU and Softplus.
  • My post explains Tanh, Softsign, Sigmoid and Softmax.
  • My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
  • My post explains layers in PyTorch.
  • My post explains loss functions in PyTorch.
  • My post explains optimizers in PyTorch.

(1) ELU(Exponential Linear Unit):

  • can convert an input value(x) to the output value between aex - a and x: *Memos:
    • If x < 0, then aex - a while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • is ELU() in PyTorch.
  • 's pros:
    • It normalizes negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of exponential operation.
    • It's non-differentiable at x = 0 if a is not 1.
  • 's graph in Desmos:

Image description

(2) SELU(Scaled Exponential Linear Unit):

  • can convert an input value(x) to the output value between λ(aex - a) and λx: *Memos:
    • If x < 0, then λ(aex - a) while if 0 <= x, then λx.
    • λ=1.0507009873554804934193349852946
    • α=1.6732632423543772848170429916717
  • is SELU() in PyTorch.
  • 's pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It may cause Exploding Gradient Problem because a positive input value is increased by the multiplication with λ.
    • It's computationally expensive because of exponential operation.
    • It's non-differentiable at x = 0 if a is not 1.
  • 's graph in Desmos:

Image description

(3) CELU(Continuously Differentiable Exponential Linear Unit):

  • is improved ELU, being able to differentiate at x = 0 even if a is not 1.
  • can convert an input value(x) to the output value between aex/a - a and x: *Memos:
    • If x < 0, then aex/a - a while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • 's formula is: Image description
  • is CELU() in PyTorch.
  • 's pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of exponential operation.
  • 's graph in Desmos:

Image description

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay