DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

Activation functions in PyTorch (3)

Buy Me a Coffee

*Memos:

  • My post explains PReLU() and ELU().
  • My post explains SELU() and CELU().
  • My post explains Step function, Identity and ReLU.
  • My post explains Leaky ReLU, PReLU and FReLU.
  • My post explains GELU, Mish, SiLU and Softplus.
  • My post explains Tanh, Softsign, Sigmoid and Softmax.
  • My post explains Vanishing Gradient Problem, Exploding Gradient Problem and Dying ReLU Problem.
  • My post explains layers in PyTorch.
  • My post explains loss functions in PyTorch.
  • My post explains optimizers in PyTorch.

(1) ELU(Exponential Linear Unit):

  • can convert an input value(x) to the output value between aex - a and x: *Memos:
    • If x < 0, then aex - a while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • is ELU() in PyTorch.
  • 's pros:
    • It normalizes negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of exponential operation.
    • It's non-differentiable at x = 0 if a is not 1.
  • 's graph in Desmos:

Image description

(2) SELU(Scaled Exponential Linear Unit):

  • can convert an input value(x) to the output value between λ(aex - a) and λx: *Memos:
    • If x < 0, then λ(aex - a) while if 0 <= x, then λx.
    • λ=1.0507009873554804934193349852946
    • α=1.6732632423543772848170429916717
  • is SELU() in PyTorch.
  • 's pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It may cause Exploding Gradient Problem because a positive input value is increased by the multiplication with λ.
    • It's computationally expensive because of exponential operation.
    • It's non-differentiable at x = 0 if a is not 1.
  • 's graph in Desmos:

Image description

(3) CELU(Continuously Differentiable Exponential Linear Unit):

  • is improved ELU, being able to differentiate at x = 0 even if a is not 1.
  • can convert an input value(x) to the output value between aex/a - a and x: *Memos:
    • If x < 0, then aex/a - a while if 0 <= x, then x.
    • a is 1.0 by default basically.
  • 's formula is: Image description
  • is CELU() in PyTorch.
  • 's pros:
    • It normalises negative input values.
    • The convergence with negative input values is stable.
    • It mitigates Vanishing Gradient Problem.
    • It mitigates Dying ReLU Problem. *0 is still produced for the input value 0 so Dying ReLU Problem is not completely avoided.
  • 's cons:
    • It's computationally expensive because of exponential operation.
  • 's graph in Desmos:

Image description

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay