DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

Activation functions in PyTorch (1)

Buy Me a Coffee

*Memos:

An activation function is the function or layer which enables neural network to learn complex(non-linear) relationships by transforming the output of the previous layer. *Without activation functions, neural network can only learn linear relationships.

(1) Step function:

  • can convert an input value(x) to 0 or 1. *If x < 0, then 0 while if x >= 0, then 1.
  • is also called Binary step function, Unit step function, Binary threshold function, Threshold function, Heaviside step function or Heaviside function.
  • is heaviside() in PyTorch.
  • 's pros:
    • It's simple, only expressing the two values 0 and 1.
    • It avoids Exploding Gradient Problem.
  • 's cons:
    • is rarely used in Deep Learning because the cons are more than other activation functions.
    • It can only express the two values 0 and 1 so the created model has bad accuracy, predicting inaccurately. *The activation functions which can express wider values can create the model of good accuracy, predicting accurately.
    • It causes Dying ReLU Problem.
    • It's non-differentiable at x = 0. *The gradient for step function doesn't exist at x = 0 during Backpropagation which does differential to calculate and get a gradient.
  • 's graph in Desmos:

Image description

(2) Identity:

  • can just return the same value as an input value(x) without any conversion.
  • 's formula is y = x.
  • is also called Linear function.
  • is Identity() in PyTorch.
  • 's pros:
    • It's simple, just returning the same value as an input value.
  • 's cons:
    • It's non-differentiable at x = 0.
  • 's graph in Desmos:

Image description

(3) ReLU(Rectified Linear Unit):

  • can convert an input value(x) to the output value between 0 and x. *If x < 0, then 0 while if 0 <= x, then x.
  • 's formula is y = max(0, x).
  • is ReLU() in PyTorch.
  • is used in:
    • Binary Classification Model.
    • Multi-Class Classification Model.
    • CNN(Convolutional Neural Network).
    • RNN(Recurrent Neural Network). *RNN in PyTorch.
    • Transformer. *Transformer() in PyTorch.
    • NLP(Natural Language Processing) based on RNN.
    • GAN(Generative Adversarial Network).
  • 's pros:
    • It mitigates Vanishing Gradient Problem.
  • 's cons:
    • It causes Dying ReLU Problem.
    • It's non-differentiable at x = 0.
  • 's graph in Desmos:

Image description

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay