activation functions
a neural network without an activation function is just a giant linear regression model, no matter how many layers. an activation function is a non-linear transformation applied to input weights.
sigmoid (logistic function)
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
range is (0, 1). perfect for binary classification.
flaws:
- vanishing gradient.
- computationally expensive (exponential calculation).
when to use today:
- never in hidden layer.
- use in output layer for classification problems.
side note: dont surprise by the formulas representation or think it's AI generated. i have pretty good experience in latex/math pdf editing.
hyperbolic tangent (tanh)
def tanh(x):
return (math.exp(x)-math.exp(-x))/(math.exp(x)+math.exp(-x))
range is (-1, 1).
betterment: zero-centered, leading to faster convergence.
flaw:
- vanishing gradient with inputs of large magnitude.
when to use:
- sometimes in hidden layers of rnns/lstms.
rectified linear unit (relu)
def relu(x):
return np.maximum(0, x)
range is [0, ∞).
betterment:
- solved vanishing gradient problem: derivative is 1 for
x > 0, so gradient flows freely. - computation is cheap.
flaw:
- dying relu.
when to use: 90% used in hidden layers. if it works, don't touch it.
leaky relu
def leaky_relu(x, alpha=0.01):
return np.maximum(alpha * x, x)
betterment: provides a small, non-zero step for negative inputs, allowing neurons to recover.
when to use: if the "dying relu" problem occurs (check activation stats).
exponential linear unit (elu)
def elu(x, alpha=1.0):
return np.where(x >= 0, x, alpha * (np.exp(x) - 1))
gaussian error linear unit (gelu)
def gelu(x):
return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))
betterment: default sota for transformers.
side note: i read my second research paper, but it was the first one i read from a learning perspective, so i'm happy about it. this formula was also copied from a research paper.
swish (from google brain)
def swish(x, beta=1.0):
return x * sigmoid(beta * x)
beta is often 1.
when to use: good alternative for relu, used for cnn tasks.
here is the link of github md file which define these formulas.
Top comments (0)