mahraib

Posted on Jan 15

activation functions - day 03 of dl

#python #deeplearning #machinelearning #tutorial

activation functions

a neural network without an activation function is just a giant linear regression model, no matter how many layers. an activation function is a non-linear transformation applied to input weights.

sigmoid (logistic function)

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

range is (0, 1). perfect for binary classification.

flaws:

vanishing gradient.
computationally expensive (exponential calculation).

when to use today:

never in hidden layer.
use in output layer for classification problems.

side note: dont surprise by the formulas representation or think it's AI generated. i have pretty good experience in latex/math pdf editing.

hyperbolic tangent (tanh)

def tanh(x):
    return (math.exp(x)-math.exp(-x))/(math.exp(x)+math.exp(-x))

range is (-1, 1).

betterment: zero-centered, leading to faster convergence.

flaw:

vanishing gradient with inputs of large magnitude.

when to use:

sometimes in hidden layers of rnns/lstms.

rectified linear unit (relu)

def relu(x):
    return np.maximum(0, x)

range is [0, ∞).

betterment:

solved vanishing gradient problem: derivative is 1 for x > 0, so gradient flows freely.
computation is cheap.

flaw:

dying relu.

when to use: 90% used in hidden layers. if it works, don't touch it.

leaky relu

def leaky_relu(x, alpha=0.01):
    return np.maximum(alpha * x, x)

betterment: provides a small, non-zero step for negative inputs, allowing neurons to recover.

when to use: if the "dying relu" problem occurs (check activation stats).

exponential linear unit (elu)

def elu(x, alpha=1.0):
    return np.where(x >= 0, x, alpha * (np.exp(x) - 1))

gaussian error linear unit (gelu)

def gelu(x):
    return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))

betterment: default sota for transformers.

side note: i read my second research paper, but it was the first one i read from a learning perspective, so i'm happy about it. this formula was also copied from a research paper.

swish (from google brain)

def swish(x, beta=1.0):
    return x * sigmoid(beta * x)

beta is often 1.

when to use: good alternative for relu, used for cnn tasks.

here is the link of github md file which define these formulas.

DEV Community