Rectified Linear Unit

ReLU is the most popular activation function in deep learning because it’s super simple and makes AI learn much faster.

What it does (main point):

Positive input? → Passes it exactly as is to the next layer
Negative or zero input? → Outputs 0 (blocks it, nothing passes)

ReLU(x) = max(0, x)

Why this makes AI learn fast:

No "squashing" like old functions (Sigmoid/Tanh) → gradients don’t vanish
Very fast to compute (just check if > 0)
Many neurons turn off (output 0) → less work, faster training

Quick comparison:

Sigmoid → slow, vanishing gradient
Tanh → better but still slow
ReLU → fast, no vanishing gradient (for positive values)

That's why almost every modern neural network (CNNs, Transformers, etc.) uses ReLU by default.

One small issue: Sometimes neurons "die" (always output 0 and stop learning).

Solution: Use Leaky ReLU or similar if needed.

Main thing in one line:

ReLU lets only positive signals pass through fully and blocks negative ones → this simple rule makes deep learning train fast and powerful.

DEV Community

Rectified Linear Unit

Top comments (0)