DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

Activation Functions: Why a 100-Layer Network Without Them Is Still One Line

A neuron computes w·x + b — a straight line. The little function after it, the activation, is what makes deep learning work. Day 2 of my DeepLearningFromZero series.

The problem: linear ∘ linear = linear

Stack two linear layers and the math collapses:

layer2(layer1(x)) = W₂(W₁x) = (W₂W₁)x   ← still one linear layer
Enter fullscreen mode Exit fullscreen mode

So a 100-layer network of pure linear neurons can only ever draw a straight boundary. Useless for images, language, curves.

The fix: a nonlinear bend

const a = relu(dot(w, x) + b);   // relu = the bend
Enter fullscreen mode Exit fullscreen mode

Now each layer warps space a little, and stacking them composes complex shapes. The activation is literally what lets neural nets approximate any function.

The functions

const relu    = z => Math.max(0, z);          // default for hidden layers
const sigmoid = z => 1 / (1 + Math.exp(-z));   // (0,1) — output probabilities
const tanh    = z => Math.tanh(z);             // (−1,1) — zero-centred
const leaky   = z => z > 0 ? z : 0.01 * z;     // no "dead" neurons
Enter fullscreen mode Exit fullscreen mode

ReLU is the modern default: cheap, and its gradient is 1 for positive inputs, so it doesn't saturate and kill learning the way sigmoid does in deep nets. That's why very deep networks became trainable.

Leaky ReLU fixes the "dying ReLU" problem — a neuron stuck at 0 has zero gradient and can never recover; a tiny negative slope keeps a trickle flowing.

How to choose

  • Hidden layers → ReLU (or Leaky ReLU)
  • Binary output → sigmoid
  • Multi-class output → softmax
  • Regression output → none (linear)

That covers 95% of networks you'll build.

📐 Drag the input along each curve: https://dev48v.infy.uk/dl/day2-activations.html

Day 2 of DeepLearningFromZero.

Top comments (0)