DEV Community

Cover image for The Sigmoid Function: The Story of the World's Most Diplomatic Mathematician
Sachin Kr. Rajput
Sachin Kr. Rajput

Posted on

The Sigmoid Function: The Story of the World's Most Diplomatic Mathematician

The One-Line Summary: The sigmoid function transforms any number from negative infinity to positive infinity into a probability between 0 and 1, doing so smoothly, symmetrically, and with a mathematically convenient derivative — making it perfect for converting linear predictions into probabilities.


Act I: The Kingdom of Infinite Predictions

Once upon a time, in the Kingdom of Predictionia, there lived a Royal Oracle named Linear.

Oracle Linear was brilliant at seeing patterns. Give her data about a person — their age, income, behavior — and she would proclaim a number representing how likely they were to buy the King's magical potions.

THE ORACLE'S PROCLAMATIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Citizen Alice: "Her buying score is +2.3"
Citizen Bob:   "His buying score is -1.7"
Citizen Carol: "Her buying score is +15.8"
Citizen Dave:  "His buying score is -847.2"
Enter fullscreen mode Exit fullscreen mode

The King was confused.

"Oracle Linear," he said, "what does +15.8 mean? Is Carol 15.8% likely to buy? Or 158% likely? And Dave... is he NEGATIVE likely to buy? What does that even mean?!"

Oracle Linear shrugged. "I just find patterns, Your Majesty. I never promised my numbers would make sense as probabilities."

The Kingdom had a problem.


Act II: The Failed Solutions

The King summoned his advisors to solve the probability problem.


Advisor #1: Sir Clip-a-Lot

SIR CLIP-A-LOT'S SOLUTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"Simple! If the number is below 0, call it 0.
 If it's above 1, call it 1.
 Clip the extremes!"

Score: -847.2 → Probability: 0.0
Score: -1.7   → Probability: 0.0
Score: +0.3   → Probability: 0.3
Score: +2.3   → Probability: 1.0
Score: +15.8  → Probability: 1.0
Enter fullscreen mode Exit fullscreen mode

The King frowned. "But this means Dave with -1.7 and someone with -847.2 both have 0% probability? Surely Dave is MORE likely than -847 Dave!"

Sir Clip-a-Lot's solution lost information at the extremes.


Advisor #2: Lady Linear-Scale

LADY LINEAR-SCALE'S SOLUTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"Let's linearly scale everything between the 
 minimum and maximum we've seen!"

Scores: -847.2, -1.7, +0.3, +2.3, +15.8
Min: -847.2, Max: +15.8
Range: 863

Scaled:
  -847.2 → 0.00
  -1.7   → 0.98  (because it's close to 0!)
  +0.3   → 0.98
  +2.3   → 0.98
  +15.8  → 1.00
Enter fullscreen mode Exit fullscreen mode

The King was furious. "Now everyone except Dave looks identical! One extreme outlier ruined everything!"

Lady Linear-Scale's solution was too sensitive to outliers.


Advisor #3: Duke Threshold

DUKE THRESHOLD'S SOLUTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"Forget probabilities. Just say YES or NO.
 Above 0? YES. Below 0? NO."

Score: -847.2 → NO  (0)
Score: -1.7   → NO  (0)
Score: +0.3   → YES (1)
Score: +2.3   → YES (1)
Score: +15.8  → YES (1)
Enter fullscreen mode Exit fullscreen mode

The King sighed. "But I don't want just YES or NO. I want to KNOW how confident we are! Is +0.3 the same as +15.8? Clearly not!"

Duke Threshold's solution destroyed all nuance.


Act III: The Mysterious Mathematician

One day, a mysterious mathematician arrived at the castle. She introduced herself only as σ (Sigma).

"I hear you need to convert any number into a probability," she said softly. "I can help. But I must warn you — I never say 'absolutely certain' or 'absolutely impossible.' I deal only in shades of likelihood."

The King was intrigued. "Show me."

σ smiled and drew a beautiful S-curve:

THE SIGMOID FUNCTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

                σ(z) = 1 / (1 + e^(-z))

  1.0 │                         ●●●●●●●●●●●●●●●●
      │                      ●●●
      │                    ●●
  0.8 │                  ●●
      │                 ●
      │                ●
  0.6 │               ●
      │              ●
  0.5 │─────────────●─────────────────────────────
      │            ●
  0.4 │           ●
      │          ●
  0.2 │        ●●
      │      ●●
      │   ●●●
  0.0 │●●●
      └───────────────────────────────────────────
       -6  -4  -2   0   2   4   6   8  10
                        z

"No matter what number you give me," said σ,
"I will return a probability between 0 and 1.
 Always. Without exception. Forever."
Enter fullscreen mode Exit fullscreen mode

Act IV: The Five Promises of Sigma

σ made five promises to the King:


Promise #1: "I Will Always Give Valid Probabilities"

σ'S FIRST PROMISE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"Give me ANY number — positive, negative, huge, tiny.
 I will ALWAYS return something between 0 and 1."

Input               Output
─────────────────────────────
-1,000,000    →     0.0000...  (very close to 0)
-10           →     0.0000454
-2            →     0.119
0             →     0.500
+2            →     0.881
+10           →     0.9999546
+1,000,000    →     0.9999...  (very close to 1)

"But notice — I never actually SAY 0 or 1.
 I approach them infinitely, but never touch.
 There is always a sliver of doubt."
Enter fullscreen mode Exit fullscreen mode
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Test with extreme values
test_values = [-1000000, -10, -2, 0, 2, 10, 1000000]

print("PROMISE #1: Always between 0 and 1")
print("="*50)
for z in test_values:
    p = sigmoid(z)
    print(f"σ({z:>10}) = {p:.10f}")
Enter fullscreen mode Exit fullscreen mode

Output:

PROMISE #1: Always between 0 and 1
==================================================
σ( -1000000) = 0.0000000000
σ(       -10) = 0.0000453979
σ(        -2) = 0.1192029220
σ(         0) = 0.5000000000
σ(         2) = 0.8807970780
σ(        10) = 0.9999546021
σ(  1000000) = 1.0000000000
Enter fullscreen mode Exit fullscreen mode

Promise #2: "I Am Perfectly Balanced"

σ'S SECOND PROMISE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"I am symmetric around the center.
 Whatever I do to positive numbers,
 I do the mirror opposite to negative numbers."

σ(0) = 0.5     (exactly in the middle!)

σ(-2) = 0.119  
σ(+2) = 0.881  → These sum to 1.0!

σ(-5) = 0.0067
σ(+5) = 0.9933 → These sum to 1.0!

The mathematical beauty:
σ(-z) = 1 - σ(z)
Enter fullscreen mode Exit fullscreen mode
print("PROMISE #2: Perfect symmetry")
print("="*50)
for z in [1, 2, 3, 5, 10]:
    pos = sigmoid(z)
    neg = sigmoid(-z)
    print(f"σ({z}) = {pos:.6f}, σ({-z}) = {neg:.6f}, Sum = {pos + neg:.6f}")
Enter fullscreen mode Exit fullscreen mode

Output:

PROMISE #2: Perfect symmetry
==================================================
σ(1) = 0.731059, σ(-1) = 0.268941, Sum = 1.000000
σ(2) = 0.880797, σ(-2) = 0.119203, Sum = 1.000000
σ(3) = 0.952574, σ(-3) = 0.047426, Sum = 1.000000
σ(5) = 0.993307, σ(-5) = 0.006693, Sum = 1.000000
σ(10) = 0.999955, σ(-10) = 0.000045, Sum = 1.000000
Enter fullscreen mode Exit fullscreen mode

Promise #3: "I Transition Smoothly"

σ'S THIRD PROMISE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"Unlike Duke Threshold who jumps abruptly from 0 to 1,
 I transition gently. Small changes in input cause
 small changes in output. No surprises."


DUKE THRESHOLD (step function):

  1 │          ┌────────────
    │          │
  0 │──────────┘
    └─────────────────────────
              0

σ (sigmoid function):

  1 │              ●●●●●●●●●
    │           ●●●
    │         ●●
    │        ●
    │       ●
  0 │●●●●●●●
    └─────────────────────────
              0

"I am differentiable everywhere — 
 which means I play nicely with calculus,
 which means I can be optimized with gradient descent!"
Enter fullscreen mode Exit fullscreen mode

Promise #4: "My Derivative Is Beautiful"

σ'S FOURTH PROMISE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"If you ever need to know how fast I'm changing
 (my derivative), it's elegantly simple:

 σ'(z) = σ(z) × (1 - σ(z))

 I can compute my own derivative using just my output!
 No complicated math needed."

z       σ(z)      σ'(z) = σ(z)×(1-σ(z))
──────────────────────────────────────────
-3      0.047     0.045   (slow change)
-1      0.269     0.197   (medium change)
 0      0.500     0.250   (fastest change!)
 1      0.731     0.197   (medium change)
 3      0.953     0.045   (slow change)

"I change fastest at z=0 (where uncertainty is highest)
 and slowest at the extremes (where I'm already confident)."
Enter fullscreen mode Exit fullscreen mode
def sigmoid_derivative(z):
    s = sigmoid(z)
    return s * (1 - s)

print("PROMISE #4: Beautiful derivative")
print("="*50)
print(f"{'z':<8} {'σ(z)':<12} {'σ´(z)':<12}")
print("-"*32)
for z in [-3, -2, -1, 0, 1, 2, 3]:
    s = sigmoid(z)
    ds = sigmoid_derivative(z)
    print(f"{z:<8} {s:<12.6f} {ds:<12.6f}")
Enter fullscreen mode Exit fullscreen mode

Output:

PROMISE #4: Beautiful derivative
==================================================
z        σ(z)         σ´(z)       
--------------------------------
-3       0.047426     0.045177    
-2       0.119203     0.104994    
-1       0.268941     0.196612    
0        0.500000     0.250000    
1        0.731059     0.196612    
2        0.880797     0.104994    
3        0.952574     0.045177    
Enter fullscreen mode Exit fullscreen mode

Promise #5: "I Represent Log-Odds Linearly"

σ'S FIFTH PROMISE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

"Here's my deepest secret. If p = σ(z), then:

 z = ln(p / (1-p))

 This means z is the LOG-ODDS!

 And the log-odds is a LINEAR function of features.
 So underneath my curved exterior, I'm working with
 good old linear regression — just on a different scale."


If σ(z) = 0.9, what is z?
  z = ln(0.9 / 0.1) = ln(9) = 2.197

Check: σ(2.197) = 0.9 ✓


This is why logistic regression is called "regression"!
The log-odds (z) is being regressed linearly.
Enter fullscreen mode Exit fullscreen mode

Act V: Why the Kingdom Chose Sigma

The King was convinced. Here's why σ was perfect:

THE KING'S SUMMARY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Problem                          σ's Solution
─────────────────────────────────────────────────────
Linear outputs go -∞ to +∞      → Squished to (0,1)
Need valid probabilities        → Always 0 < p < 1
Need smooth transitions         → Infinitely differentiable
Need to optimize with calculus  → Simple derivative: σ(1-σ)
Need symmetric behavior         → σ(-z) = 1 - σ(z)
Need interpretable model        → Log-odds is linear
Need efficient computation      → Just exp() and division
Enter fullscreen mode Exit fullscreen mode

And so, σ the Sigmoid became the Royal Probability Converter, and the Kingdom of Predictionia prospered with sensible predictions forevermore.


The Mathematical Definition

THE SIGMOID FUNCTION:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

             1
σ(z) = ─────────────
        1 + e^(-z)


WHERE:
• z is any real number (the input)
• e is Euler's number (≈ 2.71828)
• σ(z) is always between 0 and 1 (the output)


ALTERNATIVE FORMS:

         e^z
σ(z) = ───────     (multiply top and bottom by e^z)
       1 + e^z


        1
σ(z) = ─ (1 + tanh(z/2))    (relationship to tanh)
        2
Enter fullscreen mode Exit fullscreen mode

Code: The Complete Sigmoid

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
    """The sigmoid function."""
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):
    """Derivative of sigmoid."""
    s = sigmoid(z)
    return s * (1 - s)

def inverse_sigmoid(p):
    """Inverse sigmoid (logit function)."""
    return np.log(p / (1 - p))

# Demonstrate all properties
print("THE SIGMOID FUNCTION: COMPLETE DEMONSTRATION")
print("="*60)

# Property 1: Always between 0 and 1
print("\n1. BOUNDED OUTPUT (always between 0 and 1):")
extreme_inputs = [-100, -10, -1, 0, 1, 10, 100]
for z in extreme_inputs:
    print(f"   σ({z:>4}) = {sigmoid(z):.10f}")

# Property 2: Symmetry
print("\n2. SYMMETRY (σ(-z) = 1 - σ(z)):")
for z in [1, 2, 5]:
    print(f"   σ({z}) + σ({-z}) = {sigmoid(z):.6f} + {sigmoid(-z):.6f} = {sigmoid(z) + sigmoid(-z):.6f}")

# Property 3: Center point
print("\n3. CENTER POINT:")
print(f"   σ(0) = {sigmoid(0)} (exactly 0.5)")

# Property 4: Derivative
print("\n4. DERIVATIVE (σ'(z) = σ(z) × (1-σ(z))):")
print(f"   Maximum derivative at z=0: σ'(0) = {sigmoid_derivative(0)}")

# Property 5: Inverse
print("\n5. INVERSE (logit function):")
for p in [0.1, 0.5, 0.9]:
    z = inverse_sigmoid(p)
    print(f"   If σ(z) = {p}, then z = {z:.4f}")
Enter fullscreen mode Exit fullscreen mode

Why Sigmoid Over Other Options?

WHY NOT OTHER "SQUISHING" FUNCTIONS?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

OPTION 1: Step Function
         ┌── 1 if z ≥ 0
f(z) = ──┤
         └── 0 if z < 0

❌ Not differentiable (can't use gradient descent)
❌ No nuance (just 0 or 1)


OPTION 2: Linear Clipping
         ┌── 0   if z < 0
f(z) = ──┼── z   if 0 ≤ z ≤ 1  
         └── 1   if z > 1

❌ Not smooth (kinks at 0 and 1)
❌ Derivative is 0 outside [0,1] (vanishing gradient)


OPTION 3: Tanh (Hyperbolic Tangent)
f(z) = (e^z - e^(-z)) / (e^z + e^(-z))

Range: -1 to +1 (not 0 to 1!)
✓ Smooth and differentiable
⚠️ Needs rescaling for probabilities


OPTION 4: Sigmoid ✓
f(z) = 1 / (1 + e^(-z))

✓ Range exactly 0 to 1 (perfect for probabilities)
✓ Smooth and differentiable everywhere
✓ Simple, elegant derivative
✓ Natural probabilistic interpretation (log-odds)
✓ Computationally efficient
Enter fullscreen mode Exit fullscreen mode

The Sigmoid Family Portrait

THE SIGMOID AND ITS RELATIVES:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

SIGMOID (Logistic):    σ(z) = 1/(1+e^(-z))
Range: (0, 1)
Use: Binary classification, output layer

TANH:                  tanh(z) = (e^z - e^(-z))/(e^z + e^(-z))
Range: (-1, 1)
Use: Hidden layers (zero-centered)
Relationship: tanh(z) = 2σ(2z) - 1

SOFTMAX:               softmax(zᵢ) = e^(zᵢ) / Σe^(zⱼ)
Range: (0, 1) for each, sum to 1
Use: Multi-class classification
Relationship: Sigmoid is softmax for 2 classes!


THEY'RE ALL RELATED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

sigmoid(z) = (1 + tanh(z/2)) / 2

softmax([z, 0]) = [sigmoid(z), sigmoid(-z)]
Enter fullscreen mode Exit fullscreen mode

When Sigmoid Struggles

Even our hero σ has weaknesses:

THE VANISHING GRADIENT PROBLEM:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

When z is very large or very small:
σ'(z) ≈ 0

z = -10:  σ(-10) = 0.0000454, σ'(-10) = 0.0000454
z = +10:  σ(+10) = 0.9999546, σ'(+10) = 0.0000454

The gradient is essentially ZERO!

In deep neural networks, this means:
• Gradients shrink exponentially through layers
• Weights stop updating
• Learning grinds to a halt

THIS IS WHY RELU REPLACED SIGMOID IN HIDDEN LAYERS:
ReLU(z) = max(0, z)
• Gradient is 1 for positive inputs
• No vanishing gradient problem


BUT SIGMOID IS STILL PERFECT FOR:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ Output layer for binary classification
✓ Gates in LSTM/GRU (need 0-1 range)
✓ Logistic regression
✓ Any time you need a probability output
Enter fullscreen mode Exit fullscreen mode

Quick Reference Card

THE SIGMOID FUNCTION: QUICK REFERENCE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

FORMULA:        σ(z) = 1 / (1 + e^(-z))

DOMAIN:         All real numbers (-∞, +∞)

RANGE:          (0, 1) — perfect for probabilities!

CENTER:         σ(0) = 0.5

SYMMETRY:       σ(-z) = 1 - σ(z)

DERIVATIVE:     σ'(z) = σ(z) × (1 - σ(z))
                Maximum at z=0, where σ'(0) = 0.25

INVERSE:        z = ln(p / (1-p))    [logit function]

LIMITS:         lim(z→-∞) σ(z) = 0
                lim(z→+∞) σ(z) = 1

SHAPE:          S-curve (hence "sigmoid" = S-shaped)

USE CASES:      • Logistic regression output
                • Neural network output for binary classification
                • LSTM/GRU gates
                • Any probability conversion

WEAKNESS:       Vanishing gradient for extreme inputs
                (don't use in hidden layers of deep networks)
Enter fullscreen mode Exit fullscreen mode

The Story's Moral

THE MORAL OF THE STORY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The world is full of unbounded quantities:
• Scores that go from -∞ to +∞
• Sums that can be any real number
• Linear combinations without limits

But probabilities must live in [0, 1].

The sigmoid function is the PERFECT TRANSLATOR:
• Takes any real number
• Returns a valid probability
• Does so smoothly and elegantly
• Has beautiful mathematical properties

She never says "impossible" (0) or "certain" (1).
She always leaves room for doubt.
And that humility is what makes her perfect.


In the words of σ herself:
"I transform infinity into certainty,
 yet I never claim to be certain myself."
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. Sigmoid squishes (-∞, +∞) to (0, 1) — Any input becomes a valid probability

  2. σ(z) = 1/(1+e⁻ᶻ) — Simple formula, profound implications

  3. Symmetric around 0.5 — σ(-z) = 1 - σ(z)

  4. Beautiful derivative — σ'(z) = σ(z)(1-σ(z)), computed from output alone

  5. Represents log-odds linearly — Why logistic regression works

  6. Perfect for output layers — When you need probability output

  7. Avoid in hidden layers — Vanishing gradient problem; use ReLU instead

  8. Never touches 0 or 1 — Always maintains a sliver of uncertainty


The One-Sentence Summary

The sigmoid function is the diplomatic mathematician who takes any number from negative infinity to positive infinity and transforms it into a probability between 0 and 1, doing so smoothly, symmetrically, and with a derivative so elegant (σ times 1-σ) that it makes calculus weep with joy — which is why it's the perfect function for converting linear predictions into the probabilities we need for classification.


What's Next?

Now that you understand the sigmoid, explore:

  • Softmax Function — Sigmoid's multiclass cousin
  • Activation Functions — ReLU, Tanh, and beyond
  • Vanishing Gradients — Why deep networks struggled
  • Cross-Entropy Loss — The perfect partner for sigmoid

Follow me for the next article in this series!


Let's Connect!

If the story of σ made the sigmoid click, drop a heart!

Questions? Ask in the comments — I read and respond to every one.

What's your favorite mathematical function? Mine is now sigmoid — she's humble, elegant, and transforms chaos into probability! 🎭


Once upon a time, Oracle Linear gave predictions of +847 and -352, and the King didn't know what to do. Then σ arrived and said, "Let me translate those into 99.97% and 0.0000000001%." And the Kingdom finally had probabilities that made sense.


Share this with someone who finds the sigmoid mysterious. After meeting σ, they'll never forget her.

Happy probability converting! 📊

Top comments (0)