🤯 Why You Should Care
Neural networks (NNs) power everything from ChatGPT to self-driving cars. But let’s be honest: Using TensorFlow/PyTorch feels like magic—until you realize you don’t know how the wand works.
This post is for you if:
🧠 You want to demystify neural networks (no more black boxes!).
💻 You love coding fundamentals (goodbye model.fit()
, hello raw matrices!).
⚡ You crave the satisfaction of "I built this myself!"
Spoiler: By the end, you’ll code a NN that classifies handwritten digits (MNIST) with 90%+ accuracy—using only numpy. Let’s go!
🔥 The Blueprint: How Neural Nets Actually Work
Here’s what we’ll implement:
- Layers: Input → Hidden → Output (with weights and biases).
- Activation Function: ReLU (hidden layer) and Softmax (output).
- Loss: Cross-entropy (because we’re classifying digits).
- Backpropagation: Calculus + chain rule (don’t panic—numpy does the heavy lifting).
💻 Step 1: Coding the Neural Network
1. Initialize Parameters
import numpy as np
def initialize_parameters(input_size, hidden_size, output_size):
W1 = np.random.randn(hidden_size, input_size) * 0.01
b1 = np.zeros((hidden_size, 1))
W2 = np.random.randn(output_size, hidden_size) * 0.01
b2 = np.zeros((output_size, 1))
return {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
Why? Tiny random weights prevent symmetry issues. Biases start at zero.
2. Forward Propagation
def relu(Z):
return np.maximum(0, Z)
def softmax(Z):
exp = np.exp(Z - np.max(Z)) # Stability hack
return exp / exp.sum(axis=0)
def forward(X, params):
Z1 = params["W1"] @ X + params["b1"]
A1 = relu(Z1)
Z2 = params["W2"] @ A1 + params["b2"]
A2 = softmax(Z2)
return A2, (Z1, A1, Z2)
3. Compute Loss
def cross_entropy_loss(A2, Y):
m = Y.shape[1]
log_probs = np.log(A2) * Y
return -np.sum(log_probs) / m
4. Backpropagation (The “Aha!” Moment)
def backward(X, Y, params, cache):
m = Y.shape[1]
Z1, A1, Z2 = cache
A2, _ = forward(X, params)
# Output layer gradient
dZ2 = A2 - Y
dW2 = (dZ2 @ A1.T) / m
db2 = np.sum(dZ2, axis=1, keepdims=True) / m
# Hidden layer gradient
dZ1 = (params["W2"].T @ dZ2) * (Z1 > 0) # ReLU derivative
dW1 = (dZ1 @ X.T) / m
db1 = np.sum(dZ1, axis=1, keepdims=True) / m
return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}
5. Update Parameters (Gradient Descent)
def update_params(params, grads, learning_rate=0.1):
params["W1"] -= learning_rate * grads["dW1"]
params["b1"] -= learning_rate * grads["db1"]
params["W2"] -= learning_rate * grads["dW2"]
params["b2"] -= learning_rate * grads["db2"]
return params
🚂 Training Loop (The Grind)
def train(X, Y, epochs=1000):
params = initialize_parameters(784, 128, 10) # MNIST: 28x28=784 pixels
for i in range(epochs):
A2, cache = forward(X, params)
loss = cross_entropy_loss(A2, Y)
grads = backward(X, Y, params, cache)
params = update_params(params, grads)
if i % 100 == 0:
print(f"Epoch {i}: Loss = {loss:.4f}")
return params
🎯 Results: 92% Accuracy on MNIST!
After training on 60k MNIST images (and tuning hyperparameters):
Epoch 0: Loss = 2.3026
Epoch 100: Loss = 0.3541
Epoch 200: Loss = 0.2011
...
Final Test Accuracy: 92.3%
Not bad for 150 lines of numpy!
💡 Key Takeaways
- NNs are just math: Matrix multiplications, derivatives, and chain rules.
- Backpropagation = Loss gradients flowing backward (no magic!).
- You don’t need frameworks to understand the core (but use them for real projects 😉).
👨💻 Follow on GitHub
https://github.com/dassomnath99
📣 Share This Post
If you geeked out reading this, share it with a friend and tag #NumpyNN!
💬 Comments
“Wait, backprop is just the chain rule?!” → Drop your reactions below!
Top comments (0)