Introduction
When learning machine learning, it is easy to rely on powerful frameworks such as PyTorch and TensorFlow.
A few lines of code can create a neural network, train it on thousands of examples, and achieve impressive results.
For a long time, that was how I approached deep learning.
I understood the high-level concepts, but many of the mechanics remained hidden behind library abstractions.
What actually happens during backpropagation?
How are gradients computed?
Why do weights update the way they do?
How does a network gradually learn meaningful patterns from data?
To answer these questions, I decided to build a Multilayer Perceptron (MLP) completely from scratch using NumPy, without relying on any deep learning framework.
The project turned out to be one of the most educational experiences in my machine learning journey.
Why Build an MLP from Scratch?
At first glance, building a neural network manually may seem unnecessary.
Modern frameworks already provide:
Automatic differentiation
Optimized training loops
GPU acceleration
Efficient tensor operations
However, abstractions can sometimes hide understanding.
I wanted to move beyond:
model.fit(X, y)
and understand what happens underneath.
The goal was not performance.
The goal was understanding.
What Is a Multilayer Perceptron?
A Multilayer Perceptron is one of the simplest forms of a neural network.
It consists of:
Input Layer
Hidden Layers
Output Layer
Each layer contains neurons connected through learnable weights.
A simple architecture looks like:
Input Layer
│
▼
Hidden Layer 1
│
▼
Hidden Layer 2
│
▼
Output Layer
Although simple, MLPs contain the fundamental ideas that power modern deep learning systems.
Understanding the Building Blocks
Before writing any code, I broke the network into smaller components.
Weights
Weights determine how much influence one neuron has on another.
Initially, these values are random.
Training gradually adjusts them to improve predictions.
Biases
Biases allow neurons to shift decision boundaries.
Without biases, neural networks become significantly less flexible.
Activation Functions
Activation functions introduce non-linearity.
Without them, multiple layers collapse into a simple linear transformation.
Common choices include:
ReLU
Sigmoid
Tanh
For my implementation, I experimented primarily with ReLU and Sigmoid.
Implementing Forward Propagation
The first step was teaching the network how to make predictions.
Each layer performs:
z = Wx + b
followed by an activation function.
The output becomes the input for the next layer.
The complete forward pass looks like:
Input
↓
Linear Transformation
↓
Activation Function
↓
Hidden Representation
↓
Output Prediction
Writing this manually helped me understand that neural networks are ultimately sequences of matrix operations.
The Moment Everything Clicked: Backpropagation
Forward propagation was straightforward.
Backpropagation was where the real learning began.
Initially, backpropagation felt mysterious.
Most explanations describe it as:
"The network calculates gradients and updates weights."
But where do those gradients come from?
Implementing the algorithm forced me to understand the answer.
The key insight was that every parameter contributes to the final error.
Using the chain rule, we can compute how much each weight influenced the loss.
This influence becomes the gradient.
The network then updates weights in the direction that reduces error.
For the first time, backpropagation stopped feeling like magic and started feeling like mathematics.
Understanding Gradient Descent
Once gradients are available, learning becomes an optimization problem.
The update rule is:
w_{new}=w_{old}-\eta\frac{\partial L}{\partial w}
where:
(w) represents weights
(L) is the loss function
(\eta) is the learning rate
Watching loss decrease over training iterations was incredibly satisfying because every update was now something I had implemented myself.
Challenges I Faced
Building the MLP was not as straightforward as expected.
Several issues repeatedly appeared.
Shape Mismatches
Matrix multiplication requires precise dimensions.
A single incorrect shape can break the entire network.
I spent a surprising amount of time debugging tensor dimensions.
Numerical Stability
Large values sometimes caused exploding outputs.
This taught me why activation choices and initialization strategies matter.
Learning Rate Selection
If the learning rate was too large:
Loss explodes
If it was too small:
Training becomes extremely slow
Finding the right balance helped me understand optimization dynamics.
Gradient Debugging
Incorrect gradients often produced networks that appeared to train but never improved.
Verifying gradient calculations manually was one of the most valuable learning experiences.
What Building It Taught Me
The project fundamentally changed how I think about deep learning.
Neural Networks Are Not Magic
Before this project, neural networks felt complex and mysterious.
After implementing one from scratch, they became sequences of understandable mathematical operations.
Matrix Operations Matter
Deep learning relies heavily on linear algebra.
Understanding matrix multiplication, vectorization, and dimensions became significantly more important than memorizing model architectures.
Backpropagation Is Elegant
Initially, backpropagation seemed intimidating.
After implementing it manually, I realized it is simply an efficient application of calculus.
Frameworks Became Easier to Understand
After building the underlying components myself, PyTorch APIs suddenly made much more sense.
Functions such as:
loss.backward()
optimizer.step()
were no longer black boxes.
I understood exactly what they were doing internally.
Comparing My Implementation with PyTorch
Building from scratch also helped me appreciate modern frameworks.
PyTorch automatically handles:
Gradient computation
Efficient tensor operations
GPU acceleration
Computational graphs
My implementation was slower and less optimized.
However, it provided something more valuable:
Understanding.
Results
The final implementation successfully included:
Multiple hidden layers
Forward propagation
Backpropagation
Gradient descent optimization
Configurable activation functions
Loss calculation
Training and evaluation loops
Most importantly, the network was able to learn meaningful patterns from data entirely through code I had written myself.
Lessons for Anyone Learning Deep Learning
If I could give one recommendation to students learning neural networks, it would be:
Build at least one neural network completely from scratch.
You do not need to build a state-of-the-art model.
You do not need GPUs.
You do not need millions of parameters.
Even a simple MLP can teach:
Linear algebra
Optimization
Gradient computation
Neural network architecture
Training dynamics
better than dozens of tutorials.
Conclusion
Building a Multilayer Perceptron from scratch was one of the most valuable projects in my machine learning journey.
The goal was never to outperform PyTorch or TensorFlow.
The goal was to understand what happens beneath the abstractions.
By implementing forward propagation, backpropagation, gradient descent, and training loops manually, I gained a much deeper appreciation for both the mathematics and engineering behind modern deep learning systems.
Today, when I work with frameworks such as PyTorch, I no longer see a collection of APIs.
I see the mathematical machinery operating underneath them.
And that understanding has made me a better machine learning engineer.
Top comments (0)