I never used PyTorch or TensorFlow.
My ML background was NumPy and scikit-learn. I could train models, tune parameters, and get reasonable results. But when it came to explaining why things worked, my understanding was shaky.
Backpropagation especially felt like a black box.
I knew the steps at a high level.
I knew gradients were involved.
I knew the library handled it.
But I didn’t feel it.
So I stopped using ML libraries entirely and rebuilt the core of a neural network from scratch in Rust.
That’s when backprop finally made sense.
Removing the Magic
The problem wasn’t NumPy or scikit-learn. They do exactly what they promise. The problem was that they abstract away everything that actually matters for understanding.
So I removed the abstractions.
No autograd.
No tensor libraries.
No hidden memory layouts.
Just flat buffers, explicit indexing, and matrix operations written by hand.
data = [1, 2, 3, 4, 5, 6]
shape = (2 rows, 3 cols)
Logical view:
[ 1 2 3 ]
[ 4 5 6 ]
Memory view:
[1][2][3][4][5][6]
0 1 2 3 4 5
Rust forced me to be precise. You can’t “kind of” do a transpose in Rust. You have to explain exactly how indices move in memory. You can’t wave at gradients. You have to compute and store them explicitly.
index = row * cols + col
That constraint changed everything.
Where Backprop Clicked
Backprop stopped being mysterious when I had to implement it myself.
Not symbolically.
Not as equations on paper.
But as code that moves numbers through memory.
Once you build it manually, you see that backprop is not magic. It’s structured bookkeeping.
You’re doing three things over and over:
- applying the chain rule
- reusing intermediate values from the forward pass
- pushing gradients backward through matrix operations
Forward pass:
X → [ Linear ] → [ Activation ] → ŷ → Loss
Backward pass:
∂Loss → [ dActivation ] → [ dLinear ] → ∂W, ∂X
When you write this by hand, a few things become painfully clear:
- gradients don’t “flow” — they are accumulated
- shape alignment is the real constraint, not calculus
- most bugs come from incorrect assumptions about dimensions, not math
┌─── w1 ───┐
X ──► (+) (+) ──► Loss
└─── w2 ───┘
Backward:
∂Loss/∂X = ∂Loss/∂path1 + ∂Loss/∂path2
Backprop felt hard before because I never saw where the numbers actually lived.
Why Rust Helped
Rust isn’t important because it’s fast here. It’s important because it’s unforgiving.
It forces you to confront:
- how tensors are laid out in memory
- when data is copied vs reused
- which operations allocate new buffers
- which gradients depend on which forward values
I avoided third-party crates on purpose and used only the standard library. The goal wasn’t elegance or performance. It was transparency.
If something worked, I wanted to be able to explain why it worked at the level of indices and buffers.
What I Built
Step by step, I built:
- a tensor type backed by a flat buffer
- element-wise operations
- transpose, reduction, and matrix multiplication
- linear regression
- backpropagation and gradient updates
- a small neural network trained end-to-end
Nothing is optimized. Everything is explicit.
This is not a framework.
It’s not production-ready.
It’s a learning tool.
Who This Is For
It is especially suited for:
- Software developers who want to understand neural networks beyond high-level APIs
- Readers learning Rust who want a demanding, systems-oriented project
If backprop still feels like something you “accept” rather than understand, rebuilding it once is worth the time.
The Full Walkthrough
I documented the entire process as a chapter-style guide, starting from tensors in memory and ending with a working neural network.
You can read the full walkthrough here:
https://ai.palashkantikundu.in
Backprop didn’t become simpler.
It became visible.
And that made all the difference.
Top comments (0)