Palash Kanti Kundu

Posted on Feb 2

Backprop Finally Made Sense When I Reimplemented It in Rust

#machinelearning #rust #learning #deeplearning

I never used PyTorch or TensorFlow.

My ML background was NumPy and scikit-learn. I could train models, tune parameters, and get reasonable results. But when it came to explaining why things worked, my understanding was shaky.

Backpropagation especially felt like a black box.

I knew the steps at a high level.
I knew gradients were involved.
I knew the library handled it.

But I didn’t feel it.

So I stopped using ML libraries entirely and rebuilt the core of a neural network from scratch in Rust.

That’s when backprop finally made sense.

Removing the Magic

The problem wasn’t NumPy or scikit-learn. They do exactly what they promise. The problem was that they abstract away everything that actually matters for understanding.

So I removed the abstractions.

No autograd.
No tensor libraries.
No hidden memory layouts.

Just flat buffers, explicit indexing, and matrix operations written by hand.

data = [1, 2, 3, 4, 5, 6]
shape = (2 rows, 3 cols)

Logical view:
[ 1  2  3 ]
[ 4  5  6 ]

Memory view:
[1][2][3][4][5][6]
 0  1  2  3  4  5

Rust forced me to be precise. You can’t “kind of” do a transpose in Rust. You have to explain exactly how indices move in memory. You can’t wave at gradients. You have to compute and store them explicitly.

index = row * cols + col

That constraint changed everything.

Where Backprop Clicked

Backprop stopped being mysterious when I had to implement it myself.

Not symbolically.
Not as equations on paper.
But as code that moves numbers through memory.

Once you build it manually, you see that backprop is not magic. It’s structured bookkeeping.

You’re doing three things over and over:

applying the chain rule
reusing intermediate values from the forward pass
pushing gradients backward through matrix operations

Forward pass:
X → [ Linear ] → [ Activation ] → ŷ → Loss

Backward pass:
∂Loss → [ dActivation ] → [ dLinear ] → ∂W, ∂X

When you write this by hand, a few things become painfully clear:

gradients don’t “flow” — they are accumulated
shape alignment is the real constraint, not calculus
most bugs come from incorrect assumptions about dimensions, not math

        ┌─── w1 ───┐
X ──► (+)         (+) ──► Loss
        └─── w2 ───┘

Backward:
∂Loss/∂X = ∂Loss/∂path1 + ∂Loss/∂path2

Backprop felt hard before because I never saw where the numbers actually lived.

Why Rust Helped

Rust isn’t important because it’s fast here. It’s important because it’s unforgiving.

It forces you to confront:

how tensors are laid out in memory
when data is copied vs reused
which operations allocate new buffers
which gradients depend on which forward values

I avoided third-party crates on purpose and used only the standard library. The goal wasn’t elegance or performance. It was transparency.

If something worked, I wanted to be able to explain why it worked at the level of indices and buffers.

What I Built

Step by step, I built:

a tensor type backed by a flat buffer
element-wise operations
transpose, reduction, and matrix multiplication
linear regression
backpropagation and gradient updates
a small neural network trained end-to-end

Nothing is optimized. Everything is explicit.

This is not a framework.
It’s not production-ready.
It’s a learning tool.

Who This Is For

It is especially suited for:

Software developers who want to understand neural networks beyond high-level APIs
Readers learning Rust who want a demanding, systems-oriented project

If backprop still feels like something you “accept” rather than understand, rebuilding it once is worth the time.

The Full Walkthrough

I documented the entire process as a chapter-style guide, starting from tensors in memory and ending with a working neural network.

You can read the full walkthrough here:
https://ai.palashkantikundu.in

Backprop didn’t become simpler.
It became visible.

And that made all the difference.

Top comments (1)

Rajdeep • Feb 2

Yeah, when i coded my first backward pass using raw math, the whole concept which was once a black box felt so stupidly simple XD

I still dont completely FEEL backpropagation but coding things from scratch helps a lot!