INTRODUCTION
My background? I've spent the past 5 years doing game dev in C#. Last year I tried peeking under the Game Engines and that was when my obsession with low level systems began.
I was craving for a challenge, a heavy learning experience, so I decided to make a minimal ML runtime in C++.
I knew it was going to be hard, but I did not expect it to be this hard...
Today was the first day of this project, I tried implementing Tensors and the two most basic tensor operations, add & matmul.
INITIAL STRUGGLES
What confused me the most was how the interpretation of a tensor in ML runtimes differs from how someone who knows the mathematical definition would think.
Mathematically, tensors are just generalized higher dimensional matrices, so it was natural for me to think I needed to somehow initialize a multidimensional nested array of floats and that was when it hit me, how on Earth am I going to do that...?
After a bit of digging into PyTorch's monstrous source code, guess what... I understood nothing. Then I found this goldmine https://github.com/Adam-Mazur/TinyTensor/blob/main/include/tensor.h the TinyTensor's Tensor class was not... tiny at all :
I'm not sure if massive classes like these are good from a design perspective but anyway the key to my confusion was right at the top :
Took me a solid hour to understand how this works but the moment it clicked, everything made sense...
From here I will try my best to explain how multidimensional arrays are actually stored in memory and also talk a bit about matmul.
NESTED ARRAY REPRESENTATION
Computers don’t know what a 4D tensor is. They only know how to store things in a straight line. In my C# days, I might have used a nested array like float[][][]. But in high-performance C++, that’s a nightmare. It scatters data all over the heap. Instead, every ML library uses a flat 1D array which ensures contiguous memory allocation, significantly improving cache hit rates, and a Shape vector to describe the dimensions and the "length" of the tensor in each dimension.
If I have a Tensor AR,C,D where R(ow), C(olumn), D(epth) equal 2, 3, 2 respectively, it’s just 12 floats in a row. To find the element at [i, j, k], we don't "index" into a nested structure, we use a simple offset formula:
This is called "Row-Major" Ordering, you can also do "Column-Major" Ordering it's all up to you and this offset is called "stride" I'm computing it manually for now but runtimes usually store these strides for each dimension in a vector. Here is a small MS Paint illustration of this, hopefully my toddler level drawing helps you visualize this formula:
The first term (i * C * D) jumps across "depth units", the second term (j ∗ D) jumps across "column units" and the third term 'k' specifies the row. I took a 3D example so that you can visualize higher dimensions more easily.
MATMUL AND TENSOR PRODUCT
So now that you understand how data is stored, I'll try to explain how Matmul works, and please don't confuse it with the Tensor Product denoted by A ⊗ B...
If you search for "Tensor Product", also called "Outer Product" for some reason, you get this definition:
This is carried out by multiplying every single element of A by every single element of B... And that is NOT what we want for ML. In ML runtimes, matmul is basically the classic 2D Matrix Product on steroids.
Mathematically, a Tensor Product ⊗ is a dimension-exploding machine. If you multiply a 2D matrix by another 2D matrix using a tensor product, you don’t get a matrix back, you get a 4D tensor since it keeps every single combination of products.
But in ML, we actually want the opposite, a "Tensor Contraction" specifically.
Thus, matmul "contracts" a shared dimension by summing up the values after every row-column elementwise product. This keeps the data manageable and retains most of the information. Hence, matmul is just a contraction along ONE of the dimensions.
I know it's hard to visualize but honestly I can't think of a way to help you here, you'll be able to feel matmul eventually as you keep playing with examples...
For now, I’m sticking to the naive nested loop and my matmul only supports 2D matrix multiplications.
FINAL WORDS
Looking at today's progress, I've cut down my original scope by like 60% because "smol finished project" > unfinished mess. I just want to get a simple working horizontal slice by the end of March. Will probably write my next blog when I have that ready.
It wasn't a very productive day but friction at the start is normal when you are pushing the boundaries of your knowledge, hopefully I speed up as time goes on.
Check out the project on my GitHub : https://github.com/Raju1173/ml-runtime-cpp
Shine a little star on it if you'd like to follow along!




Top comments (3)
I checked your GitHub profile, quite impressive for your age, how long do you plan to work on this massive project?
Thanks! I'll stick to this project for the next 6-9 months at best, maybe the whole year in fact...
Well done!