I recently realized that despite my 20+ years of experience in software engineering across very diverse areas, I had always missed one fascinating topic: Machine Learning. (The title of this article is, obviously, rhetorical.)
ML is currently changing the world at a rapid pace, and specifically reshaping the software engineering landscape. Because of this, I decided to dedicate some of my spare time to closing that gap in my expertise. What started as a learning exercise quickly turned into an incredibly interesting project.
Building from Scratch
To kick things off, I decided to write a simple Multi-Layer Perceptron (MLP)—the basic building block for pretty much any ML task. My goal was to write it completely from scratch, without any help from coding agents, just to truly understand the mechanics myself.
I have to give massive credit to Andrej Karpathy. His video gave me an incredible breakdown and immediate insights. Within the first few minutes of his content and after reading through the code of the micrograd framework, I finally grasped the true nature of backpropagation—the core concept behind training an MLP.
With that understanding, I headed to my first milestone: training a simple MLP to predict the output of a sin(x) function. After a couple of hours, I ended up with a very simple core loop for a basic MLP:
# Note: The training loop and some initial wiring code have been skipped intentionally for simplicity.
W = [
[[random.uniform(-0.5, 0.5)] for _ in range(16)],
[[random.uniform(-0.5, 0.5) for _ in range(16)] for _ in range(16)],
[[random.uniform(-0.5, 0.5) for _ in range(16)]],
]
B = [
[random.uniform(-0.5, 0.5) for _ in range(16)],
[random.uniform(-0.5, 0.5) for _ in range(16)],
[random.uniform(-0.5, 0.5)],
]
W_vals = wrap_values(W)
B_vals = wrap_values(B)
def sinNetwork(x):
x_output = []
for layer_index, layer in enumerate(W_vals):
x_output.append([])
for neuron_index, neuron_inputs in enumerate(layer):
for input_index, neuron_W_input in enumerate(neuron_inputs):
if layer_index == 0:
input_val = x
else:
input_val = x_output[layer_index-1][input_index]
if len(x_output[layer_index]) < neuron_index + 1:
x_output[layer_index].append(input_val * neuron_W_input)
else:
x_output[layer_index][neuron_index] = (input_val * neuron_W_input) + x_output[layer_index][neuron_index]
x_output[layer_index][neuron_index] = x_output[layer_index][neuron_index] + B_vals[layer_index][neuron_index]
if layer_index != len(W_vals) - 1:
x_output[layer_index][neuron_index] = x_output[layer_index][neuron_index].relu()
return x_output[len(W_vals)-1][0]
Yes, it is ugly code and it was slow, but it did the job and obviously gave far from ideal results:
Scaling Up with PyTorch and CUDA
To speed up the incredibly slow, CPU-bound training caused by these sequential nested loops, the next natural step was to adopt PyTorch. The framework provides an excellent level of abstraction over basic building blocks while handling low-level optimizations like CUDA execution out of the box.
Migrating this logic to PyTorch + CUDA gave a massive performance boost. It allowed me to train a much deeper model and finally achieve a clean, output. Instead of waiting 5 to 7 minutes for 2,000 steps on the CPU, CUDA handled roughly 8,000 steps with 5x higher resolution in just about 45 seconds:
And in PyTorch definition is much more human readable:
from torch import nn
class MySinNetwork(nn.Module):
def __init__(self):
super().__init__()
self.linear_relu_stack = nn.Sequential(
nn.Linear(1, 16),
nn.Sigmoid(),
nn.Linear(16, 16),
nn.ReLU(),
nn.Linear(16, 16),
nn.ReLU(),
nn.Linear(16, 1)
)
self.init_weights()
def init_weights(self):
for layer in self.linear_relu_stack:
if isinstance(layer, nn.Linear):
nn.init.uniform_(layer.weight, -0.5, 0.5)
nn.init.uniform_(layer.bias, -0.5, 0.5)
def forward(self, x):
y_predict = self.linear_relu_stack(x)
return y_predict
What's Next?
Now for the exciting part. Having cleared this first challenge, I wanted to find a much more complex training task.
Here is a quick sneak peek at what's coming next: the Polyniod platform and a custom Gen0 robot built around an ESP32 and a bunch of the cheapest parts and hot glue, full of non-linear behaviors and challenges. Stay tuned!




Top comments (0)