DEV Community

Cover image for I Put a Neural Network Inside My Portfolio: No TensorFlow, No Server, 145 KB
Rahul Sangamker
Rahul Sangamker

Posted on • Originally published at rs-03.github.io

I Put a Neural Network Inside My Portfolio: No TensorFlow, No Server, 145 KB

Training a network from scratch in raw NumPy, quantizing it to int8, and running it as ~80 lines of dependency-free JavaScript, with a parity test proving the browser matches Python to 1e-6.

Why bother? MNIST is a solved problem

Digit recognition is the "hello world" of ML, and that's exactly why I used it. The model isn't the point. The point is everything around the model, which happens to be the part that matters in production work too: training without a framework, compressing for deployment, running inference in a constrained environment, and proving the deployed system matches the trained one.

Training: just NumPy and math

The network is a 784→128→64→10 MLP: hand-written forward pass, backpropagation, and Adam optimizer. No autograd, no framework:

# backward pass, by hand
dz3 = (probs - y_batch) / batch_size
grads_w[2] = a2.T @ dz3
da2 = dz3 @ weights[2].T
dz2 = da2 * (z2 > 0)          # ReLU mask
grads_w[1] = a1.T @ dz2
...
Enter fullscreen mode Exit fullscreen mode

784-128-64-10 network trained in raw NumPy, int8-quantized, run in the browser

One trick that matters for a drawing demo specifically: shift augmentation. MNIST digits are centered; humans draw wherever they like. Training on randomly translated copies makes the model tolerant of sloppy placement. Combined with MNIST-style preprocessing at inference (crop to bounding box, scale into a 20×20 box, center by center-of-mass), real-world doodles classify reliably. Final test accuracy: 98.2%.

Compression: int8 in 15 lines

A float32 weight file would be ~430 KB. Symmetric int8 quantization cuts it ~4×:

scale = np.abs(w).max() / 127.0
q = np.clip(np.round(w / scale), -127, 127).astype(np.int8)
Enter fullscreen mode Exit fullscreen mode

One scale factor per layer, weights stored as base64 in JSON: 145 KB total, and quantized test accuracy is identical to float: 98.2%.

Inference: ~80 lines of plain JavaScript

In the browser, the weights are dequantized once on load, and inference is three matrix-vector products with ReLU and a softmax. ~109K multiply-adds, about a microsecond-scale problem for any modern device. No TensorFlow.js (that runtime is megabytes; the entire model is 145 KB).

The part I'd actually show a hiring manager

Deployed-vs-trained drift is a real production failure mode, so the JS engine is tested against the Python model directly: ten fixture digits, expected probabilities exported from training, asserted in Node:

max prob diff vs Python: 1.14e-6
correct: 10/10
PARITY OK
Enter fullscreen mode Exit fullscreen mode

If I change the inference code and break numerical equivalence, CI knows before a visitor does. That habit, verifying the deployment artifact and not just the training run, is worth more than another accuracy point.

Try it (draw badly, it copes): rs-03.github.io/portfolio-website/demos
Source: github.com/rs-03/portfolio-website: training script, inference engine, and parity test.

Top comments (0)