Training a network from scratch in raw NumPy, quantizing it to int8, and running it as ~80 lines of dependency-free JavaScript, with a parity test proving the browser matches Python to 1e-6.
Why bother? MNIST is a solved problem
Digit recognition is the "hello world" of ML, and that's exactly why I used it. The model isn't the point. The point is everything around the model, which happens to be the part that matters in production work too: training without a framework, compressing for deployment, running inference in a constrained environment, and proving the deployed system matches the trained one.
Training: just NumPy and math
The network is a 784→128→64→10 MLP: hand-written forward pass, backpropagation, and Adam optimizer. No autograd, no framework:
# backward pass, by hand
dz3 = (probs - y_batch) / batch_size
grads_w[2] = a2.T @ dz3
da2 = dz3 @ weights[2].T
dz2 = da2 * (z2 > 0) # ReLU mask
grads_w[1] = a1.T @ dz2
...
One trick that matters for a drawing demo specifically: shift augmentation. MNIST digits are centered; humans draw wherever they like. Training on randomly translated copies makes the model tolerant of sloppy placement. Combined with MNIST-style preprocessing at inference (crop to bounding box, scale into a 20×20 box, center by center-of-mass), real-world doodles classify reliably. Final test accuracy: 98.2%.
Compression: int8 in 15 lines
A float32 weight file would be ~430 KB. Symmetric int8 quantization cuts it ~4×:
scale = np.abs(w).max() / 127.0
q = np.clip(np.round(w / scale), -127, 127).astype(np.int8)
One scale factor per layer, weights stored as base64 in JSON: 145 KB total, and quantized test accuracy is identical to float: 98.2%.
Inference: ~80 lines of plain JavaScript
In the browser, the weights are dequantized once on load, and inference is three matrix-vector products with ReLU and a softmax. ~109K multiply-adds, about a microsecond-scale problem for any modern device. No TensorFlow.js (that runtime is megabytes; the entire model is 145 KB).
The part I'd actually show a hiring manager
Deployed-vs-trained drift is a real production failure mode, so the JS engine is tested against the Python model directly: ten fixture digits, expected probabilities exported from training, asserted in Node:
max prob diff vs Python: 1.14e-6
correct: 10/10
PARITY OK
If I change the inference code and break numerical equivalence, CI knows before a visitor does. That habit, verifying the deployment artifact and not just the training run, is worth more than another accuracy point.
Try it (draw badly, it copes): rs-03.github.io/portfolio-website/demos
Source: github.com/rs-03/portfolio-website: training script, inference engine, and parity test.

Top comments (0)