Draw a Digit and Watch the Neural Network Think in Real Time

Tomohisa Iura — Fri, 17 Apr 2026 04:56:35 +0000

Introduction

"A neural network can recognize digits" — but what's actually happening inside?

I built a tool where you draw a digit with your finger or mouse, and watch the CNN (Convolutional Neural Network) recognize it in real time, with the internal signal flow visualized as it happens.

Try the Demo (runs in your browser — no install needed)

GitHub Repository

What Is This?

A tool that lets you see how a neural network makes its decisions as you draw handwritten digits.

Three Visualizations

Dial-style Heatmap — Digits 0–9 arranged like a phone dial, with color intensity showing confidence in real time. As you draw, you can see the network thinking: "looks like an 8... wait, now it's a 3."
Network Diagram — Input → Conv1 → Conv2 → FC → Output nodes and links light up orange based on signal strength. You can trace exactly which pathways the signal took to reach the answer.
CNN Input Preview — Shows how your drawing gets downscaled to 28×28 pixels. This is what the network actually "sees."

Not an Emulation — The Real Thing

This is not a simulation or replay. A real CNN with 27,690 parameters is running in your browser. Every time you draw a stroke, actual convolutions, ReLU activations, max-pooling, and fully-connected layer computations are executed, and the intermediate values are visualized directly.

Why I Built This

My previous project, Transformer Emulator, visualized the internals of a Transformer. But that was a "watch" experience — replaying pre-computed results.

This time, I wanted a "touch" experience. You draw a digit, and the network reacts instantly. The probabilities shift as you draw. The moment when "I'm drawing a 3 but the network thinks it's an 8" — that's something no textbook can give you.

What Happens While You Draw

On every pointermove event while drawing, the following pipeline runs:

Canvas → 28×28 downscale — Bounding box detection, center-of-mass alignment. Same preprocessing as MNIST.
CNN inference (JavaScript) — Conv → ReLU → MaxPool → Conv → ReLU → MaxPool → FC → Softmax. Pure matrix operations in vanilla JavaScript.
Visualization update — Intermediate activations from each layer drive the dial colors and network diagram node/link brightness.

For a CNN this size on 28×28 input, inference completes in a few milliseconds — fast enough to run on every stroke without dropping frame rate.

Things I Learned

1. CNNs "Hesitate"

Watching the probability shift while drawing a "3":

Drawing stage	Prediction
Drew a vertical line	1: 30%, 7: 25%
Closed the top curve	8: 55%, 9: 20%
Opened the bottom	3: 60%, 8: 22%
Finished drawing	3: 92%

The intermediate states genuinely look like an 8. The CNN's "hesitation" matches human intuition — it's making rational judgments.

2. Preprocessing Makes or Breaks Accuracy

In the first version, drawing a "4" got classified as "7". The cause: missing preprocessing. MNIST data is center-of-mass aligned, but I was just naively downscaling the canvas to 28×28. Adding MNIST-compliant preprocessing (bounding box detection → center alignment → fit into 20×20 region) fixed it immediately.

3. 27,690 Parameters, 98% Accuracy

GPT-4 reportedly has ~1.8 trillion parameters. This CNN is 1/65-millionth that size. Yet it achieves 98.04% test accuracy. "Choose the right architecture (convolutions) and you can get high accuracy with minimal parameters" — this is the essence of CNNs, and you can feel it.

Tech Stack

Component	Technology	Why
Training	Python / pure NumPy	No PyTorch — all backpropagation implemented from scratch. Educational purpose
Inference	Vanilla JavaScript	Runs entirely in the browser. No external libraries
Visualization	SVG + Canvas + CSS	Network diagram in SVG, drawing and preview in Canvas
Output	Single HTML file (~620KB)	Trained weights embedded as JSON. Easy to distribute

Model Architecture

Conv(5x5, 8ch) → ReLU → MaxPool(2)    # Detect 8 types of features from the image
Conv(3x3, 16ch) → ReLU → MaxPool(2)   # Combine into 16 higher-level features
Flatten(400) → FC(64) → ReLU          # Integrate all features for judgment
FC(10) → Softmax                       # Output probabilities for 0–9

Network Diagram Implementation

Nodes for each layer are placed in SVG, with <line> elements connecting adjacent layers. During inference, activation values update each node's fill and each link's stroke-opacity, making the signal flow visible.

There are 552 links total, but most have opacity near 0 — visually, only the active pathways light up.

Multilingual Support

A toggle button next to the title switches between Japanese and English. The initial language is auto-detected from the browser's language setting, and can also be set via URL parameter (?lang=en).

Since there are few text elements, a JS dictionary holds both languages and a button click swaps all text instantly — even mid-drawing.

Try It

Live Demo

https://tomoiura.github.io/digit_recognizer/

Just open it in your browser.

Build from Source

git clone https://github.com/Tomoiura/digit_recognizer.git
cd digit_recognizer
pip install numpy
python main.py

First run downloads MNIST data and trains the model (takes a few minutes). Subsequent runs use cached weights and complete in seconds.

Wrapping Up

My previous Transformer Emulator was about "watching AI learn." This project is about "drawing with your own hand and feeling AI react in real time."

Instead of formulas or diagrams, the answer to "what is a neural network doing?" comes through touching, seeing, and feeling. That's the experience I was aiming for.

If you find technical errors or have suggestions, Issues and PRs are welcome.

Related: Transformer Emulator — Visualize the internals of a Transformer decoder, also running in the browser.

DEV Community: Tomohisa Iura