Machine learning can sound intimidating, but let’s break it down step by step. In this article, we’ll explore how a neural network can recognize handwritten digits (0–9). Don’t worry if you’re starting with zero knowledge — this guide is designed for you.
✍️ The Problem
We want a computer to look at an image of a handwritten digit and correctly identify it.
Examples of where this is used:
- Reading postal codes on envelopes.
- Recognizing amounts on bank checks.
- Digitizing handwritten notes.
This task is called digit recognition.
🔢 Classification Explained
- Classification = sorting things into categories.
- Example: Is this email "spam" or "not spam"?
- For digit recognition, the categories are digits
0–9. - That means we’re solving a multi‑class classification problem (10 possible classes).
🖼️ How Computers See Digits
- Images are made of tiny squares called pixels.
- Each pixel has a value (brightness).
- A
28x28image has 784 pixels. - The neural network looks at these pixel values to decide which digit it is.
🏗️ Anatomy of a Neural Network
-
Input Layer
- Takes in pixel values (e.g., 784 inputs for a
28x28image).
- Takes in pixel values (e.g., 784 inputs for a
-
Hidden Layers
- Transform inputs into meaningful features.
- Learn shapes like curves, lines, and loops that make up digits.
-
Output Layer
- Produces probabilities for each digit (0–9).
- Example:
- "This looks 80% like a 3, 15% like an 8, 5% like a 5."
- The digit with the highest probability is chosen.
📚 Training the Neural Network
- Training = teaching the network using examples.
- We show it thousands of digit images with correct answers.
- The network adjusts itself to improve accuracy.
- Eventually, it can recognize digits it has never seen before.
🛠️ Tools You’ll Use
- Python → beginner‑friendly programming language.
- TensorFlow / Keras → libraries to build neural networks.
- MNIST dataset → famous dataset of handwritten digits used for practice.
💻 Hands‑On Example (Python Code with Explanations)
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
-
Import libraries:
-
tensorflow→ the main machine learning library we’re using. -
mnist→ the dataset of handwritten digits. -
Sequential→ lets us build a neural network layer by layer. -
Dense,Flatten→ types of layers we’ll use. -
to_categorical→ converts labels into one‑hot encoding.
-
(x_train, y_train), (x_test, y_test) = mnist.load_data()
-
Load the dataset:
-
x_train→ images used for training. -
y_train→ correct answers (labels) for training. -
x_test,y_test→ images and labels for testing. - Each image is
28x28pixels.
-
x_train = x_train / 255.0
x_test = x_test / 255.0
-
Normalize pixel values:
- Pixel values range from
0(black) to255(white). - Dividing by 255 scales them to between
0and1. - This makes training easier and faster because the numbers are small and consistent.
- Pixel values range from
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
-
Convert labels to one‑hot encoding:
- Original labels are just numbers like
3,7,9. - Neural networks work better when labels are represented as vectors.
- Example:
- Label
3→[0,0,0,1,0,0,0,0,0,0] - Label
7→[0,0,0,0,0,0,0,1,0,0] - This is called one‑hot encoding because only one position is “hot” (set to 1).
- Why? Because the output layer has 10 neurons (one for each digit). The network needs labels in the same format to compare predictions with the correct answer.
- Original labels are just numbers like
model = Sequential([
Flatten(input_shape=(28, 28)), # Input layer
Dense(128, activation='relu'), # Hidden layer
Dense(10, activation='softmax') # Output layer
])
-
Build the neural network:
-
Flatten→ turns the 28x28 image into a list of 784 numbers. -
Dense(128, relu)→ hidden layer with 128 neurons. -
reluhelps the network learn complex patterns. -
Dense(10, softmax)→ output layer with 10 neurons (one per digit). -
softmaxconverts outputs into probabilities that add up to 1.
-
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
-
Compile the model:
-
optimizer='adam'→ decides how the network updates itself during training. -
loss='categorical_crossentropy'→ measures how far off predictions are. -
metrics=['accuracy']→ tells us how often the model is correct.
-
model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.1)
-
Train the model:
-
epochs=5→ the model sees the entire dataset 5 times. -
batch_size=32→ processes 32 images at a time before updating itself. -
validation_split=0.1→ uses 10% of training data to check progress during training.
-
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.2f}")
-
Evaluate the model:
- Tests the trained network on unseen data (
x_test,y_test). - Prints accuracy (e.g.,
0.98→ 98% correct predictions).
- Tests the trained network on unseen data (
🌍 Why This Matters
Digit recognition is a classic beginner project in machine learning because:
- It’s easy to understand.
- It’s visual (you can see the digits).
- It teaches the basics of how neural networks work.
Once you grasp this, you can move on to more complex tasks like recognizing faces, objects, or even handwriting styles.
📝 Key Takeaways
- Neural networks learn patterns from data.
- Digit recognition is a multi‑class classification problem.
- Images are made of pixels, and the network learns features step by step.
- Training requires lots of examples.
- The MNIST dataset is the perfect playground for beginners.
- One‑hot encoding is essential because it matches labels to the output layer format.
Top comments (0)