likhitha manikonda

Posted on Dec 25

Teaching Computers to Read Handwriting: Neural Networks Made Simple

#aiops #machinelearning #neuralnetworks #learning

Machine learning can sound intimidating, but let’s break it down step by step. In this article, we’ll explore how a neural network can recognize handwritten digits (0–9). Don’t worry if you’re starting with zero knowledge — this guide is designed for you.

✍️ The Problem

We want a computer to look at an image of a handwritten digit and correctly identify it.

Examples of where this is used:

Reading postal codes on envelopes.
Recognizing amounts on bank checks.
Digitizing handwritten notes.

This task is called digit recognition.

🔢 Classification Explained

Classification = sorting things into categories.
Example: Is this email "spam" or "not spam"?
For digit recognition, the categories are digits 0–9.
That means we’re solving a multi‑class classification problem (10 possible classes).

🖼️ How Computers See Digits

Images are made of tiny squares called pixels.
Each pixel has a value (brightness).
A 28x28 image has 784 pixels.
The neural network looks at these pixel values to decide which digit it is.

🏗️ Anatomy of a Neural Network

Input Layer
- Takes in pixel values (e.g., 784 inputs for a 28x28 image).
Hidden Layers
- Transform inputs into meaningful features.
- Learn shapes like curves, lines, and loops that make up digits.
Output Layer
- Produces probabilities for each digit (0–9).
- Example:
  - "This looks 80% like a 3, 15% like an 8, 5% like a 5."
- The digit with the highest probability is chosen.

📚 Training the Neural Network

Training = teaching the network using examples.
We show it thousands of digit images with correct answers.
The network adjusts itself to improve accuracy.
Eventually, it can recognize digits it has never seen before.

🛠️ Tools You’ll Use

Python → beginner‑friendly programming language.
TensorFlow / Keras → libraries to build neural networks.
MNIST dataset → famous dataset of handwritten digits used for practice.

💻 Hands‑On Example (Python Code with Explanations)

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical

Import libraries:
- tensorflow → the main machine learning library we’re using.
- mnist → the dataset of handwritten digits.
- Sequential → lets us build a neural network layer by layer.
- Dense, Flatten → types of layers we’ll use.
- to_categorical → converts labels into one‑hot encoding.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Load the dataset:
- x_train → images used for training.
- y_train → correct answers (labels) for training.
- x_test, y_test → images and labels for testing.
- Each image is 28x28 pixels.

x_train = x_train / 255.0
x_test = x_test / 255.0

Normalize pixel values:
- Pixel values range from 0 (black) to 255 (white).
- Dividing by 255 scales them to between 0 and 1.
- This makes training easier and faster because the numbers are small and consistent.

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

Convert labels to one‑hot encoding:
- Original labels are just numbers like 3, 7, 9.
- Neural networks work better when labels are represented as vectors.
- Example:
- Label 3 → [0,0,0,1,0,0,0,0,0,0]
- Label 7 → [0,0,0,0,0,0,0,1,0,0]
- This is called one‑hot encoding because only one position is “hot” (set to 1).
- Why? Because the output layer has 10 neurons (one for each digit). The network needs labels in the same format to compare predictions with the correct answer.

model = Sequential([
    Flatten(input_shape=(28, 28)),   # Input layer
    Dense(128, activation='relu'),   # Hidden layer
    Dense(10, activation='softmax')  # Output layer
])

Build the neural network:
- Flatten → turns the 28x28 image into a list of 784 numbers.
- Dense(128, relu) → hidden layer with 128 neurons.
- relu helps the network learn complex patterns.
- Dense(10, softmax) → output layer with 10 neurons (one per digit).
- softmax converts outputs into probabilities that add up to 1.

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Compile the model:
- optimizer='adam' → decides how the network updates itself during training.
- loss='categorical_crossentropy' → measures how far off predictions are.
- metrics=['accuracy'] → tells us how often the model is correct.

model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.1)

Train the model:
- epochs=5 → the model sees the entire dataset 5 times.
- batch_size=32 → processes 32 images at a time before updating itself.
- validation_split=0.1 → uses 10% of training data to check progress during training.

test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.2f}")

Evaluate the model:
- Tests the trained network on unseen data (x_test, y_test).
- Prints accuracy (e.g., 0.98 → 98% correct predictions).

🌍 Why This Matters

Digit recognition is a classic beginner project in machine learning because:

It’s easy to understand.
It’s visual (you can see the digits).
It teaches the basics of how neural networks work.

Once you grasp this, you can move on to more complex tasks like recognizing faces, objects, or even handwriting styles.

📝 Key Takeaways

Neural networks learn patterns from data.
Digit recognition is a multi‑class classification problem.
Images are made of pixels, and the network learns features step by step.
Training requires lots of examples.
The MNIST dataset is the perfect playground for beginners.
One‑hot encoding is essential because it matches labels to the output layer format.

DEV Community