DEV Community

Cover image for Teaching Computers to Read Handwriting: Neural Networks Made Simple
likhitha manikonda
likhitha manikonda

Posted on

Teaching Computers to Read Handwriting: Neural Networks Made Simple

Machine learning can sound intimidating, but let’s break it down step by step. In this article, we’ll explore how a neural network can recognize handwritten digits (0–9). Don’t worry if you’re starting with zero knowledge — this guide is designed for you.


✍️ The Problem

We want a computer to look at an image of a handwritten digit and correctly identify it.

Examples of where this is used:

  • Reading postal codes on envelopes.
  • Recognizing amounts on bank checks.
  • Digitizing handwritten notes.

This task is called digit recognition.


🔢 Classification Explained

  • Classification = sorting things into categories.
  • Example: Is this email "spam" or "not spam"?
  • For digit recognition, the categories are digits 0–9.
  • That means we’re solving a multi‑class classification problem (10 possible classes).

🖼️ How Computers See Digits

  • Images are made of tiny squares called pixels.
  • Each pixel has a value (brightness).
  • A 28x28 image has 784 pixels.
  • The neural network looks at these pixel values to decide which digit it is.

🏗️ Anatomy of a Neural Network

  1. Input Layer

    • Takes in pixel values (e.g., 784 inputs for a 28x28 image).
  2. Hidden Layers

    • Transform inputs into meaningful features.
    • Learn shapes like curves, lines, and loops that make up digits.
  3. Output Layer

    • Produces probabilities for each digit (0–9).
    • Example:
      • "This looks 80% like a 3, 15% like an 8, 5% like a 5."
    • The digit with the highest probability is chosen.

📚 Training the Neural Network

  • Training = teaching the network using examples.
  • We show it thousands of digit images with correct answers.
  • The network adjusts itself to improve accuracy.
  • Eventually, it can recognize digits it has never seen before.

🛠️ Tools You’ll Use

  • Python → beginner‑friendly programming language.
  • TensorFlow / Keras → libraries to build neural networks.
  • MNIST dataset → famous dataset of handwritten digits used for practice.

💻 Hands‑On Example (Python Code with Explanations)

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
Enter fullscreen mode Exit fullscreen mode
  • Import libraries:
    • tensorflow → the main machine learning library we’re using.
    • mnist → the dataset of handwritten digits.
    • Sequential → lets us build a neural network layer by layer.
    • Dense, Flatten → types of layers we’ll use.
    • to_categorical → converts labels into one‑hot encoding.

(x_train, y_train), (x_test, y_test) = mnist.load_data()
Enter fullscreen mode Exit fullscreen mode
  • Load the dataset:
    • x_train → images used for training.
    • y_train → correct answers (labels) for training.
    • x_test, y_test → images and labels for testing.
    • Each image is 28x28 pixels.

x_train = x_train / 255.0
x_test = x_test / 255.0
Enter fullscreen mode Exit fullscreen mode
  • Normalize pixel values:
    • Pixel values range from 0 (black) to 255 (white).
    • Dividing by 255 scales them to between 0 and 1.
    • This makes training easier and faster because the numbers are small and consistent.

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)
Enter fullscreen mode Exit fullscreen mode
  • Convert labels to one‑hot encoding:
    • Original labels are just numbers like 3, 7, 9.
    • Neural networks work better when labels are represented as vectors.
    • Example:
    • Label 3[0,0,0,1,0,0,0,0,0,0]
    • Label 7[0,0,0,0,0,0,0,1,0,0]
    • This is called one‑hot encoding because only one position is “hot” (set to 1).
    • Why? Because the output layer has 10 neurons (one for each digit). The network needs labels in the same format to compare predictions with the correct answer.

model = Sequential([
    Flatten(input_shape=(28, 28)),   # Input layer
    Dense(128, activation='relu'),   # Hidden layer
    Dense(10, activation='softmax')  # Output layer
])
Enter fullscreen mode Exit fullscreen mode
  • Build the neural network:
    • Flatten → turns the 28x28 image into a list of 784 numbers.
    • Dense(128, relu) → hidden layer with 128 neurons.
    • relu helps the network learn complex patterns.
    • Dense(10, softmax) → output layer with 10 neurons (one per digit).
    • softmax converts outputs into probabilities that add up to 1.

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
Enter fullscreen mode Exit fullscreen mode
  • Compile the model:
    • optimizer='adam' → decides how the network updates itself during training.
    • loss='categorical_crossentropy' → measures how far off predictions are.
    • metrics=['accuracy'] → tells us how often the model is correct.

model.fit(x_train, y_train, epochs=5, batch_size=32, validation_split=0.1)
Enter fullscreen mode Exit fullscreen mode
  • Train the model:
    • epochs=5 → the model sees the entire dataset 5 times.
    • batch_size=32 → processes 32 images at a time before updating itself.
    • validation_split=0.1 → uses 10% of training data to check progress during training.

test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.2f}")
Enter fullscreen mode Exit fullscreen mode
  • Evaluate the model:
    • Tests the trained network on unseen data (x_test, y_test).
    • Prints accuracy (e.g., 0.98 → 98% correct predictions).

🌍 Why This Matters

Digit recognition is a classic beginner project in machine learning because:

  • It’s easy to understand.
  • It’s visual (you can see the digits).
  • It teaches the basics of how neural networks work.

Once you grasp this, you can move on to more complex tasks like recognizing faces, objects, or even handwriting styles.


📝 Key Takeaways

  • Neural networks learn patterns from data.
  • Digit recognition is a multi‑class classification problem.
  • Images are made of pixels, and the network learns features step by step.
  • Training requires lots of examples.
  • The MNIST dataset is the perfect playground for beginners.
  • One‑hot encoding is essential because it matches labels to the output layer format.

Top comments (0)