DEV Community

Vivan Rajath
Vivan Rajath

Posted on

ASL Hand Sign Recognition using Neural Networks and Mediapipe

This project uses MediaPipe to extract hand landmarks and a Random Forest model to recognize American Sign Language (ASL) alphabet letters. It also includes real-time sign recognition using your webcam.

Features

  • Uses the ASL Alphabet dataset from Kaggle
  • Extracts hand landmarks using MediaPipe
  • Trains a Random Forest classifier on landmark data
  • Tests accuracy on validation data
  • Predicts ASL hand signs in real-time via webcam

This model supports all static ASL letters (A–Z), even motion-based signs like "J" and "Z".

Tools & Technologies Used

  • MediaPipe: For extracting 21 hand landmarks (x, y, z) from each hand
  • OpenCV: For webcam input and image processing
  • scikit-learn: For training and evaluating the Random Forest model
  • Python: Language used for scripting and development

How It Works

  • Data Collection: Hand images for each ASL letter are passed through MediaPipe to extract 21 landmarks per hand.
  • Data Formatting: Each landmark includes (x, y, z), resulting in 63 values per frame.
  • Model Training: A Random Forest classifier is trained using this data.
  • Real-Time Prediction: The webcam captures live hand gestures, which are processed and classified into ASL letters.

Steps to Build It Yourself

Step 1: Collect Data
Use the ASL Alphabet dataset from Kaggle. Extract hand landmarks from each image using MediaPipe and save them in a CSV file for training.

Step 2: Train the Model
Train a Random Forest classifier using the landmark CSV. This model learns to distinguish between different hand poses.

Step 3: Test the Accuracy
Run tests using a validation set to check how well your model is performing. Aim for high accuracy — Random Forests usually perform well with this kind of data.

Step 4: Real-Time Recognition
Use your webcam to capture hand signs in real-time, feed them through MediaPipe, and classify them with the trained model. The predicted letter is shown on screen.

Why Use Landmarks Instead of Images?
Using landmark coordinates is far more efficient than training heavy image-based models like CNNs. It reduces training time, improves performance on low-resource devices, and works surprisingly well with static signs.

Final Thoughts
This project is a practical, lightweight introduction to real-time hand gesture recognition. If you're interested in computer vision, sign language, or accessibility tech, this is a great way to dive in. By combining MediaPipe and a simple machine learning model, we’ve built something that can make communication more inclusive — one hand sign at a time.

Checkout : https://github.com/VivanRajath/ASL

Top comments (0)