DEV Community

Rudra Sarker
Rudra Sarker

Posted on

🤟 I Built the Most Comprehensive Open-Source Sign Language Dataset Catalog — 73+ Datasets, 26 Languages

How I curated and verified 73+ sign language datasets covering 26 sign languages to help researchers, developers, and the deaf community build better assistive technology.

Sign Language

Why I Built This

Sign language recognition (SLR) is one of the most impactful applications of computer vision and AI — it directly helps millions of deaf and hard-of-hearing people communicate. But when I started working on assistive technology projects, I hit the same wall every researcher faces: finding reliable, verified datasets is painfully hard.

Most existing lists are:

  • ❌ Full of broken links
  • ❌ Missing sample counts
  • ❌ No license or citation info
  • ❌ Limited to 1-5 sign languages

So I decided to build what I wish existed: a single, verified, comprehensive catalog.

What Is SignLanguage-Dataset-Hub?

SignLanguage-Dataset-Hub is a curated, open-source catalog of 73+ verified sign language datasets covering 26 sign languages — the most comprehensive collection available anywhere.

Key Numbers

Metric Count
Verified Datasets 73+
Sign Languages Covered 26
Modality Types Video, Image, Sensor, Pose, RGB-D, Skeleton, Text
URL Verification 100% (all links checked)
Tutorials Included 9 (beginner to advanced)

Languages Covered

The catalog spans sign languages from around the world:

  • ASL (American), BSL (British), ISL (Indian), CSL (Chinese), LSM (Mexican)
  • ArSL (Arabic), BdSL (Bangla), JSL (Japanese), KSL (Korean), TİD (Turkish)
  • DGS (German), LSF (French), NGT (Dutch), LIS (Italian), RSL (Russian)
  • Plus 11 more including multilingual corpora and linguistic databases

What Makes It Different

Every Link is Verified

Not just collected — verified. Every URL resolves to HTTP 200 or an auth-gated page. No placeholder links. No fabricated data. If we couldn't verify a dataset, it's excluded.

From Original Sources

Sample counts come from the original publications, not estimated or copied from secondary sources.

Full License & Citation Info

Every dataset includes:

  • Source URL (verified)
  • Sample count
  • License information
  • Proper citation to original creators

Working Tools Included

This isn't just a list — it comes with tools:

# Load demo sensor data (4,824 samples)
from scripts.data_loader import BdSLSensorGloveDataset
dataset = BdSLSensorGloveDataset(split='train')
print(f"Loaded {len(dataset)} samples, 36 gesture classes")

# Visualize
python tools/visualize.py --data data/bdsl/BdSL-Sensor-Glove/

# Query the catalog
import pandas as pd
df = pd.read_csv('datasets_catalog.csv')
asl = df[df['language_code'] == 'ASL']
Enter fullscreen mode Exit fullscreen mode

9 Step-by-Step Tutorials

From beginner to advanced:

  1. Introduction to Sign Language Recognition
  2. Loading and Exploring Datasets
  3. Visualizing Sign Language Data
  4. Building Your First Classifier
  5. Hand Pose Estimation with MediaPipe
  6. Data Augmentation Techniques
  7. Real-time Recognition System
  8. Continuous Sign Language Recognition
  9. Multilingual Sign Recognition

Project Structure

SignLanguage-Dataset-Hub/
├── DATASETS.md              # Complete verified catalog (73 datasets)
├── datasets_catalog.csv     # Machine-readable catalog
├── STATISTICS.md            # Detailed statistics & breakdowns
├── data/bdsl/               # Demo sensor dataset (4,824 samples)
├── docs/
│   ├── BENCHMARKS.md        # Published accuracy numbers
│   ├── TUTORIALS.md         # 9 tutorials (beginner to advanced)
│   ├── QUICKSTART.md        # Quick start guide
│   └── LICENSE_ATTRIBUTION.md
├── scripts/
│   ├── data_loader.py       # PyTorch data loaders
│   └── download_datasets.py # Multi-source downloader
└── tools/
    ├── visualize.py         # Sensor data visualization
    └── generate_realistic_data.py
Enter fullscreen mode Exit fullscreen mode

Who Is This For?

  • Researchers — Find datasets for your next SLR paper with proper citations
  • Developers — Build sign language apps with verified, working data sources
  • Students — Learn sign language recognition with guided tutorials
  • The Deaf Community — Better tools start with better data

Why Open Source Matters Here

Sign language technology shouldn't be locked behind paywalls. The deaf community deserves free, accessible tools — and that starts with free, accessible data. This project is licensed under CC BY 4.0 so anyone can use it.

Get Started

git clone https://github.com/rudra496/SignLanguage-Dataset-Hub.git
cd SignLanguage-Dataset-Hub
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

🌐 Live Demo: rudra496.github.io/SignLanguage-Dataset-Hub
GitHub: github.com/rudra496/SignLanguage-Dataset-Hub

Citation

@misc{signlanguage_dataset_hub,
  title = {Sign Language Dataset Hub: A Verified Catalog of Sign Language Datasets},
  author = {Sarker, Rudra and Contributors},
  year = {2026},
  url = {https://github.com/rudra496/SignLanguage-Dataset-Hub}
}
Enter fullscreen mode Exit fullscreen mode

I'm Rudra Sarker, a student researcher and full-stack developer from Bangladesh, building ethical AI and assistive technology. If you found this useful, ⭐ the repo and share it with someone who needs it!

🔗 Connect with me:

Top comments (0)