How I curated and verified 73+ sign language datasets covering 26 sign languages to help researchers, developers, and the deaf community build better assistive technology.
Why I Built This
Sign language recognition (SLR) is one of the most impactful applications of computer vision and AI — it directly helps millions of deaf and hard-of-hearing people communicate. But when I started working on assistive technology projects, I hit the same wall every researcher faces: finding reliable, verified datasets is painfully hard.
Most existing lists are:
- ❌ Full of broken links
- ❌ Missing sample counts
- ❌ No license or citation info
- ❌ Limited to 1-5 sign languages
So I decided to build what I wish existed: a single, verified, comprehensive catalog.
What Is SignLanguage-Dataset-Hub?
SignLanguage-Dataset-Hub is a curated, open-source catalog of 73+ verified sign language datasets covering 26 sign languages — the most comprehensive collection available anywhere.
Key Numbers
| Metric | Count |
|---|---|
| Verified Datasets | 73+ |
| Sign Languages Covered | 26 |
| Modality Types | Video, Image, Sensor, Pose, RGB-D, Skeleton, Text |
| URL Verification | 100% (all links checked) |
| Tutorials Included | 9 (beginner to advanced) |
Languages Covered
The catalog spans sign languages from around the world:
- ASL (American), BSL (British), ISL (Indian), CSL (Chinese), LSM (Mexican)
- ArSL (Arabic), BdSL (Bangla), JSL (Japanese), KSL (Korean), TİD (Turkish)
- DGS (German), LSF (French), NGT (Dutch), LIS (Italian), RSL (Russian)
- Plus 11 more including multilingual corpora and linguistic databases
What Makes It Different
Every Link is Verified
Not just collected — verified. Every URL resolves to HTTP 200 or an auth-gated page. No placeholder links. No fabricated data. If we couldn't verify a dataset, it's excluded.
From Original Sources
Sample counts come from the original publications, not estimated or copied from secondary sources.
Full License & Citation Info
Every dataset includes:
- Source URL (verified)
- Sample count
- License information
- Proper citation to original creators
Working Tools Included
This isn't just a list — it comes with tools:
# Load demo sensor data (4,824 samples)
from scripts.data_loader import BdSLSensorGloveDataset
dataset = BdSLSensorGloveDataset(split='train')
print(f"Loaded {len(dataset)} samples, 36 gesture classes")
# Visualize
python tools/visualize.py --data data/bdsl/BdSL-Sensor-Glove/
# Query the catalog
import pandas as pd
df = pd.read_csv('datasets_catalog.csv')
asl = df[df['language_code'] == 'ASL']
9 Step-by-Step Tutorials
From beginner to advanced:
- Introduction to Sign Language Recognition
- Loading and Exploring Datasets
- Visualizing Sign Language Data
- Building Your First Classifier
- Hand Pose Estimation with MediaPipe
- Data Augmentation Techniques
- Real-time Recognition System
- Continuous Sign Language Recognition
- Multilingual Sign Recognition
Project Structure
SignLanguage-Dataset-Hub/
├── DATASETS.md # Complete verified catalog (73 datasets)
├── datasets_catalog.csv # Machine-readable catalog
├── STATISTICS.md # Detailed statistics & breakdowns
├── data/bdsl/ # Demo sensor dataset (4,824 samples)
├── docs/
│ ├── BENCHMARKS.md # Published accuracy numbers
│ ├── TUTORIALS.md # 9 tutorials (beginner to advanced)
│ ├── QUICKSTART.md # Quick start guide
│ └── LICENSE_ATTRIBUTION.md
├── scripts/
│ ├── data_loader.py # PyTorch data loaders
│ └── download_datasets.py # Multi-source downloader
└── tools/
├── visualize.py # Sensor data visualization
└── generate_realistic_data.py
Who Is This For?
- Researchers — Find datasets for your next SLR paper with proper citations
- Developers — Build sign language apps with verified, working data sources
- Students — Learn sign language recognition with guided tutorials
- The Deaf Community — Better tools start with better data
Why Open Source Matters Here
Sign language technology shouldn't be locked behind paywalls. The deaf community deserves free, accessible tools — and that starts with free, accessible data. This project is licensed under CC BY 4.0 so anyone can use it.
Get Started
git clone https://github.com/rudra496/SignLanguage-Dataset-Hub.git
cd SignLanguage-Dataset-Hub
pip install -r requirements.txt
🌐 Live Demo: rudra496.github.io/SignLanguage-Dataset-Hub
⭐ GitHub: github.com/rudra496/SignLanguage-Dataset-Hub
Citation
@misc{signlanguage_dataset_hub,
title = {Sign Language Dataset Hub: A Verified Catalog of Sign Language Datasets},
author = {Sarker, Rudra and Contributors},
year = {2026},
url = {https://github.com/rudra496/SignLanguage-Dataset-Hub}
}
I'm Rudra Sarker, a student researcher and full-stack developer from Bangladesh, building ethical AI and assistive technology. If you found this useful, ⭐ the repo and share it with someone who needs it!
🔗 Connect with me:
- GitHub | LinkedIn | Twitter/X | ResearchGate | Personal Site
Top comments (0)