DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

I Built a Webcam Sign-Language Reader in the Browser (No Cloud)

"AI that reads sign language" sounds like a research lab and a GPU cluster. But a genuinely useful starting version runs entirely in your browser, with no model upload and no cloud — the camera feed never leaves your machine. Here's how I built a webcam sign reader from scratch.

This is Day 7 of SolveFromZero, where I solve a real, useful problem each day.

The browser can track a hand

You don't need a server or a camera SDK. Google's MediaPipe ships a tiny hand-tracking model that runs on WebAssembly right in the tab. Hand it a video frame, get back the hand's skeleton — all on-device.

import { HandLandmarker, FilesetResolver } from
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/vision_bundle.mjs";
const hand = await HandLandmarker.createFromOptions(files, { runningMode: "VIDEO" });
Enter fullscreen mode Exit fullscreen mode

21 points per hand

For every frame the model returns 21 landmarks — the wrist plus four points per finger (knuckle, two joints, tip) — each as an (x, y, z) coordinate from 0 to 1. That skeleton is all you need; you never touch raw pixels again.

const lm = hand.detectForVideo(video, performance.now()).landmarks[0];  // 21 points
Enter fullscreen mode Exit fullscreen mode

A finger is "up" if its tip beats its knuckle

Geometry does the recognition. For a roughly upright hand, a finger is extended when its tip is higher on screen (smaller y) than its middle joint. Check that for the four fingers and you instantly know how many are raised.

const up = [8, 12, 16, 20].map((tip, i) =>
  lm[tip].y < lm[[6, 10, 14, 18][i]].y);
Enter fullscreen mode Exit fullscreen mode

The thumb is the awkward one

The thumb bends sideways, not up, so the tip-above-knuckle trick fails on it. Instead, measure how far the thumb tip sticks out from the hand. Far out = extended. Handling the thumb separately is the classic gotcha in gesture code.

const thumb = dist(lm[4], lm[5]) > 0.13;
Enter fullscreen mode Exit fullscreen mode

Map the finger pattern to a sign

Now turn the pattern of raised fingers into meaning — no fingers = 0, index only = 1, index+middle = 2, all five = an open-palm "hi", thumb alone = 👍:

if (!thumb && count === 0) return "0";
if (!thumb && count === 2) return "2";
if (thumb  && count === 4) return "hi";
Enter fullscreen mode Exit fullscreen mode

It's a hand-coded lookup — simple, transparent, and enough for counts and a few gestures.

Hold to commit

A hand wobbles, so only "type" a sign once it's been steady for ~12 frames. That debounce stops the transcript from filling with noise as your hand moves between signs.

if (sign === lastSign) stable++; else stable = 0;
if (stable === 12) type(sign);
Enter fullscreen mode Exit fullscreen mode

Scaling to real ASL

This demo recognises counts 0–5 plus a couple of gestures with pure geometry. Full ASL — dozens of letters, motion, two hands, facial cues — needs a small trained classifier sitting on top of these same 21 landmarks. But that's the beautiful part: the hard perception (finding the hand) is done for you, and the pipeline you'd build for the real thing is exactly the one here. Landmarks in, sign out, fully on-device.

It's also a reminder that a lot of "AI" products are 20% model and 80% turning its output into something useful.

👉 Try it with your webcam (Chrome/Edge, grant camera): https://dev48v.infy.uk/solve/day7-sign-language.html

🌐 All solutions: https://dev48v.infy.uk/solvefromzero.php

Tomorrow: live captions for any video, in the browser.

Top comments (0)