The problem starts here: a hand, in motion, carrying meaning. Teaching a machine to read it is harder than it looks.
I was standing in a cafe in Beirut when I saw it happen.
A deaf man was trying to explain something to the barista. He signed. She stared. He tried again, slower, more deliberate, like that would help. She shook her head apologetically and reached for a notepad. He took it, wrote something down, she read it, nodded. The whole exchange took maybe four minutes for something that should have taken thirty seconds.
I stood there watching and thought: we have real-time translation for dozens of spoken languages in our pockets. Why not this?
That question became OmniSign, a real-time Lebanese Sign Language (LSL) translator. And building it taught me things about machine learning that no paper had prepared me for, because the hardest problems weren't technical. They were human.
The Dataset Problem Nobody Talks About
When you want to train a computer vision model, the standard advice is: get more data. ImageNet has over 14 million images. Common Voice has thousands of hours of speech. Even niche spoken languages have crowdsourced datasets you can start from.
Lebanese Sign Language has almost none of that.
LSL is a distinct language, not a transliteration of Arabic, not a derivative of French Sign Language, though it shares some roots. It has its own grammar, its own spatial logic, its own regional quirks. And it is used by a community that has been largely invisible to the tech world.
So before I could write a single line of model code, I had to figure out how to build a dataset from scratch.
This is what the unglamorous middle of an ML project looks like. Every frame, reviewed. Every label, decided by a human.
Finding People Who Would Actually Help
The first challenge was access. I needed signers willing to be filmed, and not just willing, but patient enough to repeat the same sign dozens of times under different conditions, at different speeds, with different lighting. And they had to trust that this wasn't going to end up as some project that got submitted, got a grade, and disappeared.
The deaf community has seen a lot of that. Technology built about them, not with them.
Getting past that took time and relationship, not code. It meant showing up, explaining what the goal actually was, being honest about what the system could and couldn't do. It meant involving people in decisions, not just data collection.
Once we had that trust, the filming itself was its own challenge. We recorded in different environments: different backgrounds, different light sources, indoors and outdoors, because a model trained only in a clean lab setting will fail spectacularly in a pharmacy with fluorescent lights and motion blur.
The Variation Problem
Here's something I didn't fully appreciate until I was knee-deep in footage: sign languages have dialects.
Not in the same loose way people use that word. I mean real, meaningful variation. A sign that means one thing to someone from one part of Lebanon might look subtly different to someone from another region. Age matters. Individual signers develop personal style. Some people sign large and expansive; others keep everything close to the body.
This is actually true of spoken languages too, but for speech recognition, you have decades of research and millions of data points to smooth out that variation. For LSL, every variation we encountered was a new challenge to solve with whatever data we had.
Our solution was imperfect but pragmatic: we over-indexed on signer diversity rather than sign volume. Fewer total signs, more variation per sign. The model had to learn that a sign is a category, not a specific hand shape at a specific moment.
MediaPipe hand landmarks: 21 points per hand, tracked in real time. The model doesn't see a hand. It sees a skeleton, moving through space.
What "Good Enough" Means When There's No Benchmark
This is the question that kept me up at night: how do you know your model is good?
For most ML tasks, you have benchmarks. You can compare your accuracy to the state of the art, see where you land, iterate. For LSL, there was no benchmark. No prior model to compare against. No established test set.
So I had to define what success looked like from first principles, and that forced an uncomfortable honesty: the only real measure of success was whether the people who use LSL found the tool useful.
We demoed the system to members of the deaf community. We watched how they used it. Where it hesitated, where it failed, where it surprised us by working. That feedback loop, messy and qualitative as it was, became more valuable than any metric I could compute.
The system isn't perfect. It's probably not close to perfect. But it translated in real time, in front of real people, and some of them smiled when it worked. That felt like a more honest measure of success than an accuracy number on a test set I built myself.
What I'd Tell Someone Starting This Problem
Don't start with the model. Start with the community.
Not because it's the ethical thing to do (though it is), but because you will build the wrong thing if you don't. The assumptions you make in isolation, about what signs to include, what variation looks like, what "correct" even means, will be wrong in ways that matter.
The dataset is not a preprocessing step you get through before the real work starts. The dataset is the work. In low-resource settings, every annotation decision, every filming session, every signer you include or exclude, shapes what the model can and cannot do. That deserves the same care and intention as the architecture.
And finally: ship something. An imperfect tool that someone can actually use is worth more than a perfect model that lives in a notebook. The pharmacy moment that started all of this, that man and that pharmacist, they don't need 99% accuracy. They need something that works well enough, right now, in the real world.
That's what we were building toward. And we're not done yet.
If you're working on low-resource sign language AI or have LSL data you'd like to contribute, I'd genuinely love to talk, reach me at ramikronbi.com.*



Top comments (0)