DEV Community

Ariyo Aresa
Ariyo Aresa

Posted on

My keyboard can't speak my language: Here's how i fixed it

Nigeria is a country of hundreds of languages. One of them is Yorùbá, spoken by tens of millions of people across the world. But despite how widely spoken it is, a quiet problem keeps getting in the way: most of our devices simply weren't built with it in mind.
Here's why that matters. In Yorùbá, the same combination of letters can mean completely different things depending on the marks above or below them:

  • Ọkọ means husband
  • Ọkọ̀ means vehicle
  • Òkò means stone
  • Okó means penis
  • Ọkọ́ means hoe

One word. Five meanings. And the only thing separating them is a dot or an accent.
I ran into this problem head-on while building a Yorùbá dictionary app — one where anyone, regardless of their region or dialect, could add words. Almost immediately, I hit a wall: How does someone search for a word like "òwúrò" (meaning morning) when their phone keyboard only lets them type "owuro"?

The Problem: Your Keyboard Is Missing the Marks
Standard keyboards, the ones that ship with most phones and laptops — are designed around English. And in English, O is just O. In Yorùbá, O, Ò, and are three entirely different things. They're not stylish variations. They change the word completely.
This means a regular search will fail someone typing "oko" when what they're actually looking for is "Ọkọ̀." The database won't find it. The word just disappears like it never existed.
To fix this, I had to solve two problems: making search work even when the marks are missing, and making sure words still sort in the correct Yorùbá alphabetical order.

The solution came down to how computers store characters under the hood. Accented letters like ò aren't stored as one single thing — they can be broken down into two parts: the base letter (o) and the accent mark sitting on top of it. Once I understood that, I could work with each part separately.

Smarter Search
For searching, I strip away all the accent marks and dots before comparing anything. So when a user types "owuro," the app quietly converts both the search term and the stored words into their plain versions before checking for a match. The user finds what they're looking for, and never has to worry about whether they have the right keyboard.

const searchNormalization = (text) => {
  return text
    .normalize('NFD')
    .replace(/[\u0300-\u036f]/g, "");
};
Enter fullscreen mode Exit fullscreen mode

This turns "òwúrò" into "owuro", making it findable no matter how it's typed.

Smarter Sorting
Sorting was trickier. In Yorùbá, E and are genuinely different letters — they belong in different places in the alphabet. But E and É are the same letter, just pronounced at a different pitch. So I couldn't strip everything away for sorting — I'd lose the structure of the language.
Instead, I wrote a version that removes tone marks (the accents that indicate pitch) while keeping the dots under letters like , , and intact. That way, the dictionary sorts correctly according to actual Yorùbá alphabetical rules.

const sortNormalization = (text) => {
  return text
    .normalize('NFD')
    .replace(/[\u0300-\u0301\u0304]/g, "");
};
Enter fullscreen mode Exit fullscreen mode

What This Makes Possible
Anyone can search, regardless of their keyboard: You don't need a special Yorùbá keyboard app installed. Type it the way your keyboard allows, and the dictionary will meet you there.
The language stays accurate in the database: The dots and accents are preserved exactly as they should be. The app just handles the translation between what users type and what's stored.
This isn't just a Yorùbá solution: The same approach can work for Igbo, Efik, and other tonal African languages facing the same digital barriers.

Why This Matters Beyond the Code
This isn't really a story about regular expressions or character encoding. It's about making sure our languages survive the digital age.
When tools aren't built to handle Yorùbá properly, speakers quietly stop using the language in digital spaces. Dictionaries go unbuilt. Resources go uncreated. And slowly, something irreplaceable shrinks.
Building this isn't just about writing the code, it is how we make sure our mother tongues don't get left behind.

Top comments (0)