Most dictionary popups inside language-learning tools try to do too much at once. Click a word and you get the translation, three definitions, four example sentences, frequency rank, etymology, conjugation table, audio button, save button, and a related-words list. All at once, all on the same panel.
It looks generous. It actually breaks the moment of reading.
When I built the click-to-look-up popup for TubeVocab, I started with that maximalist design and slowly stripped it down. The version that actually retains learners is intentionally quiet: one translation, one short example, one save button. Everything else lives behind a single tap.
A popup is a reading interruption, not a reference page
When someone watches a YouTube video with subtitles and clicks an unknown word, they are not opening a dictionary. They are buying themselves a half second of clarification so they can keep watching.
If the popup contains a dense reference layout, three things happen, all of which break the loop:
- The video stays paused longer than the learner intended.
- The learner starts reading the popup instead of returning to the subtitle.
- The cognitive load of choosing what to read inside the popup competes with the cognitive load of the next subtitle line.
The popup is a tooltip, not a destination. Treating it like a destination is the most common mistake in this kind of interface.
Defaults should match the dominant intent
Among learners hovering over an unknown word, the dominant intent is "what does this mean, roughly, so I can keep going". Maybe 80 percent of clicks fit that pattern.
A smaller share genuinely wants to dive in: see all senses of the word, hear it pronounced multiple ways, check a usage table, save it with context, compare regional variants. That is real, but it is the minority intent.
A good popup serves the majority intent in zero clicks and the minority intent in one click. A bad popup serves both at the same time and accidentally serves neither.
What the quiet popup actually contains
The default state in TubeVocab shows a single line for the most common translation, a single line for a short usage example pulled from the subtitle context, and a small heart icon to save. Nothing else.
If the learner wants more, a single tap expands the panel into the fuller reference view with all senses, all examples, the audio button, and the save-with-tags flow. Tapping outside collapses it back to quiet mode.
That structure rewards both modes of attention. A learner who clicks ten unknown words in a five-minute video gets ten near-instant peeks. A learner who pauses on one fascinating word can drill in without losing the rest of the popup.
The quieter the default, the higher the volume
Counterintuitively, the quieter default produced higher save rates and longer session lengths. Two reasons.
First, learners click more unknown words because the cost of clicking is genuinely low. Volume goes up, which is exactly what a vocabulary tool needs to build up a meaningful flashcard collection.
Second, the few learners who care about deep reference do not feel locked out. The fuller view is still there, just intentionally one tap away.
A maximalist popup feels generous on the surface and exhausting in practice. A quiet popup feels minimal at first and rewards extended use.
The lesson
Reference data in a learning tool should be layered, not dumped. The first layer is the answer to the question the learner actually asked: what does this word mean. Every additional layer should require a deliberate gesture to unlock.
A vocabulary tool is not a dictionary. It is a reading aid. Designing the dictionary popup to behave more like a reading aid and less like a dictionary is what made TubeVocab finally feel calm to use, even when learners click dozens of words inside a single video.
Top comments (0)