Why I scoped TubeVocab to YouTube only and skipped TikTok and Bilibili

#sideprojects #startup #product #learning

The most common feedback on a YouTube-based vocabulary tool is some version of "this is great, can you add TikTok next?" Or Instagram Reels. Or Bilibili for Chinese learners. Or Netflix. Or any platform with subtitled video.

I considered all of them. I built TubeVocab as a YouTube-only product on purpose. The decision was less about engineering effort and more about content quality and learner outcomes.

Captions are not a commodity

A vocabulary tool that mines comprehensible input from video stands or falls on caption quality. Without a clean caption track, every downstream feature — click-to-flashcard, dictation, cloze tests, vocabulary in context — degrades.

YouTube's caption ecosystem has three things going for it that other platforms do not:

A massive base of human-edited captions from creators who care about accessibility and SEO.
Automatic captions that, while imperfect, are usable for popular speakers and standard accents.
A long-standing public API that lets a learning tool read the caption track without scraping fragile HTML.

TikTok captions are heavily auto-generated, often inaccurate, and frequently burned into the video as visual overlays rather than text. Instagram captions are similar. Netflix subtitles are high quality but locked inside DRM. Bilibili has good captions but a Chinese-speaking user base that is not the dominant target for an English learning tool.

A vocabulary tool built on the wrong caption source is built on sand.

Watch time and content length matter

YouTube is unique among video platforms in the variety of content length it supports. A learner can find a five-minute explainer or a ninety-minute documentary on essentially any topic, in any English accent.

Vocabulary acquisition through video requires extended exposure to coherent speech. A ten-second TikTok cannot give a learner the same kind of language exposure that a fifteen-minute YouTube video can. The information density per minute is similar, but the contextual scaffolding inside a longer video helps learners attach new words to surrounding ideas.

A tool that depends on long-form context simply works better on a platform that hosts long-form content. Scoping TubeVocab to YouTube made each saved word more useful, because each word came from a more complete contextual moment.

Focused scope beats feature breadth

There is a familiar pull in product design to support every adjacent platform "for completeness". The downside is that completeness drains energy from the depth of the experience on any single platform.

For TubeVocab, supporting YouTube alone means I can build features that exploit YouTube specifically: deep linking to a specific second of a video, channel-aware vocabulary grouping, recommendation against a learner's watched list, transcript-level cloze tests. Each of those would have to be reinvented per platform if the tool tried to serve all video sources at once.

Depth on one platform beats shallow coverage of five.

The exception is the rule

Plenty of learners genuinely watch a mix of YouTube and other video sources. For those learners, TubeVocab covers only part of their habit. That is fine. A vocabulary tool does not need to capture every word the learner ever encounters. It needs to capture enough words, often enough, with enough context, to build a useful flashcard deck and a stable review habit.

For learners whose YouTube watching is a meaningful chunk of their English input, scoping to YouTube is plenty. For learners whose YouTube watching is incidental, no tool would help much anyway.

The lesson

Picking the right single platform and building deeply on it is a more honest strategy than trying to be everywhere. It also keeps the development surface small enough that a tiny team can ship real improvements every week instead of patching integration breakage on five different APIs.

For a vocabulary tool, content quality and content length determine the ceiling. YouTube wins on both, by a margin that makes adding TikTok or Bilibili a worse product, not a more inclusive one.