TLDR: The default 10,000 unit/day quota will burn through in ~10 naive user requests. Three tricks pulled my per-user cost down 50× and let me ship TubeVocab on the free tier.
When I started building TubeVocab — an ESL learning tool that turns any YouTube video into a clickable, vocab-learning interactive transcript — I assumed the YouTube Data API v3 would be the cheap, easy part. "It's Google. It scales. The free tier is generous." That kind of gut feeling.
I was wrong. The free tier is generous, but only if you understand how quota math actually works. Most public tutorials skip this. Here's what I learned the hard way.
The quota arithmetic nobody puts in the quickstart
Default daily quota: 10,000 units. Sounds like a lot.
Then you start reading the cost table and realize:
-
search.list— 100 units per call. That's how you find a video by query. -
videos.list— 1 unit per call. That's how you fetch metadata once you have an ID. -
captions.list— 50 units. Thumbnails of available subtitles. -
captions.download— 200 units. The actual subtitle data.
If your user-facing flow is "search a YouTube channel → pick a video → load subtitles → render the interactive player," you're looking at roughly 100 + 1 + 50 + 200 = 351 units per single user session. The 10,000 free units evaporate in 28 sessions/day.
That's not a side project. That's a 30-DAU launch and you're paying for quota expansion the next morning.
Three tricks that cut my per-user cost ~50×
1. Don't use search.list for known IDs
This sounds obvious in hindsight but it took me a week to see. If a user pastes a YouTube URL, the video ID is right there in the URL. Parse it. Skip search.list entirely.
// Bad: 100 units per pasted URL
const result = await youtube.search.list({ q: pastedUrl, type: 'video', part: 'snippet' });
// Good: 0 units, regex the ID
const id = pastedUrl.match(/(?:v=|youtu\.be\/)([\w-]{11})/)?.[1];
const result = await youtube.videos.list({ id, part: 'snippet,contentDetails' }); // 1 unit
This one change took the average pasted-URL flow from 351 units → 251 units.
2. Skip the official captions.* endpoints entirely
The captions.download endpoint costs 200 units per video AND requires OAuth (the user has to be the video owner). For non-owner subtitle access — i.e. the actual ESL use case — you need a different path.
The trick: YouTube serves the auto-generated and uploader-provided subtitles through an undocumented but stable XML endpoint that doesn't count against your quota at all. You can get the timed transcript via https://video.google.com/timedtext?lang=en&v=VIDEO_ID, parse the XML, and you're done. 0 quota units.
(Caveat: this endpoint is undocumented, so it can break. I have a fallback path that uses youtube-transcript-api style scraping. The combined approach gets ~95% subtitle hit rate without touching the official caption quota.)
After this, my "load subtitles" cost dropped from 250 → 1 unit per session.
3. Cache aggressively at the video-ID level
Every time someone watches a video on TubeVocab, the metadata + subtitle + thumbnail set is the same until the video itself changes. I run a per-video-ID cache (just SQLite — overkill is fine) with no expiry. Subsequent views of the same video cost zero quota, regardless of how many users watch it.
Once I had ~500 popular videos cached, my marginal cost per session was effectively zero. The quota is now spent only on first-time-seen videos.
What actually shipped
After these three optimizations:
- Average new-video session: ~2 units (videos.list + occasional fallback)
- Average cached-video session: 0 units
- Daily ceiling on the free tier: ~5,000 unique new videos/day before I'd need to start budgeting
That's enough headroom for the foreseeable lifetime of a side project.
If you're building anything in the YouTube + content-analysis space — vocabulary tools, accessibility, search, analytics — the playbook is roughly: assume search.list is poison, route around captions.*, and cache by video ID forever. The free tier becomes more than generous once you stop fighting it.
For context: I built TubeVocab using exactly this stack — it's a click-to-flashcard ESL tool that turns any YouTube video into vocabulary practice. The quota math was the single most underestimated technical risk of the whole project. Hope this saves someone a week.
Top comments (0)