YouTube Has No Transcript API So I Built One (150 Users Later)

#javascript #webdev #api #youtube

You know what's wild? YouTube, a Google product, has no official API for pulling video transcripts. You can upload, search, and manage playlists through their API. But if you want the actual words spoken in a video? Good luck.

I ran into this wall in late 2025 while building a content repurposing tool. I needed transcripts from YouTube videos to feed into an LLM for summarization. The YouTube Data API v3 gives you metadata, thumbnails, view counts. But transcripts? Nope.

So I built my own.

What It Actually Does

The actor loads a YouTube video page, grabs the auto generated captions that YouTube creates for most videos, and returns clean text with timestamps. It supports multiple languages because YouTube generates captions in different languages automatically.

Here's what the output looks like:

{
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "title": "Example Video Title",
  "language": "en",
  "transcript": [
    { "text": "Welcome to this tutorial", "start": 0.0, "duration": 2.5 },
    { "text": "Today we are going to cover", "start": 2.5, "duration": 3.1 }
  ]
}

No API key needed. No OAuth flows. Just pass in a video URL and get the transcript back.

The Numbers After 8 Months

I published this on the Apify Store and kind of forgot about it for a while. Then I checked the dashboard:

154 users have tried it
1,737 total runs across all users
It's one of my most popular actors out of 40+

The thing that surprised me was who's using it. I expected developers. And yes, developers building AI pipelines are a big chunk. But I also see researchers pulling transcripts from lecture series, content creators repurposing their own videos into blog posts, and marketing teams analyzing competitor video content.

The Hard Parts

YouTube does not make this easy. Captions are loaded dynamically through a separate request after the page renders. The URL for the caption track is embedded inside a massive JSON blob in the page source. Finding and parsing that reliably took more debugging than the actual extraction logic.

The other challenge: some videos have manually uploaded captions, some have auto generated ones, and some have both. The actor handles all three cases and lets you pick which language you want.

Rate limiting is real too. YouTube will throttle you if you hammer it. The actor spaces out requests and uses session management to stay under the radar.

Why Not Just Use a Python Library?

There are Python packages like youtube_transcript_api that do something similar. They work fine for one off scripts. But when you need to run this at scale, on a schedule, with proxy rotation and automatic retries, you want infrastructure around it.

That's what Apify gives you. The actor runs in the cloud, handles failures gracefully, and stores results in a dataset you can export to JSON, CSV, or push to a webhook.

Try It

The actor is free to run on Apify (you just pay for compute, which is pennies per video): YouTube Transcript Extractor

If you are building anything that needs video content as text, save yourself the headache of reverse engineering YouTube's caption system. Someone already did that part for you.

Built in Nairobi by George. 40+ actors on the Apify Store, 154 users on this one alone.